Image quality validation is usually an afterthought — something you add after a production bug. This guide shows you how to make it a first-class CI/CD gate, so bad images are caught before they ever reach your staging environment, your ML training run, or your users.
We'll wire up imageguard to GitHub Actions, write a pytest-based quality gate, and show how to make it part of an MLOps data validation step.
The use cases for image quality in CI/CD
- Static asset validation: Ensure product images committed to your repo or uploaded to S3 meet minimum quality standards before they go live.
- ML dataset validation: Before triggering a training run, validate the new batch of training images — catch bad data before it wastes GPU time.
- Test fixture validation: Your test suite uses sample images. Make sure they haven't been accidentally replaced with low-quality versions.
- Content moderation pre-filter: Run quality checks before sending images to expensive moderation APIs.
Step 1: Create the validation script
Create scripts/validate_images.py:
#!/usr/bin/env python3
"""CI image quality gate.
Exit 0 if all images pass; exit 1 with report if any fail.
Usage: python validate_images.py [folder] [--min-score 0.5]
"""
import argparse, sys
from pathlib import Path
from imageguard import validate
EXTENSIONS = {".jpg", ".jpeg", ".png", ".webp"}
def main():
parser = argparse.ArgumentParser()
parser.add_argument("folder", help="Image folder to validate")
parser.add_argument("--min-score", type=float, default=0.4)
args = parser.parse_args()
images = [p for p in Path(args.folder).rglob("*") if p.suffix.lower() in EXTENSIONS]
failures = []
for img in images:
result = validate(img)
if result.score < args.min_score:
failures.append((img, result))
if failures:
print(f"❌ {len(failures)}/{len(images)} images failed quality check:\n")
for path, r in failures:
print(f" {path.name}: score={r.score:.2f} reason={r.reason} issues={r.issues}")
sys.exit(1)
else:
print(f"✅ All {len(images)} images passed (min score: {args.min_score})")
if __name__ == "__main__":
main()
Step 2: GitHub Actions workflow
Create .github/workflows/image-quality.yml:
name: Image Quality Gate
on:
push:
paths:
- 'assets/images/**'
- 'data/images/**'
pull_request:
paths:
- 'assets/images/**'
jobs:
validate-images:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install imageguard
run: pip install imageguard
- name: Run image quality gate
run: python scripts/validate_images.py assets/images/ --min-score 0.4
Pro tip: For ML dataset pipelines, trigger this workflow on pushes to your
data/ directory path. This way every new data batch is automatically validated before it can be used in a training run.
Step 3: pytest integration for test fixtures
# tests/test_image_fixtures.py
import pytest
from pathlib import Path
from imageguard import validate
FIXTURES_DIR = Path("tests/fixtures/images")
@pytest.mark.parametrize("image_path", list(FIXTURES_DIR.glob("*.jpg")))
def test_fixture_image_quality(image_path):
result = validate(image_path)
assert result.ok, (
f"Test fixture {image_path.name} has quality issues: "
f"{result.reason} (score={result.score:.2f}, issues={result.issues})"
)
Step 4: MLOps data validation step (DVC / Prefect / Airflow)
# dvc stage or Prefect task
from pathlib import Path
from imageguard import validate
def validate_dataset_stage(data_dir: str, reject_threshold: float = 0.45) -> dict:
images = list(Path(data_dir).rglob("*.jpg"))
results = {"total": len(images), "passed": 0, "rejected": 0, "rejection_reasons": {}}
for img in images:
r = validate(img)
if r.score >= reject_threshold:
results["passed"] += 1
else:
results["rejected"] += 1
results["rejection_reasons"][r.reason] = results["rejection_reasons"].get(r.reason, 0) + 1
reject_rate = results["rejected"] / max(1, results["total"])
if reject_rate > 0.20: # fail pipeline if more than 20% rejected
raise ValueError(f"Data quality too low: {reject_rate:.0%} rejected. Reasons: {results['rejection_reasons']}")
return results
imageguard — the validation library behind these examples
Open-source Python package. pip install imageguard — no API keys, no cloud dependency. Used in production at changeimageto.com.