Automate Image Quality Checks in a CI/CD Pipeline

Published: April 2026 Reading time: 8 minutes

Image quality validation is usually an afterthought — something you add after a production bug. This guide shows you how to make it a first-class CI/CD gate, so bad images are caught before they ever reach your staging environment, your ML training run, or your users.

We'll wire up imageguard to GitHub Actions, write a pytest-based quality gate, and show how to make it part of an MLOps data validation step.

The use cases for image quality in CI/CD

Static asset validation: Ensure product images committed to your repo or uploaded to S3 meet minimum quality standards before they go live.
ML dataset validation: Before triggering a training run, validate the new batch of training images — catch bad data before it wastes GPU time.
Test fixture validation: Your test suite uses sample images. Make sure they haven't been accidentally replaced with low-quality versions.
Content moderation pre-filter: Run quality checks before sending images to expensive moderation APIs.

Step 1: Create the validation script

Create scripts/validate_images.py:

#!/usr/bin/env python3
"""CI image quality gate.

Exit 0 if all images pass; exit 1 with report if any fail.
Usage: python validate_images.py [folder] [--min-score 0.5]
"""
import argparse, sys
from pathlib import Path
from imageguard import validate

EXTENSIONS = {".jpg", ".jpeg", ".png", ".webp"}

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("folder", help="Image folder to validate")
    parser.add_argument("--min-score", type=float, default=0.4)
    args = parser.parse_args()

    images = [p for p in Path(args.folder).rglob("*") if p.suffix.lower() in EXTENSIONS]
    failures = []

    for img in images:
        result = validate(img)
        if result.score < args.min_score:
            failures.append((img, result))

    if failures:
        print(f"❌ {len(failures)}/{len(images)} images failed quality check:\n")
        for path, r in failures:
            print(f"  {path.name}: score={r.score:.2f}  reason={r.reason}  issues={r.issues}")
        sys.exit(1)
    else:
        print(f"✅ All {len(images)} images passed (min score: {args.min_score})")

if __name__ == "__main__":
    main()

Step 2: GitHub Actions workflow

Create .github/workflows/image-quality.yml:

name: Image Quality Gate

on:
  push:
    paths:
      - 'assets/images/**'
      - 'data/images/**'
  pull_request:
    paths:
      - 'assets/images/**'

jobs:
  validate-images:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install imageguard
        run: pip install imageguard

      - name: Run image quality gate
        run: python scripts/validate_images.py assets/images/ --min-score 0.4

Pro tip: For ML dataset pipelines, trigger this workflow on pushes to your data/ directory path. This way every new data batch is automatically validated before it can be used in a training run.

Step 3: pytest integration for test fixtures

# tests/test_image_fixtures.py
import pytest
from pathlib import Path
from imageguard import validate

FIXTURES_DIR = Path("tests/fixtures/images")

@pytest.mark.parametrize("image_path", list(FIXTURES_DIR.glob("*.jpg")))
def test_fixture_image_quality(image_path):
    result = validate(image_path)
    assert result.ok, (
        f"Test fixture {image_path.name} has quality issues: "
        f"{result.reason} (score={result.score:.2f}, issues={result.issues})"
    )

Step 4: MLOps data validation step (DVC / Prefect / Airflow)

# dvc stage or Prefect task
from pathlib import Path
from imageguard import validate

def validate_dataset_stage(data_dir: str, reject_threshold: float = 0.45) -> dict:
    images = list(Path(data_dir).rglob("*.jpg"))
    results = {"total": len(images), "passed": 0, "rejected": 0, "rejection_reasons": {}}

    for img in images:
        r = validate(img)
        if r.score >= reject_threshold:
            results["passed"] += 1
        else:
            results["rejected"] += 1
            results["rejection_reasons"][r.reason] = results["rejection_reasons"].get(r.reason, 0) + 1

    reject_rate = results["rejected"] / max(1, results["total"])
    if reject_rate > 0.20:  # fail pipeline if more than 20% rejected
        raise ValueError(f"Data quality too low: {reject_rate:.0%} rejected. Reasons: {results['rejection_reasons']}")

    return results

imageguard — the validation library behind these examples

Open-source Python package. pip install imageguard — no API keys, no cloud dependency. Used in production at changeimageto.com.

View on GitHub →

Last updated: April 2026

Tags: CI/CD, MLOps, GitHub Actions, Image Quality, Python