Every call to a vision AI API — GPT-4o Vision, Google Cloud Vision, AWS Rekognition, Azure Computer Vision — costs money and takes time. When you send a bad image, you still pay, you still wait, and you get a bad result. In high-volume pipelines, that waste adds up quickly.
This checklist covers everything you should validate before making the API call. It applies to any vision AI service and can be automated in Python in about 10 lines of code.
The checklist
- ✓Image loads without error — Check that the file can be decoded. Corrupted files, truncated uploads, and wrong extensions are more common than you'd think. Use
cv2.imread()or PIL and verify the result is notNone. - ✓Correct format — Most APIs accept JPEG, PNG, and WebP. GIF, TIFF, BMP, and HEIC may or may not be supported. Convert to JPEG before the call if in doubt.
- ✓File size within limit — GPT-4o Vision: 20 MB. Google Vision: 10 MB. AWS Rekognition: 15 MB. Larger files should be compressed or resized first.
- ✓Minimum resolution — Anything below 100×100 will give poor results from most vision models. OCR needs at least 300×300. Face recognition needs at least 160×160.
- ✓Not blurry — A blurry image gives blurry API results. OCR engines will miss characters; object detectors will miss objects; face detectors may not detect at all.
- ✓Properly exposed — Overexposed (all white) and underexposed (all black) regions contain no usable information. Check that the image isn't more than 60% pure darks or lights.
- ✓No severe compression artefacts — Heavily re-compressed JPEG images look worse than they appear on screen. For OCR especially, blockiness at 8×8 boundaries confuses character segmentation.
- ✓Aspect ratio is reasonable — Extreme panoramas (> 3:1 ratio) and tall slivers confuse many models. Crop or pad to a reasonable aspect ratio if needed.
Automate the checklist in Python
from imageguard import validate
import os
def preprocess_for_api(image_path: str, api: str = "general") -> str | None:
"""Validate image and return path if ready, None if rejected."""
# File-level checks
if not os.path.exists(image_path):
return None
if os.path.getsize(image_path) > 10 * 1024 * 1024: # 10 MB
return None
# Quality checks
thresholds = {
"ocr": {"blur_score": 60.0, "resolution_score": 70.0},
"face": {"blur_score": 50.0, "resolution_score": 65.0},
"general": {}, # use defaults
}.get(api, {})
result = validate(image_path, thresholds=thresholds)
return image_path if result.ok else None
# Usage
ready = preprocess_for_api("scan.jpg", api="ocr")
if ready:
response = ocr_api.call(ready) # safe to proceed
else:
log_skipped("scan.jpg", reason="quality_check_failed")
Per-API minimum requirements
| API | Min resolution | Max file size | Formats |
|---|---|---|---|
| GPT-4o Vision | No hard minimum | 20 MB | JPEG, PNG, WEBP, GIF |
| Google Cloud Vision | No hard minimum | 10 MB | JPEG, PNG, WEBP, BMP, GIF |
| AWS Rekognition | 80×80 (faces) | 15 MB (S3: no limit) | JPEG, PNG |
| Azure Computer Vision | 50×50 | 4 MB (free); 20 MB (paid) | JPEG, PNG, BMP, TIFF |
| Tesseract OCR (local) | 300 DPI recommended | No limit | JPEG, PNG, TIFF, BMP |
The cost of skipping validation
In a pipeline processing 10,000 images per day at $0.001 per API call, a 10% bad-image rate costs $1/day in wasted credits — but worse, it costs you the time to debug the downstream failures those bad results create.
Validation with imageguard runs in ~20–50ms per image. At 10,000 images/day that's less than 10 minutes of compute — a negligible price to eliminate an entire class of production bugs.
imageguard — automated image validation in Python
One call checks blur, noise, resolution, exposure, compression, and pixelation. Returns a simple pass/fail with reason. Open-source on GitHub.
View on GitHub →