Synthetic Data Generation
show_samples() Function
Visualize generated samples with bounding boxes to assess dataset quality and detection accuracy.
Overview
The show_samples() function displays generated images with their corresponding bounding boxes and labels. This is essential for quality assessment, debugging detection issues, and verifying that your dataset meets your requirements.
Key Features
Visual Inspection
See images with bounding boxes
Quality Assessment
Verify detection accuracy
Debugging Tool
Identify detection issues
Function Signature
def show_samples(self, num_samples: int = 3) -> None:
Parameters
num_samples (int) - Default: 3
Number of sample images to display. The function will randomly select this many images from your dataset to show.
# Show default number of samples (3)detection_dataset.show_samples()# Show specific number of samplesdetection_dataset.show_samples(num_samples=5)# Show single sampledetection_dataset.show_samples(num_samples=1)# Show many samples for thorough reviewdetection_dataset.show_samples(num_samples=10)
What You'll See
Visual Output
The function displays images with:
- β’ Bounding boxes - Rectangles around detected objects
- β’ Labels - Object class names above each box
- β’ Confidence scores - Detection confidence (if available)
- β’ Image information - File path and dimensions
# Example output format:# βββββββββββββββββββββββββββββββββββββββ# β Image: /path/to/image001.jpg β# β Dimensions: 512x512 β# β Objects detected: 2 β# β β# β βββββββββββ β# β β cat β βββββββββββ β# β β 0.95 β β chair β β# β βββββββββββ β 0.87 β β# β βββββββββββ β# βββββββββββββββββββββββββββββββββββββββ
Basic Examples
Quick Quality Check
from cvpal.generate import DetectionDataset# Initialize and generate datasetdetection_dataset = DetectionDataset()detection_dataset.generate(prompt="a cat sitting on a chair",num_images=5,labels=["cat", "chair"],output_type="yolo")# Quick visual checkdetection_dataset.show_samples(num_samples=3)
Comprehensive Review
# Generate larger datasetdetection_dataset.generate(prompt="a person walking a dog in a park",num_images=10,labels=["person", "dog"],output_type="yolo",overwrite=False)# Review all samplesdetection_dataset.show_samples(num_samples=10)# Or review in batchesdetection_dataset.show_samples(num_samples=5) # First 5detection_dataset.show_samples(num_samples=5) # Next 5
Quality Assessment Workflow
# Complete quality assessment workflowdef assess_dataset_quality(detection_dataset):print("=== Dataset Quality Assessment ===")# 1. Check for empty imagesempty_images = detection_dataset.isnull()print(f"Empty images: {len(empty_images)}")# 2. Show samples for visual inspectionprint("\nVisual inspection:")detection_dataset.show_samples(num_samples=5)# 3. Ask for user feedback (in interactive environment)print("\nReview the samples above.")print("Check for:")print(" - Correct object detection")print(" - Accurate bounding boxes")print(" - Proper label assignment")print(" - Image quality")# 4. Clean up if neededif len(empty_images) > 0:print(f"\nFound {len(empty_images)} empty images. Cleaning up...")detection_dataset.dropna()print("Cleanup complete.")print("\n=== Assessment Complete ===")# Use the assessment workflowassess_dataset_quality(detection_dataset)
Advanced Usage
Batch Quality Monitoring
Monitor quality across multiple generation batches:
# Monitor quality across batchesprompts = ["a cat sitting on a chair","a dog running in a park","a person riding a bicycle"]for i, prompt in enumerate(prompts):print(f"\n=== Batch {i+1}: {prompt} ===")# Generate batchdetection_dataset.generate(prompt=prompt,num_images=3,labels=["cat", "dog", "person"][i:i+1],output_type="yolo",overwrite=False)# Check qualityempty_images = detection_dataset.isnull()print(f"Generated 3 images, {len(empty_images)} empty")# Visual inspectionprint("Sample review:")detection_dataset.show_samples(num_samples=2)# Clean up if neededif len(empty_images) > 0:print("Cleaning up empty images...")detection_dataset.dropna()print(f"Batch {i+1} complete.")
Comparative Analysis
Compare samples before and after cleanup:
# Compare before and after cleanupdef compare_before_after(detection_dataset):print("=== Before Cleanup ===")# Show samples before cleanupempty_images = detection_dataset.isnull()print(f"Empty images: {len(empty_images)}")detection_dataset.show_samples(num_samples=3)# Clean upif len(empty_images) > 0:print("\nCleaning up...")detection_dataset.dropna()print("\n=== After Cleanup ===")final_empty = detection_dataset.isnull()print(f"Empty images: {len(final_empty)}")detection_dataset.show_samples(num_samples=3)print(f"\nRemoved {len(empty_images) - len(final_empty)} empty images")else:print("No cleanup needed - dataset is already clean!")# Use comparisoncompare_before_after(detection_dataset)
What to Look For
β Good Signs
- β’ Bounding boxes tightly fit objects
- β’ Labels match visible objects
- β’ High confidence scores
- β’ Clear, high-quality images
- β’ Consistent detection across samples
- β’ Appropriate number of objects per image
β οΈ Warning Signs
- β’ Loose or incorrect bounding boxes
- β’ Wrong or missing labels
- β’ Low confidence scores
- β’ Blurry or poor quality images
- β’ Inconsistent detection
- β’ Too many or too few objects
Troubleshooting
No Images Displayed
Issue: show_samples() runs but no images appear.
Solutions: Check if dataset has images, verify file paths, ensure display environment supports image rendering, or try reducing num_samples.
Missing Bounding Boxes
Issue: Images show but no bounding boxes are displayed.
Solutions: Check if objects were detected, verify label files exist, ensure detection threshold isn't too high, or check label format.
Incorrect Labels
Issue: Bounding boxes appear but with wrong labels.
Solutions: Check label mapping, verify detection model accuracy, improve prompts, or adjust detection parameters.
Best Practices
β Recommended Usage
- β’ Use after each generation batch
- β’ Start with small num_samples (3-5)
- β’ Review systematically
- β’ Document quality issues
- β’ Use before final dataset export
β οΈ Common Mistakes
- β’ Not reviewing samples regularly
- β’ Using too many samples at once
- β’ Ignoring quality issues
- β’ Not documenting problems
- β’ Skipping visual inspection
Integration with Other Functions
Complete Quality Control Pipeline
def complete_quality_pipeline(detection_dataset):"""Complete pipeline: Generate -> Show -> Check -> Clean -> Verify"""print("=== Complete Quality Control Pipeline ===")# 1. Generate datasetdetection_dataset.generate(prompt="a cat sitting on a chair",num_images=8,labels=["cat", "chair"],output_type="yolo")# 2. Visual inspectionprint("\nStep 1: Visual Inspection")detection_dataset.show_samples(num_samples=4)# 3. Check for empty imagesprint("\nStep 2: Empty Image Check")empty_images = detection_dataset.isnull()print(f"Empty images found: {len(empty_images)}")# 4. Clean up if neededif len(empty_images) > 0:print("\nStep 3: Cleanup")print(f"Removing {len(empty_images)} empty images...")detection_dataset.dropna()# 5. Final verificationprint("\nStep 4: Final Verification")final_empty = detection_dataset.isnull()print(f"Final empty images: {len(final_empty)}")# 6. Show final samplesprint("\nStep 5: Final Sample Review")detection_dataset.show_samples(num_samples=3)print("\nβ Quality control pipeline complete!")# Use the complete pipelinecomplete_quality_pipeline(detection_dataset)