Synthetic Data Generation

show_samples() Function

Visualize generated samples with bounding boxes to assess dataset quality and detection accuracy.

Overview

The show_samples() function displays generated images with their corresponding bounding boxes and labels. This is essential for quality assessment, debugging detection issues, and verifying that your dataset meets your requirements.

Key Features

Visual Inspection

See images with bounding boxes

Quality Assessment

Verify detection accuracy

Debugging Tool

Identify detection issues

Function Signature

python
def show_samples(self, num_samples: int = 3) -> None:

Parameters

num_samples (int) - Default: 3

Number of sample images to display. The function will randomly select this many images from your dataset to show.

python
# Show default number of samples (3)
detection_dataset.show_samples()
# Show specific number of samples
detection_dataset.show_samples(num_samples=5)
# Show single sample
detection_dataset.show_samples(num_samples=1)
# Show many samples for thorough review
detection_dataset.show_samples(num_samples=10)

What You'll See

Visual Output

The function displays images with:

  • β€’ Bounding boxes - Rectangles around detected objects
  • β€’ Labels - Object class names above each box
  • β€’ Confidence scores - Detection confidence (if available)
  • β€’ Image information - File path and dimensions
text
# Example output format:
# β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
# β”‚ Image: /path/to/image001.jpg β”‚
# β”‚ Dimensions: 512x512 β”‚
# β”‚ Objects detected: 2 β”‚
# β”‚ β”‚
# β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
# β”‚ β”‚ cat β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
# β”‚ β”‚ 0.95 β”‚ β”‚ chair β”‚ β”‚
# β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ 0.87 β”‚ β”‚
# β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
# β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Basic Examples

Quick Quality Check

python
from cvpal.generate import DetectionDataset
# Initialize and generate dataset
detection_dataset = DetectionDataset()
detection_dataset.generate(
prompt="a cat sitting on a chair",
num_images=5,
labels=["cat", "chair"],
output_type="yolo"
)
# Quick visual check
detection_dataset.show_samples(num_samples=3)

Comprehensive Review

python
# Generate larger dataset
detection_dataset.generate(
prompt="a person walking a dog in a park",
num_images=10,
labels=["person", "dog"],
output_type="yolo",
overwrite=False
)
# Review all samples
detection_dataset.show_samples(num_samples=10)
# Or review in batches
detection_dataset.show_samples(num_samples=5) # First 5
detection_dataset.show_samples(num_samples=5) # Next 5

Quality Assessment Workflow

python
# Complete quality assessment workflow
def assess_dataset_quality(detection_dataset):
print("=== Dataset Quality Assessment ===")
# 1. Check for empty images
empty_images = detection_dataset.isnull()
print(f"Empty images: {len(empty_images)}")
# 2. Show samples for visual inspection
print("\nVisual inspection:")
detection_dataset.show_samples(num_samples=5)
# 3. Ask for user feedback (in interactive environment)
print("\nReview the samples above.")
print("Check for:")
print(" - Correct object detection")
print(" - Accurate bounding boxes")
print(" - Proper label assignment")
print(" - Image quality")
# 4. Clean up if needed
if len(empty_images) > 0:
print(f"\nFound {len(empty_images)} empty images. Cleaning up...")
detection_dataset.dropna()
print("Cleanup complete.")
print("\n=== Assessment Complete ===")
# Use the assessment workflow
assess_dataset_quality(detection_dataset)

Advanced Usage

Batch Quality Monitoring

Monitor quality across multiple generation batches:

python
# Monitor quality across batches
prompts = [
"a cat sitting on a chair",
"a dog running in a park",
"a person riding a bicycle"
]
for i, prompt in enumerate(prompts):
print(f"\n=== Batch {i+1}: {prompt} ===")
# Generate batch
detection_dataset.generate(
prompt=prompt,
num_images=3,
labels=["cat", "dog", "person"][i:i+1],
output_type="yolo",
overwrite=False
)
# Check quality
empty_images = detection_dataset.isnull()
print(f"Generated 3 images, {len(empty_images)} empty")
# Visual inspection
print("Sample review:")
detection_dataset.show_samples(num_samples=2)
# Clean up if needed
if len(empty_images) > 0:
print("Cleaning up empty images...")
detection_dataset.dropna()
print(f"Batch {i+1} complete.")

Comparative Analysis

Compare samples before and after cleanup:

python
# Compare before and after cleanup
def compare_before_after(detection_dataset):
print("=== Before Cleanup ===")
# Show samples before cleanup
empty_images = detection_dataset.isnull()
print(f"Empty images: {len(empty_images)}")
detection_dataset.show_samples(num_samples=3)
# Clean up
if len(empty_images) > 0:
print("\nCleaning up...")
detection_dataset.dropna()
print("\n=== After Cleanup ===")
final_empty = detection_dataset.isnull()
print(f"Empty images: {len(final_empty)}")
detection_dataset.show_samples(num_samples=3)
print(f"\nRemoved {len(empty_images) - len(final_empty)} empty images")
else:
print("No cleanup needed - dataset is already clean!")
# Use comparison
compare_before_after(detection_dataset)

What to Look For

βœ… Good Signs

  • β€’ Bounding boxes tightly fit objects
  • β€’ Labels match visible objects
  • β€’ High confidence scores
  • β€’ Clear, high-quality images
  • β€’ Consistent detection across samples
  • β€’ Appropriate number of objects per image

⚠️ Warning Signs

  • β€’ Loose or incorrect bounding boxes
  • β€’ Wrong or missing labels
  • β€’ Low confidence scores
  • β€’ Blurry or poor quality images
  • β€’ Inconsistent detection
  • β€’ Too many or too few objects

Troubleshooting

No Images Displayed

Issue: show_samples() runs but no images appear.

Solutions: Check if dataset has images, verify file paths, ensure display environment supports image rendering, or try reducing num_samples.

Missing Bounding Boxes

Issue: Images show but no bounding boxes are displayed.

Solutions: Check if objects were detected, verify label files exist, ensure detection threshold isn't too high, or check label format.

Incorrect Labels

Issue: Bounding boxes appear but with wrong labels.

Solutions: Check label mapping, verify detection model accuracy, improve prompts, or adjust detection parameters.

Best Practices

βœ… Recommended Usage

  • β€’ Use after each generation batch
  • β€’ Start with small num_samples (3-5)
  • β€’ Review systematically
  • β€’ Document quality issues
  • β€’ Use before final dataset export

⚠️ Common Mistakes

  • β€’ Not reviewing samples regularly
  • β€’ Using too many samples at once
  • β€’ Ignoring quality issues
  • β€’ Not documenting problems
  • β€’ Skipping visual inspection

Integration with Other Functions

Complete Quality Control Pipeline

python
def complete_quality_pipeline(detection_dataset):
"""
Complete pipeline: Generate -> Show -> Check -> Clean -> Verify
"""
print("=== Complete Quality Control Pipeline ===")
# 1. Generate dataset
detection_dataset.generate(
prompt="a cat sitting on a chair",
num_images=8,
labels=["cat", "chair"],
output_type="yolo"
)
# 2. Visual inspection
print("\nStep 1: Visual Inspection")
detection_dataset.show_samples(num_samples=4)
# 3. Check for empty images
print("\nStep 2: Empty Image Check")
empty_images = detection_dataset.isnull()
print(f"Empty images found: {len(empty_images)}")
# 4. Clean up if needed
if len(empty_images) > 0:
print("\nStep 3: Cleanup")
print(f"Removing {len(empty_images)} empty images...")
detection_dataset.dropna()
# 5. Final verification
print("\nStep 4: Final Verification")
final_empty = detection_dataset.isnull()
print(f"Final empty images: {len(final_empty)}")
# 6. Show final samples
print("\nStep 5: Final Sample Review")
detection_dataset.show_samples(num_samples=3)
print("\nβœ… Quality control pipeline complete!")
# Use the complete pipeline
complete_quality_pipeline(detection_dataset)