Synthetic Data Generation
DetectionDataset Class
The main class for generating synthetic datasets with object detection capabilities.
Overview
The DetectionDataset class is the core component of cvPal's synthetic data generation system. It combines image generation with object detection to create labeled datasets automatically.
Key Features
Image Generation
Create images from text prompts
Object Detection
Automatically detect and label objects
Dataset Export
Export in YOLO or COCO format
Initialization
Create a new DetectionDataset instance with your preferred model:
Basic Initialization
from cvpal.generate import DetectionDataset# Initialize with Stable Diffusion (default)detection_dataset = DetectionDataset()# Initialize with specific modeldetection_dataset = DetectionDataset(model="stable-diffusion")# Initialize with DALL-E (requires API key)detection_dataset = DetectionDataset(model="dall-e",openai_api_key="your-api-key-here")
π Parameters
model- "stable-diffusion" or "dall-e"openai_api_key- Required for DALL-E model
π― Supported Models
- β’ stable-diffusion - Free, local processing
- β’ dall-e - Premium quality, API required
Main Methods
generate() - Main Generation Function
Generate synthetic images with automatic object detection and labeling:
# Generate synthetic datasetdetection_dataset.generate(prompt="a cat looking at the camera",num_images=5,labels=["cat"],output_type="yolo",overwrite=False)
add_labels() - Add Labels to Dataset
Add additional labels to existing dataset:
# Add labels to existing datasetdetection_dataset.add_labels(["dog", "person", "car"])
show_samples() - Visualize Samples
Display generated samples with bounding boxes:
# Show sample imagesdetection_dataset.show_samples(num_samples=3)
Quality Control Methods
isnull() - Check for Empty Detections
Identify images with no detected objects:
# Check for empty detectionsempty_images = detection_dataset.isnull()print(f"Found {len(empty_images)} images with no detections")
dropna() - Remove Empty Images
Remove images that have no detected objects:
# Remove images with no detectionsdetection_dataset.dropna()print("Removed images with no detections")
Complete Example
A complete workflow using the DetectionDataset class:
from cvpal.generate import DetectionDataset# 1. Initialize the datasetdetection_dataset = DetectionDataset(model="stable-diffusion")# 2. Generate initial datasetdetection_dataset.generate(prompt="a cat sitting on a chair",num_images=10,labels=["cat", "chair"],output_type="yolo",overwrite=False)# 3. Add more labelsdetection_dataset.add_labels(["person", "dog"])# 4. Generate more diverse imagesdetection_dataset.generate(prompt="a person walking a dog in the park",num_images=5,labels=["person", "dog"],output_type="yolo",overwrite=False)# 5. Check for quality issuesempty_images = detection_dataset.isnull()if len(empty_images) > 0:print(f"Found {len(empty_images)} empty images")detection_dataset.dropna()# 6. Visualize resultsdetection_dataset.show_samples(num_samples=5)print("Dataset generation complete!")
Best Practices
β Recommended Workflow
- β’ Start with small batches (5-10 images)
- β’ Use descriptive, specific prompts
- β’ Check quality with show_samples()
- β’ Remove empty images with dropna()
- β’ Use consistent label names
- β’ Save progress frequently
β οΈ Common Issues
- β’ Vague prompts lead to poor detection
- β’ Too many objects in single image
- β’ Inconsistent label naming
- β’ Not checking for empty images
- β’ Overwriting existing datasets