Core Features
Synthetic Generation
Generate synthetic datasets with automatic object detection and labeling using AI models.
Overview
cvPal's synthetic generation feature combines state-of-the-art image generation models (DALL-E 3, Stable Diffusion) with automatic object detection (OWL-ViT) to create fully annotated datasets. Simply provide a text prompt and labels, and cvPal will generate images with precise bounding box annotations.
How It Works
Generate Image
AI model creates image from text prompt
Detect Objects
OWL-ViT identifies objects in image
Create Labels
Generate bounding box annotations
Save Dataset
Export in YOLO or COCO format
Basic Usage
Get started with synthetic dataset generation using the DetectionDataset class:
Simple Example
from cvpal.generate import DetectionDataset# Initialize with Stable Diffusion (default)dataset = DetectionDataset(model="stable-diffusion")# Generate a datasetdataset.generate(prompt="a cat sitting on a chair",num_images=5,labels=["cat", "chair"],output_type="yolo")
π Parameters
prompt- Text description for image generationnum_images- Number of images to generatelabels- List of object classes to detectoutput_type- "yolo" or "coco" formatheight/width- Image dimensions (default: 512x512)
π― Output
- β’ Generated images in
images/folder - β’ Corresponding labels in
labels/folder - β’ Dataset configuration in
data.yaml - β’ Optional COCO annotations in
annotations.json
Model Selection
Choose between different image generation models based on your needs:
Stable Diffusion (Recommended)
Free, local processing with high customization options. Best for most use cases.
# Stable Diffusion - No API key requireddataset = DetectionDataset(model="stable-diffusion")# Customize generation parametersdataset.generate(prompt="a dog playing in the park",num_images=10,labels=["dog", "park", "tree"],height=768,width=768,seed=42,output_type="yolo")
DALL-E 3
Premium quality images with excellent prompt understanding. Requires OpenAI API key.
# DALL-E 3 - Requires OpenAI API keydataset = DetectionDataset(model="dalle",openai_api_key="your-openai-api-key")# Generate high-quality imagesdataset.generate(prompt="a professional photo of a cat in a business suit",num_images=3,labels=["cat", "suit"],output_type="coco")
Advanced Features
Dataset Management
Manage and extend your generated datasets with built-in utilities:
Add Labels to Existing Dataset
# Add new labels to existing imagesdataset.add_labels(labels=["person", "car"])# This will:# - Run detection on all existing images# - Add new annotations to label files# - Update data.yaml with new classes
Quality Control
# Check for images with no detectionsdataset.isnull()# Remove images with no detectionsdataset.dropna()# Visualize samplesdataset.show_samples(num_samples=5, annotation_type="yolo")
Batch Processing
Generate multiple datasets efficiently with parallel processing:
# Generate multiple datasetsprompts = ["a cat sitting on a chair","a dog playing with a ball","a bird flying over trees"]for i, prompt in enumerate(prompts):dataset = DetectionDataset(model="stable-diffusion")dataset.generate(prompt=prompt,num_images=5,labels=["animal", "object"],output_type="yolo",overwrite=True)print(f"Generated dataset {i+1}/3")
Output Formats
Choose the output format that best fits your training pipeline:
YOLO Format
Each image gets a corresponding .txt file with normalized coordinates.
# image_001.txt0 0.5 0.3 0.2 0.4 # cat: class_id x_center y_center width height1 0.7 0.6 0.15 0.3 # chair: class_id x_center y_center width height
Best for: YOLOv5, YOLOv8, custom detection models
COCO Format
Single JSON file with comprehensive metadata and annotations.
{"images": [{"id": 1, "file_name": "image_001.jpg", "width": 512, "height": 512}],"annotations": [{"id": 1, "image_id": 1, "category_id": 1, "bbox": [100, 50, 200, 150], "area": 30000}],"categories": [{"id": 1, "name": "cat"}]}
Best for: Detectron2, MMDetection, COCO evaluation
Best Practices
β Effective Prompts
- β’ Be specific about object types and positions
- β’ Include environmental context
- β’ Use descriptive adjectives
- β’ Mention lighting and style preferences
- β’ Avoid ambiguous descriptions
Good: "a black cat sitting on a wooden chair in a living room"
Bad: "cat and chair"
π― Label Strategy
- β’ Use consistent naming conventions
- β’ Include all objects you want to detect
- β’ Consider hierarchical labels
- β’ Test detection threshold (0.1-0.3)
- β’ Validate annotations manually
Example: ["person", "car", "traffic_light", "road"]
Performance Tips
π Speed
- β’ Use GPU acceleration
- β’ Reduce inference steps
- β’ Lower detection threshold
- β’ Use parallel processing
- β’ Consider smaller image sizes
πΎ Memory
- β’ Process images in batches
- β’ Use torch.float16 precision
- β’ Clear GPU cache regularly
- β’ Monitor memory usage
- β’ Use CPU fallback if needed
π¨ Quality
- β’ Use higher resolution images
- β’ Increase inference steps
- β’ Fine-tune detection threshold
- β’ Use diverse prompts
- β’ Validate with manual inspection
Troubleshooting
No Objects Detected
If OWL-ViT doesn't detect objects in generated images:
- β’ Lower the detection threshold (try 0.05-0.1)
- β’ Use more specific labels in your prompt
- β’ Ensure labels match objects in the image
- β’ Try different image generation models
Poor Image Quality
For better image generation results:
- β’ Increase inference steps (50-100)
- β’ Use higher resolution (768x768 or 1024x1024)
- β’ Improve prompt specificity
- β’ Consider using DALL-E 3 for premium quality
Memory Issues
If you encounter GPU memory errors:
- β’ Reduce batch size or image count
- β’ Use smaller image dimensions
- β’ Enable CPU fallback
- β’ Clear GPU cache between generations