Core Features

Synthetic Generation

Generate synthetic datasets with automatic object detection and labeling using AI models.

Overview

cvPal's synthetic generation feature combines state-of-the-art image generation models (DALL-E 3, Stable Diffusion) with automatic object detection (OWL-ViT) to create fully annotated datasets. Simply provide a text prompt and labels, and cvPal will generate images with precise bounding box annotations.

How It Works

1

Generate Image

AI model creates image from text prompt

2

Detect Objects

OWL-ViT identifies objects in image

3

Create Labels

Generate bounding box annotations

4

Save Dataset

Export in YOLO or COCO format

Basic Usage

Get started with synthetic dataset generation using the DetectionDataset class:

Simple Example

python
from cvpal.generate import DetectionDataset
# Initialize with Stable Diffusion (default)
dataset = DetectionDataset(model="stable-diffusion")
# Generate a dataset
dataset.generate(
prompt="a cat sitting on a chair",
num_images=5,
labels=["cat", "chair"],
output_type="yolo"
)

πŸ“ Parameters

  • prompt - Text description for image generation
  • num_images - Number of images to generate
  • labels - List of object classes to detect
  • output_type - "yolo" or "coco" format
  • height/width - Image dimensions (default: 512x512)

🎯 Output

  • β€’ Generated images in images/ folder
  • β€’ Corresponding labels in labels/ folder
  • β€’ Dataset configuration in data.yaml
  • β€’ Optional COCO annotations in annotations.json

Model Selection

Choose between different image generation models based on your needs:

Stable Diffusion (Recommended)

Free, local processing with high customization options. Best for most use cases.

python
# Stable Diffusion - No API key required
dataset = DetectionDataset(model="stable-diffusion")
# Customize generation parameters
dataset.generate(
prompt="a dog playing in the park",
num_images=10,
labels=["dog", "park", "tree"],
height=768,
width=768,
seed=42,
output_type="yolo"
)

DALL-E 3

Premium quality images with excellent prompt understanding. Requires OpenAI API key.

python
# DALL-E 3 - Requires OpenAI API key
dataset = DetectionDataset(
model="dalle",
openai_api_key="your-openai-api-key"
)
# Generate high-quality images
dataset.generate(
prompt="a professional photo of a cat in a business suit",
num_images=3,
labels=["cat", "suit"],
output_type="coco"
)

Advanced Features

Dataset Management

Manage and extend your generated datasets with built-in utilities:

Add Labels to Existing Dataset

python
# Add new labels to existing images
dataset.add_labels(labels=["person", "car"])
# This will:
# - Run detection on all existing images
# - Add new annotations to label files
# - Update data.yaml with new classes

Quality Control

python
# Check for images with no detections
dataset.isnull()
# Remove images with no detections
dataset.dropna()
# Visualize samples
dataset.show_samples(num_samples=5, annotation_type="yolo")

Batch Processing

Generate multiple datasets efficiently with parallel processing:

python
# Generate multiple datasets
prompts = [
"a cat sitting on a chair",
"a dog playing with a ball",
"a bird flying over trees"
]
for i, prompt in enumerate(prompts):
dataset = DetectionDataset(model="stable-diffusion")
dataset.generate(
prompt=prompt,
num_images=5,
labels=["animal", "object"],
output_type="yolo",
overwrite=True
)
print(f"Generated dataset {i+1}/3")

Output Formats

Choose the output format that best fits your training pipeline:

YOLO Format

Each image gets a corresponding .txt file with normalized coordinates.

text
# image_001.txt
0 0.5 0.3 0.2 0.4 # cat: class_id x_center y_center width height
1 0.7 0.6 0.15 0.3 # chair: class_id x_center y_center width height

Best for: YOLOv5, YOLOv8, custom detection models

COCO Format

Single JSON file with comprehensive metadata and annotations.

json
{
"images": [{"id": 1, "file_name": "image_001.jpg", "width": 512, "height": 512}],
"annotations": [{"id": 1, "image_id": 1, "category_id": 1, "bbox": [100, 50, 200, 150], "area": 30000}],
"categories": [{"id": 1, "name": "cat"}]
}

Best for: Detectron2, MMDetection, COCO evaluation

Best Practices

βœ… Effective Prompts

  • β€’ Be specific about object types and positions
  • β€’ Include environmental context
  • β€’ Use descriptive adjectives
  • β€’ Mention lighting and style preferences
  • β€’ Avoid ambiguous descriptions

Good: "a black cat sitting on a wooden chair in a living room"
Bad: "cat and chair"

🎯 Label Strategy

  • β€’ Use consistent naming conventions
  • β€’ Include all objects you want to detect
  • β€’ Consider hierarchical labels
  • β€’ Test detection threshold (0.1-0.3)
  • β€’ Validate annotations manually

Example: ["person", "car", "traffic_light", "road"]

Performance Tips

πŸš€ Speed

  • β€’ Use GPU acceleration
  • β€’ Reduce inference steps
  • β€’ Lower detection threshold
  • β€’ Use parallel processing
  • β€’ Consider smaller image sizes

πŸ’Ύ Memory

  • β€’ Process images in batches
  • β€’ Use torch.float16 precision
  • β€’ Clear GPU cache regularly
  • β€’ Monitor memory usage
  • β€’ Use CPU fallback if needed

🎨 Quality

  • β€’ Use higher resolution images
  • β€’ Increase inference steps
  • β€’ Fine-tune detection threshold
  • β€’ Use diverse prompts
  • β€’ Validate with manual inspection

Troubleshooting

No Objects Detected

If OWL-ViT doesn't detect objects in generated images:

  • β€’ Lower the detection threshold (try 0.05-0.1)
  • β€’ Use more specific labels in your prompt
  • β€’ Ensure labels match objects in the image
  • β€’ Try different image generation models

Poor Image Quality

For better image generation results:

  • β€’ Increase inference steps (50-100)
  • β€’ Use higher resolution (768x768 or 1024x1024)
  • β€’ Improve prompt specificity
  • β€’ Consider using DALL-E 3 for premium quality

Memory Issues

If you encounter GPU memory errors:

  • β€’ Reduce batch size or image count
  • β€’ Use smaller image dimensions
  • β€’ Enable CPU fallback
  • β€’ Clear GPU cache between generations