Synthetic Data Generation

DetectionDataset Class

The main class for generating synthetic datasets with object detection capabilities.

Overview

The DetectionDataset class is the core component of cvPal's synthetic data generation system. It combines image generation with object detection to create labeled datasets automatically.

Key Features

Image Generation

Create images from text prompts

Object Detection

Automatically detect and label objects

Dataset Export

Export in YOLO or COCO format

Initialization

Create a new DetectionDataset instance with your preferred model:

Basic Initialization

python
from cvpal.generate import DetectionDataset
# Initialize with Stable Diffusion (default)
detection_dataset = DetectionDataset()
# Initialize with specific model
detection_dataset = DetectionDataset(model="stable-diffusion")
# Initialize with DALL-E (requires API key)
detection_dataset = DetectionDataset(
model="dall-e",
openai_api_key="your-api-key-here"
)

πŸ“ Parameters

  • model - "stable-diffusion" or "dall-e"
  • openai_api_key - Required for DALL-E model

🎯 Supported Models

  • β€’ stable-diffusion - Free, local processing
  • β€’ dall-e - Premium quality, API required

Main Methods

generate() - Main Generation Function

Generate synthetic images with automatic object detection and labeling:

python
# Generate synthetic dataset
detection_dataset.generate(
prompt="a cat looking at the camera",
num_images=5,
labels=["cat"],
output_type="yolo",
overwrite=False
)

add_labels() - Add Labels to Dataset

Add additional labels to existing dataset:

python
# Add labels to existing dataset
detection_dataset.add_labels(["dog", "person", "car"])

show_samples() - Visualize Samples

Display generated samples with bounding boxes:

python
# Show sample images
detection_dataset.show_samples(num_samples=3)

Quality Control Methods

isnull() - Check for Empty Detections

Identify images with no detected objects:

python
# Check for empty detections
empty_images = detection_dataset.isnull()
print(f"Found {len(empty_images)} images with no detections")

dropna() - Remove Empty Images

Remove images that have no detected objects:

python
# Remove images with no detections
detection_dataset.dropna()
print("Removed images with no detections")

Complete Example

A complete workflow using the DetectionDataset class:

python
from cvpal.generate import DetectionDataset
# 1. Initialize the dataset
detection_dataset = DetectionDataset(model="stable-diffusion")
# 2. Generate initial dataset
detection_dataset.generate(
prompt="a cat sitting on a chair",
num_images=10,
labels=["cat", "chair"],
output_type="yolo",
overwrite=False
)
# 3. Add more labels
detection_dataset.add_labels(["person", "dog"])
# 4. Generate more diverse images
detection_dataset.generate(
prompt="a person walking a dog in the park",
num_images=5,
labels=["person", "dog"],
output_type="yolo",
overwrite=False
)
# 5. Check for quality issues
empty_images = detection_dataset.isnull()
if len(empty_images) > 0:
print(f"Found {len(empty_images)} empty images")
detection_dataset.dropna()
# 6. Visualize results
detection_dataset.show_samples(num_samples=5)
print("Dataset generation complete!")

Best Practices

βœ… Recommended Workflow

  • β€’ Start with small batches (5-10 images)
  • β€’ Use descriptive, specific prompts
  • β€’ Check quality with show_samples()
  • β€’ Remove empty images with dropna()
  • β€’ Use consistent label names
  • β€’ Save progress frequently

⚠️ Common Issues

  • β€’ Vague prompts lead to poor detection
  • β€’ Too many objects in single image
  • β€’ Inconsistent label naming
  • β€’ Not checking for empty images
  • β€’ Overwriting existing datasets