Get Started

Supported Models

Learn about the AI models supported by cvPal for synthetic image generation and object detection.

Overview

cvPal integrates multiple state-of-the-art AI models for synthetic dataset generation. The system combines image generation models with object detection capabilities to create fully annotated datasets automatically.

3 Models

DALL-E, Stable Diffusion, OWL-ViT

Auto Detection

Automatic object detection and labeling

Multiple Formats

YOLO and COCO output formats

Image Generation Models

cvPal supports multiple image generation models, each with unique strengths and capabilities:

Stable Diffusion

CompVis/stable-diffusion-v1-4

Features

• High-quality image generation
• Local processing (no API required)
• Customizable inference steps (50 default)
• GPU acceleration support
• DDIM scheduler for better quality

Specifications

• Model: CompVis/stable-diffusion-v1-4
• Inference Steps: 50
• Scheduler: DDIMScheduler
• Precision: torch.float16
• Safety Checker: Disabled

Usage Example

python

from cvpal.generate import DetectionDataset

# Initialize with Stable Diffusion
dataset = DetectionDataset(model="stable-diffusion")

# Generate images with object detection
dataset.generate(
    prompt="a cat sitting on a chair",
    num_images=5,
    labels=["cat", "chair"],
    output_type="yolo"
)

DALL-E 3

OpenAI's latest image generation model

Features

• State-of-the-art image quality
• Excellent prompt understanding
• High-resolution output
• Cloud-based processing
• Requires OpenAI API key

Requirements

• API Key: OpenAI API key required
• Model: dall-e-3
• Internet: Required for API calls
• Cost: Pay-per-use pricing
• Rate Limits: OpenAI rate limits apply

Usage Example

python

from cvpal.generate import DetectionDataset

# Initialize with DALL-E (requires API key)
dataset = DetectionDataset(
    model="dalle",
    openai_api_key="your-openai-api-key"
)

# Generate images with object detection
dataset.generate(
    prompt="a dog playing in the park",
    num_images=3,
    labels=["dog", "park"],
    output_type="yolo"
)

Object Detection Model

All generated images are automatically processed through OWL-ViT for object detection and labeling:

OWL-ViT

Open-World Localization Vision Transformer

Capabilities

• Zero-shot object detection
• Text-conditioned detection
• High accuracy on diverse objects
• GPU acceleration support
• Configurable detection threshold

Technical Details

• Model: google/owlvit-base-patch32
• Threshold: 0.1 (configurable)
• Architecture: Vision Transformer
• Input: Images + text prompts
• Output: Bounding boxes + labels

Detection Process

python

# Automatic detection process
inputs = processor(text=[labels], images=image, return_tensors="pt")
outputs = detector(**inputs)
target_sizes = torch.Tensor([image.size[::-1]])
results = processor.post_process_object_detection(
    outputs=outputs, 
    target_sizes=target_sizes,
    threshold=0.1
)

# Results contain:
# - Bounding boxes (normalized coordinates)
# - Confidence scores
# - Detected labels

Output Formats

cvPal generates datasets in multiple formats for compatibility with different training frameworks:

YOLO Format

Each image gets a corresponding .txt file with normalized bounding box coordinates.

text

# image_001.txt
0 0.5 0.3 0.2 0.4  # class_id x_center y_center width height
1 0.7 0.6 0.15 0.3  # another object

Best for: YOLOv5, YOLOv8, custom detection models

COCO Format

Single JSON file containing all annotations with detailed metadata.

json

{
  "images": [{"id": 1, "file_name": "image_001.jpg", "width": 512, "height": 512}],
  "annotations": [{"id": 1, "image_id": 1, "category_id": 1, "bbox": [100, 50, 200, 150]}],
  "categories": [{"id": 1, "name": "cat"}]
}

Best for: Detectron2, MMDetection, COCO evaluation

Model Comparison

Feature	Stable Diffusion	DALL-E 3	OWL-ViT
Processing	Local	Cloud (API)	Local
Internet Required	No	Yes	No
API Key Required	No	Yes	No
GPU Acceleration	Yes	N/A	Yes
Customization	High	Medium	Medium
Cost	Free	Pay-per-use	Free

Performance Tips

🚀 Speed Optimization

• Use GPU acceleration when available
• Reduce inference steps for Stable Diffusion
• Use parallel processing for multiple images
• Lower detection threshold for faster processing
• Consider batch processing for large datasets

💡 Quality Tips

• Use descriptive prompts for better results
• Specify object types in labels clearly
• Adjust detection threshold (0.1-0.3)
• Use higher resolution for detailed objects
• Validate generated annotations manually

Dataset Structure Synthetic Generation

Supported Models

Overview

3 Models

Auto Detection

Multiple Formats

Image Generation Models

Stable Diffusion

Features

Specifications

Usage Example

DALL-E 3

Features

Requirements

Usage Example

Object Detection Model

OWL-ViT

Capabilities

Technical Details

Detection Process

Output Formats

YOLO Format

COCO Format

Model Comparison

Performance Tips

🚀 Speed Optimization

💡 Quality Tips

Table of Contents