Get Started

Supported Models

Learn about the AI models supported by cvPal for synthetic image generation and object detection.

Overview

cvPal integrates multiple state-of-the-art AI models for synthetic dataset generation. The system combines image generation models with object detection capabilities to create fully annotated datasets automatically.

3 Models

DALL-E, Stable Diffusion, OWL-ViT

Auto Detection

Automatic object detection and labeling

Multiple Formats

YOLO and COCO output formats

Image Generation Models

cvPal supports multiple image generation models, each with unique strengths and capabilities:

Stable Diffusion

CompVis/stable-diffusion-v1-4

Features

  • β€’ High-quality image generation
  • β€’ Local processing (no API required)
  • β€’ Customizable inference steps (50 default)
  • β€’ GPU acceleration support
  • β€’ DDIM scheduler for better quality

Specifications

  • β€’ Model: CompVis/stable-diffusion-v1-4
  • β€’ Inference Steps: 50
  • β€’ Scheduler: DDIMScheduler
  • β€’ Precision: torch.float16
  • β€’ Safety Checker: Disabled

Usage Example

python
from cvpal.generate import DetectionDataset
# Initialize with Stable Diffusion
dataset = DetectionDataset(model="stable-diffusion")
# Generate images with object detection
dataset.generate(
prompt="a cat sitting on a chair",
num_images=5,
labels=["cat", "chair"],
output_type="yolo"
)

DALL-E 3

OpenAI's latest image generation model

Features

  • β€’ State-of-the-art image quality
  • β€’ Excellent prompt understanding
  • β€’ High-resolution output
  • β€’ Cloud-based processing
  • β€’ Requires OpenAI API key

Requirements

  • β€’ API Key: OpenAI API key required
  • β€’ Model: dall-e-3
  • β€’ Internet: Required for API calls
  • β€’ Cost: Pay-per-use pricing
  • β€’ Rate Limits: OpenAI rate limits apply

Usage Example

python
from cvpal.generate import DetectionDataset
# Initialize with DALL-E (requires API key)
dataset = DetectionDataset(
model="dalle",
openai_api_key="your-openai-api-key"
)
# Generate images with object detection
dataset.generate(
prompt="a dog playing in the park",
num_images=3,
labels=["dog", "park"],
output_type="yolo"
)

Object Detection Model

All generated images are automatically processed through OWL-ViT for object detection and labeling:

OWL-ViT

Open-World Localization Vision Transformer

Capabilities

  • β€’ Zero-shot object detection
  • β€’ Text-conditioned detection
  • β€’ High accuracy on diverse objects
  • β€’ GPU acceleration support
  • β€’ Configurable detection threshold

Technical Details

  • β€’ Model: google/owlvit-base-patch32
  • β€’ Threshold: 0.1 (configurable)
  • β€’ Architecture: Vision Transformer
  • β€’ Input: Images + text prompts
  • β€’ Output: Bounding boxes + labels

Detection Process

python
# Automatic detection process
inputs = processor(text=[labels], images=image, return_tensors="pt")
outputs = detector(**inputs)
target_sizes = torch.Tensor([image.size[::-1]])
results = processor.post_process_object_detection(
outputs=outputs,
target_sizes=target_sizes,
threshold=0.1
)
# Results contain:
# - Bounding boxes (normalized coordinates)
# - Confidence scores
# - Detected labels

Output Formats

cvPal generates datasets in multiple formats for compatibility with different training frameworks:

YOLO Format

Each image gets a corresponding .txt file with normalized bounding box coordinates.

text
# image_001.txt
0 0.5 0.3 0.2 0.4 # class_id x_center y_center width height
1 0.7 0.6 0.15 0.3 # another object

Best for: YOLOv5, YOLOv8, custom detection models

COCO Format

Single JSON file containing all annotations with detailed metadata.

json
{
"images": [{"id": 1, "file_name": "image_001.jpg", "width": 512, "height": 512}],
"annotations": [{"id": 1, "image_id": 1, "category_id": 1, "bbox": [100, 50, 200, 150]}],
"categories": [{"id": 1, "name": "cat"}]
}

Best for: Detectron2, MMDetection, COCO evaluation

Model Comparison

FeatureStable DiffusionDALL-E 3OWL-ViT
ProcessingLocalCloud (API)Local
Internet RequiredNoYesNo
API Key RequiredNoYesNo
GPU AccelerationYesN/AYes
CustomizationHighMediumMedium
CostFreePay-per-useFree

Performance Tips

πŸš€ Speed Optimization

  • β€’ Use GPU acceleration when available
  • β€’ Reduce inference steps for Stable Diffusion
  • β€’ Use parallel processing for multiple images
  • β€’ Lower detection threshold for faster processing
  • β€’ Consider batch processing for large datasets

πŸ’‘ Quality Tips

  • β€’ Use descriptive prompts for better results
  • β€’ Specify object types in labels clearly
  • β€’ Adjust detection threshold (0.1-0.3)
  • β€’ Use higher resolution for detailed objects
  • β€’ Validate generated annotations manually