Get Started
Supported Models
Learn about the AI models supported by cvPal for synthetic image generation and object detection.
Overview
cvPal integrates multiple state-of-the-art AI models for synthetic dataset generation. The system combines image generation models with object detection capabilities to create fully annotated datasets automatically.
3 Models
DALL-E, Stable Diffusion, OWL-ViT
Auto Detection
Automatic object detection and labeling
Multiple Formats
YOLO and COCO output formats
Image Generation Models
cvPal supports multiple image generation models, each with unique strengths and capabilities:
Stable Diffusion
CompVis/stable-diffusion-v1-4
Features
- β’ High-quality image generation
- β’ Local processing (no API required)
- β’ Customizable inference steps (50 default)
- β’ GPU acceleration support
- β’ DDIM scheduler for better quality
Specifications
- β’ Model: CompVis/stable-diffusion-v1-4
- β’ Inference Steps: 50
- β’ Scheduler: DDIMScheduler
- β’ Precision: torch.float16
- β’ Safety Checker: Disabled
Usage Example
from cvpal.generate import DetectionDataset# Initialize with Stable Diffusiondataset = DetectionDataset(model="stable-diffusion")# Generate images with object detectiondataset.generate(prompt="a cat sitting on a chair",num_images=5,labels=["cat", "chair"],output_type="yolo")
DALL-E 3
OpenAI's latest image generation model
Features
- β’ State-of-the-art image quality
- β’ Excellent prompt understanding
- β’ High-resolution output
- β’ Cloud-based processing
- β’ Requires OpenAI API key
Requirements
- β’ API Key: OpenAI API key required
- β’ Model: dall-e-3
- β’ Internet: Required for API calls
- β’ Cost: Pay-per-use pricing
- β’ Rate Limits: OpenAI rate limits apply
Usage Example
from cvpal.generate import DetectionDataset# Initialize with DALL-E (requires API key)dataset = DetectionDataset(model="dalle",openai_api_key="your-openai-api-key")# Generate images with object detectiondataset.generate(prompt="a dog playing in the park",num_images=3,labels=["dog", "park"],output_type="yolo")
Object Detection Model
All generated images are automatically processed through OWL-ViT for object detection and labeling:
OWL-ViT
Open-World Localization Vision Transformer
Capabilities
- β’ Zero-shot object detection
- β’ Text-conditioned detection
- β’ High accuracy on diverse objects
- β’ GPU acceleration support
- β’ Configurable detection threshold
Technical Details
- β’ Model: google/owlvit-base-patch32
- β’ Threshold: 0.1 (configurable)
- β’ Architecture: Vision Transformer
- β’ Input: Images + text prompts
- β’ Output: Bounding boxes + labels
Detection Process
# Automatic detection processinputs = processor(text=[labels], images=image, return_tensors="pt")outputs = detector(**inputs)target_sizes = torch.Tensor([image.size[::-1]])results = processor.post_process_object_detection(outputs=outputs,target_sizes=target_sizes,threshold=0.1)# Results contain:# - Bounding boxes (normalized coordinates)# - Confidence scores# - Detected labels
Output Formats
cvPal generates datasets in multiple formats for compatibility with different training frameworks:
YOLO Format
Each image gets a corresponding .txt file with normalized bounding box coordinates.
# image_001.txt0 0.5 0.3 0.2 0.4 # class_id x_center y_center width height1 0.7 0.6 0.15 0.3 # another object
Best for: YOLOv5, YOLOv8, custom detection models
COCO Format
Single JSON file containing all annotations with detailed metadata.
{"images": [{"id": 1, "file_name": "image_001.jpg", "width": 512, "height": 512}],"annotations": [{"id": 1, "image_id": 1, "category_id": 1, "bbox": [100, 50, 200, 150]}],"categories": [{"id": 1, "name": "cat"}]}
Best for: Detectron2, MMDetection, COCO evaluation
Model Comparison
| Feature | Stable Diffusion | DALL-E 3 | OWL-ViT |
|---|---|---|---|
| Processing | Local | Cloud (API) | Local |
| Internet Required | No | Yes | No |
| API Key Required | No | Yes | No |
| GPU Acceleration | Yes | N/A | Yes |
| Customization | High | Medium | Medium |
| Cost | Free | Pay-per-use | Free |
Performance Tips
π Speed Optimization
- β’ Use GPU acceleration when available
- β’ Reduce inference steps for Stable Diffusion
- β’ Use parallel processing for multiple images
- β’ Lower detection threshold for faster processing
- β’ Consider batch processing for large datasets
π‘ Quality Tips
- β’ Use descriptive prompts for better results
- β’ Specify object types in labels clearly
- β’ Adjust detection threshold (0.1-0.3)
- β’ Use higher resolution for detailed objects
- β’ Validate generated annotations manually