Features
π¨
Synthetic Image Generation
Generate high-quality synthetic images from text prompts for data augmentation and training.
- β’ Text-to-image generation with customizable prompts
- β’ Support for various styles (photorealistic, artistic, etc.)
- β’ Batch generation for efficient processing
- β’ Integration with YOLO dataset formats
generate.synthetic_images("cat sitting", 10)π
Dataset Merging
Combine multiple datasets into a single, unified dataset with automatic conflict resolution.
- β’ Merge multiple image datasets seamlessly
- β’ Automatic label conflict resolution
- β’ Duplicate detection and removal
- β’ Preserve original dataset structure
merge_datasets(["dataset1", "dataset2"])π·οΈ
Label Management
Comprehensive tools for managing and manipulating dataset labels and annotations.
- β’ Replace or remove specific labels
- β’ Remap class names and IDs
- β’ Batch label operations
- β’ Label validation and correction
replace_labels({"old": "new"})π
Dataset Reporting
Generate comprehensive reports and statistics about your datasets.
- β’ Count label occurrences and distributions
- β’ Generate detailed dataset reports
- β’ Export statistics to various formats
- β’ Visualize dataset composition
count_labels("dataset_path")Additional Features
YOLO Integration
Full compatibility with YOLO-style dataset formats including images/labels structure and data.yaml files.
Batch Processing
Efficient batch processing capabilities for handling large datasets with minimal memory usage.
Error Handling
Robust error handling and validation to ensure data integrity throughout all operations.
Usage Examples
Complete Workflow Example
from cvpal import generate, preprocessing
# 1. Generate synthetic images for data augmentation
synthetic_images = generate.synthetic_images(
"person walking on street",
50,
style="photorealistic"
)
# 2. Merge existing datasets
merged_dataset = preprocessing.merge_datasets([
"path/to/street_dataset",
"path/to/pedestrian_dataset"
])
# 3. Replace inconsistent labels
preprocessing.replace_labels(
merged_dataset,
{"person": "pedestrian", "car": "vehicle"}
)
# 4. Generate dataset report
report = preprocessing.generate_report(merged_dataset)
print(f"Dataset contains {report['total_images']} images")
print(f"Label distribution: {report['label_counts']}")Table of Contents
0 sections
Reading Progress
Documentation Stats
12
Total Pages
4
Sections