Python Package Documentation

The cvpal Python package is an open-source toolkit for computer vision engineers to manage datasets, generate synthetic images, and streamline data preparation for training and evaluation.

Open Source

This is the open-source Python package documentation. For the platform API (dataset search, global datasets), see the Platform API Reference.

Installation

Install cvpal using pip:

bash
pip install cvpal

Quick Start

Get started with cvpal in just a few lines of code:

python
from cvpal import generate, preprocessing
# Generate synthetic images
images = generate.synthetic_images("a cat sitting on a chair", 10)
# Merge datasets
merged_dataset = preprocessing.merge_datasets([
"path/to/dataset1/images",
"path/to/dataset2/images"
])
# Generate dataset report
report = preprocessing.generate_report("path/to/dataset")
print(f"Dataset contains {report['total_images']} images")

Modules

cvpal.generate module

Generate synthetic images from text prompts for data augmentation. Create variations, batch process multiple prompts, and enhance your datasets with AI-generated content.

cvpal.preprocessing module

Dataset operations, label management, and data analysis utilities. Merge datasets, validate structure, generate reports, and prepare data for training.

Examples

Complete Workflow

python
from cvpal import generate, preprocessing
# 1. Generate synthetic images for data augmentation
synthetic_images = generate.synthetic_images(
"person walking on street",
50,
style="photorealistic"
)
# 2. Merge existing datasets
merged_dataset = preprocessing.merge_datasets([
"path/to/street_dataset",
"path/to/pedestrian_dataset"
])
# 3. Standardize labels
preprocessing.replace_labels(
merged_dataset,
{"person": "pedestrian", "car": "vehicle"}
)
# 4. Generate dataset report
report = preprocessing.generate_report(merged_dataset)
print(f"Final dataset contains {report['total_images']} images")
print(f"Label distribution: {report['label_counts']}")