Synthetic Data Generation

add_labels() Function

Add additional labels to your existing dataset for expanded object detection capabilities.

Overview

The add_labels() function allows you to expand your dataset's label vocabulary by adding new object classes. This is useful when you want to detect additional objects in your generated images without starting over.

Use Cases

Expand Detection

Add new object classes

Iterative Building

Build dataset incrementally

Flexible Workflow

Adapt to changing requirements

Function Signature

python
def add_labels(self, labels: List[str]) -> None:

Parameters

labels (List[str]) - Required

List of new label names to add to the dataset. These labels will be available for future generation calls.

python
# Add single label
detection_dataset.add_labels(["dog"])
# Add multiple labels
detection_dataset.add_labels(["person", "car", "bicycle"])
# Add labels with specific naming
detection_dataset.add_labels(["feline", "canine", "vehicle"])

Basic Examples

Adding Single Label

python
from cvpal.generate import DetectionDataset
# Initialize dataset
detection_dataset = DetectionDataset()
# Generate initial dataset with cats
detection_dataset.generate(
prompt="a cat sitting on a chair",
num_images=5,
labels=["cat", "chair"],
output_type="yolo"
)
# Add dog label for future generations
detection_dataset.add_labels(["dog"])
# Now generate images with dogs
detection_dataset.generate(
prompt="a dog running in a park",
num_images=3,
labels=["dog"],
output_type="yolo",
overwrite=False
)

Adding Multiple Labels

python
# Add multiple labels at once
detection_dataset.add_labels([
"person",
"car",
"bicycle",
"tree",
"building"
])
# Now you can generate images with any of these labels
detection_dataset.generate(
prompt="a person riding a bicycle on a street",
num_images=4,
labels=["person", "bicycle"],
output_type="yolo",
overwrite=False
)

Iterative Dataset Building

python
# Start with basic labels
detection_dataset.add_labels(["animal", "furniture"])
# Generate initial images
detection_dataset.generate(
prompt="animals and furniture in a room",
num_images=5,
labels=["animal", "furniture"],
output_type="yolo"
)
# Add more specific labels
detection_dataset.add_labels(["cat", "dog", "chair", "table"])
# Generate more specific images
detection_dataset.generate(
prompt="a cat sitting on a chair",
num_images=3,
labels=["cat", "chair"],
output_type="yolo",
overwrite=False
)

Advanced Usage

Dynamic Label Management

Add labels based on detected objects in existing images:

python
# Analyze existing dataset to find new labels
existing_labels = ["cat", "chair"]
# Add complementary labels
detection_dataset.add_labels([
"dog", # Similar to cat
"table", # Similar to chair
"person", # Often appears with pets
"room" # Context label
])
# Generate diverse images
prompts = [
"a cat and dog playing in a room",
"a person sitting on a chair at a table",
"furniture in a living room"
]
for prompt in prompts:
detection_dataset.generate(
prompt=prompt,
num_images=2,
labels=existing_labels + ["dog", "table", "person", "room"],
output_type="yolo",
overwrite=False
)

Label Hierarchy Management

Organize labels in hierarchical categories:

python
# Add general categories first
detection_dataset.add_labels([
"animal",
"vehicle",
"furniture",
"person"
])
# Add specific subcategories
detection_dataset.add_labels([
"cat", "dog", "bird", # animals
"car", "bicycle", "truck", # vehicles
"chair", "table", "sofa", # furniture
"man", "woman", "child" # people
])
# Generate with mixed specificity
detection_dataset.generate(
prompt="a man with a cat in a room with furniture",
num_images=3,
labels=["man", "cat", "furniture"],
output_type="yolo",
overwrite=False
)

Best Practices

✅ Effective Strategies

  • • Add labels before generating new images
  • • Use consistent naming conventions
  • • Group related labels together
  • • Plan your label hierarchy
  • • Document your label choices

⚠️ Common Mistakes

  • • Adding labels after generation
  • • Inconsistent label naming
  • • Too many similar labels
  • • Not planning label structure
  • • Forgetting to use new labels

Label Naming Conventions

Recommended Naming

python
# Use lowercase, single words
good_labels = [
"cat", "dog", "person", "car",
"chair", "table", "tree", "house"
]
# Use descriptive but concise names
good_labels = [
"pedestrian", "cyclist", "motorcycle",
"traffic_light", "stop_sign"
]
# Avoid these patterns
bad_labels = [
"Cat", "DOG", "Person-1", # Mixed case, numbers
"a_cat", "the_dog", # Articles
"cat_sitting", "dog_running" # Actions/descriptions
]

Label Categories

python
# Organize labels by category
animals = ["cat", "dog", "bird", "fish", "horse"]
vehicles = ["car", "truck", "bicycle", "motorcycle", "bus"]
furniture = ["chair", "table", "sofa", "bed", "desk"]
people = ["person", "man", "woman", "child", "baby"]
objects = ["book", "phone", "laptop", "cup", "bottle"]
# Add categories systematically
detection_dataset.add_labels(animals)
detection_dataset.add_labels(vehicles)
detection_dataset.add_labels(furniture)

Integration with Other Functions

Complete Workflow Example

python
from cvpal.generate import DetectionDataset
# Initialize dataset
detection_dataset = DetectionDataset()
# 1. Start with basic labels
detection_dataset.add_labels(["animal", "furniture"])
# 2. Generate initial dataset
detection_dataset.generate(
prompt="animals and furniture in a room",
num_images=5,
labels=["animal", "furniture"],
output_type="yolo"
)
# 3. Add more specific labels
detection_dataset.add_labels(["cat", "dog", "chair", "table"])
# 4. Generate specific images
detection_dataset.generate(
prompt="a cat sitting on a chair",
num_images=3,
labels=["cat", "chair"],
output_type="yolo",
overwrite=False
)
# 5. Check for quality issues
empty_images = detection_dataset.isnull()
if len(empty_images) > 0:
detection_dataset.dropna()
# 6. Visualize results
detection_dataset.show_samples(num_samples=3)
# 7. Add even more labels for expansion
detection_dataset.add_labels(["person", "room", "window"])
print("Dataset expansion complete!")

Troubleshooting

Labels Not Detected

Issue: Added labels are not being detected in generated images.

Solution: Ensure labels are added before generation, use specific prompts, and check label spelling.

Duplicate Labels

Issue: Accidentally adding the same label multiple times.

Solution: Keep track of added labels, use consistent naming, and check existing labels before adding.

Label Conflicts

Issue: Similar labels causing confusion in detection.

Solution: Use distinct, specific labels and avoid overlapping categories.