10 KiB
Generative Media Guide
VideoDB provides AI-powered generation of images, videos, music, sound effects, voice, and text content. All generation methods are on the Collection object.
Prerequisites
You need a connection and a collection reference before calling any generation method:
import videodb
conn = videodb.connect()
coll = conn.get_collection()
Image Generation
Generate images from text prompts:
image = coll.generate_image(
prompt="a futuristic cityscape at sunset with flying cars",
aspect_ratio="16:9",
)
# Access the generated image
print(image.id)
print(image.generate_url()) # returns a signed download URL
generate_image Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
str |
required | Text description of the image to generate |
aspect_ratio |
str |
"1:1" |
Aspect ratio: "1:1", "9:16", "16:9", "4:3", or "3:4" |
callback_url |
str|None |
None |
URL to receive async callback |
Returns an Image object with .id, .name, and .collection_id. The .url property may be None for generated images — always use image.generate_url() to get a reliable signed download URL.
Note: Unlike
Videoobjects (which use.generate_stream()),Imageobjects use.generate_url()to retrieve the image URL. The.urlproperty is only populated for some image types (e.g. thumbnails).
Video Generation
Generate short video clips from text prompts:
video = coll.generate_video(
prompt="a timelapse of a flower blooming in a garden",
duration=5,
)
stream_url = video.generate_stream()
video.play()
generate_video Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
str |
required | Text description of the video to generate |
duration |
int |
5 |
Duration in seconds (must be integer value, 5-8) |
callback_url |
str|None |
None |
URL to receive async callback |
Returns a Video object. Generated videos are automatically added to the collection and can be used in timelines, searches, and compilations like any uploaded video.
Audio Generation
VideoDB provides three separate methods for different audio types.
Music
Generate background music from text descriptions:
music = coll.generate_music(
prompt="upbeat electronic music with a driving beat, suitable for a tech demo",
duration=30,
)
print(music.id)
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
str |
required | Text description of the music |
duration |
int |
5 |
Duration in seconds |
callback_url |
str|None |
None |
URL to receive async callback |
Sound Effects
Generate specific sound effects:
sfx = coll.generate_sound_effect(
prompt="thunderstorm with heavy rain and distant thunder",
duration=10,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
str |
required | Text description of the sound effect |
duration |
int |
2 |
Duration in seconds |
config |
dict |
{} |
Additional configuration |
callback_url |
str|None |
None |
URL to receive async callback |
Voice (Text-to-Speech)
Generate speech from text:
voice = coll.generate_voice(
text="Welcome to our product demo. Today we'll walk through the key features.",
voice_name="Default",
)
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
required | Text to convert to speech |
voice_name |
str |
"Default" |
Voice to use |
config |
dict |
{} |
Additional configuration |
callback_url |
str|None |
None |
URL to receive async callback |
All three audio methods return an Audio object with .id, .name, .length, and .collection_id.
Text Generation (LLM Integration)
Use coll.generate_text() to run LLM analysis. This is a Collection-level method -- pass any context (transcripts, descriptions) directly in the prompt string.
# Get transcript from a video first
transcript_text = video.get_transcript_text()
# Generate analysis using collection LLM
result = coll.generate_text(
prompt=f"Summarize the key points discussed in this video:\n{transcript_text}",
model_name="pro",
)
print(result["output"])
generate_text Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
str |
required | Prompt with context for the LLM |
model_name |
str |
"basic" |
Model tier: "basic", "pro", or "ultra" |
response_type |
str |
"text" |
Response format: "text" or "json" |
Returns a dict with an output key. When response_type="text", output is a str. When response_type="json", output is a dict.
result = coll.generate_text(prompt="Summarize this", model_name="pro")
print(result["output"]) # access the actual text/dict
Analyze Scenes with LLM
Combine scene extraction with text generation:
from videodb import SceneExtractionType
# First index scenes
scenes = video.index_scenes(
extraction_type=SceneExtractionType.time_based,
extraction_config={"time": 10},
prompt="Describe the visual content in this scene.",
)
# Get transcript for spoken context
transcript_text = video.get_transcript_text()
scene_descriptions = []
for scene in scenes:
if isinstance(scene, dict):
description = scene.get("description") or scene.get("summary")
else:
description = getattr(scene, "description", None) or getattr(scene, "summary", None)
scene_descriptions.append(description or str(scene))
scenes_text = "\n".join(scene_descriptions)
# Analyze with collection LLM
result = coll.generate_text(
prompt=(
f"Given this video transcript:\n{transcript_text}\n\n"
f"And these visual scene descriptions:\n{scenes_text}\n\n"
"Based on the spoken and visual content, describe the main topics covered."
),
model_name="pro",
)
print(result["output"])
Dubbing and Translation
Dub a Video
Dub a video into another language using the collection method:
dubbed_video = coll.dub_video(
video_id=video.id,
language_code="es", # Spanish
)
dubbed_video.play()
dub_video Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
video_id |
str |
required | ID of the video to dub |
language_code |
str |
required | Target language code (e.g., "es", "fr", "de") |
callback_url |
str|None |
None |
URL to receive async callback |
Returns a Video object with the dubbed content.
Translate Transcript
Translate a video's transcript without dubbing:
translated = video.translate_transcript(
language="Spanish",
additional_notes="Use formal tone",
)
for entry in translated:
print(entry)
Supported languages include: en, es, fr, de, it, pt, ja, ko, zh, hi, ar, and more.
Complete Workflow Examples
Generate Narration for a Video
import videodb
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")
# Get transcript
transcript_text = video.get_transcript_text()
# Generate narration script using collection LLM
result = coll.generate_text(
prompt=(
f"Write a professional narration script for this video content:\n"
f"{transcript_text[:2000]}"
),
model_name="pro",
)
script = result["output"]
# Convert script to speech
narration = coll.generate_voice(text=script)
print(f"Narration audio: {narration.id}")
Generate Thumbnail from Prompt
thumbnail = coll.generate_image(
prompt="professional video thumbnail showing data analytics dashboard, modern design",
aspect_ratio="16:9",
)
print(f"Thumbnail URL: {thumbnail.generate_url()}")
Add Generated Music to Video
import videodb
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, AudioAsset
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")
# Generate background music
music = coll.generate_music(
prompt="calm ambient background music for a tutorial video",
duration=60,
)
# Build timeline with video + music overlay
timeline = Timeline(conn)
timeline.add_inline(VideoAsset(asset_id=video.id))
timeline.add_overlay(0, AudioAsset(asset_id=music.id, disable_other_tracks=False))
stream_url = timeline.generate_stream()
print(f"Video with music: {stream_url}")
Structured JSON Output
transcript_text = video.get_transcript_text()
result = coll.generate_text(
prompt=(
f"Given this transcript:\n{transcript_text}\n\n"
"Return a JSON object with keys: summary, topics (array), action_items (array)."
),
model_name="pro",
response_type="json",
)
# result["output"] is a dict when response_type="json"
print(result["output"]["summary"])
print(result["output"]["topics"])
Tips
- Generated media is persistent: All generated content is stored in your collection and can be reused.
- Three audio methods: Use
generate_music()for background music,generate_sound_effect()for SFX, andgenerate_voice()for text-to-speech. There is no unifiedgenerate_audio()method. - Text generation is collection-level:
coll.generate_text()does not have access to video content automatically. Fetch the transcript withvideo.get_transcript_text()and pass it in the prompt. - Model tiers:
"basic"is fastest,"pro"is balanced,"ultra"is highest quality. Use"pro"for most analysis tasks. - Combine generation types: Generate images for overlays, music for backgrounds, and voice for narration, then compose using timelines (see editor.md).
- Prompt quality matters: Descriptive, specific prompts produce better results across all generation types.
- Aspect ratios for images: Choose from
"1:1","9:16","16:9","4:3", or"3:4".