Files
2026-03-10 21:22:35 -07:00

10 KiB

Generative Media Guide

VideoDB provides AI-powered generation of images, videos, music, sound effects, voice, and text content. All generation methods are on the Collection object.

Prerequisites

You need a connection and a collection reference before calling any generation method:

import videodb

conn = videodb.connect()
coll = conn.get_collection()

Image Generation

Generate images from text prompts:

image = coll.generate_image(
    prompt="a futuristic cityscape at sunset with flying cars",
    aspect_ratio="16:9",
)

# Access the generated image
print(image.id)
print(image.generate_url())  # returns a signed download URL

generate_image Parameters

Parameter Type Default Description
prompt str required Text description of the image to generate
aspect_ratio str "1:1" Aspect ratio: "1:1", "9:16", "16:9", "4:3", or "3:4"
callback_url str|None None URL to receive async callback

Returns an Image object with .id, .name, and .collection_id. The .url property may be None for generated images — always use image.generate_url() to get a reliable signed download URL.

Note: Unlike Video objects (which use .generate_stream()), Image objects use .generate_url() to retrieve the image URL. The .url property is only populated for some image types (e.g. thumbnails).

Video Generation

Generate short video clips from text prompts:

video = coll.generate_video(
    prompt="a timelapse of a flower blooming in a garden",
    duration=5,
)

stream_url = video.generate_stream()
video.play()

generate_video Parameters

Parameter Type Default Description
prompt str required Text description of the video to generate
duration int 5 Duration in seconds (must be integer value, 5-8)
callback_url str|None None URL to receive async callback

Returns a Video object. Generated videos are automatically added to the collection and can be used in timelines, searches, and compilations like any uploaded video.

Audio Generation

VideoDB provides three separate methods for different audio types.

Music

Generate background music from text descriptions:

music = coll.generate_music(
    prompt="upbeat electronic music with a driving beat, suitable for a tech demo",
    duration=30,
)

print(music.id)
Parameter Type Default Description
prompt str required Text description of the music
duration int 5 Duration in seconds
callback_url str|None None URL to receive async callback

Sound Effects

Generate specific sound effects:

sfx = coll.generate_sound_effect(
    prompt="thunderstorm with heavy rain and distant thunder",
    duration=10,
)
Parameter Type Default Description
prompt str required Text description of the sound effect
duration int 2 Duration in seconds
config dict {} Additional configuration
callback_url str|None None URL to receive async callback

Voice (Text-to-Speech)

Generate speech from text:

voice = coll.generate_voice(
    text="Welcome to our product demo. Today we'll walk through the key features.",
    voice_name="Default",
)
Parameter Type Default Description
text str required Text to convert to speech
voice_name str "Default" Voice to use
config dict {} Additional configuration
callback_url str|None None URL to receive async callback

All three audio methods return an Audio object with .id, .name, .length, and .collection_id.

Text Generation (LLM Integration)

Use coll.generate_text() to run LLM analysis. This is a Collection-level method -- pass any context (transcripts, descriptions) directly in the prompt string.

# Get transcript from a video first
transcript_text = video.get_transcript_text()

# Generate analysis using collection LLM
result = coll.generate_text(
    prompt=f"Summarize the key points discussed in this video:\n{transcript_text}",
    model_name="pro",
)

print(result["output"])

generate_text Parameters

Parameter Type Default Description
prompt str required Prompt with context for the LLM
model_name str "basic" Model tier: "basic", "pro", or "ultra"
response_type str "text" Response format: "text" or "json"

Returns a dict with an output key. When response_type="text", output is a str. When response_type="json", output is a dict.

result = coll.generate_text(prompt="Summarize this", model_name="pro")
print(result["output"])  # access the actual text/dict

Analyze Scenes with LLM

Combine scene extraction with text generation:

from videodb import SceneExtractionType

# First index scenes
scenes = video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 10},
    prompt="Describe the visual content in this scene.",
)

# Get transcript for spoken context
transcript_text = video.get_transcript_text()
scene_descriptions = []
for scene in scenes:
    if isinstance(scene, dict):
        description = scene.get("description") or scene.get("summary")
    else:
        description = getattr(scene, "description", None) or getattr(scene, "summary", None)
    scene_descriptions.append(description or str(scene))

scenes_text = "\n".join(scene_descriptions)

# Analyze with collection LLM
result = coll.generate_text(
    prompt=(
        f"Given this video transcript:\n{transcript_text}\n\n"
        f"And these visual scene descriptions:\n{scenes_text}\n\n"
        "Based on the spoken and visual content, describe the main topics covered."
    ),
    model_name="pro",
)
print(result["output"])

Dubbing and Translation

Dub a Video

Dub a video into another language using the collection method:

dubbed_video = coll.dub_video(
    video_id=video.id,
    language_code="es",  # Spanish
)

dubbed_video.play()

dub_video Parameters

Parameter Type Default Description
video_id str required ID of the video to dub
language_code str required Target language code (e.g., "es", "fr", "de")
callback_url str|None None URL to receive async callback

Returns a Video object with the dubbed content.

Translate Transcript

Translate a video's transcript without dubbing:

translated = video.translate_transcript(
    language="Spanish",
    additional_notes="Use formal tone",
)

for entry in translated:
    print(entry)

Supported languages include: en, es, fr, de, it, pt, ja, ko, zh, hi, ar, and more.

Complete Workflow Examples

Generate Narration for a Video

import videodb

conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")

# Get transcript
transcript_text = video.get_transcript_text()

# Generate narration script using collection LLM
result = coll.generate_text(
    prompt=(
        f"Write a professional narration script for this video content:\n"
        f"{transcript_text[:2000]}"
    ),
    model_name="pro",
)
script = result["output"]

# Convert script to speech
narration = coll.generate_voice(text=script)
print(f"Narration audio: {narration.id}")

Generate Thumbnail from Prompt

thumbnail = coll.generate_image(
    prompt="professional video thumbnail showing data analytics dashboard, modern design",
    aspect_ratio="16:9",
)
print(f"Thumbnail URL: {thumbnail.generate_url()}")

Add Generated Music to Video

import videodb
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, AudioAsset

conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")

# Generate background music
music = coll.generate_music(
    prompt="calm ambient background music for a tutorial video",
    duration=60,
)

# Build timeline with video + music overlay
timeline = Timeline(conn)
timeline.add_inline(VideoAsset(asset_id=video.id))
timeline.add_overlay(0, AudioAsset(asset_id=music.id, disable_other_tracks=False))

stream_url = timeline.generate_stream()
print(f"Video with music: {stream_url}")

Structured JSON Output

transcript_text = video.get_transcript_text()

result = coll.generate_text(
    prompt=(
        f"Given this transcript:\n{transcript_text}\n\n"
        "Return a JSON object with keys: summary, topics (array), action_items (array)."
    ),
    model_name="pro",
    response_type="json",
)

# result["output"] is a dict when response_type="json"
print(result["output"]["summary"])
print(result["output"]["topics"])

Tips

  • Generated media is persistent: All generated content is stored in your collection and can be reused.
  • Three audio methods: Use generate_music() for background music, generate_sound_effect() for SFX, and generate_voice() for text-to-speech. There is no unified generate_audio() method.
  • Text generation is collection-level: coll.generate_text() does not have access to video content automatically. Fetch the transcript with video.get_transcript_text() and pass it in the prompt.
  • Model tiers: "basic" is fastest, "pro" is balanced, "ultra" is highest quality. Use "pro" for most analysis tasks.
  • Combine generation types: Generate images for overlays, music for backgrounds, and voice for narration, then compose using timelines (see editor.md).
  • Prompt quality matters: Descriptive, specific prompts produce better results across all generation types.
  • Aspect ratios for images: Choose from "1:1", "9:16", "16:9", "4:3", or "3:4".