mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-03-30 21:53:28 +08:00
332 lines
10 KiB
Markdown
332 lines
10 KiB
Markdown
# Generative Media Guide
|
|
|
|
VideoDB provides AI-powered generation of images, videos, music, sound effects, voice, and text content. All generation methods are on the **Collection** object.
|
|
|
|
## Prerequisites
|
|
|
|
You need a connection and a collection reference before calling any generation method:
|
|
|
|
```python
|
|
import videodb
|
|
|
|
conn = videodb.connect()
|
|
coll = conn.get_collection()
|
|
```
|
|
|
|
## Image Generation
|
|
|
|
Generate images from text prompts:
|
|
|
|
```python
|
|
image = coll.generate_image(
|
|
prompt="a futuristic cityscape at sunset with flying cars",
|
|
aspect_ratio="16:9",
|
|
)
|
|
|
|
# Access the generated image
|
|
print(image.id)
|
|
print(image.generate_url()) # returns a signed download URL
|
|
```
|
|
|
|
### generate_image Parameters
|
|
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `prompt` | `str` | required | Text description of the image to generate |
|
|
| `aspect_ratio` | `str` | `"1:1"` | Aspect ratio: `"1:1"`, `"9:16"`, `"16:9"`, `"4:3"`, or `"3:4"` |
|
|
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
|
|
|
|
Returns an `Image` object with `.id`, `.name`, and `.collection_id`. The `.url` property may be `None` for generated images — always use `image.generate_url()` to get a reliable signed download URL.
|
|
|
|
> **Note:** Unlike `Video` objects (which use `.generate_stream()`), `Image` objects use `.generate_url()` to retrieve the image URL. The `.url` property is only populated for some image types (e.g. thumbnails).
|
|
|
|
## Video Generation
|
|
|
|
Generate short video clips from text prompts:
|
|
|
|
```python
|
|
video = coll.generate_video(
|
|
prompt="a timelapse of a flower blooming in a garden",
|
|
duration=5,
|
|
)
|
|
|
|
stream_url = video.generate_stream()
|
|
video.play()
|
|
```
|
|
|
|
### generate_video Parameters
|
|
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `prompt` | `str` | required | Text description of the video to generate |
|
|
| `duration` | `int` | `5` | Duration in seconds (must be integer value, 5-8) |
|
|
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
|
|
|
|
Returns a `Video` object. Generated videos are automatically added to the collection and can be used in timelines, searches, and compilations like any uploaded video.
|
|
|
|
## Audio Generation
|
|
|
|
VideoDB provides three separate methods for different audio types.
|
|
|
|
### Music
|
|
|
|
Generate background music from text descriptions:
|
|
|
|
```python
|
|
music = coll.generate_music(
|
|
prompt="upbeat electronic music with a driving beat, suitable for a tech demo",
|
|
duration=30,
|
|
)
|
|
|
|
print(music.id)
|
|
```
|
|
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `prompt` | `str` | required | Text description of the music |
|
|
| `duration` | `int` | `5` | Duration in seconds |
|
|
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
|
|
|
|
### Sound Effects
|
|
|
|
Generate specific sound effects:
|
|
|
|
```python
|
|
sfx = coll.generate_sound_effect(
|
|
prompt="thunderstorm with heavy rain and distant thunder",
|
|
duration=10,
|
|
)
|
|
```
|
|
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `prompt` | `str` | required | Text description of the sound effect |
|
|
| `duration` | `int` | `2` | Duration in seconds |
|
|
| `config` | `dict` | `{}` | Additional configuration |
|
|
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
|
|
|
|
### Voice (Text-to-Speech)
|
|
|
|
Generate speech from text:
|
|
|
|
```python
|
|
voice = coll.generate_voice(
|
|
text="Welcome to our product demo. Today we'll walk through the key features.",
|
|
voice_name="Default",
|
|
)
|
|
```
|
|
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `text` | `str` | required | Text to convert to speech |
|
|
| `voice_name` | `str` | `"Default"` | Voice to use |
|
|
| `config` | `dict` | `{}` | Additional configuration |
|
|
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
|
|
|
|
All three audio methods return an `Audio` object with `.id`, `.name`, `.length`, and `.collection_id`.
|
|
|
|
## Text Generation (LLM Integration)
|
|
|
|
Use `coll.generate_text()` to run LLM analysis. This is a **Collection-level** method -- pass any context (transcripts, descriptions) directly in the prompt string.
|
|
|
|
```python
|
|
# Get transcript from a video first
|
|
transcript_text = video.get_transcript_text()
|
|
|
|
# Generate analysis using collection LLM
|
|
result = coll.generate_text(
|
|
prompt=f"Summarize the key points discussed in this video:\n{transcript_text}",
|
|
model_name="pro",
|
|
)
|
|
|
|
print(result["output"])
|
|
```
|
|
|
|
### generate_text Parameters
|
|
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `prompt` | `str` | required | Prompt with context for the LLM |
|
|
| `model_name` | `str` | `"basic"` | Model tier: `"basic"`, `"pro"`, or `"ultra"` |
|
|
| `response_type` | `str` | `"text"` | Response format: `"text"` or `"json"` |
|
|
|
|
Returns a `dict` with an `output` key. When `response_type="text"`, `output` is a `str`. When `response_type="json"`, `output` is a `dict`.
|
|
|
|
```python
|
|
result = coll.generate_text(prompt="Summarize this", model_name="pro")
|
|
print(result["output"]) # access the actual text/dict
|
|
```
|
|
|
|
### Analyze Scenes with LLM
|
|
|
|
Combine scene extraction with text generation:
|
|
|
|
```python
|
|
from videodb import SceneExtractionType
|
|
|
|
# First index scenes
|
|
scenes = video.index_scenes(
|
|
extraction_type=SceneExtractionType.time_based,
|
|
extraction_config={"time": 10},
|
|
prompt="Describe the visual content in this scene.",
|
|
)
|
|
|
|
# Get transcript for spoken context
|
|
transcript_text = video.get_transcript_text()
|
|
scene_descriptions = []
|
|
for scene in scenes:
|
|
if isinstance(scene, dict):
|
|
description = scene.get("description") or scene.get("summary")
|
|
else:
|
|
description = getattr(scene, "description", None) or getattr(scene, "summary", None)
|
|
scene_descriptions.append(description or str(scene))
|
|
|
|
scenes_text = "\n".join(scene_descriptions)
|
|
|
|
# Analyze with collection LLM
|
|
result = coll.generate_text(
|
|
prompt=(
|
|
f"Given this video transcript:\n{transcript_text}\n\n"
|
|
f"And these visual scene descriptions:\n{scenes_text}\n\n"
|
|
"Based on the spoken and visual content, describe the main topics covered."
|
|
),
|
|
model_name="pro",
|
|
)
|
|
print(result["output"])
|
|
```
|
|
|
|
## Dubbing and Translation
|
|
|
|
### Dub a Video
|
|
|
|
Dub a video into another language using the collection method:
|
|
|
|
```python
|
|
dubbed_video = coll.dub_video(
|
|
video_id=video.id,
|
|
language_code="es", # Spanish
|
|
)
|
|
|
|
dubbed_video.play()
|
|
```
|
|
|
|
### dub_video Parameters
|
|
|
|
| Parameter | Type | Default | Description |
|
|
|-----------|------|---------|-------------|
|
|
| `video_id` | `str` | required | ID of the video to dub |
|
|
| `language_code` | `str` | required | Target language code (e.g., `"es"`, `"fr"`, `"de"`) |
|
|
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
|
|
|
|
Returns a `Video` object with the dubbed content.
|
|
|
|
### Translate Transcript
|
|
|
|
Translate a video's transcript without dubbing:
|
|
|
|
```python
|
|
translated = video.translate_transcript(
|
|
language="Spanish",
|
|
additional_notes="Use formal tone",
|
|
)
|
|
|
|
for entry in translated:
|
|
print(entry)
|
|
```
|
|
|
|
**Supported languages** include: `en`, `es`, `fr`, `de`, `it`, `pt`, `ja`, `ko`, `zh`, `hi`, `ar`, and more.
|
|
|
|
## Complete Workflow Examples
|
|
|
|
### Generate Narration for a Video
|
|
|
|
```python
|
|
import videodb
|
|
|
|
conn = videodb.connect()
|
|
coll = conn.get_collection()
|
|
video = coll.get_video("your-video-id")
|
|
|
|
# Get transcript
|
|
transcript_text = video.get_transcript_text()
|
|
|
|
# Generate narration script using collection LLM
|
|
result = coll.generate_text(
|
|
prompt=(
|
|
f"Write a professional narration script for this video content:\n"
|
|
f"{transcript_text[:2000]}"
|
|
),
|
|
model_name="pro",
|
|
)
|
|
script = result["output"]
|
|
|
|
# Convert script to speech
|
|
narration = coll.generate_voice(text=script)
|
|
print(f"Narration audio: {narration.id}")
|
|
```
|
|
|
|
### Generate Thumbnail from Prompt
|
|
|
|
```python
|
|
thumbnail = coll.generate_image(
|
|
prompt="professional video thumbnail showing data analytics dashboard, modern design",
|
|
aspect_ratio="16:9",
|
|
)
|
|
print(f"Thumbnail URL: {thumbnail.generate_url()}")
|
|
```
|
|
|
|
### Add Generated Music to Video
|
|
|
|
```python
|
|
import videodb
|
|
from videodb.timeline import Timeline
|
|
from videodb.asset import VideoAsset, AudioAsset
|
|
|
|
conn = videodb.connect()
|
|
coll = conn.get_collection()
|
|
video = coll.get_video("your-video-id")
|
|
|
|
# Generate background music
|
|
music = coll.generate_music(
|
|
prompt="calm ambient background music for a tutorial video",
|
|
duration=60,
|
|
)
|
|
|
|
# Build timeline with video + music overlay
|
|
timeline = Timeline(conn)
|
|
timeline.add_inline(VideoAsset(asset_id=video.id))
|
|
timeline.add_overlay(0, AudioAsset(asset_id=music.id, disable_other_tracks=False))
|
|
|
|
stream_url = timeline.generate_stream()
|
|
print(f"Video with music: {stream_url}")
|
|
```
|
|
|
|
### Structured JSON Output
|
|
|
|
```python
|
|
transcript_text = video.get_transcript_text()
|
|
|
|
result = coll.generate_text(
|
|
prompt=(
|
|
f"Given this transcript:\n{transcript_text}\n\n"
|
|
"Return a JSON object with keys: summary, topics (array), action_items (array)."
|
|
),
|
|
model_name="pro",
|
|
response_type="json",
|
|
)
|
|
|
|
# result["output"] is a dict when response_type="json"
|
|
print(result["output"]["summary"])
|
|
print(result["output"]["topics"])
|
|
```
|
|
|
|
## Tips
|
|
|
|
- **Generated media is persistent**: All generated content is stored in your collection and can be reused.
|
|
- **Three audio methods**: Use `generate_music()` for background music, `generate_sound_effect()` for SFX, and `generate_voice()` for text-to-speech. There is no unified `generate_audio()` method.
|
|
- **Text generation is collection-level**: `coll.generate_text()` does not have access to video content automatically. Fetch the transcript with `video.get_transcript_text()` and pass it in the prompt.
|
|
- **Model tiers**: `"basic"` is fastest, `"pro"` is balanced, `"ultra"` is highest quality. Use `"pro"` for most analysis tasks.
|
|
- **Combine generation types**: Generate images for overlays, music for backgrounds, and voice for narration, then compose using timelines (see [editor.md](editor.md)).
|
|
- **Prompt quality matters**: Descriptive, specific prompts produce better results across all generation types.
|
|
- **Aspect ratios for images**: Choose from `"1:1"`, `"9:16"`, `"16:9"`, `"4:3"`, or `"3:4"`.
|