videodb skills update: add reference files for videodb skills

This commit is contained in:
Rohit Garg
2026-03-03 18:16:39 +05:30
parent c26ba60003
commit cff0308568
12 changed files with 3625 additions and 69 deletions

View File

@@ -1,109 +1,368 @@
---
name: videodb-skills
description: The only video skill your agent needs — upload any video, connect real-time streams, search inside by what was said or shown, build complex editing workflows with overlays, generate AI media, add subtitles, and get instant streaming links.
description: See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.
origin: ECC
allowed-tools: Read Grep Glob Bash(python:*)
argument-hint: "[task description]"
---
# VideoDB Skills
# VideoDB Skill
The only video skill your agent needs. Upload any video, connect real-time streams, search inside by what was said or shown, build complex editing workflows with overlays, generate AI media, add subtitles, and get instant streaming links — all via the VideoDB Python SDK.
**Perception + memory + actions for video, live streams, and desktop sessions.**
## When to Activate
Use this skill when you need to:
- Uploading or ingesting videos from YouTube URLs, web URLs, or local files
- Searching spoken words or visual scenes across video content
- Generating transcripts or auto-styling subtitles
- Editing clips — trim, combine, multi-timeline composition
- Adding overlays — text, images, audio, music
- Generating AI media — images, video, music, sound effects, voiceovers
- Transcoding — resolution, codec, bitrate, FPS changes
- Reframing video for social platforms (vertical, square, etc.)
- Real-time screen or audio capture with AI transcription
- Getting playable HLS streaming links for any output
## 1) Desktop Perception
- Start/stop a **desktop session** capturing **screen, mic, and system audio**
- Stream **live context** and store **episodic session memory**
- Run **real-time alerts/triggers** on what's spoken and what's happening on screen
- Produce **session summaries**, a searchable timeline, and **playable evidence links**
## 2) Video ingest + stream
- Ingest a **file or URL** and return a **playable web stream link**
- Transcode/normalize: **codec, bitrate, fps, resolution, aspect ratio**
## 3) Index + search (timestamps + evidence)
- Build **visual**, **spoken**, and **keyword** indexes
- Search and return exact moments with **timestamps** and **playable evidence**
- Auto-create **clips** from search results
## 4) Timeline editing + generation
- Subtitles: **generate**, **translate**, **burn-in**
- Overlays: **text/image/branding**, motion captions
- Audio: **background music**, **voiceover**, **dubbing**
- Programmatic composition and exports via **timeline operations**
## 5) Live streams (RTSP) + monitoring
- Connect **RTSP/live feeds**
- Run **real-time visual and spoken understanding** and emit **events/alerts** for monitoring workflows
---
## Common inputs
- Local **file path**, public **URL**, or **RTSP URL**
- Desktop capture request: **start / stop / summarize session**
- Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules
## Common outputs
- **Stream URL**
- Search results with **timestamps** and **evidence links**
- Generated assets: subtitles, audio, images, clips
- **Event/alert payloads** for live streams
- Desktop **session summaries** and memory entries
---
## Canonical prompts (examples)
- "Start desktop capture and alert when a password field appears."
- "Record my session and produce an actionable summary when it ends."
- "Ingest this file and return a playable stream link."
- "Index this folder and find every scene with people, return timestamps."
- "Generate subtitles, burn them in, and add light background music."
- "Connect this RTSP URL and alert when a person enters the zone."
## Running Python code
Before running any VideoDB code, change to the project directory and load environment variables:
```python
from dotenv import load_dotenv
load_dotenv(".env")
import videodb
conn = videodb.connect()
```
This reads `VIDEO_DB_API_KEY` from:
1. Environment (if already exported)
2. Project's `.env` file in current directory
If the key is missing, `videodb.connect()` raises `AuthenticationError` automatically.
Do NOT write a script file when a short inline command works.
When writing inline Python (`python -c "..."`), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead:
```bash
python << 'EOF'
from dotenv import load_dotenv
load_dotenv(".env")
import videodb
conn = videodb.connect()
coll = conn.get_collection()
print(f"Videos: {len(coll.get_videos())}")
EOF
```
## Setup
When the user asks to "setup videodb" or similar:
### 1. Install SDK
```bash
# Install the skill
npx skills add video-db/skills
# Or setup manually
pip install "videodb[capture]" python-dotenv
export VIDEO_DB_API_KEY=sk-xxx
```
Run `/videodb setup` inside your agent for guided setup ($20 free credits, no credit card).
If `videodb[capture]` fails on Linux, install without the capture extra:
## Core Patterns
### Upload and Process
```python
import videodb
conn = videodb.connect()
video = conn.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")
transcript = video.get_transcript()
for entry in transcript:
print(f"[{entry['start']:.1f}s] {entry['text']}")
```bash
pip install videodb python-dotenv
```
### Search Across Videos
### 2. Configure API key
The user must set `VIDEO_DB_API_KEY` using **either** method:
- **Export in terminal** (before starting Claude): `export VIDEO_DB_API_KEY=your-key`
- **Project `.env` file**: Save `VIDEO_DB_API_KEY=your-key` in the project's `.env` file
Get a free API key at https://console.videodb.io (50 free uploads, no credit card).
**Do NOT** read, write, or handle the API key yourself. Always let the user set it.
## Quick Reference
### Upload media
```python
# Index for semantic search
video.index_spoken_words()
# URL
video = coll.upload(url="https://example.com/video.mp4")
# Search by what was said
results = video.search("product demo")
for r in results:
print(f"{r.start:.1f}s - {r.end:.1f}s: {r.text}")
# YouTube
video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")
# Local file
video = coll.upload(file_path="/path/to/video.mp4")
```
### Timeline Editing
### Transcript + subtitle
```python
from videodb import Timeline, VideoAsset, AudioAsset
# force=True skips the error if the video is already indexed
video.index_spoken_words(force=True)
text = video.get_transcript_text()
stream_url = video.add_subtitle()
```
### Search inside videos
```python
from videodb.exceptions import InvalidRequestError
video.index_spoken_words(force=True)
# search() raises InvalidRequestError when no results are found.
# Always wrap in try/except and treat "No results found" as empty.
try:
results = video.search("product demo")
shots = results.get_shots()
stream_url = results.compile()
except InvalidRequestError as e:
if "No results found" in str(e):
shots = []
else:
raise
```
### Scene search
```python
import re
from videodb import SearchType, IndexType, SceneExtractionType
from videodb.exceptions import InvalidRequestError
# index_scenes() has no force parameter — it raises an error if a scene
# index already exists. Extract the existing index ID from the error.
try:
scene_index_id = video.index_scenes(
extraction_type=SceneExtractionType.shot_based,
prompt="Describe the visual content in this scene.",
)
except Exception as e:
match = re.search(r"id\s+([a-f0-9]+)", str(e))
if match:
scene_index_id = match.group(1)
else:
raise
# Use score_threshold to filter low-relevance noise (recommended: 0.3+)
try:
results = video.search(
query="person writing on a whiteboard",
search_type=SearchType.semantic,
index_type=IndexType.scene,
scene_index_id=scene_index_id,
score_threshold=0.3,
)
shots = results.get_shots()
stream_url = results.compile()
except InvalidRequestError as e:
if "No results found" in str(e):
shots = []
else:
raise
```
### Timeline editing
**Important:** Always validate timestamps before building a timeline:
- `start` must be >= 0 (negative values are silently accepted but produce broken output)
- `start` must be < `end`
- `end` must be <= `video.length`
```python
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, TextAsset, TextStyle
timeline = Timeline(conn)
asset = VideoAsset(asset_id=video.id, start=10, end=30)
timeline.add_inline(asset)
stream = timeline.generate_stream()
print(stream) # Playable HLS link
timeline.add_inline(VideoAsset(asset_id=video.id, start=10, end=30))
timeline.add_overlay(0, TextAsset(text="The End", duration=3, style=TextStyle(fontsize=36)))
stream_url = timeline.generate_stream()
```
### AI Media Generation
### Transcode video (resolution / quality change)
```python
audio = conn.generate_audio(text="Upbeat background music", duration=30)
image = conn.generate_image(prompt="Title card: Welcome to the Demo")
from videodb import TranscodeMode, VideoConfig, AudioConfig
# Change resolution, quality, or aspect ratio server-side
job_id = conn.transcode(
source="https://example.com/video.mp4",
callback_url="https://example.com/webhook",
mode=TranscodeMode.economy,
video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"),
audio_config=AudioConfig(mute=False),
)
```
## Capabilities
### Reframe aspect ratio (for social platforms)
| Capability | What It Does |
|---|---|
| Upload | YouTube, URLs, local files |
| Search | Speech-based and scene-based |
| Transcripts | Timestamped, multi-language |
| Edit | Trim, combine, multi-timeline |
| Subtitles | Auto-generate, custom styling |
| AI Generate | Images, video, music, SFX, voiceover |
| Capture | Screen + audio, real-time |
| Transcode | Resolution, codec, aspect ratio |
| Stream | HLS playable links |
**Warning:** `reframe()` is a slow server-side operation. For long videos it can take
several minutes and may time out. Best practices:
- Always limit to a short segment using `start`/`end` when possible
- For full-length videos, use `callback_url` for async processing
- Trim the video on a `Timeline` first, then reframe the shorter result
## Best Practices
```python
from videodb import ReframeMode
- Always verify SDK connection before operations: `conn.get_collection()`
- Use `video.index_spoken_words()` before searching — indexing is required once per video
- For scene search, use `video.index_scenes()` — this processes visual frames
- Timeline edits produce new streams; the original video is never modified
- AI generation is async — poll status or use callbacks for long operations
- Store `VIDEO_DB_API_KEY` in `.env`, not hardcoded
# Always prefer reframing a short segment:
reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
# Async reframe for full-length videos (returns None, result via webhook):
video.reframe(target="vertical", callback_url="https://example.com/webhook")
# Presets: "vertical" (9:16), "square" (1:1), "landscape" (16:9)
reframed = video.reframe(start=0, end=60, target="square")
# Custom dimensions
reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})
```
### Generative media
```python
image = coll.generate_image(
prompt="a sunset over mountains",
aspect_ratio="16:9",
)
```
## Error handling
```python
from videodb.exceptions import AuthenticationError, InvalidRequestError
try:
conn = videodb.connect()
except AuthenticationError:
print("Check your VIDEO_DB_API_KEY")
try:
video = coll.upload(url="https://example.com/video.mp4")
except InvalidRequestError as e:
print(f"Upload failed: {e}")
```
### Common pitfalls
| Scenario | Error message | Solution |
|----------|--------------|----------|
| Indexing an already-indexed video | `Spoken word index for video already exists` | Use `video.index_spoken_words(force=True)` to skip if already indexed |
| Scene index already exists | `Scene index with id XXXX already exists` | Extract the existing `scene_index_id` from the error with `re.search(r"id\s+([a-f0-9]+)", str(e))` |
| Search finds no matches | `InvalidRequestError: No results found` | Catch the exception and treat as empty results (`shots = []`) |
| Reframe times out | Blocks indefinitely on long videos | Use `start`/`end` to limit segment, or pass `callback_url` for async |
| Negative timestamps on Timeline | Silently produces broken stream | Always validate `start >= 0` before creating `VideoAsset` |
| `generate_video()` / `create_collection()` fails | `Operation not allowed` or `maximum limit` | Plan-gated features — inform the user about plan limits |
## Additional docs
Reference documentation is in the `reference/` directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.
- [reference/api-reference.md](reference/api-reference.md) - Complete VideoDB Python SDK API reference
- [reference/search.md](reference/search.md) - In-depth guide to video search (spoken word and scene-based)
- [reference/editor.md](reference/editor.md) - Timeline editing, assets, and composition
- [reference/streaming.md](reference/streaming.md) - HLS streaming and instant playback
- [reference/generative.md](reference/generative.md) - AI-powered media generation (images, video, audio)
- [reference/rtstream.md](reference/rtstream.md) - Live stream ingestion workflow (RTSP/RTMP)
- [reference/rtstream-reference.md](reference/rtstream-reference.md) - RTStream SDK methods and AI pipelines
- [reference/capture.md](reference/capture.md) - Desktop capture workflow
- [reference/capture-reference.md](reference/capture-reference.md) - Capture SDK and WebSocket events
- [reference/use-cases.md](reference/use-cases.md) - Common video processing patterns and examples
## Screen Recording (Desktop Capture)
Use `ws_listener.py` to capture WebSocket events during recording sessions. Desktop capture supports **macOS** only.
### Quick Start
1. **Start listener**: `python scripts/ws_listener.py &`
2. **Get WebSocket ID**: `cat /tmp/videodb_ws_id`
3. **Run capture code** (see reference/capture.md for full workflow)
4. **Events written to**: `/tmp/videodb_events.jsonl`
### Query Events
```python
import json
events = [json.loads(l) for l in open("/tmp/videodb_events.jsonl")]
# Get all transcripts
transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
# Get visual descriptions from last 5 minutes
import time
cutoff = time.time() - 300
recent_visual = [e for e in events
if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff]
```
### Utility Scripts
- [scripts/ws_listener.py](scripts/ws_listener.py) - WebSocket event listener (dumps to JSONL)
For complete capture workflow, see [reference/capture.md](reference/capture.md).
**Do not use ffmpeg, moviepy, or local encoding tools** when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing).
### When to use what
| Problem | VideoDB solution |
|---------|-----------------|
| Platform rejects video aspect ratio or resolution | `video.reframe()` or `conn.transcode()` with `VideoConfig` |
| Need to resize video for Twitter/Instagram/TikTok | `video.reframe(target="vertical")` or `target="square"` |
| Need to change resolution (e.g. 1080p → 720p) | `conn.transcode()` with `VideoConfig(resolution=720)` |
| Need to overlay audio/music on video | `AudioAsset` on a `Timeline` |
| Need to add subtitles | `video.add_subtitle()` or `CaptionAsset` |
| Need to combine/trim clips | `VideoAsset` on a `Timeline` |
| Need to generate voiceover, music, or SFX | `coll.generate_voice()`, `generate_music()`, `generate_sound_effect()` |
## Repository
https://github.com/video-db/skills
**Maintained By:** [VideoDB](https://github.com/video-db)

View File

@@ -0,0 +1,548 @@
# Complete API Reference
## Connection
```python
import videodb
conn = videodb.connect(
api_key="your-api-key", # or set VIDEO_DB_API_KEY env var
base_url=None, # custom API endpoint (optional)
)
```
**Returns:** `Connection` object
### Connection Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `conn.get_collection(collection_id="default")` | `Collection` | Get collection (default if no ID) |
| `conn.get_collections()` | `list[Collection]` | List all collections |
| `conn.create_collection(name, description, is_public=False)` | `Collection` | Create new collection |
| `conn.update_collection(id, name, description)` | `Collection` | Update a collection |
| `conn.check_usage()` | `dict` | Get account usage stats |
| `conn.upload(source, media_type, name, ...)` | `Video\|Audio\|Image` | Upload to default collection |
| `conn.record_meeting(meeting_url, bot_name, ...)` | `Meeting` | Record a meeting |
| `conn.create_capture_session(...)` | `CaptureSession` | Create a capture session (see [capture-reference.md](capture-reference.md)) |
| `conn.youtube_search(query, result_threshold, duration)` | `list[dict]` | Search YouTube |
| `conn.transcode(source, callback_url, mode, ...)` | `str` | Transcode video (returns job ID) |
| `conn.get_transcode_details(job_id)` | `dict` | Get transcode job status and details |
| `conn.connect_websocket(collection_id)` | `WebSocketConnection` | Connect to WebSocket (see [capture-reference.md](capture-reference.md)) |
### Transcode
Transcode a video from a URL with custom resolution, quality, and audio settings. Processing happens server-side — no local ffmpeg required.
```python
from videodb import TranscodeMode, VideoConfig, AudioConfig
job_id = conn.transcode(
source="https://example.com/video.mp4",
callback_url="https://example.com/webhook",
mode=TranscodeMode.economy,
video_config=VideoConfig(resolution=720, quality=23),
audio_config=AudioConfig(mute=False),
)
```
#### transcode Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `source` | `str` | required | URL of the video to transcode (preferably a downloadable URL) |
| `callback_url` | `str` | required | URL to receive the callback when transcoding completes |
| `mode` | `TranscodeMode` | `TranscodeMode.economy` | Transcoding speed: `economy` or `lightning` |
| `video_config` | `VideoConfig` | `VideoConfig()` | Video encoding settings |
| `audio_config` | `AudioConfig` | `AudioConfig()` | Audio encoding settings |
Returns a job ID (`str`). Use `conn.get_transcode_details(job_id)` to check job status.
```python
details = conn.get_transcode_details(job_id)
```
#### VideoConfig
```python
from videodb import VideoConfig, ResizeMode
config = VideoConfig(
resolution=720, # Target resolution height (e.g. 480, 720, 1080)
quality=23, # Encoding quality (lower = better, default 23)
framerate=30, # Target framerate
aspect_ratio="16:9", # Target aspect ratio
resize_mode=ResizeMode.crop, # How to fit: crop, fit, or pad
)
```
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `resolution` | `int\|None` | `None` | Target resolution height in pixels |
| `quality` | `int` | `23` | Encoding quality (lower = higher quality) |
| `framerate` | `int\|None` | `None` | Target framerate |
| `aspect_ratio` | `str\|None` | `None` | Target aspect ratio (e.g. `"16:9"`, `"9:16"`) |
| `resize_mode` | `str` | `ResizeMode.crop` | Resize strategy: `crop`, `fit`, or `pad` |
#### AudioConfig
```python
from videodb import AudioConfig
config = AudioConfig(mute=False)
```
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `mute` | `bool` | `False` | Mute the audio track |
## Collections
```python
coll = conn.get_collection()
```
### Collection Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `coll.get_videos()` | `list[Video]` | List all videos |
| `coll.get_video(video_id)` | `Video` | Get specific video |
| `coll.get_audios()` | `list[Audio]` | List all audios |
| `coll.get_audio(audio_id)` | `Audio` | Get specific audio |
| `coll.get_images()` | `list[Image]` | List all images |
| `coll.get_image(image_id)` | `Image` | Get specific image |
| `coll.upload(url=None, file_path=None, media_type=None, name=None)` | `Video\|Audio\|Image` | Upload media |
| `coll.search(query, search_type, index_type, score_threshold, namespace, scene_index_id, ...)` | `SearchResult` | Search across collection (semantic only; keyword and scene search raise `NotImplementedError`) |
| `coll.generate_image(prompt, aspect_ratio="1:1")` | `Image` | Generate image with AI |
| `coll.generate_video(prompt, duration=5)` | `Video` | Generate video with AI |
| `coll.generate_music(prompt, duration=5)` | `Audio` | Generate music with AI |
| `coll.generate_sound_effect(prompt, duration=2)` | `Audio` | Generate sound effect |
| `coll.generate_voice(text, voice_name="Default")` | `Audio` | Generate speech from text |
| `coll.generate_text(prompt, model_name="basic", response_type="text")` | `dict` | LLM text generation — access result via `["output"]` |
| `coll.dub_video(video_id, language_code)` | `Video` | Dub video into another language |
| `coll.record_meeting(meeting_url, bot_name, ...)` | `Meeting` | Record a live meeting |
| `coll.create_capture_session(...)` | `CaptureSession` | Create a capture session (see [capture-reference.md](capture-reference.md)) |
| `coll.get_capture_session(...)` | `CaptureSession` | Retrieve capture session (see [capture-reference.md](capture-reference.md)) |
| `coll.connect_rtstream(url, name, ...)` | `RTStream` | Connect to a live stream (see [rtstream-reference.md](rtstream-reference.md)) |
| `coll.make_public()` | `None` | Make collection public |
| `coll.make_private()` | `None` | Make collection private |
| `coll.delete_video(video_id)` | `None` | Delete a video |
| `coll.delete_audio(audio_id)` | `None` | Delete an audio |
| `coll.delete_image(image_id)` | `None` | Delete an image |
| `coll.delete()` | `None` | Delete the collection |
### Upload Parameters
```python
video = coll.upload(
url=None, # Remote URL (HTTP, YouTube)
file_path=None, # Local file path
media_type=None, # "video", "audio", or "image" (auto-detected if omitted)
name=None, # Custom name for the media
description=None, # Description
callback_url=None, # Webhook URL for async notification
)
```
## Video Object
```python
video = coll.get_video(video_id)
```
### Video Properties
| Property | Type | Description |
|----------|------|-------------|
| `video.id` | `str` | Unique video ID |
| `video.collection_id` | `str` | Parent collection ID |
| `video.name` | `str` | Video name |
| `video.description` | `str` | Video description |
| `video.length` | `float` | Duration in seconds |
| `video.stream_url` | `str` | Default stream URL |
| `video.player_url` | `str` | Player embed URL |
| `video.thumbnail_url` | `str` | Thumbnail URL |
### Video Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `video.generate_stream(timeline=None)` | `str` | Generate stream URL (optional timeline of `[(start, end)]` tuples) |
| `video.play()` | `str` | Open stream in browser, returns player URL |
| `video.index_spoken_words(language_code=None, force=False)` | `None` | Index speech for search. Use `force=True` to skip if already indexed. |
| `video.index_scenes(extraction_type, prompt, extraction_config, metadata, model_name, name, scenes, callback_url)` | `str` | Index visual scenes (returns scene_index_id) |
| `video.index_visuals(prompt, batch_config, ...)` | `str` | Index visuals (returns scene_index_id) |
| `video.index_audio(prompt, model_name, ...)` | `str` | Index audio with LLM (returns scene_index_id) |
| `video.get_transcript(start=None, end=None)` | `list[dict]` | Get timestamped transcript |
| `video.get_transcript_text(start=None, end=None)` | `str` | Get full transcript text |
| `video.generate_transcript(force=None)` | `dict` | Generate transcript |
| `video.translate_transcript(language, additional_notes)` | `list[dict]` | Translate transcript |
| `video.search(query, search_type, index_type, filter, **kwargs)` | `SearchResult` | Search within video |
| `video.add_subtitle(style=SubtitleStyle())` | `str` | Add subtitles (returns stream URL) |
| `video.generate_thumbnail(time=None)` | `str\|Image` | Generate thumbnail |
| `video.get_thumbnails()` | `list[Image]` | Get all thumbnails |
| `video.extract_scenes(extraction_type, extraction_config)` | `SceneCollection` | Extract scenes |
| `video.reframe(start, end, target, mode, callback_url)` | `Video\|None` | Reframe video aspect ratio |
| `video.clip(prompt, content_type, model_name)` | `str` | Generate clip from prompt (returns stream URL) |
| `video.insert_video(video, timestamp)` | `str` | Insert video at timestamp |
| `video.download(name=None)` | `dict` | Download the video |
| `video.delete()` | `None` | Delete the video |
### Reframe
Convert a video to a different aspect ratio with optional smart object tracking. Processing is server-side.
> **Warning:** Reframe is a slow server-side operation. It can take several minutes for long videos and may time out. Always use `start`/`end` to limit the segment, or pass `callback_url` for async processing.
```python
from videodb import ReframeMode
# Always prefer short segments to avoid timeouts:
reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
# Async reframe for full-length videos (returns None, result via webhook):
video.reframe(target="vertical", callback_url="https://example.com/webhook")
# Custom dimensions
reframed = video.reframe(start=0, end=60, target={"width": 1080, "height": 1080})
```
#### reframe Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `start` | `float\|None` | `None` | Start time in seconds (None = beginning) |
| `end` | `float\|None` | `None` | End time in seconds (None = end of video) |
| `target` | `str\|dict` | `"vertical"` | Preset string (`"vertical"`, `"square"`, `"landscape"`) or `{"width": int, "height": int}` |
| `mode` | `str` | `ReframeMode.smart` | `"simple"` (centre crop) or `"smart"` (object tracking) |
| `callback_url` | `str\|None` | `None` | Webhook URL for async notification |
Returns a `Video` object when no `callback_url` is provided, `None` otherwise.
## Audio Object
```python
audio = coll.get_audio(audio_id)
```
### Audio Properties
| Property | Type | Description |
|----------|------|-------------|
| `audio.id` | `str` | Unique audio ID |
| `audio.collection_id` | `str` | Parent collection ID |
| `audio.name` | `str` | Audio name |
| `audio.length` | `float` | Duration in seconds |
### Audio Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `audio.generate_url()` | `str` | Generate signed URL for playback |
| `audio.get_transcript(start=None, end=None)` | `list[dict]` | Get timestamped transcript |
| `audio.get_transcript_text(start=None, end=None)` | `str` | Get full transcript text |
| `audio.generate_transcript(force=None)` | `dict` | Generate transcript |
| `audio.delete()` | `None` | Delete the audio |
## Image Object
```python
image = coll.get_image(image_id)
```
### Image Properties
| Property | Type | Description |
|----------|------|-------------|
| `image.id` | `str` | Unique image ID |
| `image.collection_id` | `str` | Parent collection ID |
| `image.name` | `str` | Image name |
| `image.url` | `str\|None` | Image URL (may be `None` for generated images — use `generate_url()` instead) |
### Image Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `image.generate_url()` | `str` | Generate signed URL |
| `image.delete()` | `None` | Delete the image |
## Timeline & Editor
### Timeline
```python
from videodb.timeline import Timeline
timeline = Timeline(conn)
```
| Method | Returns | Description |
|--------|---------|-------------|
| `timeline.add_inline(asset)` | `None` | Add `VideoAsset` sequentially on main track |
| `timeline.add_overlay(start, asset)` | `None` | Overlay `AudioAsset`, `ImageAsset`, or `TextAsset` at timestamp |
| `timeline.generate_stream()` | `str` | Compile and get stream URL |
### Asset Types
#### VideoAsset
```python
from videodb.asset import VideoAsset
asset = VideoAsset(
asset_id=video.id,
start=0, # trim start (seconds)
end=None, # trim end (seconds, None = full)
)
```
#### AudioAsset
```python
from videodb.asset import AudioAsset
asset = AudioAsset(
asset_id=audio.id,
start=0,
end=None,
disable_other_tracks=True, # mute original audio when True
fade_in_duration=0, # seconds (max 5)
fade_out_duration=0, # seconds (max 5)
)
```
#### ImageAsset
```python
from videodb.asset import ImageAsset
asset = ImageAsset(
asset_id=image.id,
duration=None, # display duration (seconds)
width=100, # display width
height=100, # display height
x=80, # horizontal position (px from left)
y=20, # vertical position (px from top)
)
```
#### TextAsset
```python
from videodb.asset import TextAsset, TextStyle
asset = TextAsset(
text="Hello World",
duration=5,
style=TextStyle(
fontsize=24,
fontcolor="black",
boxcolor="white", # background box colour
alpha=1.0,
font="Sans",
text_align="T", # text alignment within box
),
)
```
#### CaptionAsset (Editor API)
CaptionAsset belongs to the Editor API, which has its own Timeline, Track, and Clip system:
```python
from videodb.editor import CaptionAsset, FontStyling
asset = CaptionAsset(
src="auto", # "auto" or base64 ASS string
font=FontStyling(name="Clear Sans", size=30),
primary_color="&H00FFFFFF",
)
```
See [editor.md](editor.md#caption-overlays) for full CaptionAsset usage with the Editor API.
## Video Search Parameters
```python
results = video.search(
query="your query",
search_type=SearchType.semantic, # semantic, keyword, or scene
index_type=IndexType.spoken_word, # spoken_word or scene
result_threshold=None, # max number of results
score_threshold=None, # minimum relevance score
dynamic_score_percentage=None, # percentage of dynamic score
scene_index_id=None, # target a specific scene index (pass via **kwargs)
filter=[], # metadata filters for scene search
)
```
> **Note:** `filter` is an explicit named parameter in `video.search()`. `scene_index_id` is passed through `**kwargs` to the API.
> **Important:** `video.search()` raises `InvalidRequestError` with message `"No results found"` when there are no matches. Always wrap search calls in try/except. For scene search, use `score_threshold=0.3` or higher to filter low-relevance noise.
For scene search, use `search_type=SearchType.semantic` with `index_type=IndexType.scene`. Pass `scene_index_id` when targeting a specific scene index. See [search.md](search.md) for details.
## SearchResult Object
```python
results = video.search("query", search_type=SearchType.semantic)
```
| Method | Returns | Description |
|--------|---------|-------------|
| `results.get_shots()` | `list[Shot]` | Get list of matching segments |
| `results.compile()` | `str` | Compile all shots into a stream URL |
| `results.play()` | `str` | Open compiled stream in browser |
### Shot Properties
| Property | Type | Description |
|----------|------|-------------|
| `shot.video_id` | `str` | Source video ID |
| `shot.video_length` | `float` | Source video duration |
| `shot.video_title` | `str` | Source video title |
| `shot.start` | `float` | Start time (seconds) |
| `shot.end` | `float` | End time (seconds) |
| `shot.text` | `str` | Matched text content |
| `shot.search_score` | `float` | Search relevance score |
| Method | Returns | Description |
|--------|---------|-------------|
| `shot.generate_stream()` | `str` | Stream this specific shot |
| `shot.play()` | `str` | Open shot stream in browser |
## Meeting Object
```python
meeting = coll.record_meeting(
meeting_url="https://meet.google.com/...",
bot_name="Bot",
callback_url=None, # Webhook URL for status updates
callback_data=None, # Optional dict passed through to callbacks
time_zone="UTC", # Time zone for the meeting
)
```
### Meeting Properties
| Property | Type | Description |
|----------|------|-------------|
| `meeting.id` | `str` | Unique meeting ID |
| `meeting.collection_id` | `str` | Parent collection ID |
| `meeting.status` | `str` | Current status |
| `meeting.video_id` | `str` | Recorded video ID (after completion) |
| `meeting.bot_name` | `str` | Bot name |
| `meeting.meeting_title` | `str` | Meeting title |
| `meeting.meeting_url` | `str` | Meeting URL |
| `meeting.speaker_timeline` | `dict` | Speaker timeline data |
| `meeting.is_active` | `bool` | True if initializing or processing |
| `meeting.is_completed` | `bool` | True if done |
### Meeting Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `meeting.refresh()` | `Meeting` | Refresh data from server |
| `meeting.wait_for_status(target_status, timeout=14400, interval=120)` | `bool` | Poll until status reached |
## RTStream & Capture
For RTStream (live ingestion, indexing, transcription), see [rtstream-reference.md](rtstream-reference.md).
For capture sessions (desktop recording, CaptureClient, channels), see [capture-reference.md](capture-reference.md).
## Enums & Constants
### SearchType
```python
from videodb import SearchType
SearchType.semantic # Natural language semantic search
SearchType.keyword # Exact keyword matching
SearchType.scene # Visual scene search (may require paid plan)
SearchType.llm # LLM-powered search
```
### SceneExtractionType
```python
from videodb import SceneExtractionType
SceneExtractionType.shot_based # Automatic shot boundary detection
SceneExtractionType.time_based # Fixed time interval extraction
SceneExtractionType.transcript # Transcript-based scene extraction
```
### SubtitleStyle
```python
from videodb import SubtitleStyle
style = SubtitleStyle(
font_name="Arial",
font_size=18,
primary_colour="&H00FFFFFF",
bold=False,
# ... see SubtitleStyle for all options
)
video.add_subtitle(style=style)
```
### SubtitleAlignment & SubtitleBorderStyle
```python
from videodb import SubtitleAlignment, SubtitleBorderStyle
```
### TextStyle
```python
from videodb import TextStyle
# or: from videodb.asset import TextStyle
style = TextStyle(
fontsize=24,
fontcolor="black",
boxcolor="white",
font="Sans",
text_align="T",
alpha=1.0,
)
```
### Other Constants
```python
from videodb import (
IndexType, # spoken_word, scene
MediaType, # video, audio, image
Segmenter, # word, sentence, time
SegmentationType, # sentence, llm
TranscodeMode, # economy, lightning
ResizeMode, # crop, fit, pad
ReframeMode, # simple, smart
RTStreamChannelType,
)
```
## Exceptions
```python
from videodb.exceptions import (
AuthenticationError, # Invalid or missing API key
InvalidRequestError, # Bad parameters or malformed request
RequestTimeoutError, # Request timed out
SearchError, # Search operation failure (e.g. not indexed)
VideodbError, # Base exception for all VideoDB errors
)
```
| Exception | Common Cause |
|-----------|-------------|
| `AuthenticationError` | Missing or invalid `VIDEO_DB_API_KEY` |
| `InvalidRequestError` | Invalid URL, unsupported format, bad parameters |
| `RequestTimeoutError` | Server took too long to respond |
| `SearchError` | Searching before indexing, invalid search type |
| `VideodbError` | Server errors, network issues, generic failures |

View File

@@ -0,0 +1,386 @@
# Capture Reference
Code-level details for VideoDB capture sessions. For workflow guide, see [capture.md](capture.md).
---
## WebSocket Events
Real-time events from capture sessions and AI pipelines. No webhooks or polling required.
Use [scripts/ws_listener.py](../scripts/ws_listener.py) to connect and dump events to `/tmp/videodb_events.jsonl`.
### Event Channels
| Channel | Source | Content |
|---------|--------|---------|
| `capture_session` | Session lifecycle | Status changes |
| `transcript` | `start_transcript()` | Speech-to-text |
| `visual_index` / `scene_index` | `index_visuals()` | Visual analysis |
| `audio_index` | `index_audio()` | Audio analysis |
| `alert` | `create_alert()` | Alert notifications |
### Session Lifecycle Events
| Event | Status | Key Data |
|-------|--------|----------|
| `capture_session.created` | `created` | — |
| `capture_session.starting` | `starting` | — |
| `capture_session.active` | `active` | `rtstreams[]` |
| `capture_session.stopping` | `stopping` | — |
| `capture_session.stopped` | `stopped` | — |
| `capture_session.exported` | `exported` | `exported_video_id`, `stream_url`, `player_url` |
| `capture_session.failed` | `failed` | `error` |
### Event Structures
**Transcript event:**
```json
{
"channel": "transcript",
"rtstream_id": "rts-xxx",
"rtstream_name": "mic:default",
"data": {
"text": "Let's schedule the meeting for Thursday",
"is_final": true,
"start": 1710000001234,
"end": 1710000002345
}
}
```
**Visual index event:**
```json
{
"channel": "visual_index",
"rtstream_id": "rts-xxx",
"rtstream_name": "display:1",
"data": {
"text": "User is viewing a Slack conversation with 3 unread messages",
"start": 1710000012340,
"end": 1710000018900
}
}
```
**Audio index event:**
```json
{
"channel": "audio_index",
"rtstream_id": "rts-xxx",
"rtstream_name": "mic:default",
"data": {
"text": "Discussion about scheduling a team meeting",
"start": 1710000021500,
"end": 1710000029200
}
}
```
**Session active event:**
```json
{
"event": "capture_session.active",
"capture_session_id": "cap-xxx",
"status": "active",
"data": {
"rtstreams": [
{ "rtstream_id": "rts-1", "name": "mic:default", "media_types": ["audio"] },
{ "rtstream_id": "rts-2", "name": "system_audio:default", "media_types": ["audio"] },
{ "rtstream_id": "rts-3", "name": "display:1", "media_types": ["video"] }
]
}
}
```
**Session exported event:**
```json
{
"event": "capture_session.exported",
"capture_session_id": "cap-xxx",
"status": "exported",
"data": {
"exported_video_id": "v_xyz789",
"stream_url": "https://stream.videodb.io/...",
"player_url": "https://console.videodb.io/player?url=..."
}
}
```
> For latest details, see https://docs.videodb.io/pages/ingest/capture-sdks/realtime-context.md
---
## Event Persistence
Use `ws_listener.py` to dump all WebSocket events to a JSONL file for later analysis.
### Start Listener and Get WebSocket ID
```bash
# Start with --clear to clear old events (recommended for new sessions)
python scripts/ws_listener.py --clear &
# Append to existing events (for reconnects)
python scripts/ws_listener.py &
```
Or specify a custom output directory:
```bash
python scripts/ws_listener.py --clear /path/to/output &
# Or via environment variable:
VIDEODB_EVENTS_DIR=/path/to/output python scripts/ws_listener.py --clear &
```
The script outputs `WS_ID=<connection_id>` on the first line, then listens indefinitely.
**Get the ws_id:**
```bash
cat /tmp/videodb_ws_id
```
**Stop the listener:**
```bash
kill $(cat /tmp/videodb_ws_pid)
```
**Functions that accept `ws_connection_id`:**
| Function | Purpose |
|----------|---------|
| `conn.create_capture_session()` | Session lifecycle events |
| RTStream methods | See [rtstream-reference.md](rtstream-reference.md) |
**Output files** (in output directory, default `/tmp`):
- `videodb_ws_id` - WebSocket connection ID
- `videodb_events.jsonl` - All events
- `videodb_ws_pid` - Process ID for easy termination
**Features:**
- `--clear` flag to clear events file on start (use for new sessions)
- Auto-reconnect with exponential backoff on connection drops
- Graceful shutdown on SIGINT/SIGTERM
- Connection status logging
### JSONL Format
Each line is a JSON object with added timestamps:
```json
{"ts": "2026-03-02T10:15:30.123Z", "unix_ts": 1709374530.12, "channel": "visual_index", "data": {"text": "..."}}
{"ts": "2026-03-02T10:15:31.456Z", "unix_ts": 1709374531.45, "event": "capture_session.active", "capture_session_id": "cap-xxx"}
```
### Reading Events
```python
import json
events = [json.loads(l) for l in open("/tmp/videodb_events.jsonl")]
# Filter by channel
transcripts = [e for e in events if e.get("channel") == "transcript"]
# Filter by time (last 10 minutes)
import time
cutoff = time.time() - 600
recent = [e for e in events if e["unix_ts"] > cutoff]
# Filter visual events containing keyword
visual = [e for e in events
if e.get("channel") == "visual_index"
and "code" in e.get("data", {}).get("text", "").lower()]
```
---
## WebSocket Connection
Connect to receive real-time AI results from transcription and indexing pipelines.
```python
ws_wrapper = conn.connect_websocket()
ws = await ws_wrapper.connect()
ws_id = ws.connection_id
```
| Property / Method | Type | Description |
|-------------------|------|-------------|
| `ws.connection_id` | `str` | Unique connection ID (pass to AI pipeline methods) |
| `ws.receive()` | `AsyncIterator[dict]` | Async iterator yielding real-time messages |
---
## CaptureSession
### Connection Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `conn.create_capture_session(end_user_id, collection_id, ws_connection_id, metadata)` | `CaptureSession` | Create a new capture session |
| `conn.get_capture_session(capture_session_id)` | `CaptureSession` | Retrieve an existing capture session |
| `conn.generate_client_token()` | `str` | Generate a client-side authentication token |
### Create a Capture Session
```python
ws_id = open("/tmp/videodb_ws_id").read().strip()
session = conn.create_capture_session(
end_user_id="user-123", # required
collection_id="default",
ws_connection_id=ws_id,
metadata={"app": "my-app"},
)
print(f"Session ID: {session.id}")
```
> **Note:** `end_user_id` is required and identifies the user initiating the capture. For testing or demo purposes, any unique string identifier works (e.g., `"demo-user"`, `"test-123"`).
### CaptureSession Properties
| Property | Type | Description |
|----------|------|-------------|
| `session.id` | `str` | Unique capture session ID |
### CaptureSession Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `session.get_rtstream(type)` | `list[RTStream]` | Get RTStreams by type: `"mic"`, `"screen"`, or `"system_audio"` |
### Generate a Client Token
```python
token = conn.generate_client_token()
```
---
## CaptureClient
The client runs on the user's machine and handles permissions, channel discovery, and streaming.
```python
from videodb.capture import CaptureClient
client = CaptureClient(client_token=token)
```
### CaptureClient Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `await client.request_permission(type)` | `None` | Request device permission (`"microphone"`, `"screen_capture"`) |
| `await client.list_channels()` | `Channels` | Discover available audio/video channels |
| `await client.start_capture_session(capture_session_id, channels, primary_video_channel_id)` | `None` | Start streaming selected channels |
| `await client.stop_capture()` | `None` | Gracefully stop the capture session |
| `await client.shutdown()` | `None` | Clean up client resources |
### Request Permissions
```python
await client.request_permission("microphone")
await client.request_permission("screen_capture")
```
### Start a Session
```python
selected_channels = [c for c in [mic, display, system_audio] if c]
await client.start_capture_session(
capture_session_id=session.id,
channels=selected_channels,
primary_video_channel_id=display.id if display else None,
)
```
### Stop a Session
```python
await client.stop_capture()
await client.shutdown()
```
---
## Channels
Returned by `client.list_channels()`. Groups available devices by type.
```python
channels = await client.list_channels()
for ch in channels.all():
print(f" {ch.id} ({ch.type}): {ch.name}")
mic = channels.mics.default
display = channels.displays.default
system_audio = channels.system_audio.default
```
### Channel Groups
| Property | Type | Description |
|----------|------|-------------|
| `channels.mics` | `ChannelGroup` | Available microphones |
| `channels.displays` | `ChannelGroup` | Available screen displays |
| `channels.system_audio` | `ChannelGroup` | Available system audio sources |
### ChannelGroup Methods & Properties
| Member | Type | Description |
|--------|------|-------------|
| `group.default` | `Channel` | Default channel in the group (or `None`) |
| `group.all()` | `list[Channel]` | All channels in the group |
### Channel Properties
| Property | Type | Description |
|----------|------|-------------|
| `ch.id` | `str` | Unique channel ID |
| `ch.type` | `str` | Channel type (`"mic"`, `"display"`, `"system_audio"`) |
| `ch.name` | `str` | Human-readable channel name |
| `ch.store` | `bool` | Whether to persist the recording (set to `True` to save) |
Without `store = True`, streams are processed in real-time but not saved.
---
## RTStreams and AI Pipelines
After session is active, retrieve RTStream objects with `session.get_rtstream()`.
For RTStream methods (indexing, transcription, alerts, batch config), see [rtstream-reference.md](rtstream-reference.md).
---
## Session Lifecycle
```
create_capture_session()
v
┌───────────────┐
│ created │
└───────┬───────┘
│ client.start_capture_session()
v
┌───────────────┐ WebSocket: capture_session.active
│ active │ ──> Start AI pipelines
└───────┬───────┘
│ client.stop_capture()
v
┌───────────────┐ WebSocket: capture_session.stopping
│ stopping │ ──> Finalize streams
└───────┬───────┘
v
┌───────────────┐ WebSocket: capture_session.stopped
│ stopped │ ──> All streams finalized
└───────┬───────┘
│ (if store=True)
v
┌───────────────┐ WebSocket: capture_session.exported
│ exported │ ──> Access video_id, stream_url, player_url
└───────────────┘
```

View File

@@ -0,0 +1,101 @@
# Capture Guide
## Overview
VideoDB Capture enables real-time screen and audio recording with AI processing. Desktop capture currently supports **macOS** only.
For code-level details (SDK methods, event structures, AI pipelines), see [capture-reference.md](capture-reference.md).
## Quick Start
1. **Start WebSocket listener**: `python scripts/ws_listener.py --clear &`
2. **Run capture code** (see Complete Capture Workflow below)
3. **Events written to**: `/tmp/videodb_events.jsonl`
---
## Complete Capture Workflow
No webhooks or polling required. WebSocket delivers all events including session lifecycle.
> **CRITICAL:** The `CaptureClient` must remain running for the entire duration of the capture. It runs the local recorder binary that streams screen/audio data to VideoDB. If the Python process that created the `CaptureClient` exits, the recorder binary is killed and capture stops silently. Always run the capture code as a **long-lived background process** (e.g. `nohup python capture_script.py &`) and use signal handling (`asyncio.Event` + `SIGINT`/`SIGTERM`) to keep it alive until you explicitly stop it.
1. **Start WebSocket listener** in background with `--clear` flag to clear old events. Wait for it to create the WebSocket ID file.
2. **Read the WebSocket ID**. This ID is required for capture session and AI pipelines.
3. **Create a capture session** and generate a client token for the desktop client.
4. **Initialize CaptureClient** with the token. Request permissions for microphone and screen capture.
5. **List and select channels** (mic, display, system_audio). Set `store = True` on channels you want to persist as a video.
6. **Start the session** with selected channels.
7. **Wait for session active** by reading events until you see `capture_session.active`. This event contains the `rtstreams` array. Save session info (session ID, RTStream IDs) to a file (e.g. `/tmp/videodb_capture_info.json`) so other scripts can read it.
8. **Keep the process alive.** Use `asyncio.Event` with signal handlers for `SIGINT`/`SIGTERM` to block until explicitly stopped. Write a PID file (e.g. `/tmp/videodb_capture_pid`) so the process can be stopped later with `kill $(cat /tmp/videodb_capture_pid)`. The PID file should be overwritten on every run so reruns always have the correct PID.
9. **Start AI pipelines** (in a separate command/script) on each RTStream for audio indexing and visual indexing. Read the RTStream IDs from the saved session info file.
10. **Write custom event processing logic** (in a separate command/script) to read real-time events based on your use case. Examples:
- Log Slack activity when `visual_index` mentions "Slack"
- Summarize discussions when `audio_index` events arrive
- Trigger alerts when specific keywords appear in `transcript`
- Track application usage from screen descriptions
11. **Stop capture** when done — send SIGTERM to the capture process. It should call `client.stop_capture()` and `client.shutdown()` in its signal handler.
12. **Wait for export** by reading events until you see `capture_session.exported`. This event contains `exported_video_id`, `stream_url`, and `player_url`. This may take several seconds after stopping capture.
13. **Stop WebSocket listener** after receiving the export event. Use `kill $(cat /tmp/videodb_ws_pid)` to cleanly terminate it.
---
## Shutdown Sequence
Proper shutdown order is important to ensure all events are captured:
1. **Stop the capture session**`client.stop_capture()` then `client.shutdown()`
2. **Wait for export event** — poll `/tmp/videodb_events.jsonl` for `capture_session.exported`
3. **Stop the WebSocket listener**`kill $(cat /tmp/videodb_ws_pid)`
Do NOT kill the WebSocket listener before receiving the export event, or you will miss the final video URLs.
---
## Scripts
| Script | Description |
|--------|-------------|
| `scripts/ws_listener.py` | WebSocket event listener (dumps to JSONL) |
### ws_listener.py Usage
```bash
# Start listener in background (append to existing events)
python scripts/ws_listener.py &
# Start listener with clear (new session, clears old events)
python scripts/ws_listener.py --clear &
# Custom output directory
python scripts/ws_listener.py --clear /path/to/events &
# Stop the listener
kill $(cat /tmp/videodb_ws_pid)
```
**Options:**
- `--clear`: Clear the events file before starting. Use when starting a new capture session.
**Output files:**
- `videodb_events.jsonl` - All WebSocket events
- `videodb_ws_id` - WebSocket connection ID (for `ws_connection_id` parameter)
- `videodb_ws_pid` - Process ID (for stopping the listener)
**Features:**
- Auto-reconnect with exponential backoff on connection drops
- Graceful shutdown on SIGINT/SIGTERM
- PID file for easy process management
- Connection status logging

View File

@@ -0,0 +1,434 @@
# Timeline Editing Guide
VideoDB provides a non-destructive timeline editor for composing videos from multiple assets, adding text and image overlays, mixing audio tracks, and trimming clips — all server-side without re-encoding or local tools. Use this for trimming, combining clips, overlaying audio/music on video, adding subtitles, and layering text or images.
## Prerequisites
Videos, audio, and images **must be uploaded** to a collection before they can be used as timeline assets. For caption overlays, the video must also be **indexed for spoken words**.
## Core Concepts
### Timeline
A `Timeline` is a virtual composition layer. Assets are placed on it either **inline** (sequentially on the main track) or as **overlays** (layered at a specific timestamp). Nothing modifies the original media; the final stream is compiled on demand.
```python
from videodb.timeline import Timeline
timeline = Timeline(conn)
```
### Assets
Every element on a timeline is an **asset**. VideoDB provides five asset types:
| Asset | Import | Primary Use |
|-------|--------|-------------|
| `VideoAsset` | `from videodb.asset import VideoAsset` | Video clips (trim, sequencing) |
| `AudioAsset` | `from videodb.asset import AudioAsset` | Music, SFX, narration |
| `ImageAsset` | `from videodb.asset import ImageAsset` | Logos, thumbnails, overlays |
| `TextAsset` | `from videodb.asset import TextAsset, TextStyle` | Titles, captions, lower-thirds |
| `CaptionAsset` | `from videodb.editor import CaptionAsset` | Auto-rendered subtitles (Editor API) |
## Building a Timeline
### Add Video Clips Inline
Inline assets play one after another on the main video track. The `add_inline` method only accepts `VideoAsset`:
```python
from videodb.asset import VideoAsset
video_a = coll.get_video(video_id_a)
video_b = coll.get_video(video_id_b)
timeline = Timeline(conn)
timeline.add_inline(VideoAsset(asset_id=video_a.id))
timeline.add_inline(VideoAsset(asset_id=video_b.id))
stream_url = timeline.generate_stream()
```
### Trim / Sub-clip
Use `start` and `end` on a `VideoAsset` to extract a portion:
```python
# Take only seconds 1030 from the source video
clip = VideoAsset(asset_id=video.id, start=10, end=30)
timeline.add_inline(clip)
```
### VideoAsset Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `asset_id` | `str` | required | Video media ID |
| `start` | `float` | `0` | Trim start (seconds) |
| `end` | `float\|None` | `None` | Trim end (`None` = full) |
> **Warning:** The SDK does not validate negative timestamps. Passing `start=-5` is silently accepted but produces broken or unexpected output. Always ensure `start >= 0`, `start < end`, and `end <= video.length` before creating a `VideoAsset`.
## Text Overlays
Add titles, lower-thirds, or captions at any point on the timeline:
```python
from videodb.asset import TextAsset, TextStyle
title = TextAsset(
text="Welcome to the Demo",
duration=5,
style=TextStyle(
fontsize=36,
fontcolor="white",
boxcolor="black",
alpha=0.8,
font="Sans",
),
)
# Overlay the title at the very start (t=0)
timeline.add_overlay(0, title)
```
### TextStyle Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `fontsize` | `int` | `24` | Font size in pixels |
| `fontcolor` | `str` | `"black"` | CSS colour name or hex |
| `fontcolor_expr` | `str` | `""` | Dynamic font colour expression |
| `alpha` | `float` | `1.0` | Text opacity (0.01.0) |
| `font` | `str` | `"Sans"` | Font family |
| `box` | `bool` | `True` | Enable background box |
| `boxcolor` | `str` | `"white"` | Background box colour |
| `boxborderw` | `str` | `"10"` | Box border width |
| `boxw` | `int` | `0` | Box width override |
| `boxh` | `int` | `0` | Box height override |
| `line_spacing` | `int` | `0` | Line spacing |
| `text_align` | `str` | `"T"` | Text alignment within the box |
| `y_align` | `str` | `"text"` | Vertical alignment reference |
| `borderw` | `int` | `0` | Text border width |
| `bordercolor` | `str` | `"black"` | Text border colour |
| `expansion` | `str` | `"normal"` | Text expansion mode |
| `basetime` | `int` | `0` | Base time for time-based expressions |
| `fix_bounds` | `bool` | `False` | Fix text bounds |
| `text_shaping` | `bool` | `True` | Enable text shaping |
| `shadowcolor` | `str` | `"black"` | Shadow colour |
| `shadowx` | `int` | `0` | Shadow X offset |
| `shadowy` | `int` | `0` | Shadow Y offset |
| `tabsize` | `int` | `4` | Tab size in spaces |
| `x` | `str` | `"(main_w-text_w)/2"` | Horizontal position expression |
| `y` | `str` | `"(main_h-text_h)/2"` | Vertical position expression |
## Audio Overlays
Layer background music, sound effects, or voiceover on top of the video track:
```python
from videodb.asset import AudioAsset
music = coll.get_audio(music_id)
audio_layer = AudioAsset(
asset_id=music.id,
disable_other_tracks=False,
fade_in_duration=2,
fade_out_duration=2,
)
# Start the music at t=0, overlaid on the video track
timeline.add_overlay(0, audio_layer)
```
### AudioAsset Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `asset_id` | `str` | required | Audio media ID |
| `start` | `float` | `0` | Trim start (seconds) |
| `end` | `float\|None` | `None` | Trim end (`None` = full) |
| `disable_other_tracks` | `bool` | `True` | When True, mutes other audio tracks |
| `fade_in_duration` | `float` | `0` | Fade-in seconds (max 5) |
| `fade_out_duration` | `float` | `0` | Fade-out seconds (max 5) |
## Image Overlays
Add logos, watermarks, or generated images as overlays:
```python
from videodb.asset import ImageAsset
logo = coll.get_image(logo_id)
logo_overlay = ImageAsset(
asset_id=logo.id,
duration=10,
width=120,
height=60,
x=20,
y=20,
)
timeline.add_overlay(0, logo_overlay)
```
### ImageAsset Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `asset_id` | `str` | required | Image media ID |
| `width` | `int\|str` | `100` | Display width |
| `height` | `int\|str` | `100` | Display height |
| `x` | `int` | `80` | Horizontal position (px from left) |
| `y` | `int` | `20` | Vertical position (px from top) |
| `duration` | `float\|None` | `None` | Display duration (seconds) |
## Caption Overlays
There are two ways to add captions to video.
### Method 1: Subtitle Workflow (simplest)
Use `video.add_subtitle()` to burn subtitles directly onto a video stream. This uses the `videodb.timeline.Timeline` internally:
```python
from videodb import SubtitleStyle
# Video must have spoken words indexed first (force=True skips if already done)
video.index_spoken_words(force=True)
# Add subtitles with default styling
stream_url = video.add_subtitle()
# Or customise the subtitle style
stream_url = video.add_subtitle(style=SubtitleStyle(
font_name="Arial",
font_size=22,
primary_colour="&H00FFFFFF",
bold=True,
))
```
### Method 2: Editor API (advanced)
The Editor API (`videodb.editor`) provides a track-based composition system with `CaptionAsset`, `Clip`, `Track`, and its own `Timeline`. This is a separate API from the `videodb.timeline.Timeline` used above.
```python
from videodb.editor import (
CaptionAsset,
Clip,
Track,
Timeline as EditorTimeline,
FontStyling,
BorderAndShadow,
Positioning,
CaptionAnimation,
)
# Video must have spoken words indexed first (force=True skips if already done)
video.index_spoken_words(force=True)
# Create a caption asset
caption = CaptionAsset(
src="auto",
font=FontStyling(name="Clear Sans", size=30),
primary_color="&H00FFFFFF",
back_color="&H00000000",
border=BorderAndShadow(outline=1),
position=Positioning(margin_v=30),
animation=CaptionAnimation.box_highlight,
)
# Build an editor timeline with tracks and clips
editor_tl = EditorTimeline(conn)
track = Track()
track.add_clip(start=0, clip=Clip(asset=caption, duration=video.length))
editor_tl.add_track(track)
stream_url = editor_tl.generate_stream()
```
### CaptionAsset Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `src` | `str` | `"auto"` | Caption source (`"auto"` or base64 ASS string) |
| `font` | `FontStyling\|None` | `FontStyling()` | Font styling (name, size, bold, italic, etc.) |
| `primary_color` | `str` | `"&H00FFFFFF"` | Primary text colour (ASS format) |
| `secondary_color` | `str` | `"&H000000FF"` | Secondary text colour (ASS format) |
| `back_color` | `str` | `"&H00000000"` | Background colour (ASS format) |
| `border` | `BorderAndShadow\|None` | `BorderAndShadow()` | Border and shadow styling |
| `position` | `Positioning\|None` | `Positioning()` | Caption alignment and margins |
| `animation` | `CaptionAnimation\|None` | `None` | Animation effect (e.g., `box_highlight`, `reveal`, `karaoke`) |
## Compiling & Streaming
After assembling a timeline, compile it into a streamable URL. Streams are generated instantly - no render wait times.
```python
stream_url = timeline.generate_stream()
print(f"Stream: {stream_url}")
```
For more streaming options (segment streams, search-to-stream, audio playback), see [streaming.md](streaming.md).
## Complete Workflow Examples
### Highlight Reel with Title Card
```python
import videodb
from videodb import SearchType
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, TextAsset, TextStyle
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")
# 1. Search for key moments
video.index_spoken_words(force=True)
results = video.search("product announcement", search_type=SearchType.semantic)
shots = results.get_shots() # may be empty if no results
# 2. Build timeline
timeline = Timeline(conn)
# Title card
title = TextAsset(
text="Product Launch Highlights",
duration=4,
style=TextStyle(fontsize=48, fontcolor="white", boxcolor="#1a1a2e", alpha=0.95),
)
timeline.add_overlay(0, title)
# Append each matching clip
for shot in shots:
asset = VideoAsset(asset_id=shot.video_id, start=shot.start, end=shot.end)
timeline.add_inline(asset)
# 3. Generate stream
stream_url = timeline.generate_stream()
print(f"Highlight reel: {stream_url}")
```
### Picture-in-Picture with Background Music
```python
import videodb
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, AudioAsset, ImageAsset
conn = videodb.connect()
coll = conn.get_collection()
main_video = coll.get_video(main_video_id)
music = coll.get_audio(music_id)
logo = coll.get_image(logo_id)
timeline = Timeline(conn)
# Main video track
timeline.add_inline(VideoAsset(asset_id=main_video.id))
# Background music — disable_other_tracks=False to mix with video audio
timeline.add_overlay(
0,
AudioAsset(asset_id=music.id, disable_other_tracks=False, fade_in_duration=3),
)
# Logo in top-right corner for first 10 seconds
timeline.add_overlay(
0,
ImageAsset(asset_id=logo.id, duration=10, x=1140, y=20, width=120, height=60),
)
stream_url = timeline.generate_stream()
print(f"Final video: {stream_url}")
```
### Multi-Clip Montage from Multiple Videos
```python
import videodb
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, TextAsset, TextStyle
conn = videodb.connect()
coll = conn.get_collection()
clips = [
{"video_id": "vid_001", "start": 5, "end": 15, "label": "Scene 1"},
{"video_id": "vid_002", "start": 0, "end": 20, "label": "Scene 2"},
{"video_id": "vid_003", "start": 30, "end": 45, "label": "Scene 3"},
]
timeline = Timeline(conn)
for clip in clips:
# Add a label as an overlay on each clip
label = TextAsset(
text=clip["label"],
duration=2,
style=TextStyle(fontsize=32, fontcolor="white", boxcolor="#333333"),
)
timeline.add_inline(
VideoAsset(asset_id=clip["video_id"], start=clip["start"], end=clip["end"])
)
timeline.add_overlay(0, label)
stream_url = timeline.generate_stream()
print(f"Montage: {stream_url}")
```
## Two Timeline APIs
VideoDB has two separate timeline systems. They are **not interchangeable**:
| | `videodb.timeline.Timeline` | `videodb.editor.Timeline` (Editor API) |
|---|---|---|
| **Import** | `from videodb.timeline import Timeline` | `from videodb.editor import Timeline as EditorTimeline` |
| **Assets** | `VideoAsset`, `AudioAsset`, `ImageAsset`, `TextAsset` | `CaptionAsset`, `Clip`, `Track` |
| **Methods** | `add_inline()`, `add_overlay()` | `add_track()` with `Track` / `Clip` |
| **Best for** | Video composition, overlays, multi-clip editing | Caption/subtitle styling with animations |
Do not mix assets from one API into the other. `CaptionAsset` only works with the Editor API. `VideoAsset` / `AudioAsset` / `ImageAsset` / `TextAsset` only work with `videodb.timeline.Timeline`.
## Limitations & Constraints
The timeline editor is designed for **non-destructive linear composition**. The following operations are **not supported**:
### Not Possible
| Limitation | Detail |
|---|---|
| **No transitions or effects** | No crossfades, wipes, dissolves, or transitions between clips. All cuts are hard cuts. |
| **No video-on-video (picture-in-picture)** | `add_inline()` only accepts `VideoAsset`. You cannot overlay one video stream on top of another. Image overlays can approximate static PiP but not live video. |
| **No speed or playback control** | No slow-motion, fast-forward, reverse playback, or time remapping. `VideoAsset` has no `speed` parameter. |
| **No crop, zoom, or pan** | Cannot crop a region of a video frame, apply zoom effects, or pan across a frame. `video.reframe()` is for aspect-ratio conversion only. |
| **No video filters or color grading** | No brightness, contrast, saturation, hue, or color correction adjustments. |
| **No animated text** | `TextAsset` is static for its full duration. No fade-in/out, movement, or animation. For animated captions, use `CaptionAsset` with the Editor API. |
| **No mixed text styling** | A single `TextAsset` has one `TextStyle`. Cannot mix bold, italic, or colors within a single text block. |
| **No blank or solid-color clips** | Cannot create a solid color frame, black screen, or standalone title card. Text and image overlays require a `VideoAsset` beneath them on the inline track. |
| **No audio volume control** | `AudioAsset` has no `volume` parameter. Audio is either full volume or muted via `disable_other_tracks`. Cannot mix at a reduced level. |
| **No keyframe animation** | Cannot change overlay properties over time (e.g., move an image from position A to B). |
### Constraints
| Constraint | Detail |
|---|---|
| **Audio fade max 5 seconds** | `fade_in_duration` and `fade_out_duration` are capped at 5 seconds each. |
| **Overlay positioning is absolute** | Overlays use absolute timestamps from the timeline start. Rearranging inline clips does not move their overlays. |
| **Inline track is video only** | `add_inline()` only accepts `VideoAsset`. Audio, image, and text must use `add_overlay()`. |
| **No overlay-to-clip binding** | Overlays are placed at a fixed timeline timestamp. There is no way to attach an overlay to a specific inline clip so it moves with it. |
## Tips
- **Non-destructive**: Timelines never modify source media. You can create multiple timelines from the same assets.
- **Overlay stacking**: Multiple overlays can start at the same timestamp. Audio overlays mix together; image/text overlays layer in add-order.
- **Inline is VideoAsset only**: `add_inline()` only accepts `VideoAsset`. Use `add_overlay()` for `AudioAsset`, `ImageAsset`, and `TextAsset`.
- **Trim precision**: `start`/`end` on `VideoAsset` and `AudioAsset` are in seconds.
- **Muting video audio**: Set `disable_other_tracks=True` on `AudioAsset` to mute the original video audio when overlaying music or narration.
- **Fade limits**: `fade_in_duration` and `fade_out_duration` on `AudioAsset` have a maximum of 5 seconds.
- **Generated media**: Use `coll.generate_music()`, `coll.generate_sound_effect()`, `coll.generate_voice()`, and `coll.generate_image()` to create media that can be used as timeline assets immediately.

View File

@@ -0,0 +1,321 @@
# Generative Media Guide
VideoDB provides AI-powered generation of images, videos, music, sound effects, voice, and text content. All generation methods are on the **Collection** object.
## Prerequisites
You need a connection and a collection reference before calling any generation method:
```python
import videodb
conn = videodb.connect()
coll = conn.get_collection()
```
## Image Generation
Generate images from text prompts:
```python
image = coll.generate_image(
prompt="a futuristic cityscape at sunset with flying cars",
aspect_ratio="16:9",
)
# Access the generated image
print(image.id)
print(image.generate_url()) # returns a signed download URL
```
### generate_image Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | `str` | required | Text description of the image to generate |
| `aspect_ratio` | `str` | `"1:1"` | Aspect ratio: `"1:1"`, `"9:16"`, `"16:9"`, `"4:3"`, or `"3:4"` |
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
Returns an `Image` object with `.id`, `.name`, and `.collection_id`. The `.url` property may be `None` for generated images — always use `image.generate_url()` to get a reliable signed download URL.
> **Note:** Unlike `Video` objects (which use `.generate_stream()`), `Image` objects use `.generate_url()` to retrieve the image URL. The `.url` property is only populated for some image types (e.g. thumbnails).
## Video Generation
Generate short video clips from text prompts:
```python
video = coll.generate_video(
prompt="a timelapse of a flower blooming in a garden",
duration=5,
)
stream_url = video.generate_stream()
video.play()
```
### generate_video Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | `str` | required | Text description of the video to generate |
| `duration` | `float` | `5` | Duration in seconds (must be integer value, 5-8) |
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
Returns a `Video` object. Generated videos are automatically added to the collection and can be used in timelines, searches, and compilations like any uploaded video.
## Audio Generation
VideoDB provides three separate methods for different audio types.
### Music
Generate background music from text descriptions:
```python
music = coll.generate_music(
prompt="upbeat electronic music with a driving beat, suitable for a tech demo",
duration=30,
)
print(music.id)
```
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | `str` | required | Text description of the music |
| `duration` | `int` | `5` | Duration in seconds |
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
### Sound Effects
Generate specific sound effects:
```python
sfx = coll.generate_sound_effect(
prompt="thunderstorm with heavy rain and distant thunder",
duration=10,
)
```
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | `str` | required | Text description of the sound effect |
| `duration` | `int` | `2` | Duration in seconds |
| `config` | `dict` | `{}` | Additional configuration |
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
### Voice (Text-to-Speech)
Generate speech from text:
```python
voice = coll.generate_voice(
text="Welcome to our product demo. Today we'll walk through the key features.",
voice_name="Default",
)
```
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `text` | `str` | required | Text to convert to speech |
| `voice_name` | `str` | `"Default"` | Voice to use |
| `config` | `dict` | `{}` | Additional configuration |
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
All three audio methods return an `Audio` object with `.id`, `.name`, `.length`, and `.collection_id`.
## Text Generation (LLM Integration)
Use `coll.generate_text()` to run LLM analysis. This is a **Collection-level** method -- pass any context (transcripts, descriptions) directly in the prompt string.
```python
# Get transcript from a video first
transcript_text = video.get_transcript_text()
# Generate analysis using collection LLM
result = coll.generate_text(
prompt=f"Summarize the key points discussed in this video:\n{transcript_text}",
model_name="pro",
)
print(result["output"])
```
### generate_text Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `prompt` | `str` | required | Prompt with context for the LLM |
| `model_name` | `str` | `"basic"` | Model tier: `"basic"`, `"pro"`, or `"ultra"` |
| `response_type` | `str` | `"text"` | Response format: `"text"` or `"json"` |
Returns a `dict` with an `output` key. When `response_type="text"`, `output` is a `str`. When `response_type="json"`, `output` is a `dict`.
```python
result = coll.generate_text(prompt="Summarize this", model_name="pro")
print(result["output"]) # access the actual text/dict
```
### Analyze Scenes with LLM
Combine scene extraction with text generation:
```python
from videodb import SceneExtractionType
# First index scenes
video.index_scenes(
extraction_type=SceneExtractionType.time_based,
extraction_config={"time": 10},
prompt="Describe the visual content in this scene.",
)
# Get transcript for spoken context
transcript_text = video.get_transcript_text()
# Analyze with collection LLM
result = coll.generate_text(
prompt=(
f"Given this video transcript:\n{transcript_text}\n\n"
"Based on the spoken and visual content, describe the main topics covered."
),
model_name="pro",
)
print(result["output"])
```
## Dubbing and Translation
### Dub a Video
Dub a video into another language using the collection method:
```python
dubbed_video = coll.dub_video(
video_id=video.id,
language_code="es", # Spanish
)
dubbed_video.play()
```
### dub_video Parameters
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `video_id` | `str` | required | ID of the video to dub |
| `language_code` | `str` | required | Target language code (e.g., `"es"`, `"fr"`, `"de"`) |
| `callback_url` | `str\|None` | `None` | URL to receive async callback |
Returns a `Video` object with the dubbed content.
### Translate Transcript
Translate a video's transcript without dubbing:
```python
translated = video.translate_transcript(
language="Spanish",
additional_notes="Use formal tone",
)
for entry in translated:
print(entry)
```
**Supported languages** include: `en`, `es`, `fr`, `de`, `it`, `pt`, `ja`, `ko`, `zh`, `hi`, `ar`, and more.
## Complete Workflow Examples
### Generate Narration for a Video
```python
import videodb
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")
# Get transcript
transcript_text = video.get_transcript_text()
# Generate narration script using collection LLM
result = coll.generate_text(
prompt=(
f"Write a professional narration script for this video content:\n"
f"{transcript_text[:2000]}"
),
model_name="pro",
)
script = result["output"]
# Convert script to speech
narration = coll.generate_voice(text=script)
print(f"Narration audio: {narration.id}")
```
### Generate Thumbnail from Prompt
```python
thumbnail = coll.generate_image(
prompt="professional video thumbnail showing data analytics dashboard, modern design",
aspect_ratio="16:9",
)
print(f"Thumbnail URL: {thumbnail.generate_url()}")
```
### Add Generated Music to Video
```python
import videodb
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, AudioAsset
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")
# Generate background music
music = coll.generate_music(
prompt="calm ambient background music for a tutorial video",
duration=60,
)
# Build timeline with video + music overlay
timeline = Timeline(conn)
timeline.add_inline(VideoAsset(asset_id=video.id))
timeline.add_overlay(0, AudioAsset(asset_id=music.id, disable_other_tracks=False))
stream_url = timeline.generate_stream()
print(f"Video with music: {stream_url}")
```
### Structured JSON Output
```python
transcript_text = video.get_transcript_text()
result = coll.generate_text(
prompt=(
f"Given this transcript:\n{transcript_text}\n\n"
"Return a JSON object with keys: summary, topics (array), action_items (array)."
),
model_name="pro",
response_type="json",
)
# result["output"] is a dict when response_type="json"
print(result["output"]["summary"])
print(result["output"]["topics"])
```
## Tips
- **Generated media is persistent**: All generated content is stored in your collection and can be reused.
- **Three audio methods**: Use `generate_music()` for background music, `generate_sound_effect()` for SFX, and `generate_voice()` for text-to-speech. There is no unified `generate_audio()` method.
- **Text generation is collection-level**: `coll.generate_text()` does not have access to video content automatically. Fetch the transcript with `video.get_transcript_text()` and pass it in the prompt.
- **Model tiers**: `"basic"` is fastest, `"pro"` is balanced, `"ultra"` is highest quality. Use `"pro"` for most analysis tasks.
- **Combine generation types**: Generate images for overlays, music for backgrounds, and voice for narration, then compose using timelines (see [editor.md](editor.md)).
- **Prompt quality matters**: Descriptive, specific prompts produce better results across all generation types.
- **Aspect ratios for images**: Choose from `"1:1"`, `"9:16"`, `"16:9"`, `"4:3"`, or `"3:4"`.

View File

@@ -0,0 +1,551 @@
# RTStream Reference
Code-level details for RTStream operations. For workflow guide, see [rtstream.md](rtstream.md).
Based on [docs.videodb.io](https://docs.videodb.io/pages/ingest/live-streams/realtime-apis.md).
---
## Collection RTStream Methods
Methods on `Collection` for managing RTStreams:
| Method | Returns | Description |
|--------|---------|-------------|
| `coll.connect_rtstream(url, name, ...)` | `RTStream` | Create new RTStream from RTSP/RTMP URL |
| `coll.get_rtstream(id)` | `RTStream` | Get existing RTStream by ID |
| `coll.list_rtstreams(limit, offset, status, name, ordering)` | `List[RTStream]` | List all RTStreams in collection |
| `coll.search(query, namespace="rtstream")` | `RTStreamSearchResult` | Search across all RTStreams |
### Connect RTStream
```python
import videodb
conn = videodb.connect()
coll = conn.get_collection()
rtstream = coll.connect_rtstream(
url="rtmp://your-stream-server/live/stream-key",
name="My Live Stream",
media_types=["video"], # or ["audio", "video"]
sample_rate=30, # optional
store=True, # enable recording storage for export
enable_transcript=True, # optional
ws_connection_id=ws_id, # optional, for real-time events
)
```
### Get Existing RTStream
```python
rtstream = coll.get_rtstream("rts-xxx")
```
### List RTStreams
```python
rtstreams = coll.list_rtstreams(
limit=10,
offset=0,
status="connected", # optional filter
name="meeting", # optional filter
ordering="-created_at",
)
for rts in rtstreams:
print(f"{rts.id}: {rts.name} - {rts.status}")
```
### From Capture Session
After a capture session is active, retrieve RTStream objects:
```python
session = conn.get_capture_session(session_id)
mics = session.get_rtstream("mic")
displays = session.get_rtstream("screen")
system_audios = session.get_rtstream("system_audio")
```
Or use the `rtstreams` data from the `capture_session.active` WebSocket event:
```python
for rts in rtstreams:
rtstream = coll.get_rtstream(rts["rtstream_id"])
```
---
## RTStream Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `rtstream.start()` | `None` | Begin ingestion |
| `rtstream.stop()` | `None` | Stop ingestion |
| `rtstream.generate_stream(start, end)` | `str` | Stream recorded segment (Unix timestamps) |
| `rtstream.export(name=None)` | `RTStreamExportResult` | Export to permanent video |
| `rtstream.index_visuals(prompt, ...)` | `RTStreamSceneIndex` | Create visual index with AI analysis |
| `rtstream.index_audio(prompt, ...)` | `RTStreamSceneIndex` | Create audio index with LLM summarization |
| `rtstream.list_scene_indexes()` | `List[RTStreamSceneIndex]` | List all scene indexes on the stream |
| `rtstream.get_scene_index(index_id)` | `RTStreamSceneIndex` | Get a specific scene index |
| `rtstream.search(query, ...)` | `RTStreamSearchResult` | Search indexed content |
| `rtstream.start_transcript(ws_connection_id, engine)` | `dict` | Start live transcription |
| `rtstream.get_transcript(page, page_size, start, end, since)` | `dict` | Get transcript pages |
| `rtstream.stop_transcript(engine)` | `dict` | Stop transcription |
---
## Starting and Stopping
```python
# Begin ingestion
rtstream.start()
# ... stream is being recorded ...
# Stop ingestion
rtstream.stop()
```
---
## Generating Streams
Use Unix timestamps (not seconds offsets) to generate a playback stream from recorded content:
```python
import time
start_ts = time.time()
rtstream.start()
# Let it record for a while...
time.sleep(60)
end_ts = time.time()
rtstream.stop()
# Generate a stream URL for the recorded segment
stream_url = rtstream.generate_stream(start=start_ts, end=end_ts)
print(f"Recorded stream: {stream_url}")
```
---
## Exporting to Video
Export the recorded stream to a permanent video in the collection:
```python
export_result = rtstream.export(name="Meeting Recording 2024-01-15")
print(f"Video ID: {export_result.video_id}")
print(f"Stream URL: {export_result.stream_url}")
print(f"Player URL: {export_result.player_url}")
print(f"Duration: {export_result.duration}s")
```
### RTStreamExportResult Properties
| Property | Type | Description |
|----------|------|-------------|
| `video_id` | `str` | ID of the exported video |
| `stream_url` | `str` | HLS stream URL |
| `player_url` | `str` | Web player URL |
| `name` | `str` | Video name |
| `duration` | `float` | Duration in seconds |
---
## AI Pipelines
AI pipelines process live streams and send results via WebSocket.
### RTStream AI Pipeline Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `rtstream.index_audio(prompt, batch_config, ...)` | `RTStreamSceneIndex` | Start audio indexing with LLM summarization |
| `rtstream.index_visuals(prompt, batch_config, ...)` | `RTStreamSceneIndex` | Start visual indexing of screen content |
### Audio Indexing
Generate LLM summaries of audio content at intervals:
```python
audio_index = rtstream.index_audio(
prompt="Summarize what is being discussed",
batch_config={"type": "word", "value": 50},
model_name=None, # optional
name="meeting_audio", # optional
ws_connection_id=ws_id,
)
```
**Audio batch_config options:**
| Type | Value | Description |
|------|-------|-------------|
| `"word"` | count | Segment every N words |
| `"sentence"` | count | Segment every N sentences |
| `"time"` | seconds | Segment every N seconds |
Examples:
```python
{"type": "word", "value": 50} # every 50 words
{"type": "sentence", "value": 5} # every 5 sentences
{"type": "time", "value": 30} # every 30 seconds
```
Results arrive on the `audio_index` WebSocket channel.
### Visual Indexing
Generate AI descriptions of visual content:
```python
scene_index = rtstream.index_visuals(
prompt="Describe what is happening on screen",
batch_config={"type": "time", "value": 2, "frame_count": 5},
model_name="basic",
name="screen_monitor", # optional
ws_connection_id=ws_id,
)
```
**Parameters:**
| Parameter | Type | Description |
|-----------|------|-------------|
| `prompt` | `str` | Instructions for the AI model (supports structured JSON output) |
| `batch_config` | `dict` | Controls frame sampling (see below) |
| `model_name` | `str` | Model tier: `"mini"`, `"basic"`, `"pro"`, `"ultra"` |
| `name` | `str` | Name for the index (optional) |
| `ws_connection_id` | `str` | WebSocket connection ID for receiving results |
**Visual batch_config:**
| Key | Type | Description |
|-----|------|-------------|
| `type` | `str` | Only `"time"` is supported for visuals |
| `value` | `int` | Window size in seconds |
| `frame_count` | `int` | Number of frames to extract per window |
Example: `{"type": "time", "value": 2, "frame_count": 5}` samples 5 frames every 2 seconds and sends them to the model.
**Structured JSON output:**
Use a prompt that requests JSON format for structured responses:
```python
scene_index = rtstream.index_visuals(
prompt="""Analyze the screen and return a JSON object with:
{
"app_name": "name of the active application",
"activity": "what the user is doing",
"ui_elements": ["list of visible UI elements"],
"contains_text": true/false,
"dominant_colors": ["list of main colors"]
}
Return only valid JSON.""",
batch_config={"type": "time", "value": 3, "frame_count": 3},
model_name="pro",
ws_connection_id=ws_id,
)
```
Results arrive on the `scene_index` WebSocket channel.
---
## Batch Config Summary
| Indexing Type | `type` Options | `value` | Extra Keys |
|---------------|----------------|---------|------------|
| **Audio** | `"word"`, `"sentence"`, `"time"` | words/sentences/seconds | - |
| **Visual** | `"time"` only | seconds | `frame_count` |
Examples:
```python
# Audio: every 50 words
{"type": "word", "value": 50}
# Audio: every 30 seconds
{"type": "time", "value": 30}
# Visual: 5 frames every 2 seconds
{"type": "time", "value": 2, "frame_count": 5}
```
---
## Transcription
Real-time transcription via WebSocket:
```python
# Start live transcription
rtstream.start_transcript(
ws_connection_id=ws_id,
engine=None, # optional, defaults to "assemblyai"
)
# Get transcript pages (with optional filters)
transcript = rtstream.get_transcript(
page=1,
page_size=100,
start=None, # optional: start timestamp filter
end=None, # optional: end timestamp filter
since=None, # optional: for polling, get transcripts after this timestamp
engine=None,
)
# Stop transcription
rtstream.stop_transcript(engine=None)
```
Transcript results arrive on the `transcript` WebSocket channel.
---
## RTStreamSceneIndex
When you call `index_audio()` or `index_visuals()`, the method returns an `RTStreamSceneIndex` object. This object represents the running index and provides methods for managing scenes and alerts.
```python
# index_visuals returns an RTStreamSceneIndex
scene_index = rtstream.index_visuals(
prompt="Describe what is on screen",
ws_connection_id=ws_id,
)
# index_audio also returns an RTStreamSceneIndex
audio_index = rtstream.index_audio(
prompt="Summarize the discussion",
ws_connection_id=ws_id,
)
```
### RTStreamSceneIndex Properties
| Property | Type | Description |
|----------|------|-------------|
| `rtstream_index_id` | `str` | Unique ID of the index |
| `rtstream_id` | `str` | ID of the parent RTStream |
| `extraction_type` | `str` | Type of extraction (`time` or `transcript`) |
| `extraction_config` | `dict` | Extraction configuration |
| `prompt` | `str` | The prompt used for analysis |
| `name` | `str` | Name of the index |
| `status` | `str` | Status (`connected`, `stopped`) |
### RTStreamSceneIndex Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `index.get_scenes(start, end, page, page_size)` | `dict` | Get indexed scenes |
| `index.start()` | `None` | Start/resume the index |
| `index.stop()` | `None` | Stop the index |
| `index.create_alert(event_id, callback_url, ws_connection_id)` | `str` | Create alert for event detection |
| `index.list_alerts()` | `list` | List all alerts on this index |
| `index.enable_alert(alert_id)` | `None` | Enable an alert |
| `index.disable_alert(alert_id)` | `None` | Disable an alert |
### Getting Scenes
Poll indexed scenes from the index:
```python
result = scene_index.get_scenes(
start=None, # optional: start timestamp
end=None, # optional: end timestamp
page=1,
page_size=100,
)
for scene in result["scenes"]:
print(f"[{scene['start']}-{scene['end']}] {scene['text']}")
if result["next_page"]:
# fetch next page
pass
```
### Managing Scene Indexes
```python
# List all indexes on the stream
indexes = rtstream.list_scene_indexes()
# Get a specific index by ID
scene_index = rtstream.get_scene_index(index_id)
# Stop an index
scene_index.stop()
# Restart an index
scene_index.start()
```
---
## Events
Events are reusable detection rules. Create them once, attach to any index via alerts.
### Connection Event Methods
| Method | Returns | Description |
|--------|---------|-------------|
| `conn.create_event(event_prompt, label)` | `str` (event_id) | Create detection event |
| `conn.list_events()` | `list` | List all events |
### Creating an Event
```python
event_id = conn.create_event(
event_prompt="User opened Slack application",
label="slack_opened",
)
```
### Listing Events
```python
events = conn.list_events()
for event in events:
print(f"{event['event_id']}: {event['label']}")
```
---
## Alerts
Alerts wire events to indexes for real-time notifications. When the AI detects content matching the event description, an alert is sent.
### Creating an Alert
```python
# Get the RTStreamSceneIndex from index_visuals
scene_index = rtstream.index_visuals(
prompt="Describe what application is open on screen",
ws_connection_id=ws_id,
)
# Create an alert on the index
alert_id = scene_index.create_alert(
event_id=event_id,
callback_url="https://your-backend.com/alerts", # for webhook delivery
ws_connection_id=ws_id, # for WebSocket delivery (optional)
)
```
**Note:** `callback_url` is required. Pass an empty string `""` if only using WebSocket delivery.
### Managing Alerts
```python
# List all alerts on an index
alerts = scene_index.list_alerts()
# Enable/disable alerts
scene_index.disable_alert(alert_id)
scene_index.enable_alert(alert_id)
```
### Alert Delivery
| Method | Latency | Use Case |
|--------|---------|----------|
| WebSocket | Real-time | Dashboards, live UI |
| Webhook | < 1 second | Server-to-server, automation |
### WebSocket Alert Event
```json
{
"channel": "alert",
"rtstream_id": "rts-xxx",
"data": {
"event_label": "slack_opened",
"timestamp": 1710000012340,
"text": "User opened Slack application"
}
}
```
### Webhook Payload
```json
{
"event_id": "event-xxx",
"label": "slack_opened",
"confidence": 0.95,
"explanation": "User opened the Slack application",
"timestamp": "2024-01-15T10:30:45Z",
"start_time": 1234.5,
"end_time": 1238.0,
"stream_url": "https://stream.videodb.io/v3/...",
"player_url": "https://console.videodb.io/player?url=..."
}
```
---
## WebSocket Integration
All real-time AI results are delivered via WebSocket. Pass `ws_connection_id` to:
- `rtstream.start_transcript()`
- `rtstream.index_audio()`
- `rtstream.index_visuals()`
- `scene_index.create_alert()`
### WebSocket Channels
| Channel | Source | Content |
|---------|--------|---------|
| `transcript` | `start_transcript()` | Real-time speech-to-text |
| `scene_index` | `index_visuals()` | Visual analysis results |
| `audio_index` | `index_audio()` | Audio analysis results |
| `alert` | `create_alert()` | Alert notifications |
For WebSocket event structures and ws_listener usage, see [capture-reference.md](capture-reference.md).
---
## Complete Workflow
```python
import time
import videodb
conn = videodb.connect()
coll = conn.get_collection()
# 1. Connect and start recording
rtstream = coll.connect_rtstream(
url="rtmp://your-stream-server/live/stream-key",
name="Weekly Standup",
)
rtstream.start()
# 2. Record for the duration of the meeting
start_ts = time.time()
time.sleep(1800) # 30 minutes
end_ts = time.time()
rtstream.stop()
# 3. Export to a permanent video
export_result = rtstream.export(name="Weekly Standup Recording")
print(f"Exported video: {export_result.video_id}")
# 4. Index the exported video for search
video = coll.get_video(export_result.video_id)
video.index_spoken_words(force=True)
# 5. Search for action items
results = video.search("action items and next steps")
stream_url = results.compile()
print(f"Action items clip: {stream_url}")
```

View File

@@ -0,0 +1,65 @@
# RTStream Guide
## Overview
RTStream enables real-time ingestion of live video streams (RTSP/RTMP) and desktop capture sessions. Once connected, you can record, index, search, and export content from live sources.
For code-level details (SDK methods, parameters, examples), see [rtstream-reference.md](rtstream-reference.md).
## Use Cases
- **Security & Monitoring**: Connect RTSP cameras, detect events, trigger alerts
- **Live Broadcasts**: Ingest RTMP streams, index in real-time, enable instant search
- **Meeting Recording**: Capture desktop screen and audio, transcribe live, export recordings
- **Event Processing**: Monitor live feeds, run AI analysis, respond to detected content
## Quick Start
1. **Connect to a live stream** (RTSP/RTMP URL) or get RTStream from a capture session
2. **Start ingestion** to begin recording the live content
3. **Start AI pipelines** for real-time indexing (audio, visual, transcription)
4. **Monitor events** via WebSocket for live AI results and alerts
5. **Stop ingestion** when done
6. **Export to video** for permanent storage and further processing
7. **Search the recording** to find specific moments
## RTStream Sources
### From RTSP/RTMP Streams
Connect directly to a live video source:
```python
rtstream = coll.connect_rtstream(
url="rtmp://your-stream-server/live/stream-key",
name="My Live Stream",
)
```
### From Capture Sessions
Get RTStreams from desktop capture (mic, screen, system audio):
```python
session = conn.get_capture_session(session_id)
mics = session.get_rtstream("mic")
displays = session.get_rtstream("screen")
system_audios = session.get_rtstream("system_audio")
```
For capture session workflow, see [capture.md](capture.md).
---
## Scripts
| Script | Description |
|--------|-------------|
| `scripts/ws_listener.py` | WebSocket event listener for real-time AI results |

View File

@@ -0,0 +1,230 @@
# Search & Indexing Guide
Search allows you to find specific moments inside videos using natural language queries, exact keywords, or visual scene descriptions.
## Prerequisites
Videos **must be indexed** before they can be searched. Indexing is a one-time operation per video per index type.
## Indexing
### Spoken Word Index
Index the transcribed speech content of a video for semantic and keyword search:
```python
video = coll.get_video(video_id)
# force=True makes indexing idempotent — skips if already indexed
video.index_spoken_words(force=True)
```
This transcribes the audio track and builds a searchable index over the spoken content. Required for semantic search and keyword search.
**Parameters:**
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `language_code` | `str\|None` | `None` | Language code of the video |
| `segmentation_type` | `SegmentationType` | `SegmentationType.sentence` | Segmentation type (`sentence` or `llm`) |
| `force` | `bool` | `False` | Set to `True` to skip if already indexed (avoids "already exists" error) |
| `callback_url` | `str\|None` | `None` | Webhook URL for async notification |
### Scene Index
Index visual content by generating AI descriptions of scenes. Like spoken word indexing, this raises an error if a scene index already exists. Extract the existing `scene_index_id` from the error message.
```python
import re
from videodb import SceneExtractionType
try:
scene_index_id = video.index_scenes(
extraction_type=SceneExtractionType.shot_based,
prompt="Describe the visual content, objects, actions, and setting in this scene.",
)
except Exception as e:
match = re.search(r"id\s+([a-f0-9]+)", str(e))
if match:
scene_index_id = match.group(1)
else:
raise
```
**Extraction types:**
| Type | Description | Best For |
|------|-------------|----------|
| `SceneExtractionType.shot_based` | Splits on visual shot boundaries | General purpose, action content |
| `SceneExtractionType.time_based` | Splits at fixed intervals | Uniform sampling, long static content |
| `SceneExtractionType.transcript` | Splits based on transcript segments | Speech-driven scene boundaries |
**Parameters for `time_based`:**
```python
video.index_scenes(
extraction_type=SceneExtractionType.time_based,
extraction_config={"time": 5, "select_frames": ["first", "last"]},
prompt="Describe what is happening in this scene.",
)
```
## Search Types
### Semantic Search
Natural language queries matched against spoken content:
```python
from videodb import SearchType
results = video.search(
query="explaining the benefits of machine learning",
search_type=SearchType.semantic,
)
```
Returns ranked segments where the spoken content semantically matches the query.
### Keyword Search
Exact term matching in transcribed speech:
```python
results = video.search(
query="artificial intelligence",
search_type=SearchType.keyword,
)
```
Returns segments containing the exact keyword or phrase.
### Scene Search
Visual content queries matched against indexed scene descriptions. Requires a prior `index_scenes()` call.
`index_scenes()` returns a `scene_index_id`. Pass it to `video.search()` to target a specific scene index (especially important when a video has multiple scene indexes):
```python
from videodb import SearchType, IndexType
from videodb.exceptions import InvalidRequestError
# Search using semantic search against the scene index.
# Use score_threshold to filter low-relevance noise (recommended: 0.3+).
try:
results = video.search(
query="person writing on a whiteboard",
search_type=SearchType.semantic,
index_type=IndexType.scene,
scene_index_id=scene_index_id,
score_threshold=0.3,
)
shots = results.get_shots()
except InvalidRequestError as e:
if "No results found" in str(e):
shots = []
else:
raise
```
**Important notes:**
- Use `SearchType.semantic` with `index_type=IndexType.scene` — this is the most reliable combination and works on all plans.
- `SearchType.scene` exists but may not be available on all plans (e.g. Free tier). Prefer `SearchType.semantic` with `IndexType.scene`.
- The `scene_index_id` parameter is optional. If omitted, the search runs against all scene indexes on the video. Pass it to target a specific index.
- You can create multiple scene indexes per video (with different prompts or extraction types) and search them independently using `scene_index_id`.
### Scene Search with Metadata Filtering
When indexing scenes with custom metadata, you can combine semantic search with metadata filters:
```python
from videodb import SearchType, IndexType
results = video.search(
query="a skillful chasing scene",
search_type=SearchType.semantic,
index_type=IndexType.scene,
scene_index_id=scene_index_id,
filter=[{"camera_view": "road_ahead"}, {"action_type": "chasing"}],
)
```
See the [scene_level_metadata_indexing cookbook](https://github.com/video-db/videodb-cookbook/blob/main/quickstart/scene_level_metadata_indexing.ipynb) for a full example of custom metadata indexing and filtered search.
## Working with Results
### Get Shots
Access individual result segments:
```python
results = video.search("your query")
for shot in results.get_shots():
print(f"Video: {shot.video_id}")
print(f"Start: {shot.start:.2f}s")
print(f"End: {shot.end:.2f}s")
print(f"Text: {shot.text}")
print("---")
```
### Play Compiled Results
Stream all matching segments as a single compiled video:
```python
results = video.search("your query")
stream_url = results.compile()
results.play() # opens compiled stream in browser
```
### Extract Clips
Download or stream specific result segments:
```python
for shot in results.get_shots():
stream_url = shot.generate_stream()
print(f"Clip: {stream_url}")
```
## Cross-Collection Search
Search across all videos in a collection:
```python
coll = conn.get_collection()
# Search across all videos in the collection
results = coll.search(
query="product demo",
search_type=SearchType.semantic,
)
for shot in results.get_shots():
print(f"Video: {shot.video_id} [{shot.start:.1f}s - {shot.end:.1f}s]")
```
> **Note:** Collection-level search only supports `SearchType.semantic`. Using `SearchType.keyword` or `SearchType.scene` with `coll.search()` will raise `NotImplementedError`. For keyword or scene search, use `video.search()` on individual videos instead.
## Search + Compile
Index, search, and compile matching segments into a single playable stream:
```python
video.index_spoken_words(force=True)
results = video.search(query="your query", search_type=SearchType.semantic)
stream_url = results.compile()
print(stream_url)
```
## Tips
- **Index once, search many times**: Indexing is the expensive operation. Once indexed, searches are fast.
- **Combine index types**: Index both spoken words and scenes to enable all search types on the same video.
- **Refine queries**: Semantic search works best with descriptive, natural language phrases rather than single keywords.
- **Use keyword search for precision**: When you need exact term matches, keyword search avoids semantic drift.
- **Handle "No results found"**: `video.search()` raises `InvalidRequestError` when no results match. Always wrap search calls in try/except and treat `"No results found"` as an empty result set.
- **Filter scene search noise**: Semantic scene search can return low-relevance results for vague queries. Use `score_threshold=0.3` (or higher) to filter noise.
- **Idempotent indexing**: Use `index_spoken_words(force=True)` to safely re-index. `index_scenes()` has no `force` parameter — wrap it in try/except and extract the existing `scene_index_id` from the error message with `re.search(r"id\s+([a-f0-9]+)", str(e))`.

View File

@@ -0,0 +1,339 @@
# Streaming & Playback
VideoDB generates streams on-demand, returning HLS-compatible URLs that play instantly in any standard video player. No render times or export waits - edits, searches, and compositions stream immediately.
## Prerequisites
Videos **must be uploaded** to a collection before streams can be generated. For search-based streams, the video must also be **indexed** (spoken words and/or scenes). See [search.md](search.md) for indexing details.
## Core Concepts
### Stream Generation
Every video, search result, and timeline in VideoDB can produce a **stream URL**. This URL points to an HLS (HTTP Live Streaming) manifest that is compiled on demand.
```python
# From a video
stream_url = video.generate_stream()
# From a timeline
stream_url = timeline.generate_stream()
# From search results
stream_url = results.compile()
```
## Streaming a Single Video
### Basic Playback
```python
import videodb
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")
# Generate stream URL
stream_url = video.generate_stream()
print(f"Stream: {stream_url}")
# Open in default browser
video.play()
```
### With Subtitles
```python
# Index and add subtitles first
video.index_spoken_words(force=True)
video.add_subtitle()
# Stream now includes subtitles
stream_url = video.generate_stream()
```
### Specific Segments
Stream only a portion of a video by passing a timeline of timestamp ranges:
```python
# Stream seconds 10-30 and 60-90
stream_url = video.generate_stream(timeline=[(10, 30), (60, 90)])
print(f"Segment stream: {stream_url}")
```
## Streaming Timeline Compositions
Build a multi-asset composition and stream it in real time:
```python
import videodb
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, AudioAsset, ImageAsset, TextAsset, TextStyle
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video(video_id)
music = coll.get_audio(music_id)
timeline = Timeline(conn)
# Main video content
timeline.add_inline(VideoAsset(asset_id=video.id))
# Background music overlay (starts at second 0)
timeline.add_overlay(0, AudioAsset(asset_id=music.id))
# Text overlay at the beginning
timeline.add_overlay(0, TextAsset(
text="Live Demo",
duration=3,
style=TextStyle(fontsize=48, fontcolor="white", boxcolor="#000000"),
))
# Generate the composed stream
stream_url = timeline.generate_stream()
print(f"Composed stream: {stream_url}")
```
**Important:** `add_inline()` only accepts `VideoAsset`. Use `add_overlay()` for `AudioAsset`, `ImageAsset`, and `TextAsset`.
For detailed timeline editing, see [editor.md](editor.md).
## Streaming Search Results
Compile search results into a single stream of all matching segments:
```python
from videodb import SearchType
video.index_spoken_words(force=True)
results = video.search("key announcement", search_type=SearchType.semantic)
# Compile all matching shots into one stream
stream_url = results.compile()
print(f"Search results stream: {stream_url}")
# Or play directly
results.play()
```
### Stream Individual Search Hits
```python
results = video.search("product demo", search_type=SearchType.semantic)
for i, shot in enumerate(results.get_shots()):
stream_url = shot.generate_stream()
print(f"Hit {i+1} [{shot.start:.1f}s-{shot.end:.1f}s]: {stream_url}")
```
## Audio Playback
Get a signed playback URL for audio content:
```python
audio = coll.get_audio(audio_id)
playback_url = audio.generate_url()
print(f"Audio URL: {playback_url}")
```
## Complete Workflow Examples
### Search-to-Stream Pipeline
Combine search, timeline composition, and streaming in one workflow:
```python
import videodb
from videodb import SearchType
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, TextAsset, TextStyle
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")
video.index_spoken_words(force=True)
# Search for key moments
queries = ["introduction", "main demo", "Q&A"]
timeline = Timeline(conn)
for query in queries:
# Find matching segments
results = video.search(query, search_type=SearchType.semantic)
for shot in results.get_shots():
timeline.add_inline(
VideoAsset(asset_id=shot.video_id, start=shot.start, end=shot.end)
)
# Add section label as overlay on the first shot
timeline.add_overlay(0, TextAsset(
text=query.title(),
duration=2,
style=TextStyle(fontsize=36, fontcolor="white", boxcolor="#222222"),
))
stream_url = timeline.generate_stream()
print(f"Dynamic compilation: {stream_url}")
```
### Multi-Video Stream
Combine clips from different videos into a single stream:
```python
import videodb
from videodb.timeline import Timeline
from videodb.asset import VideoAsset
conn = videodb.connect()
coll = conn.get_collection()
video_clips = [
{"id": "vid_001", "start": 0, "end": 15},
{"id": "vid_002", "start": 10, "end": 30},
{"id": "vid_003", "start": 5, "end": 25},
]
timeline = Timeline(conn)
for clip in video_clips:
timeline.add_inline(
VideoAsset(asset_id=clip["id"], start=clip["start"], end=clip["end"])
)
stream_url = timeline.generate_stream()
print(f"Multi-video stream: {stream_url}")
```
### Conditional Stream Assembly
Build a stream dynamically based on search availability:
```python
import videodb
from videodb import SearchType
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, TextAsset, TextStyle
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")
video.index_spoken_words(force=True)
timeline = Timeline(conn)
# Try to find specific content; fall back to full video
topics = ["opening remarks", "technical deep dive", "closing"]
found_any = False
for topic in topics:
results = video.search(topic, search_type=SearchType.semantic)
shots = results.get_shots()
if shots:
found_any = True
for shot in shots:
timeline.add_inline(
VideoAsset(asset_id=shot.video_id, start=shot.start, end=shot.end)
)
# Add a label overlay for the section
timeline.add_overlay(0, TextAsset(
text=topic.title(),
duration=2,
style=TextStyle(fontsize=32, fontcolor="white", boxcolor="#1a1a2e"),
))
if found_any:
stream_url = timeline.generate_stream()
print(f"Curated stream: {stream_url}")
else:
# Fall back to full video stream
stream_url = video.generate_stream()
print(f"Full video stream: {stream_url}")
```
### Live Event Recap
Process an event recording into a streamable recap with multiple sections:
```python
import videodb
from videodb import SearchType
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, AudioAsset, ImageAsset, TextAsset, TextStyle
conn = videodb.connect()
coll = conn.get_collection()
# Upload event recording
event = coll.upload(url="https://example.com/event-recording.mp4")
event.index_spoken_words(force=True)
# Generate background music
music = coll.generate_music(
prompt="upbeat corporate background music",
duration=120,
)
# Generate title image
title_img = coll.generate_image(
prompt="modern event recap title card, dark background, professional",
aspect_ratio="16:9",
)
# Build the recap timeline
timeline = Timeline(conn)
# Main video segments from search
keynote = event.search("keynote announcement", search_type=SearchType.semantic)
if keynote.get_shots():
for shot in keynote.get_shots()[:5]:
timeline.add_inline(
VideoAsset(asset_id=shot.video_id, start=shot.start, end=shot.end)
)
demo = event.search("product demo", search_type=SearchType.semantic)
if demo.get_shots():
for shot in demo.get_shots()[:5]:
timeline.add_inline(
VideoAsset(asset_id=shot.video_id, start=shot.start, end=shot.end)
)
# Overlay title card image
timeline.add_overlay(0, ImageAsset(
asset_id=title_img.id, width=100, height=100, x=80, y=20, duration=5
))
# Overlay section labels
timeline.add_overlay(5, TextAsset(
text="Keynote Highlights",
duration=3,
style=TextStyle(fontsize=40, fontcolor="white", boxcolor="#0d1117"),
))
# Overlay background music
timeline.add_overlay(0, AudioAsset(
asset_id=music.id, fade_in_duration=3
))
# Stream the final recap
stream_url = timeline.generate_stream()
print(f"Event recap: {stream_url}")
```
---
## Tips
- **HLS compatibility**: Stream URLs return HLS manifests (`.m3u8`). They work in Safari natively, and in other browsers via hls.js or similar libraries.
- **On-demand compilation**: Streams are compiled server-side when requested. The first play may have a brief compilation delay; subsequent plays of the same composition are cached.
- **Caching**: Calling `video.generate_stream()` a second time without arguments returns the cached stream URL rather than recompiling.
- **Segment streams**: `video.generate_stream(timeline=[(start, end)])` is the fastest way to stream a specific clip without building a full `Timeline` object.
- **Inline vs overlay**: `add_inline()` only accepts `VideoAsset` and places assets sequentially on the main track. `add_overlay()` accepts `AudioAsset`, `ImageAsset`, and `TextAsset` and layers them on top at a given start time.
- **TextStyle defaults**: `TextStyle` defaults to `font='Sans'`, `fontcolor='black'`. Use `boxcolor` (not `bgcolor`) for background color on text.
- **Combine with generation**: Use `coll.generate_music(prompt, duration)` and `coll.generate_image(prompt, aspect_ratio)` to create assets for timeline compositions.
- **Playback**: `.play()` opens the stream URL in the default system browser. For programmatic use, work with the URL string directly.

View File

@@ -0,0 +1,118 @@
# Use Cases
Common workflows and what VideoDB enables. For code details, see [api-reference.md](api-reference.md), [capture.md](capture.md), [editor.md](editor.md), and [search.md](search.md).
---
## Video Search & Highlights
### Create Highlight Reels
Upload a long video (conference talk, lecture, meeting recording), search for key moments by topic ("product announcement", "Q&A session", "demo"), and automatically compile matching segments into a shareable highlight reel.
### Build Searchable Video Libraries
Batch upload videos to a collection, index them for spoken word search, then query across the entire library. Find specific topics across hundreds of hours of content instantly.
### Extract Specific Clips
Search for moments matching a query ("budget discussion", "action items") and extract each matching segment as an individual clip with its own stream URL.
---
## Video Enhancement
### Add Professional Polish
Take raw footage and enhance it with:
- Auto-generated subtitles from speech
- Custom thumbnails at specific timestamps
- Background music overlays
- Intro/outro sequences with generated images
### AI-Enhanced Content
Combine existing video with generative AI:
- Generate text summaries from transcript
- Create background music matching video duration
- Generate title cards and overlay images
- Mix all elements into a polished final output
---
## Real-Time Capture (Desktop/Meeting)
### Screen + Audio Recording with AI
Capture screen, microphone, and system audio simultaneously. Get real-time:
- **Live transcription** - Speech to text as it happens
- **Audio summaries** - Periodic AI-generated summaries of discussions
- **Visual indexing** - AI descriptions of screen activity
### Meeting Capture with Summarization
Record meetings with live transcription of all participants. Get periodic summaries with key discussion points, decisions, and action items delivered in real-time.
### Screen Activity Tracking
Track what's happening on screen with AI-generated descriptions:
- "User is browsing a spreadsheet in Google Sheets"
- "User switched to a code editor with a Python file"
- "Video call with screen sharing enabled"
### Post-Session Processing
After capture ends, the recording is exported as a permanent video. Then:
- Generate searchable transcript
- Search for specific topics within the recording
- Extract clips of important moments
- Share via stream URL or player link
---
## Live Stream Intelligence (RTSP/RTMP)
### Connect External Streams
Ingest live video from RTSP/RTMP sources (security cameras, encoders, broadcasts). Process and index content in real-time.
### Real-Time Event Detection
Define events to detect in live streams:
- "Person entering restricted area"
- "Traffic violation at intersection"
- "Product visible on shelf"
Get alerts via WebSocket or webhook when events occur.
### Live Stream Search
Search across recorded live stream content. Find specific moments and generate clips from hours of continuous footage.
---
## Content Moderation & Safety
### Automated Content Review
Index video scenes with AI and search for problematic content. Flag videos containing violence, inappropriate content, or policy violations.
### Profanity Detection
Detect and locate profanity in audio. Optionally overlay beep sounds at detected timestamps.
---
## Platform Integration
### Social Media Formatting
Reframe videos for different platforms:
- Vertical (9:16) for TikTok, Reels, Shorts
- Square (1:1) for Instagram feed
- Landscape (16:9) for YouTube
### Transcode for Delivery
Change resolution, bitrate, or quality for different delivery targets. Output optimized streams for web, mobile, or broadcast.
### Generate Shareable Links
Every operation produces playable stream URLs. Embed in web players, share directly, or integrate with existing platforms.
---
## Workflow Summary
| Goal | VideoDB Approach |
|------|------------------|
| Find moments in video | Index spoken words/scenes → Search → Compile clips |
| Create highlights | Search multiple topics → Build timeline → Generate stream |
| Add subtitles | Index spoken words → Add subtitle overlay |
| Record screen + AI | Start capture → Run AI pipelines → Export video |
| Monitor live streams | Connect RTSP → Index scenes → Create alerts |
| Reformat for social | Reframe to target aspect ratio |
| Combine clips | Build timeline with multiple assets → Generate stream |

View File

@@ -0,0 +1,204 @@
#!/usr/bin/env python3
"""
WebSocket event listener for VideoDB with auto-reconnect and graceful shutdown.
Usage:
python scripts/ws_listener.py [OPTIONS] [output_dir]
Arguments:
output_dir Directory for output files (default: /tmp or VIDEODB_EVENTS_DIR env var)
Options:
--clear Clear the events file before starting (use when starting a new session)
Output files:
<output_dir>/videodb_events.jsonl - All WebSocket events (JSONL format)
<output_dir>/videodb_ws_id - WebSocket connection ID
<output_dir>/videodb_ws_pid - Process ID for easy termination
Output (first line, for parsing):
WS_ID=<connection_id>
Examples:
python scripts/ws_listener.py & # Run in background
python scripts/ws_listener.py --clear # Clear events and start fresh
python scripts/ws_listener.py --clear /tmp/mydir # Custom dir with clear
kill $(cat /tmp/videodb_ws_pid) # Stop the listener
"""
import os
import sys
import json
import signal
import asyncio
from datetime import datetime, timezone
from pathlib import Path
from dotenv import load_dotenv
load_dotenv()
import videodb
# Retry config
MAX_RETRIES = 10
INITIAL_BACKOFF = 1 # seconds
MAX_BACKOFF = 60 # seconds
# Parse arguments
def parse_args():
clear = False
output_dir = None
args = sys.argv[1:]
for arg in args:
if arg == "--clear":
clear = True
elif not arg.startswith("-"):
output_dir = arg
if output_dir is None:
output_dir = os.environ.get("VIDEODB_EVENTS_DIR", "/tmp")
return clear, Path(output_dir)
CLEAR_EVENTS, OUTPUT_DIR = parse_args()
EVENTS_FILE = OUTPUT_DIR / "videodb_events.jsonl"
WS_ID_FILE = OUTPUT_DIR / "videodb_ws_id"
PID_FILE = OUTPUT_DIR / "videodb_ws_pid"
# Track if this is the first connection (for clearing events)
_first_connection = True
def log(msg: str):
"""Log with timestamp."""
ts = datetime.now().strftime("%H:%M:%S")
print(f"[{ts}] {msg}", flush=True)
def append_event(event: dict):
"""Append event to JSONL file with timestamps."""
event["ts"] = datetime.now(timezone.utc).isoformat()
event["unix_ts"] = datetime.now(timezone.utc).timestamp()
with open(EVENTS_FILE, "a") as f:
f.write(json.dumps(event) + "\n")
def write_pid():
"""Write PID file for easy process management."""
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
PID_FILE.write_text(str(os.getpid()))
def cleanup_pid():
"""Remove PID file on exit."""
try:
PID_FILE.unlink(missing_ok=True)
except Exception:
pass
async def listen_with_retry():
"""Main listen loop with auto-reconnect and exponential backoff."""
global _first_connection
retry_count = 0
backoff = INITIAL_BACKOFF
while retry_count < MAX_RETRIES:
try:
conn = videodb.connect()
ws_wrapper = conn.connect_websocket()
ws = await ws_wrapper.connect()
ws_id = ws.connection_id
# Ensure output directory exists
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
# Clear events file only on first connection if --clear flag is set
if _first_connection and CLEAR_EVENTS:
EVENTS_FILE.unlink(missing_ok=True)
log("Cleared events file")
_first_connection = False
# Write ws_id to file for easy retrieval
WS_ID_FILE.write_text(ws_id)
# Print ws_id (parseable format for LLM)
if retry_count == 0:
print(f"WS_ID={ws_id}", flush=True)
log(f"Connected (ws_id={ws_id})")
# Reset retry state on successful connection
retry_count = 0
backoff = INITIAL_BACKOFF
# Listen for messages
async for msg in ws.receive():
append_event(msg)
channel = msg.get("channel", msg.get("event", "unknown"))
text = msg.get("data", {}).get("text", "")
if text:
print(f"[{channel}] {text[:80]}", flush=True)
# If we exit the loop normally, connection was closed
log("Connection closed by server")
except asyncio.CancelledError:
log("Shutdown requested")
raise
except Exception as e:
retry_count += 1
log(f"Connection error: {e}")
if retry_count >= MAX_RETRIES:
log(f"Max retries ({MAX_RETRIES}) exceeded, exiting")
break
log(f"Reconnecting in {backoff}s (attempt {retry_count}/{MAX_RETRIES})...")
await asyncio.sleep(backoff)
backoff = min(backoff * 2, MAX_BACKOFF)
async def main_async():
"""Async main with signal handling."""
loop = asyncio.get_running_loop()
shutdown_event = asyncio.Event()
def handle_signal():
log("Received shutdown signal")
shutdown_event.set()
# Register signal handlers
for sig in (signal.SIGINT, signal.SIGTERM):
loop.add_signal_handler(sig, handle_signal)
# Run listener with cancellation support
listen_task = asyncio.create_task(listen_with_retry())
shutdown_task = asyncio.create_task(shutdown_event.wait())
done, pending = await asyncio.wait(
[listen_task, shutdown_task],
return_when=asyncio.FIRST_COMPLETED,
)
# Cancel remaining tasks
for task in pending:
task.cancel()
try:
await task
except asyncio.CancelledError:
pass
log("Shutdown complete")
def main():
write_pid()
try:
asyncio.run(main_async())
finally:
cleanup_pid()
if __name__ == "__main__":
main()