--- name: videodb description: See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture. origin: ECC allowed-tools: Read Grep Glob Bash(python:*) argument-hint: "[task description]" --- # VideoDB Skill **Perception + memory + actions for video, live streams, and desktop sessions.** ## When to use ### Desktop Perception - Start/stop a **desktop session** capturing **screen, mic, and system audio** - Stream **live context** and store **episodic session memory** - Run **real-time alerts/triggers** on what's spoken and what's happening on screen - Produce **session summaries**, a searchable timeline, and **playable evidence links** ### Video ingest + stream - Ingest a **file or URL** and return a **playable web stream link** - Transcode/normalize: **codec, bitrate, fps, resolution, aspect ratio** ### Index + search (timestamps + evidence) - Build **visual**, **spoken**, and **keyword** indexes - Search and return exact moments with **timestamps** and **playable evidence** - Auto-create **clips** from search results ### Timeline editing + generation - Subtitles: **generate**, **translate**, **burn-in** - Overlays: **text/image/branding**, motion captions - Audio: **background music**, **voiceover**, **dubbing** - Programmatic composition and exports via **timeline operations** ### Live streams (RTSP) + monitoring - Connect **RTSP/live feeds** - Run **real-time visual and spoken understanding** and emit **events/alerts** for monitoring workflows ## How it works ### Common inputs - Local **file path**, public **URL**, or **RTSP URL** - Desktop capture request: **start / stop / summarize session** - Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules ### Common outputs - **Stream URL** - Search results with **timestamps** and **evidence links** - Generated assets: subtitles, audio, images, clips - **Event/alert payloads** for live streams - Desktop **session summaries** and memory entries ### Running Python code Before running any VideoDB code, change to the project directory and load environment variables: ```python from dotenv import load_dotenv load_dotenv(".env") import videodb conn = videodb.connect() ``` This reads `VIDEO_DB_API_KEY` from: 1. Environment (if already exported) 2. Project's `.env` file in current directory If the key is missing, `videodb.connect()` raises `AuthenticationError` automatically. Do NOT write a script file when a short inline command works. When writing inline Python (`python -c "..."`), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead: ```bash python << 'EOF' from dotenv import load_dotenv load_dotenv(".env") import videodb conn = videodb.connect() coll = conn.get_collection() print(f"Videos: {len(coll.get_videos())}") EOF ``` ### Setup When the user asks to "setup videodb" or similar: ### 1. Install SDK ```bash pip install "videodb[capture]" python-dotenv ``` If `videodb[capture]` fails on Linux, install without the capture extra: ```bash pip install videodb python-dotenv ``` ### 2. Configure API key The user must set `VIDEO_DB_API_KEY` using **either** method: - **Export in terminal** (before starting Claude): `export VIDEO_DB_API_KEY=your-key` - **Project `.env` file**: Save `VIDEO_DB_API_KEY=your-key` in the project's `.env` file Get a free API key at (50 free uploads, no credit card). **Do NOT** read, write, or handle the API key yourself. Always let the user set it. ### Quick Reference ### Upload media ```python # URL video = coll.upload(url="https://example.com/video.mp4") # YouTube video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID") # Local file video = coll.upload(file_path="/path/to/video.mp4") ``` ### Transcript + subtitle ```python # force=True skips the error if the video is already indexed video.index_spoken_words(force=True) text = video.get_transcript_text() stream_url = video.add_subtitle() ``` ### Search inside videos ```python from videodb.exceptions import InvalidRequestError video.index_spoken_words(force=True) # search() raises InvalidRequestError when no results are found. # Always wrap in try/except and treat "No results found" as empty. try: results = video.search("product demo") shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise ``` ### Scene search ```python import re from videodb import SearchType, IndexType, SceneExtractionType from videodb.exceptions import InvalidRequestError # index_scenes() has no force parameter — it raises an error if a scene # index already exists. Extract the existing index ID from the error. try: scene_index_id = video.index_scenes( extraction_type=SceneExtractionType.shot_based, prompt="Describe the visual content in this scene.", ) except Exception as e: match = re.search(r"id\s+([a-f0-9]+)", str(e)) if match: scene_index_id = match.group(1) else: raise # Use score_threshold to filter low-relevance noise (recommended: 0.3+) try: results = video.search( query="person writing on a whiteboard", search_type=SearchType.semantic, index_type=IndexType.scene, scene_index_id=scene_index_id, score_threshold=0.3, ) shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise ``` ### Timeline editing **Important:** Always validate timestamps before building a timeline: - `start` must be >= 0 (negative values are silently accepted but produce broken output) - `start` must be < `end` - `end` must be <= `video.length` ```python from videodb.timeline import Timeline from videodb.asset import VideoAsset, TextAsset, TextStyle timeline = Timeline(conn) timeline.add_inline(VideoAsset(asset_id=video.id, start=10, end=30)) timeline.add_overlay(0, TextAsset(text="The End", duration=3, style=TextStyle(fontsize=36))) stream_url = timeline.generate_stream() ``` ### Transcode video (resolution / quality change) ```python from videodb import TranscodeMode, VideoConfig, AudioConfig # Change resolution, quality, or aspect ratio server-side job_id = conn.transcode( source="https://example.com/video.mp4", callback_url="https://example.com/webhook", mode=TranscodeMode.economy, video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"), audio_config=AudioConfig(mute=False), ) ``` ### Reframe aspect ratio (for social platforms) **Warning:** `reframe()` is a slow server-side operation. For long videos it can take several minutes and may time out. Best practices: - Always limit to a short segment using `start`/`end` when possible - For full-length videos, use `callback_url` for async processing - Trim the video on a `Timeline` first, then reframe the shorter result ```python from videodb import ReframeMode # Always prefer reframing a short segment: reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart) # Async reframe for full-length videos (returns None, result via webhook): video.reframe(target="vertical", callback_url="https://example.com/webhook") # Presets: "vertical" (9:16), "square" (1:1), "landscape" (16:9) reframed = video.reframe(start=0, end=60, target="square") # Custom dimensions reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720}) ``` ### Generative media ```python image = coll.generate_image( prompt="a sunset over mountains", aspect_ratio="16:9", ) ``` ## Error handling ```python from videodb.exceptions import AuthenticationError, InvalidRequestError try: conn = videodb.connect() except AuthenticationError: print("Check your VIDEO_DB_API_KEY") try: video = coll.upload(url="https://example.com/video.mp4") except InvalidRequestError as e: print(f"Upload failed: {e}") ``` ### Common pitfalls | Scenario | Error message | Solution | |----------|--------------|----------| | Indexing an already-indexed video | `Spoken word index for video already exists` | Use `video.index_spoken_words(force=True)` to skip if already indexed | | Scene index already exists | `Scene index with id XXXX already exists` | Extract the existing `scene_index_id` from the error with `re.search(r"id\s+([a-f0-9]+)", str(e))` | | Search finds no matches | `InvalidRequestError: No results found` | Catch the exception and treat as empty results (`shots = []`) | | Reframe times out | Blocks indefinitely on long videos | Use `start`/`end` to limit segment, or pass `callback_url` for async | | Negative timestamps on Timeline | Silently produces broken stream | Always validate `start >= 0` before creating `VideoAsset` | | `generate_video()` / `create_collection()` fails | `Operation not allowed` or `maximum limit` | Plan-gated features — inform the user about plan limits | ## Examples ### Canonical prompts - "Start desktop capture and alert when a password field appears." - "Record my session and produce an actionable summary when it ends." - "Ingest this file and return a playable stream link." - "Index this folder and find every scene with people, return timestamps." - "Generate subtitles, burn them in, and add light background music." - "Connect this RTSP URL and alert when a person enters the zone." ### Screen Recording (Desktop Capture) Use `ws_listener.py` to capture WebSocket events during recording sessions. Desktop capture supports **macOS** only. #### Quick Start 1. **Choose state dir**: `STATE_DIR="${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}"` 2. **Start listener**: `VIDEODB_EVENTS_DIR="$STATE_DIR" python scripts/ws_listener.py --clear "$STATE_DIR" &` 3. **Get WebSocket ID**: `cat "$STATE_DIR/videodb_ws_id"` 4. **Run capture code** (see reference/capture.md for the full workflow) 5. **Events written to**: `$STATE_DIR/videodb_events.jsonl` Use `--clear` whenever you start a fresh capture run so stale transcript and visual events do not leak into the new session. #### Query Events ```python import json import os import time from pathlib import Path events_dir = Path(os.environ.get("VIDEODB_EVENTS_DIR", Path.home() / ".local" / "state" / "videodb")) events_file = events_dir / "videodb_events.jsonl" events = [] if events_file.exists(): with events_file.open(encoding="utf-8") as handle: for line in handle: try: events.append(json.loads(line)) except json.JSONDecodeError: continue transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"] cutoff = time.time() - 300 recent_visual = [ e for e in events if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff ] ``` ## Additional docs Reference documentation is in the `reference/` directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed. - [reference/api-reference.md](reference/api-reference.md) - Complete VideoDB Python SDK API reference - [reference/search.md](reference/search.md) - In-depth guide to video search (spoken word and scene-based) - [reference/editor.md](reference/editor.md) - Timeline editing, assets, and composition - [reference/streaming.md](reference/streaming.md) - HLS streaming and instant playback - [reference/generative.md](reference/generative.md) - AI-powered media generation (images, video, audio) - [reference/rtstream.md](reference/rtstream.md) - Live stream ingestion workflow (RTSP/RTMP) - [reference/rtstream-reference.md](reference/rtstream-reference.md) - RTStream SDK methods and AI pipelines - [reference/capture.md](reference/capture.md) - Desktop capture workflow - [reference/capture-reference.md](reference/capture-reference.md) - Capture SDK and WebSocket events - [reference/use-cases.md](reference/use-cases.md) - Common video processing patterns and examples **Do not use ffmpeg, moviepy, or local encoding tools** when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing). ### When to use what | Problem | VideoDB solution | |---------|-----------------| | Platform rejects video aspect ratio or resolution | `video.reframe()` or `conn.transcode()` with `VideoConfig` | | Need to resize video for Twitter/Instagram/TikTok | `video.reframe(target="vertical")` or `target="square"` | | Need to change resolution (e.g. 1080p → 720p) | `conn.transcode()` with `VideoConfig(resolution=720)` | | Need to overlay audio/music on video | `AudioAsset` on a `Timeline` | | Need to add subtitles | `video.add_subtitle()` or `CaptionAsset` | | Need to combine/trim clips | `VideoAsset` on a `Timeline` | | Need to generate voiceover, music, or SFX | `coll.generate_voice()`, `generate_music()`, `generate_sound_effect()` | ## Provenance Reference material for this skill is vendored locally under `skills/videodb/reference/`. Use the local copies above instead of following external repository links at runtime. **Maintained By:** [VideoDB](https://www.videodb.io/)