videodb skills update: add reference files for videodb skills

2026-04-20 00:53:29 +08:00 · 2026-03-03 18:16:39 +05:30
parent c26ba60003
commit cff0308568
12 changed files with 3625 additions and 69 deletions
--- a/skills/videodb-skills/SKILL.md
+++ b/skills/videodb-skills/SKILL.md
@@ -1,109 +1,368 @@
 ---
 name: videodb-skills
-description: The only video skill your agent needs — upload any video, connect real-time streams, search inside by what was said or shown, build complex editing workflows with overlays, generate AI media, add subtitles, and get instant streaming links.
+description: See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.
 origin: ECC
+allowed-tools: Read Grep Glob Bash(python:*)
+argument-hint: "[task description]"
 ---

-# VideoDB Skills
+# VideoDB Skill

-The only video skill your agent needs. Upload any video, connect real-time streams, search inside by what was said or shown, build complex editing workflows with overlays, generate AI media, add subtitles, and get instant streaming links — all via the VideoDB Python SDK.
+**Perception + memory + actions for video, live streams, and desktop sessions.**

-## When to Activate
+Use this skill when you need to:

- Uploading or ingesting videos from YouTube URLs, web URLs, or local files
- Searching spoken words or visual scenes across video content
- Generating transcripts or auto-styling subtitles
- Editing clips — trim, combine, multi-timeline composition
- Adding overlays — text, images, audio, music
- Generating AI media — images, video, music, sound effects, voiceovers
- Transcoding — resolution, codec, bitrate, FPS changes
- Reframing video for social platforms (vertical, square, etc.)
- Real-time screen or audio capture with AI transcription
- Getting playable HLS streaming links for any output
+## 1) Desktop Perception
+- Start/stop a **desktop session** capturing **screen, mic, and system audio**
+- Stream **live context** and store **episodic session memory**
+- Run **real-time alerts/triggers** on what's spoken and what's happening on screen
+- Produce **session summaries**, a searchable timeline, and **playable evidence links**
+
+## 2) Video ingest + stream
+- Ingest a **file or URL** and return a **playable web stream link**
+- Transcode/normalize: **codec, bitrate, fps, resolution, aspect ratio**
+
+## 3) Index + search (timestamps + evidence)
+- Build **visual**, **spoken**, and **keyword** indexes
+- Search and return exact moments with **timestamps** and **playable evidence**
+- Auto-create **clips** from search results
+
+## 4) Timeline editing + generation
+- Subtitles: **generate**, **translate**, **burn-in**
+- Overlays: **text/image/branding**, motion captions
+- Audio: **background music**, **voiceover**, **dubbing**
+- Programmatic composition and exports via **timeline operations**
+
+## 5) Live streams (RTSP) + monitoring
+- Connect **RTSP/live feeds**
+- Run **real-time visual and spoken understanding** and emit **events/alerts** for monitoring workflows
+
+---
+
+## Common inputs
+- Local **file path**, public **URL**, or **RTSP URL**
+- Desktop capture request: **start / stop / summarize session**
+- Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules
+
+## Common outputs
+- **Stream URL**
+- Search results with **timestamps** and **evidence links**
+- Generated assets: subtitles, audio, images, clips
+- **Event/alert payloads** for live streams
+- Desktop **session summaries** and memory entries
+
+---
+
+## Canonical prompts (examples)
+- "Start desktop capture and alert when a password field appears."
+- "Record my session and produce an actionable summary when it ends."
+- "Ingest this file and return a playable stream link."
+- "Index this folder and find every scene with people, return timestamps."
+- "Generate subtitles, burn them in, and add light background music."
+- "Connect this RTSP URL and alert when a person enters the zone."
+
+
+## Running Python code
+
+Before running any VideoDB code, change to the project directory and load environment variables:
+
+```python
+from dotenv import load_dotenv
+load_dotenv(".env")
+
+import videodb
+conn = videodb.connect()
+```
+
+This reads `VIDEO_DB_API_KEY` from:
+1. Environment (if already exported)
+2. Project's `.env` file in current directory
+
+If the key is missing, `videodb.connect()` raises `AuthenticationError` automatically.
+
+Do NOT write a script file when a short inline command works.
+
+When writing inline Python (`python -c "..."`), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead:
+
+```bash
+python << 'EOF'
+from dotenv import load_dotenv
+load_dotenv(".env")
+
+import videodb
+conn = videodb.connect()
+coll = conn.get_collection()
+print(f"Videos: {len(coll.get_videos())}")
+EOF
+```

 ## Setup

+When the user asks to "setup videodb" or similar:
+
+### 1. Install SDK
+
 ```bash
-# Install the skill
-npx skills add video-db/skills
-
-# Or setup manually
 pip install "videodb[capture]" python-dotenv
-export VIDEO_DB_API_KEY=sk-xxx
 ```

-Run `/videodb setup` inside your agent for guided setup ($20 free credits, no credit card).
+If `videodb[capture]` fails on Linux, install without the capture extra:

-## Core Patterns
-
-### Upload and Process
-
-```python
-import videodb
-
-conn = videodb.connect()
-video = conn.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")
-
-transcript = video.get_transcript()
-for entry in transcript:
-    print(f"[{entry['start']:.1f}s] {entry['text']}")
+```bash
+pip install videodb python-dotenv
 ```

-### Search Across Videos
+### 2. Configure API key
+
+The user must set `VIDEO_DB_API_KEY` using **either** method:
+
+- **Export in terminal** (before starting Claude): `export VIDEO_DB_API_KEY=your-key`
+- **Project `.env` file**: Save `VIDEO_DB_API_KEY=your-key` in the project's `.env` file
+
+Get a free API key at https://console.videodb.io (50 free uploads, no credit card).
+
+**Do NOT** read, write, or handle the API key yourself. Always let the user set it.
+
+## Quick Reference
+
+### Upload media

 ```python
-# Index for semantic search
-video.index_spoken_words()
+# URL
+video = coll.upload(url="https://example.com/video.mp4")

-# Search by what was said
-results = video.search("product demo")
-for r in results:
-    print(f"{r.start:.1f}s - {r.end:.1f}s: {r.text}")
+# YouTube
+video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")
+
+# Local file
+video = coll.upload(file_path="/path/to/video.mp4")
 ```

-### Timeline Editing
+### Transcript + subtitle

 ```python
-from videodb import Timeline, VideoAsset, AudioAsset
+# force=True skips the error if the video is already indexed
+video.index_spoken_words(force=True)
+text = video.get_transcript_text()
+stream_url = video.add_subtitle()
+```
+
+### Search inside videos
+
+```python
+from videodb.exceptions import InvalidRequestError
+
+video.index_spoken_words(force=True)
+
+# search() raises InvalidRequestError when no results are found.
+# Always wrap in try/except and treat "No results found" as empty.
+try:
+    results = video.search("product demo")
+    shots = results.get_shots()
+    stream_url = results.compile()
+except InvalidRequestError as e:
+    if "No results found" in str(e):
+        shots = []
+    else:
+        raise
+```
+
+### Scene search
+
+```python
+import re
+from videodb import SearchType, IndexType, SceneExtractionType
+from videodb.exceptions import InvalidRequestError
+
+# index_scenes() has no force parameter — it raises an error if a scene
+# index already exists. Extract the existing index ID from the error.
+try:
+    scene_index_id = video.index_scenes(
+        extraction_type=SceneExtractionType.shot_based,
+        prompt="Describe the visual content in this scene.",
+    )
+except Exception as e:
+    match = re.search(r"id\s+([a-f0-9]+)", str(e))
+    if match:
+        scene_index_id = match.group(1)
+    else:
+        raise
+
+# Use score_threshold to filter low-relevance noise (recommended: 0.3+)
+try:
+    results = video.search(
+        query="person writing on a whiteboard",
+        search_type=SearchType.semantic,
+        index_type=IndexType.scene,
+        scene_index_id=scene_index_id,
+        score_threshold=0.3,
+    )
+    shots = results.get_shots()
+    stream_url = results.compile()
+except InvalidRequestError as e:
+    if "No results found" in str(e):
+        shots = []
+    else:
+        raise
+```
+
+### Timeline editing
+
+**Important:** Always validate timestamps before building a timeline:
+- `start` must be >= 0 (negative values are silently accepted but produce broken output)
+- `start` must be < `end`
+- `end` must be <= `video.length`
+
+```python
+from videodb.timeline import Timeline
+from videodb.asset import VideoAsset, TextAsset, TextStyle

 timeline = Timeline(conn)
-asset = VideoAsset(asset_id=video.id, start=10, end=30)
-timeline.add_inline(asset)
-
-stream = timeline.generate_stream()
-print(stream)  # Playable HLS link
+timeline.add_inline(VideoAsset(asset_id=video.id, start=10, end=30))
+timeline.add_overlay(0, TextAsset(text="The End", duration=3, style=TextStyle(fontsize=36)))
+stream_url = timeline.generate_stream()
 ```

-### AI Media Generation
+### Transcode video (resolution / quality change)

 ```python
-audio = conn.generate_audio(text="Upbeat background music", duration=30)
-image = conn.generate_image(prompt="Title card: Welcome to the Demo")
+from videodb import TranscodeMode, VideoConfig, AudioConfig
+
+# Change resolution, quality, or aspect ratio server-side
+job_id = conn.transcode(
+    source="https://example.com/video.mp4",
+    callback_url="https://example.com/webhook",
+    mode=TranscodeMode.economy,
+    video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"),
+    audio_config=AudioConfig(mute=False),
+)
 ```

-## Capabilities
+### Reframe aspect ratio (for social platforms)

-| Capability | What It Does |
-|---|---|
-| Upload | YouTube, URLs, local files |
-| Search | Speech-based and scene-based |
-| Transcripts | Timestamped, multi-language |
-| Edit | Trim, combine, multi-timeline |
-| Subtitles | Auto-generate, custom styling |
-| AI Generate | Images, video, music, SFX, voiceover |
-| Capture | Screen + audio, real-time |
-| Transcode | Resolution, codec, aspect ratio |
-| Stream | HLS playable links |
+**Warning:** `reframe()` is a slow server-side operation. For long videos it can take
+several minutes and may time out. Best practices:
+- Always limit to a short segment using `start`/`end` when possible
+- For full-length videos, use `callback_url` for async processing
+- Trim the video on a `Timeline` first, then reframe the shorter result

-## Best Practices
+```python
+from videodb import ReframeMode

- Always verify SDK connection before operations: `conn.get_collection()`
- Use `video.index_spoken_words()` before searching — indexing is required once per video
- For scene search, use `video.index_scenes()` — this processes visual frames
- Timeline edits produce new streams; the original video is never modified
- AI generation is async — poll status or use callbacks for long operations
- Store `VIDEO_DB_API_KEY` in `.env`, not hardcoded
+# Always prefer reframing a short segment:
+reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
+
+# Async reframe for full-length videos (returns None, result via webhook):
+video.reframe(target="vertical", callback_url="https://example.com/webhook")
+
+# Presets: "vertical" (9:16), "square" (1:1), "landscape" (16:9)
+reframed = video.reframe(start=0, end=60, target="square")
+
+# Custom dimensions
+reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})
+```
+
+### Generative media
+
+```python
+image = coll.generate_image(
+    prompt="a sunset over mountains",
+    aspect_ratio="16:9",
+)
+```
+
+## Error handling
+
+```python
+from videodb.exceptions import AuthenticationError, InvalidRequestError
+
+try:
+    conn = videodb.connect()
+except AuthenticationError:
+    print("Check your VIDEO_DB_API_KEY")
+
+try:
+    video = coll.upload(url="https://example.com/video.mp4")
+except InvalidRequestError as e:
+    print(f"Upload failed: {e}")
+```
+
+### Common pitfalls
+
+| Scenario | Error message | Solution |
+|----------|--------------|----------|
+| Indexing an already-indexed video | `Spoken word index for video already exists` | Use `video.index_spoken_words(force=True)` to skip if already indexed |
+| Scene index already exists | `Scene index with id XXXX already exists` | Extract the existing `scene_index_id` from the error with `re.search(r"id\s+([a-f0-9]+)", str(e))` |
+| Search finds no matches | `InvalidRequestError: No results found` | Catch the exception and treat as empty results (`shots = []`) |
+| Reframe times out | Blocks indefinitely on long videos | Use `start`/`end` to limit segment, or pass `callback_url` for async |
+| Negative timestamps on Timeline | Silently produces broken stream | Always validate `start >= 0` before creating `VideoAsset` |
+| `generate_video()` / `create_collection()` fails | `Operation not allowed` or `maximum limit` | Plan-gated features — inform the user about plan limits |
+
+## Additional docs
+
+Reference documentation is in the `reference/` directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.
+
+- [reference/api-reference.md](reference/api-reference.md) - Complete VideoDB Python SDK API reference
+- [reference/search.md](reference/search.md) - In-depth guide to video search (spoken word and scene-based)
+- [reference/editor.md](reference/editor.md) - Timeline editing, assets, and composition
+- [reference/streaming.md](reference/streaming.md) - HLS streaming and instant playback
+- [reference/generative.md](reference/generative.md) - AI-powered media generation (images, video, audio)
+- [reference/rtstream.md](reference/rtstream.md) - Live stream ingestion workflow (RTSP/RTMP)
+- [reference/rtstream-reference.md](reference/rtstream-reference.md) - RTStream SDK methods and AI pipelines
+- [reference/capture.md](reference/capture.md) - Desktop capture workflow
+- [reference/capture-reference.md](reference/capture-reference.md) - Capture SDK and WebSocket events
+- [reference/use-cases.md](reference/use-cases.md) - Common video processing patterns and examples
+
+## Screen Recording (Desktop Capture)
+
+Use `ws_listener.py` to capture WebSocket events during recording sessions. Desktop capture supports **macOS** only.
+
+### Quick Start
+
+1. **Start listener**: `python scripts/ws_listener.py &`
+2. **Get WebSocket ID**: `cat /tmp/videodb_ws_id`
+3. **Run capture code** (see reference/capture.md for full workflow)
+4. **Events written to**: `/tmp/videodb_events.jsonl`
+
+### Query Events
+
+```python
+import json
+events = [json.loads(l) for l in open("/tmp/videodb_events.jsonl")]
+
+# Get all transcripts
+transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
+
+# Get visual descriptions from last 5 minutes
+import time
+cutoff = time.time() - 300
+recent_visual = [e for e in events 
+                 if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff]
+```
+
+### Utility Scripts
+
+- [scripts/ws_listener.py](scripts/ws_listener.py) - WebSocket event listener (dumps to JSONL)
+
+For complete capture workflow, see [reference/capture.md](reference/capture.md).
+
+
+**Do not use ffmpeg, moviepy, or local encoding tools** when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing).
+
+### When to use what
+
+| Problem | VideoDB solution |
+|---------|-----------------|
+| Platform rejects video aspect ratio or resolution | `video.reframe()` or `conn.transcode()` with `VideoConfig` |
+| Need to resize video for Twitter/Instagram/TikTok | `video.reframe(target="vertical")` or `target="square"` |
+| Need to change resolution (e.g. 1080p → 720p) | `conn.transcode()` with `VideoConfig(resolution=720)` |
+| Need to overlay audio/music on video | `AudioAsset` on a `Timeline` |
+| Need to add subtitles | `video.add_subtitle()` or `CaptionAsset` |
+| Need to combine/trim clips | `VideoAsset` on a `Timeline` |
+| Need to generate voiceover, music, or SFX | `coll.generate_voice()`, `generate_music()`, `generate_sound_effect()` |

 ## Repository

 https://github.com/video-db/skills
+
+**Maintained By:** [VideoDB](https://github.com/video-db)