docs: resolve videodb review findings

2026-04-15 14:33:33 +08:00 · 2026-03-10 21:18:33 -07:00
parent 2581bebfd9
commit db2bf16427
5 changed files with 188 additions and 134 deletions
--- a/skills/videodb/SKILL.md
+++ b/skills/videodb/SKILL.md
@@ -10,59 +10,48 @@ argument-hint: "[task description]"

 **Perception + memory + actions for video, live streams, and desktop sessions.**

-Use this skill when you need to:
+## When to Use

-## 1) Desktop Perception
+### Desktop Perception
 - Start/stop a **desktop session** capturing **screen, mic, and system audio**
 - Stream **live context** and store **episodic session memory**
 - Run **real-time alerts/triggers** on what's spoken and what's happening on screen
 - Produce **session summaries**, a searchable timeline, and **playable evidence links**

-## 2) Video ingest + stream
+### Video ingest + stream
 - Ingest a **file or URL** and return a **playable web stream link**
 - Transcode/normalize: **codec, bitrate, fps, resolution, aspect ratio**

-## 3) Index + search (timestamps + evidence)
+### Index + search (timestamps + evidence)
 - Build **visual**, **spoken**, and **keyword** indexes
 - Search and return exact moments with **timestamps** and **playable evidence**
 - Auto-create **clips** from search results

-## 4) Timeline editing + generation
+### Timeline editing + generation
 - Subtitles: **generate**, **translate**, **burn-in**
 - Overlays: **text/image/branding**, motion captions
 - Audio: **background music**, **voiceover**, **dubbing**
 - Programmatic composition and exports via **timeline operations**

-## 5) Live streams (RTSP) + monitoring
+### Live streams (RTSP) + monitoring
 - Connect **RTSP/live feeds**
 - Run **real-time visual and spoken understanding** and emit **events/alerts** for monitoring workflows

---
+## How It Works

-## Common inputs
+### Common inputs
 - Local **file path**, public **URL**, or **RTSP URL**
 - Desktop capture request: **start / stop / summarize session**
 - Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules

-## Common outputs
+### Common outputs
 - **Stream URL**
 - Search results with **timestamps** and **evidence links**
 - Generated assets: subtitles, audio, images, clips
 - **Event/alert payloads** for live streams
 - Desktop **session summaries** and memory entries

---
-
-## Canonical prompts (examples)
- "Start desktop capture and alert when a password field appears."
- "Record my session and produce an actionable summary when it ends."
- "Ingest this file and return a playable stream link."
- "Index this folder and find every scene with people, return timestamps."
- "Generate subtitles, burn them in, and add light background music."
- "Connect this RTSP URL and alert when a person enters the zone."
-
-
-## Running Python code
+### Running Python code

 Before running any VideoDB code, change to the project directory and load environment variables:

@@ -96,7 +85,7 @@ print(f"Videos: {len(coll.get_videos())}")
 EOF
 ```

-## Setup
+### Setup

 When the user asks to "setup videodb" or similar:

@@ -123,7 +112,7 @@ Get a free API key at https://console.videodb.io (50 free uploads, no credit car

 **Do NOT** read, write, or handle the API key yourself. Always let the user set it.

-## Quick Reference
+### Quick Reference

 ### Upload media

@@ -298,6 +287,55 @@ except InvalidRequestError as e:
 | Negative timestamps on Timeline | Silently produces broken stream | Always validate `start >= 0` before creating `VideoAsset` |
 | `generate_video()` / `create_collection()` fails | `Operation not allowed` or `maximum limit` | Plan-gated features — inform the user about plan limits |

+## Examples
+
+### Canonical prompts
+- "Start desktop capture and alert when a password field appears."
+- "Record my session and produce an actionable summary when it ends."
+- "Ingest this file and return a playable stream link."
+- "Index this folder and find every scene with people, return timestamps."
+- "Generate subtitles, burn them in, and add light background music."
+- "Connect this RTSP URL and alert when a person enters the zone."
+
+### Screen Recording (Desktop Capture)
+
+Use `ws_listener.py` to capture WebSocket events during recording sessions. Desktop capture supports **macOS** only.
+
+#### Quick Start
+
+1. **Start listener**: `python scripts/ws_listener.py --clear &`
+2. **Get WebSocket ID**: `cat "${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_ws_id"`
+3. **Run capture code** (see reference/capture.md for the full workflow)
+4. **Events written to**: `${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_events.jsonl`
+
+Use `--clear` whenever you start a fresh capture run so stale transcript and visual events do not leak into the new session.
+
+#### Query Events
+
+```python
+import json
+import time
+from pathlib import Path
+
+events_file = Path.home() / ".local" / "state" / "videodb" / "videodb_events.jsonl"
+events = []
+
+if events_file.exists():
+    with events_file.open(encoding="utf-8") as handle:
+        for line in handle:
+            try:
+                events.append(json.loads(line))
+            except json.JSONDecodeError:
+                continue
+
+transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
+cutoff = time.time() - 300
+recent_visual = [
+    e for e in events
+    if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff
+]
+```
+
 ## Additional docs

 Reference documentation is in the `reference/` directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.
@@ -313,50 +351,6 @@ Reference documentation is in the `reference/` directory adjacent to this SKILL.
 - [reference/capture-reference.md](reference/capture-reference.md) - Capture SDK and WebSocket events
 - [reference/use-cases.md](reference/use-cases.md) - Common video processing patterns and examples

-## Screen Recording (Desktop Capture)
-
-Use `ws_listener.py` to capture WebSocket events during recording sessions. Desktop capture supports **macOS** only.
-
-### Quick Start
-
-1. **Start listener**: `python scripts/ws_listener.py &`
-2. **Get WebSocket ID**: `cat /tmp/videodb_ws_id`
-3. **Run capture code** (see reference/capture.md for full workflow)
-4. **Events written to**: `/tmp/videodb_events.jsonl`
-
-### Query Events
-
-```python
-import json
-from pathlib import Path
-
-events_file = Path("/tmp/videodb_events.jsonl")
-events = []
-
-if events_file.exists():
-    with events_file.open(encoding="utf-8") as handle:
-        for line in handle:
-            try:
-                events.append(json.loads(line))
-            except json.JSONDecodeError:
-                continue
-
-# Get all transcripts
-transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
-
-# Get visual descriptions from last 5 minutes
-import time
-cutoff = time.time() - 300
-recent_visual = [e for e in events 
-                 if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff]
-```
-
-### Utility Scripts
-
- [scripts/ws_listener.py](scripts/ws_listener.py) - WebSocket event listener (dumps to JSONL)
-
-For complete capture workflow, see [reference/capture.md](reference/capture.md).
-

 **Do not use ffmpeg, moviepy, or local encoding tools** when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing).