docs: resolve videodb review findings

This commit is contained in:
Affaan Mustafa
2026-03-10 21:18:33 -07:00
parent 2581bebfd9
commit db2bf16427
5 changed files with 188 additions and 134 deletions

View File

@@ -10,59 +10,48 @@ argument-hint: "[task description]"
**Perception + memory + actions for video, live streams, and desktop sessions.** **Perception + memory + actions for video, live streams, and desktop sessions.**
Use this skill when you need to: ## When to Use
## 1) Desktop Perception ### Desktop Perception
- Start/stop a **desktop session** capturing **screen, mic, and system audio** - Start/stop a **desktop session** capturing **screen, mic, and system audio**
- Stream **live context** and store **episodic session memory** - Stream **live context** and store **episodic session memory**
- Run **real-time alerts/triggers** on what's spoken and what's happening on screen - Run **real-time alerts/triggers** on what's spoken and what's happening on screen
- Produce **session summaries**, a searchable timeline, and **playable evidence links** - Produce **session summaries**, a searchable timeline, and **playable evidence links**
## 2) Video ingest + stream ### Video ingest + stream
- Ingest a **file or URL** and return a **playable web stream link** - Ingest a **file or URL** and return a **playable web stream link**
- Transcode/normalize: **codec, bitrate, fps, resolution, aspect ratio** - Transcode/normalize: **codec, bitrate, fps, resolution, aspect ratio**
## 3) Index + search (timestamps + evidence) ### Index + search (timestamps + evidence)
- Build **visual**, **spoken**, and **keyword** indexes - Build **visual**, **spoken**, and **keyword** indexes
- Search and return exact moments with **timestamps** and **playable evidence** - Search and return exact moments with **timestamps** and **playable evidence**
- Auto-create **clips** from search results - Auto-create **clips** from search results
## 4) Timeline editing + generation ### Timeline editing + generation
- Subtitles: **generate**, **translate**, **burn-in** - Subtitles: **generate**, **translate**, **burn-in**
- Overlays: **text/image/branding**, motion captions - Overlays: **text/image/branding**, motion captions
- Audio: **background music**, **voiceover**, **dubbing** - Audio: **background music**, **voiceover**, **dubbing**
- Programmatic composition and exports via **timeline operations** - Programmatic composition and exports via **timeline operations**
## 5) Live streams (RTSP) + monitoring ### Live streams (RTSP) + monitoring
- Connect **RTSP/live feeds** - Connect **RTSP/live feeds**
- Run **real-time visual and spoken understanding** and emit **events/alerts** for monitoring workflows - Run **real-time visual and spoken understanding** and emit **events/alerts** for monitoring workflows
--- ## How It Works
## Common inputs ### Common inputs
- Local **file path**, public **URL**, or **RTSP URL** - Local **file path**, public **URL**, or **RTSP URL**
- Desktop capture request: **start / stop / summarize session** - Desktop capture request: **start / stop / summarize session**
- Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules - Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules
## Common outputs ### Common outputs
- **Stream URL** - **Stream URL**
- Search results with **timestamps** and **evidence links** - Search results with **timestamps** and **evidence links**
- Generated assets: subtitles, audio, images, clips - Generated assets: subtitles, audio, images, clips
- **Event/alert payloads** for live streams - **Event/alert payloads** for live streams
- Desktop **session summaries** and memory entries - Desktop **session summaries** and memory entries
--- ### Running Python code
## Canonical prompts (examples)
- "Start desktop capture and alert when a password field appears."
- "Record my session and produce an actionable summary when it ends."
- "Ingest this file and return a playable stream link."
- "Index this folder and find every scene with people, return timestamps."
- "Generate subtitles, burn them in, and add light background music."
- "Connect this RTSP URL and alert when a person enters the zone."
## Running Python code
Before running any VideoDB code, change to the project directory and load environment variables: Before running any VideoDB code, change to the project directory and load environment variables:
@@ -96,7 +85,7 @@ print(f"Videos: {len(coll.get_videos())}")
EOF EOF
``` ```
## Setup ### Setup
When the user asks to "setup videodb" or similar: When the user asks to "setup videodb" or similar:
@@ -123,7 +112,7 @@ Get a free API key at https://console.videodb.io (50 free uploads, no credit car
**Do NOT** read, write, or handle the API key yourself. Always let the user set it. **Do NOT** read, write, or handle the API key yourself. Always let the user set it.
## Quick Reference ### Quick Reference
### Upload media ### Upload media
@@ -298,6 +287,55 @@ except InvalidRequestError as e:
| Negative timestamps on Timeline | Silently produces broken stream | Always validate `start >= 0` before creating `VideoAsset` | | Negative timestamps on Timeline | Silently produces broken stream | Always validate `start >= 0` before creating `VideoAsset` |
| `generate_video()` / `create_collection()` fails | `Operation not allowed` or `maximum limit` | Plan-gated features — inform the user about plan limits | | `generate_video()` / `create_collection()` fails | `Operation not allowed` or `maximum limit` | Plan-gated features — inform the user about plan limits |
## Examples
### Canonical prompts
- "Start desktop capture and alert when a password field appears."
- "Record my session and produce an actionable summary when it ends."
- "Ingest this file and return a playable stream link."
- "Index this folder and find every scene with people, return timestamps."
- "Generate subtitles, burn them in, and add light background music."
- "Connect this RTSP URL and alert when a person enters the zone."
### Screen Recording (Desktop Capture)
Use `ws_listener.py` to capture WebSocket events during recording sessions. Desktop capture supports **macOS** only.
#### Quick Start
1. **Start listener**: `python scripts/ws_listener.py --clear &`
2. **Get WebSocket ID**: `cat "${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_ws_id"`
3. **Run capture code** (see reference/capture.md for the full workflow)
4. **Events written to**: `${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_events.jsonl`
Use `--clear` whenever you start a fresh capture run so stale transcript and visual events do not leak into the new session.
#### Query Events
```python
import json
import time
from pathlib import Path
events_file = Path.home() / ".local" / "state" / "videodb" / "videodb_events.jsonl"
events = []
if events_file.exists():
with events_file.open(encoding="utf-8") as handle:
for line in handle:
try:
events.append(json.loads(line))
except json.JSONDecodeError:
continue
transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
cutoff = time.time() - 300
recent_visual = [
e for e in events
if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff
]
```
## Additional docs ## Additional docs
Reference documentation is in the `reference/` directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed. Reference documentation is in the `reference/` directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.
@@ -313,50 +351,6 @@ Reference documentation is in the `reference/` directory adjacent to this SKILL.
- [reference/capture-reference.md](reference/capture-reference.md) - Capture SDK and WebSocket events - [reference/capture-reference.md](reference/capture-reference.md) - Capture SDK and WebSocket events
- [reference/use-cases.md](reference/use-cases.md) - Common video processing patterns and examples - [reference/use-cases.md](reference/use-cases.md) - Common video processing patterns and examples
## Screen Recording (Desktop Capture)
Use `ws_listener.py` to capture WebSocket events during recording sessions. Desktop capture supports **macOS** only.
### Quick Start
1. **Start listener**: `python scripts/ws_listener.py &`
2. **Get WebSocket ID**: `cat /tmp/videodb_ws_id`
3. **Run capture code** (see reference/capture.md for full workflow)
4. **Events written to**: `/tmp/videodb_events.jsonl`
### Query Events
```python
import json
from pathlib import Path
events_file = Path("/tmp/videodb_events.jsonl")
events = []
if events_file.exists():
with events_file.open(encoding="utf-8") as handle:
for line in handle:
try:
events.append(json.loads(line))
except json.JSONDecodeError:
continue
# Get all transcripts
transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
# Get visual descriptions from last 5 minutes
import time
cutoff = time.time() - 300
recent_visual = [e for e in events
if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff]
```
### Utility Scripts
- [scripts/ws_listener.py](scripts/ws_listener.py) - WebSocket event listener (dumps to JSONL)
For complete capture workflow, see [reference/capture.md](reference/capture.md).
**Do not use ffmpeg, moviepy, or local encoding tools** when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing). **Do not use ffmpeg, moviepy, or local encoding tools** when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing).

View File

@@ -8,7 +8,7 @@ Code-level details for VideoDB capture sessions. For workflow guide, see [captur
Real-time events from capture sessions and AI pipelines. No webhooks or polling required. Real-time events from capture sessions and AI pipelines. No webhooks or polling required.
Use [scripts/ws_listener.py](../scripts/ws_listener.py) to connect and dump events to `/tmp/videodb_events.jsonl`. Use [scripts/ws_listener.py](../scripts/ws_listener.py) to connect and dump events to `${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_events.jsonl`.
### Event Channels ### Event Channels
@@ -137,12 +137,12 @@ The script outputs `WS_ID=<connection_id>` on the first line, then listens indef
**Get the ws_id:** **Get the ws_id:**
```bash ```bash
cat /tmp/videodb_ws_id cat "${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_ws_id"
``` ```
**Stop the listener:** **Stop the listener:**
```bash ```bash
kill $(cat /tmp/videodb_ws_pid) kill "$(cat "${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_ws_pid")"
``` ```
**Functions that accept `ws_connection_id`:** **Functions that accept `ws_connection_id`:**
@@ -152,7 +152,7 @@ kill $(cat /tmp/videodb_ws_pid)
| `conn.create_capture_session()` | Session lifecycle events | | `conn.create_capture_session()` | Session lifecycle events |
| RTStream methods | See [rtstream-reference.md](rtstream-reference.md) | | RTStream methods | See [rtstream-reference.md](rtstream-reference.md) |
**Output files** (in output directory, default `/tmp`): **Output files** (in output directory, default `${XDG_STATE_HOME:-$HOME/.local/state}/videodb`):
- `videodb_ws_id` - WebSocket connection ID - `videodb_ws_id` - WebSocket connection ID
- `videodb_events.jsonl` - All events - `videodb_events.jsonl` - All events
- `videodb_ws_pid` - Process ID for easy termination - `videodb_ws_pid` - Process ID for easy termination
@@ -176,20 +176,27 @@ Each line is a JSON object with added timestamps:
```python ```python
import json import json
events = [json.loads(l) for l in open("/tmp/videodb_events.jsonl")]
# Filter by channel
transcripts = [e for e in events if e.get("channel") == "transcript"]
# Filter by time (last 10 minutes)
import time import time
cutoff = time.time() - 600 from pathlib import Path
recent = [e for e in events if e["unix_ts"] > cutoff]
# Filter visual events containing keyword events_path = Path.home() / ".local" / "state" / "videodb" / "videodb_events.jsonl"
visual = [e for e in events transcripts = []
if e.get("channel") == "visual_index" recent = []
and "code" in e.get("data", {}).get("text", "").lower()] visual = []
cutoff = time.time() - 600
with events_path.open(encoding="utf-8") as handle:
for line in handle:
event = json.loads(line)
if event.get("channel") == "transcript":
transcripts.append(event)
if event.get("unix_ts", 0) > cutoff:
recent.append(event)
if (
event.get("channel") == "visual_index"
and "code" in event.get("data", {}).get("text", "").lower()
):
visual.append(event)
``` ```
--- ---
@@ -224,7 +231,9 @@ ws_id = ws.connection_id
### Create a Capture Session ### Create a Capture Session
```python ```python
ws_id = open("/tmp/videodb_ws_id").read().strip() from pathlib import Path
ws_id = (Path.home() / ".local" / "state" / "videodb" / "videodb_ws_id").read_text().strip()
session = conn.create_capture_session( session = conn.create_capture_session(
end_user_id="user-123", # required end_user_id="user-123", # required
@@ -391,6 +400,7 @@ For RTStream methods (indexing, transcription, alerts, batch config), see [rtstr
│ exported │ ──> Access video_id, stream_url, player_url │ exported │ ──> Access video_id, stream_url, player_url
└───────────────┘ └───────────────┘
unrecoverable capture error
v v
┌───────────────┐ WebSocket: capture_session.failed ┌───────────────┐ WebSocket: capture_session.failed

View File

@@ -280,6 +280,7 @@ For more streaming options (segment streams, search-to-stream, audio playback),
```python ```python
import videodb import videodb
from videodb import SearchType from videodb import SearchType
from videodb.exceptions import InvalidRequestError
from videodb.timeline import Timeline from videodb.timeline import Timeline
from videodb.asset import VideoAsset, TextAsset, TextStyle from videodb.asset import VideoAsset, TextAsset, TextStyle
@@ -289,8 +290,14 @@ video = coll.get_video("your-video-id")
# 1. Search for key moments # 1. Search for key moments
video.index_spoken_words(force=True) video.index_spoken_words(force=True)
results = video.search("product announcement", search_type=SearchType.semantic) try:
shots = results.get_shots() # may be empty if no results results = video.search("product announcement", search_type=SearchType.semantic)
shots = results.get_shots()
except InvalidRequestError as exc:
if "No results found" in str(exc):
shots = []
else:
raise
# 2. Build timeline # 2. Build timeline
timeline = Timeline(conn) timeline = Timeline(conn)

View File

@@ -47,10 +47,10 @@ video.play()
```python ```python
# Index and add subtitles first # Index and add subtitles first
video.index_spoken_words(force=True) video.index_spoken_words(force=True)
video.add_subtitle() stream_url = video.add_subtitle()
# Stream now includes subtitles # Returned URL already includes subtitles
stream_url = video.generate_stream() print(f"Subtitled stream: {stream_url}")
``` ```
### Specific Segments ### Specific Segments

View File

@@ -6,7 +6,7 @@ Usage:
python scripts/ws_listener.py [OPTIONS] [output_dir] python scripts/ws_listener.py [OPTIONS] [output_dir]
Arguments: Arguments:
output_dir Directory for output files (default: /tmp or VIDEODB_EVENTS_DIR env var) output_dir Directory for output files (default: XDG_STATE_HOME/videodb or ~/.local/state/videodb)
Options: Options:
--clear Clear the events file before starting (use when starting a new session) --clear Clear the events file before starting (use when starting a new session)
@@ -20,10 +20,10 @@ Output (first line, for parsing):
WS_ID=<connection_id> WS_ID=<connection_id>
Examples: Examples:
python scripts/ws_listener.py & # Run in background python scripts/ws_listener.py & # Run in background
python scripts/ws_listener.py --clear # Clear events and start fresh python scripts/ws_listener.py --clear # Clear events and start fresh
python scripts/ws_listener.py --clear /tmp/mydir # Custom dir with clear python scripts/ws_listener.py --clear /tmp/mydir # Custom dir with clear
kill $(cat /tmp/videodb_ws_pid) # Stop the listener kill "$(cat ~/.local/state/videodb/videodb_ws_pid)" # Stop the listener
""" """
import os import os
import sys import sys
@@ -31,6 +31,7 @@ import json
import signal import signal
import asyncio import asyncio
import logging import logging
import contextlib
from datetime import datetime, timezone from datetime import datetime, timezone
from pathlib import Path from pathlib import Path
@@ -52,6 +53,27 @@ logging.basicConfig(
LOGGER = logging.getLogger(__name__) LOGGER = logging.getLogger(__name__)
# Parse arguments # Parse arguments
RETRYABLE_ERRORS = (ConnectionError, TimeoutError)
def default_output_dir() -> Path:
"""Return a private per-user state directory for listener artifacts."""
xdg_state_home = os.environ.get("XDG_STATE_HOME")
if xdg_state_home:
return Path(xdg_state_home) / "videodb"
return Path.home() / ".local" / "state" / "videodb"
def ensure_private_dir(path: Path) -> Path:
"""Create the listener state directory with private permissions."""
path.mkdir(parents=True, exist_ok=True, mode=0o700)
try:
path.chmod(0o700)
except OSError:
pass
return path
def parse_args() -> tuple[bool, Path]: def parse_args() -> tuple[bool, Path]:
clear = False clear = False
output_dir: str | None = None output_dir: str | None = None
@@ -64,9 +86,9 @@ def parse_args() -> tuple[bool, Path]:
output_dir = arg output_dir = arg
if output_dir is None: if output_dir is None:
output_dir = os.environ.get("VIDEODB_EVENTS_DIR", "/tmp") return clear, ensure_private_dir(default_output_dir())
return clear, Path(output_dir) return clear, ensure_private_dir(Path(output_dir))
CLEAR_EVENTS, OUTPUT_DIR = parse_args() CLEAR_EVENTS, OUTPUT_DIR = parse_args()
EVENTS_FILE = OUTPUT_DIR / "videodb_events.jsonl" EVENTS_FILE = OUTPUT_DIR / "videodb_events.jsonl"
@@ -93,7 +115,7 @@ def append_event(event: dict):
def write_pid(): def write_pid():
"""Write PID file for easy process management.""" """Write PID file for easy process management."""
OUTPUT_DIR.mkdir(parents=True, exist_ok=True) OUTPUT_DIR.mkdir(parents=True, exist_ok=True, mode=0o700)
PID_FILE.write_text(str(os.getpid())) PID_FILE.write_text(str(os.getpid()))
@@ -118,43 +140,10 @@ async def listen_with_retry():
ws_wrapper = conn.connect_websocket() ws_wrapper = conn.connect_websocket()
ws = await ws_wrapper.connect() ws = await ws_wrapper.connect()
ws_id = ws.connection_id ws_id = ws.connection_id
# Ensure output directory exists
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
# Clear events file only on first connection if --clear flag is set
if _first_connection and CLEAR_EVENTS:
EVENTS_FILE.unlink(missing_ok=True)
log("Cleared events file")
_first_connection = False
# Write ws_id to file for easy retrieval
WS_ID_FILE.write_text(ws_id)
# Print ws_id (parseable format for LLM)
if retry_count == 0:
print(f"WS_ID={ws_id}", flush=True)
log(f"Connected (ws_id={ws_id})")
# Reset retry state on successful connection
retry_count = 0
backoff = INITIAL_BACKOFF
# Listen for messages
async for msg in ws.receive():
append_event(msg)
channel = msg.get("channel", msg.get("event", "unknown"))
text = msg.get("data", {}).get("text", "")
if text:
print(f"[{channel}] {text[:80]}", flush=True)
# If we exit the loop normally, connection was closed
log("Connection closed by server")
except asyncio.CancelledError: except asyncio.CancelledError:
log("Shutdown requested") log("Shutdown requested")
raise raise
except Exception as e: except RETRYABLE_ERRORS as e:
retry_count += 1 retry_count += 1
log(f"Connection error: {e}") log(f"Connection error: {e}")
@@ -165,6 +154,52 @@ async def listen_with_retry():
log(f"Reconnecting in {backoff}s (attempt {retry_count}/{MAX_RETRIES})...") log(f"Reconnecting in {backoff}s (attempt {retry_count}/{MAX_RETRIES})...")
await asyncio.sleep(backoff) await asyncio.sleep(backoff)
backoff = min(backoff * 2, MAX_BACKOFF) backoff = min(backoff * 2, MAX_BACKOFF)
continue
OUTPUT_DIR.mkdir(parents=True, exist_ok=True, mode=0o700)
if _first_connection and CLEAR_EVENTS:
EVENTS_FILE.unlink(missing_ok=True)
log("Cleared events file")
_first_connection = False
WS_ID_FILE.write_text(ws_id)
if retry_count == 0:
print(f"WS_ID={ws_id}", flush=True)
log(f"Connected (ws_id={ws_id})")
retry_count = 0
backoff = INITIAL_BACKOFF
receiver = ws.receive().__aiter__()
while True:
try:
msg = await anext(receiver)
except StopAsyncIteration:
log("Connection closed by server")
break
except asyncio.CancelledError:
log("Shutdown requested")
raise
except RETRYABLE_ERRORS as e:
retry_count += 1
log(f"Connection error: {e}")
if retry_count >= MAX_RETRIES:
log(f"Max retries ({MAX_RETRIES}) exceeded, exiting")
return
log(f"Reconnecting in {backoff}s (attempt {retry_count}/{MAX_RETRIES})...")
await asyncio.sleep(backoff)
backoff = min(backoff * 2, MAX_BACKOFF)
break
append_event(msg)
channel = msg.get("channel", msg.get("event", "unknown"))
text = msg.get("data", {}).get("text", "")
if text:
print(f"[{channel}] {text[:80]}", flush=True)
async def main_async(): async def main_async():
@@ -178,7 +213,8 @@ async def main_async():
# Register signal handlers # Register signal handlers
for sig in (signal.SIGINT, signal.SIGTERM): for sig in (signal.SIGINT, signal.SIGTERM):
loop.add_signal_handler(sig, handle_signal) with contextlib.suppress(NotImplementedError):
loop.add_signal_handler(sig, handle_signal)
# Run listener with cancellation support # Run listener with cancellation support
listen_task = asyncio.create_task(listen_with_retry()) listen_task = asyncio.create_task(listen_with_retry())
@@ -189,6 +225,9 @@ async def main_async():
return_when=asyncio.FIRST_COMPLETED, return_when=asyncio.FIRST_COMPLETED,
) )
if listen_task.done():
await listen_task
# Cancel remaining tasks # Cancel remaining tasks
for task in pending: for task in pending:
task.cancel() task.cancel()
@@ -197,6 +236,10 @@ async def main_async():
except asyncio.CancelledError: except asyncio.CancelledError:
pass pass
for sig in (signal.SIGINT, signal.SIGTERM):
with contextlib.suppress(NotImplementedError):
loop.remove_signal_handler(sig)
log("Shutdown complete") log("Shutdown complete")