docs: resolve videodb review findings

2026-06-14 12:11:27 +08:00 · 2026-03-10 21:18:33 -07:00
parent 2581bebfd9
commit db2bf16427
5 changed files with 188 additions and 134 deletions
@@ -10,59 +10,48 @@ argument-hint: "[task description]"
 **Perception + memory + actions for video, live streams, and desktop sessions.**
-Use this skill when you need to:
+## When to Use
-## 1) Desktop Perception
+### Desktop Perception
 - Start/stop a **desktop session** capturing **screen, mic, and system audio**
 - Stream **live context** and store **episodic session memory**
 - Run **real-time alerts/triggers** on what's spoken and what's happening on screen
 - Produce **session summaries**, a searchable timeline, and **playable evidence links**
-## 2) Video ingest + stream
+### Video ingest + stream
 - Ingest a **file or URL** and return a **playable web stream link**
 - Transcode/normalize: **codec, bitrate, fps, resolution, aspect ratio**
-## 3) Index + search (timestamps + evidence)
+### Index + search (timestamps + evidence)
 - Build **visual**, **spoken**, and **keyword** indexes
 - Search and return exact moments with **timestamps** and **playable evidence**
 - Auto-create **clips** from search results
-## 4) Timeline editing + generation
+### Timeline editing + generation
 - Subtitles: **generate**, **translate**, **burn-in**
 - Overlays: **text/image/branding**, motion captions
 - Audio: **background music**, **voiceover**, **dubbing**
 - Programmatic composition and exports via **timeline operations**
-## 5) Live streams (RTSP) + monitoring
+### Live streams (RTSP) + monitoring
 - Connect **RTSP/live feeds**
 - Run **real-time visual and spoken understanding** and emit **events/alerts** for monitoring workflows
---
+## How It Works
-## Common inputs
+### Common inputs
 - Local **file path**, public **URL**, or **RTSP URL**
 - Desktop capture request: **start / stop / summarize session**
 - Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules
-## Common outputs
+### Common outputs
 - **Stream URL**
 - Search results with **timestamps** and **evidence links**
 - Generated assets: subtitles, audio, images, clips
 - **Event/alert payloads** for live streams
 - Desktop **session summaries** and memory entries
---
+### Running Python code
 ## Canonical prompts (examples)
 - "Start desktop capture and alert when a password field appears."
 - "Record my session and produce an actionable summary when it ends."
 - "Ingest this file and return a playable stream link."
 - "Index this folder and find every scene with people, return timestamps."
 - "Generate subtitles, burn them in, and add light background music."
 - "Connect this RTSP URL and alert when a person enters the zone."
 ## Running Python code
 Before running any VideoDB code, change to the project directory and load environment variables:
@@ -96,7 +85,7 @@ print(f"Videos: {len(coll.get_videos())}")
 EOF
 ```
-## Setup
+### Setup
 When the user asks to "setup videodb" or similar:
@@ -123,7 +112,7 @@ Get a free API key at https://console.videodb.io (50 free uploads, no credit car
 **Do NOT** read, write, or handle the API key yourself. Always let the user set it.
-## Quick Reference
+### Quick Reference
 ### Upload media
@@ -298,6 +287,55 @@ except InvalidRequestError as e:
 | Negative timestamps on Timeline | Silently produces broken stream | Always validate `start >= 0` before creating `VideoAsset` |
 | `generate_video()` / `create_collection()` fails | `Operation not allowed` or `maximum limit` | Plan-gated features — inform the user about plan limits |
 ## Examples
 ### Canonical prompts
 - "Start desktop capture and alert when a password field appears."
 - "Record my session and produce an actionable summary when it ends."
 - "Ingest this file and return a playable stream link."
 - "Index this folder and find every scene with people, return timestamps."
 - "Generate subtitles, burn them in, and add light background music."
 - "Connect this RTSP URL and alert when a person enters the zone."
 ### Screen Recording (Desktop Capture)
 Use `ws_listener.py` to capture WebSocket events during recording sessions. Desktop capture supports **macOS** only.
 #### Quick Start
 1. **Start listener**: `python scripts/ws_listener.py --clear &`
 2. **Get WebSocket ID**: `cat "${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_ws_id"`
 3. **Run capture code** (see reference/capture.md for the full workflow)
 4. **Events written to**: `${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_events.jsonl`
 Use `--clear` whenever you start a fresh capture run so stale transcript and visual events do not leak into the new session.
 #### Query Events
 ```python
 import json
 import time
 from pathlib import Path
 events_file = Path.home() / ".local" / "state" / "videodb" / "videodb_events.jsonl"
 events = []
 if events_file.exists():
    with events_file.open(encoding="utf-8") as handle:
        for line in handle:
            try:
                events.append(json.loads(line))
            except json.JSONDecodeError:
                continue
 transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
 cutoff = time.time() - 300
 recent_visual = [
    e for e in events
    if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff
 ]
 ```
 ## Additional docs
 Reference documentation is in the `reference/` directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.
@@ -313,50 +351,6 @@ Reference documentation is in the `reference/` directory adjacent to this SKILL.
 - [reference/capture-reference.md](reference/capture-reference.md) - Capture SDK and WebSocket events
 - [reference/use-cases.md](reference/use-cases.md) - Common video processing patterns and examples
 ## Screen Recording (Desktop Capture)
 Use `ws_listener.py` to capture WebSocket events during recording sessions. Desktop capture supports **macOS** only.
 ### Quick Start
 1. **Start listener**: `python scripts/ws_listener.py &`
 2. **Get WebSocket ID**: `cat /tmp/videodb_ws_id`
 3. **Run capture code** (see reference/capture.md for full workflow)
 4. **Events written to**: `/tmp/videodb_events.jsonl`
 ### Query Events
 ```python
 import json
 from pathlib import Path
 events_file = Path("/tmp/videodb_events.jsonl")
 events = []
 if events_file.exists():
    with events_file.open(encoding="utf-8") as handle:
        for line in handle:
            try:
                events.append(json.loads(line))
            except json.JSONDecodeError:
                continue
 # Get all transcripts
 transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
 # Get visual descriptions from last 5 minutes
 import time
 cutoff = time.time() - 300
 recent_visual = [e for e in events 
                 if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff]
 ```
 ### Utility Scripts
 - [scripts/ws_listener.py](scripts/ws_listener.py) - WebSocket event listener (dumps to JSONL)
 For complete capture workflow, see [reference/capture.md](reference/capture.md).
 **Do not use ffmpeg, moviepy, or local encoding tools** when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing).
@@ -8,7 +8,7 @@ Code-level details for VideoDB capture sessions. For workflow guide, see [captur
 Real-time events from capture sessions and AI pipelines. No webhooks or polling required.
-Use [scripts/ws_listener.py](../scripts/ws_listener.py) to connect and dump events to `/tmp/videodb_events.jsonl`.
+Use [scripts/ws_listener.py](../scripts/ws_listener.py) to connect and dump events to `${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_events.jsonl`.
 ### Event Channels
@@ -137,12 +137,12 @@ The script outputs `WS_ID=<connection_id>` on the first line, then listens indef
 **Get the ws_id:**
 ```bash
-cat /tmp/videodb_ws_id
+cat "${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_ws_id"
 ```
 **Stop the listener:**
 ```bash
-kill $(cat /tmp/videodb_ws_pid)
+kill "$(cat "${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_ws_pid")"
 ```
 **Functions that accept `ws_connection_id`:**
@@ -152,7 +152,7 @@ kill $(cat /tmp/videodb_ws_pid)
 | `conn.create_capture_session()` | Session lifecycle events |
 | RTStream methods | See [rtstream-reference.md](rtstream-reference.md) |
-**Output files** (in output directory, default `/tmp`):
+**Output files** (in output directory, default `${XDG_STATE_HOME:-$HOME/.local/state}/videodb`):
 - `videodb_ws_id` - WebSocket connection ID
 - `videodb_events.jsonl` - All events
 - `videodb_ws_pid` - Process ID for easy termination
@@ -176,20 +176,27 @@ Each line is a JSON object with added timestamps:
 ```python
 import json
 events = [json.loads(l) for l in open("/tmp/videodb_events.jsonl")]
 # Filter by channel
 transcripts = [e for e in events if e.get("channel") == "transcript"]
 # Filter by time (last 10 minutes)
 import time
-cutoff = time.time() - 600
+from pathlib import Path
 recent = [e for e in events if e["unix_ts"] > cutoff]
-# Filter visual events containing keyword
+events_path = Path.home() / ".local" / "state" / "videodb" / "videodb_events.jsonl"
-visual = [e for e in events 
+transcripts = []
-          if e.get("channel") == "visual_index" 
+recent = []
-          and "code" in e.get("data", {}).get("text", "").lower()]
+visual = []
 cutoff = time.time() - 600
 with events_path.open(encoding="utf-8") as handle:
    for line in handle:
        event = json.loads(line)
        if event.get("channel") == "transcript":
            transcripts.append(event)
        if event.get("unix_ts", 0) > cutoff:
            recent.append(event)
        if (
            event.get("channel") == "visual_index"
            and "code" in event.get("data", {}).get("text", "").lower()
        ):
            visual.append(event)
 ```
 ---
@@ -224,7 +231,9 @@ ws_id = ws.connection_id
 ### Create a Capture Session
 ```python
-ws_id = open("/tmp/videodb_ws_id").read().strip()
+from pathlib import Path
 ws_id = (Path.home() / ".local" / "state" / "videodb" / "videodb_ws_id").read_text().strip()
 session = conn.create_capture_session(
    end_user_id="user-123",  # required
@@ -391,6 +400,7 @@ For RTStream methods (indexing, transcription, alerts, batch config), see [rtstr
  │   exported     │ ──> Access video_id, stream_url, player_url
  └───────────────┘
  unrecoverable capture error
          │
          v
  ┌───────────────┐     WebSocket: capture_session.failed
@@ -280,6 +280,7 @@ For more streaming options (segment streams, search-to-stream, audio playback),
 ```python
 import videodb
 from videodb import SearchType
 from videodb.exceptions import InvalidRequestError
 from videodb.timeline import Timeline
 from videodb.asset import VideoAsset, TextAsset, TextStyle
@@ -289,8 +290,14 @@ video = coll.get_video("your-video-id")
 # 1. Search for key moments
 video.index_spoken_words(force=True)
-results = video.search("product announcement", search_type=SearchType.semantic)
+try:
-shots = results.get_shots()  # may be empty if no results
+    results = video.search("product announcement", search_type=SearchType.semantic)
    shots = results.get_shots()
 except InvalidRequestError as exc:
    if "No results found" in str(exc):
        shots = []
    else:
        raise
 # 2. Build timeline
 timeline = Timeline(conn)
@@ -47,10 +47,10 @@ video.play()
 ```python
 # Index and add subtitles first
 video.index_spoken_words(force=True)
-video.add_subtitle()
+stream_url = video.add_subtitle()
-# Stream now includes subtitles
+# Returned URL already includes subtitles
-stream_url = video.generate_stream()
+print(f"Subtitled stream: {stream_url}")
 ```
 ### Specific Segments
@@ -6,7 +6,7 @@ Usage:
  python scripts/ws_listener.py [OPTIONS] [output_dir]
 Arguments:
-  output_dir  Directory for output files (default: /tmp or VIDEODB_EVENTS_DIR env var)
+  output_dir  Directory for output files (default: XDG_STATE_HOME/videodb or ~/.local/state/videodb)
 Options:
  --clear     Clear the events file before starting (use when starting a new session)
@@ -20,10 +20,10 @@ Output (first line, for parsing):
  WS_ID=<connection_id>
 Examples:
-  python scripts/ws_listener.py &                    # Run in background
+  python scripts/ws_listener.py &                                 # Run in background
-  python scripts/ws_listener.py --clear              # Clear events and start fresh
+  python scripts/ws_listener.py --clear                           # Clear events and start fresh
-  python scripts/ws_listener.py --clear /tmp/mydir   # Custom dir with clear
+  python scripts/ws_listener.py --clear /tmp/mydir                # Custom dir with clear
-  kill $(cat /tmp/videodb_ws_pid)                    # Stop the listener
+  kill "$(cat ~/.local/state/videodb/videodb_ws_pid)"             # Stop the listener
 """
 import os
 import sys
@@ -31,6 +31,7 @@ import json
 import signal
 import asyncio
 import logging
 import contextlib
 from datetime import datetime, timezone
 from pathlib import Path
@@ -52,6 +53,27 @@ logging.basicConfig(
 LOGGER = logging.getLogger(__name__)
 # Parse arguments
 RETRYABLE_ERRORS = (ConnectionError, TimeoutError)
 def default_output_dir() -> Path:
    """Return a private per-user state directory for listener artifacts."""
    xdg_state_home = os.environ.get("XDG_STATE_HOME")
    if xdg_state_home:
        return Path(xdg_state_home) / "videodb"
    return Path.home() / ".local" / "state" / "videodb"
 def ensure_private_dir(path: Path) -> Path:
    """Create the listener state directory with private permissions."""
    path.mkdir(parents=True, exist_ok=True, mode=0o700)
    try:
        path.chmod(0o700)
    except OSError:
        pass
    return path
 def parse_args() -> tuple[bool, Path]:
    clear = False
    output_dir: str | None = None
@@ -64,9 +86,9 @@ def parse_args() -> tuple[bool, Path]:
            output_dir = arg
    if output_dir is None:
-        output_dir = os.environ.get("VIDEODB_EVENTS_DIR", "/tmp")
+        return clear, ensure_private_dir(default_output_dir())
-    return clear, Path(output_dir)
+    return clear, ensure_private_dir(Path(output_dir))
 CLEAR_EVENTS, OUTPUT_DIR = parse_args()
 EVENTS_FILE = OUTPUT_DIR / "videodb_events.jsonl"
@@ -93,7 +115,7 @@ def append_event(event: dict):
 def write_pid():
    """Write PID file for easy process management."""
-    OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
+    OUTPUT_DIR.mkdir(parents=True, exist_ok=True, mode=0o700)
    PID_FILE.write_text(str(os.getpid()))
@@ -118,43 +140,10 @@ async def listen_with_retry():
            ws_wrapper = conn.connect_websocket()
            ws = await ws_wrapper.connect()
            ws_id = ws.connection_id
            # Ensure output directory exists
            OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
            # Clear events file only on first connection if --clear flag is set
            if _first_connection and CLEAR_EVENTS:
                EVENTS_FILE.unlink(missing_ok=True)
                log("Cleared events file")
            _first_connection = False
            # Write ws_id to file for easy retrieval
            WS_ID_FILE.write_text(ws_id)
            # Print ws_id (parseable format for LLM)
            if retry_count == 0:
                print(f"WS_ID={ws_id}", flush=True)
            log(f"Connected (ws_id={ws_id})")
            # Reset retry state on successful connection
            retry_count = 0
            backoff = INITIAL_BACKOFF
            # Listen for messages
            async for msg in ws.receive():
                append_event(msg)
                channel = msg.get("channel", msg.get("event", "unknown"))
                text = msg.get("data", {}).get("text", "")
                if text:
                    print(f"[{channel}] {text[:80]}", flush=True)
            # If we exit the loop normally, connection was closed
            log("Connection closed by server")
        except asyncio.CancelledError:
            log("Shutdown requested")
            raise
-        except Exception as e:
+        except RETRYABLE_ERRORS as e:
            retry_count += 1
            log(f"Connection error: {e}")
@@ -165,6 +154,52 @@ async def listen_with_retry():
            log(f"Reconnecting in {backoff}s (attempt {retry_count}/{MAX_RETRIES})...")
            await asyncio.sleep(backoff)
            backoff = min(backoff * 2, MAX_BACKOFF)
            continue
        OUTPUT_DIR.mkdir(parents=True, exist_ok=True, mode=0o700)
        if _first_connection and CLEAR_EVENTS:
            EVENTS_FILE.unlink(missing_ok=True)
            log("Cleared events file")
        _first_connection = False
        WS_ID_FILE.write_text(ws_id)
        if retry_count == 0:
            print(f"WS_ID={ws_id}", flush=True)
        log(f"Connected (ws_id={ws_id})")
        retry_count = 0
        backoff = INITIAL_BACKOFF
        receiver = ws.receive().__aiter__()
        while True:
            try:
                msg = await anext(receiver)
            except StopAsyncIteration:
                log("Connection closed by server")
                break
            except asyncio.CancelledError:
                log("Shutdown requested")
                raise
            except RETRYABLE_ERRORS as e:
                retry_count += 1
                log(f"Connection error: {e}")
                if retry_count >= MAX_RETRIES:
                    log(f"Max retries ({MAX_RETRIES}) exceeded, exiting")
                    return
                log(f"Reconnecting in {backoff}s (attempt {retry_count}/{MAX_RETRIES})...")
                await asyncio.sleep(backoff)
                backoff = min(backoff * 2, MAX_BACKOFF)
                break
            append_event(msg)
            channel = msg.get("channel", msg.get("event", "unknown"))
            text = msg.get("data", {}).get("text", "")
            if text:
                print(f"[{channel}] {text[:80]}", flush=True)
 async def main_async():
@@ -178,7 +213,8 @@ async def main_async():
    # Register signal handlers
    for sig in (signal.SIGINT, signal.SIGTERM):
-        loop.add_signal_handler(sig, handle_signal)
+        with contextlib.suppress(NotImplementedError):
            loop.add_signal_handler(sig, handle_signal)
    # Run listener with cancellation support
    listen_task = asyncio.create_task(listen_with_retry())
@@ -189,6 +225,9 @@ async def main_async():
        return_when=asyncio.FIRST_COMPLETED,
    )
    if listen_task.done():
        await listen_task
    # Cancel remaining tasks
    for task in pending:
        task.cancel()
@@ -197,6 +236,10 @@ async def main_async():
        except asyncio.CancelledError:
            pass
    for sig in (signal.SIGINT, signal.SIGTERM):
        with contextlib.suppress(NotImplementedError):
            loop.remove_signal_handler(sig)
    log("Shutdown complete")