zsp/everything-claude-code

Fork 0

mirror of https://github.com/affaan-m/everything-claude-code.git synced 2026-03-30 13:43:26 +08:00

Files

Affaan Mustafa 2581bebfd9 docs: resolve videodb follow-up review comments

2026-03-10 21:11:00 -07:00

12 KiB

Raw Blame History

Capture Reference

Code-level details for VideoDB capture sessions. For workflow guide, see capture.md.

WebSocket Events

Real-time events from capture sessions and AI pipelines. No webhooks or polling required.

Use scripts/ws_listener.py to connect and dump events to /tmp/videodb_events.jsonl.

Event Channels

Channel	Source	Content
`capture_session`	Session lifecycle	Status changes
`transcript`	`start_transcript()`	Speech-to-text
`visual_index` / `scene_index`	`index_visuals()`	Visual analysis
`audio_index`	`index_audio()`	Audio analysis
`alert`	`create_alert()`	Alert notifications

Session Lifecycle Events

Event	Status	Key Data
`capture_session.created`	`created`	—
`capture_session.starting`	`starting`	—
`capture_session.active`	`active`	`rtstreams[]`
`capture_session.stopping`	`stopping`	—
`capture_session.stopped`	`stopped`	—
`capture_session.exported`	`exported`	`exported_video_id`, `stream_url`, `player_url`
`capture_session.failed`	`failed`	`error`

Event Structures

Transcript event:

{
  "channel": "transcript",
  "rtstream_id": "rts-xxx",
  "rtstream_name": "mic:default",
  "data": {
    "text": "Let's schedule the meeting for Thursday",
    "is_final": true,
    "start": 1710000001234,
    "end": 1710000002345
  }
}

Visual index event:

{
  "channel": "visual_index",
  "rtstream_id": "rts-xxx",
  "rtstream_name": "display:1",
  "data": {
    "text": "User is viewing a Slack conversation with 3 unread messages",
    "start": 1710000012340,
    "end": 1710000018900
  }
}

Audio index event:

{
  "channel": "audio_index",
  "rtstream_id": "rts-xxx",
  "rtstream_name": "mic:default",
  "data": {
    "text": "Discussion about scheduling a team meeting",
    "start": 1710000021500,
    "end": 1710000029200
  }
}

Session active event:

{
  "event": "capture_session.active",
  "capture_session_id": "cap-xxx",
  "status": "active",
  "data": {
    "rtstreams": [
      { "rtstream_id": "rts-1", "name": "mic:default", "media_types": ["audio"] },
      { "rtstream_id": "rts-2", "name": "system_audio:default", "media_types": ["audio"] },
      { "rtstream_id": "rts-3", "name": "display:1", "media_types": ["video"] }
    ]
  }
}

Session exported event:

{
  "event": "capture_session.exported",
  "capture_session_id": "cap-xxx",
  "status": "exported",
  "data": {
    "exported_video_id": "v_xyz789",
    "stream_url": "https://stream.videodb.io/...",
    "player_url": "https://console.videodb.io/player?url=..."
  }
}

For latest details, see https://docs.videodb.io/pages/ingest/capture-sdks/realtime-context.md

Event Persistence

Use ws_listener.py to dump all WebSocket events to a JSONL file for later analysis.

Start Listener and Get WebSocket ID

# Start with --clear to clear old events (recommended for new sessions)
python scripts/ws_listener.py --clear &

# Append to existing events (for reconnects)
python scripts/ws_listener.py &

Or specify a custom output directory:

python scripts/ws_listener.py --clear /path/to/output &
# Or via environment variable:
VIDEODB_EVENTS_DIR=/path/to/output python scripts/ws_listener.py --clear &

The script outputs WS_ID=<connection_id> on the first line, then listens indefinitely.

Get the ws_id:

cat /tmp/videodb_ws_id

Stop the listener:

kill $(cat /tmp/videodb_ws_pid)

Functions that accept ws_connection_id:

Function	Purpose
`conn.create_capture_session()`	Session lifecycle events
RTStream methods	See rtstream-reference.md

Output files (in output directory, default /tmp):

videodb_ws_id - WebSocket connection ID
videodb_events.jsonl - All events
videodb_ws_pid - Process ID for easy termination

Features:

--clear flag to clear events file on start (use for new sessions)
Auto-reconnect with exponential backoff on connection drops
Graceful shutdown on SIGINT/SIGTERM
Connection status logging

JSONL Format

Each line is a JSON object with added timestamps:

{"ts": "2026-03-02T10:15:30.123Z", "unix_ts": 1772446530.123, "channel": "visual_index", "data": {"text": "..."}}
{"ts": "2026-03-02T10:15:31.456Z", "unix_ts": 1772446531.456, "event": "capture_session.active", "capture_session_id": "cap-xxx"}

Reading Events

import json
events = [json.loads(l) for l in open("/tmp/videodb_events.jsonl")]

# Filter by channel
transcripts = [e for e in events if e.get("channel") == "transcript"]

# Filter by time (last 10 minutes)
import time
cutoff = time.time() - 600
recent = [e for e in events if e["unix_ts"] > cutoff]

# Filter visual events containing keyword
visual = [e for e in events 
          if e.get("channel") == "visual_index" 
          and "code" in e.get("data", {}).get("text", "").lower()]

WebSocket Connection

Connect to receive real-time AI results from transcription and indexing pipelines.

ws_wrapper = conn.connect_websocket()
ws = await ws_wrapper.connect()
ws_id = ws.connection_id

Property / Method	Type	Description
`ws.connection_id`	`str`	Unique connection ID (pass to AI pipeline methods)
`ws.receive()`	`AsyncIterator[dict]`	Async iterator yielding real-time messages

CaptureSession

Connection Methods

Method	Returns	Description
`conn.create_capture_session(end_user_id, collection_id, ws_connection_id, metadata)`	`CaptureSession`	Create a new capture session
`conn.get_capture_session(capture_session_id)`	`CaptureSession`	Retrieve an existing capture session
`conn.generate_client_token()`	`str`	Generate a client-side authentication token

Create a Capture Session

ws_id = open("/tmp/videodb_ws_id").read().strip()

session = conn.create_capture_session(
    end_user_id="user-123",  # required
    collection_id="default",
    ws_connection_id=ws_id,
    metadata={"app": "my-app"},
)
print(f"Session ID: {session.id}")

Note: end_user_id is required and identifies the user initiating the capture. For testing or demo purposes, any unique string identifier works (e.g., "demo-user", "test-123").

CaptureSession Properties

Property	Type	Description
`session.id`	`str`	Unique capture session ID

CaptureSession Methods

Method	Returns	Description
`session.get_rtstream(type)`	`list[RTStream]`	Get RTStreams by type: `"mic"`, `"screen"`, or `"system_audio"`

Generate a Client Token

token = conn.generate_client_token()

CaptureClient

The client runs on the user's machine and handles permissions, channel discovery, and streaming.

from videodb.capture import CaptureClient

client = CaptureClient(client_token=token)

CaptureClient Methods

Method	Returns	Description
`await client.request_permission(type)`	`None`	Request device permission (`"microphone"`, `"screen_capture"`)
`await client.list_channels()`	`Channels`	Discover available audio/video channels
`await client.start_capture_session(capture_session_id, channels, primary_video_channel_id)`	`None`	Start streaming selected channels
`await client.stop_capture()`	`None`	Gracefully stop the capture session
`await client.shutdown()`	`None`	Clean up client resources

Request Permissions

await client.request_permission("microphone")
await client.request_permission("screen_capture")

Start a Session

selected_channels = [c for c in [mic, display, system_audio] if c]
await client.start_capture_session(
    capture_session_id=session.id,
    channels=selected_channels,
    primary_video_channel_id=display.id if display else None,
)

Stop a Session

await client.stop_capture()
await client.shutdown()

Channels

Returned by client.list_channels(). Groups available devices by type.

channels = await client.list_channels()
for ch in channels.all():
    print(f"  {ch.id} ({ch.type}): {ch.name}")

mic = channels.mics.default
display = channels.displays.default
system_audio = channels.system_audio.default

Channel Groups

Property	Type	Description
`channels.mics`	`ChannelGroup`	Available microphones
`channels.displays`	`ChannelGroup`	Available screen displays
`channels.system_audio`	`ChannelGroup`	Available system audio sources

ChannelGroup Methods & Properties

Member	Type	Description
`group.default`	`Channel`	Default channel in the group (or `None`)
`group.all()`	`list[Channel]`	All channels in the group

Channel Properties

Property	Type	Description
`ch.id`	`str`	Unique channel ID
`ch.type`	`str`	Channel type (`"mic"`, `"display"`, `"system_audio"`)
`ch.name`	`str`	Human-readable channel name
`ch.store`	`bool`	Whether to persist the recording (set to `True` to save)

Without store = True, streams are processed in real-time but not saved.

RTStreams and AI Pipelines

After session is active, retrieve RTStream objects with session.get_rtstream().

For RTStream methods (indexing, transcription, alerts, batch config), see rtstream-reference.md.

Session Lifecycle

  create_capture_session()
          │
          v
  ┌───────────────┐
  │    created     │
  └───────┬───────┘
          │  client.start_capture_session()
          v
  ┌───────────────┐     WebSocket: capture_session.starting
  │   starting     │ ──> Capture channels connect
  └───────┬───────┘
          │
          v
  ┌───────────────┐     WebSocket: capture_session.active
  │    active      │ ──> Start AI pipelines
  └───────┬──────────────┐
          │              │
          │              └──────────────┐
          │  client.stop_capture()      │ unrecoverable capture error
          v
  ┌───────────────┐     WebSocket: capture_session.stopping
  │   stopping     │ ──> Finalize streams
  └───────┬───────┘
          │
          v
  ┌───────────────┐     WebSocket: capture_session.stopped
  │   stopped      │ ──> All streams finalized
  └───────┬───────┘
          │  (if store=True)
          v
  ┌───────────────┐     WebSocket: capture_session.exported
  │   exported     │ ──> Access video_id, stream_url, player_url
  └───────────────┘

          │
          v
  ┌───────────────┐     WebSocket: capture_session.failed
  │    failed      │ ──> Inspect error payload and retry setup
  └───────────────┘

12 KiB Raw Blame History