mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-03-30 13:43:26 +08:00
377 lines
14 KiB
Markdown
377 lines
14 KiB
Markdown
---
|
|
name: videodb
|
|
description: See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.
|
|
origin: ECC
|
|
allowed-tools: Read Grep Glob Bash(python:*)
|
|
argument-hint: "[task description]"
|
|
---
|
|
|
|
# VideoDB Skill
|
|
|
|
**Perception + memory + actions for video, live streams, and desktop sessions.**
|
|
|
|
## When to use
|
|
|
|
### Desktop Perception
|
|
- Start/stop a **desktop session** capturing **screen, mic, and system audio**
|
|
- Stream **live context** and store **episodic session memory**
|
|
- Run **real-time alerts/triggers** on what's spoken and what's happening on screen
|
|
- Produce **session summaries**, a searchable timeline, and **playable evidence links**
|
|
|
|
### Video ingest + stream
|
|
- Ingest a **file or URL** and return a **playable web stream link**
|
|
- Transcode/normalize: **codec, bitrate, fps, resolution, aspect ratio**
|
|
|
|
### Index + search (timestamps + evidence)
|
|
- Build **visual**, **spoken**, and **keyword** indexes
|
|
- Search and return exact moments with **timestamps** and **playable evidence**
|
|
- Auto-create **clips** from search results
|
|
|
|
### Timeline editing + generation
|
|
- Subtitles: **generate**, **translate**, **burn-in**
|
|
- Overlays: **text/image/branding**, motion captions
|
|
- Audio: **background music**, **voiceover**, **dubbing**
|
|
- Programmatic composition and exports via **timeline operations**
|
|
|
|
### Live streams (RTSP) + monitoring
|
|
- Connect **RTSP/live feeds**
|
|
- Run **real-time visual and spoken understanding** and emit **events/alerts** for monitoring workflows
|
|
|
|
## How it works
|
|
|
|
### Common inputs
|
|
- Local **file path**, public **URL**, or **RTSP URL**
|
|
- Desktop capture request: **start / stop / summarize session**
|
|
- Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules
|
|
|
|
### Common outputs
|
|
- **Stream URL**
|
|
- Search results with **timestamps** and **evidence links**
|
|
- Generated assets: subtitles, audio, images, clips
|
|
- **Event/alert payloads** for live streams
|
|
- Desktop **session summaries** and memory entries
|
|
|
|
### Running Python code
|
|
|
|
Before running any VideoDB code, change to the project directory and load environment variables:
|
|
|
|
```python
|
|
from dotenv import load_dotenv
|
|
load_dotenv(".env")
|
|
|
|
import videodb
|
|
conn = videodb.connect()
|
|
```
|
|
|
|
This reads `VIDEO_DB_API_KEY` from:
|
|
1. Environment (if already exported)
|
|
2. Project's `.env` file in current directory
|
|
|
|
If the key is missing, `videodb.connect()` raises `AuthenticationError` automatically.
|
|
|
|
Do NOT write a script file when a short inline command works.
|
|
|
|
When writing inline Python (`python -c "..."`), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead:
|
|
|
|
```bash
|
|
python << 'EOF'
|
|
from dotenv import load_dotenv
|
|
load_dotenv(".env")
|
|
|
|
import videodb
|
|
conn = videodb.connect()
|
|
coll = conn.get_collection()
|
|
print(f"Videos: {len(coll.get_videos())}")
|
|
EOF
|
|
```
|
|
|
|
### Setup
|
|
|
|
When the user asks to "setup videodb" or similar:
|
|
|
|
### 1. Install SDK
|
|
|
|
```bash
|
|
pip install "videodb[capture]" python-dotenv
|
|
```
|
|
|
|
If `videodb[capture]` fails on Linux, install without the capture extra:
|
|
|
|
```bash
|
|
pip install videodb python-dotenv
|
|
```
|
|
|
|
### 2. Configure API key
|
|
|
|
The user must set `VIDEO_DB_API_KEY` using **either** method:
|
|
|
|
- **Export in terminal** (before starting Claude): `export VIDEO_DB_API_KEY=your-key`
|
|
- **Project `.env` file**: Save `VIDEO_DB_API_KEY=your-key` in the project's `.env` file
|
|
|
|
Get a free API key at <https://console.videodb.io> (50 free uploads, no credit card).
|
|
|
|
**Do NOT** read, write, or handle the API key yourself. Always let the user set it.
|
|
|
|
### Quick Reference
|
|
|
|
### Upload media
|
|
|
|
```python
|
|
# URL
|
|
video = coll.upload(url="https://example.com/video.mp4")
|
|
|
|
# YouTube
|
|
video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")
|
|
|
|
# Local file
|
|
video = coll.upload(file_path="/path/to/video.mp4")
|
|
```
|
|
|
|
### Transcript + subtitle
|
|
|
|
```python
|
|
# force=True skips the error if the video is already indexed
|
|
video.index_spoken_words(force=True)
|
|
text = video.get_transcript_text()
|
|
stream_url = video.add_subtitle()
|
|
```
|
|
|
|
### Search inside videos
|
|
|
|
```python
|
|
from videodb.exceptions import InvalidRequestError
|
|
|
|
video.index_spoken_words(force=True)
|
|
|
|
# search() raises InvalidRequestError when no results are found.
|
|
# Always wrap in try/except and treat "No results found" as empty.
|
|
try:
|
|
results = video.search("product demo")
|
|
shots = results.get_shots()
|
|
stream_url = results.compile()
|
|
except InvalidRequestError as e:
|
|
if "No results found" in str(e):
|
|
shots = []
|
|
else:
|
|
raise
|
|
```
|
|
|
|
### Scene search
|
|
|
|
```python
|
|
import re
|
|
from videodb import SearchType, IndexType, SceneExtractionType
|
|
from videodb.exceptions import InvalidRequestError
|
|
|
|
# index_scenes() has no force parameter — it raises an error if a scene
|
|
# index already exists. Extract the existing index ID from the error.
|
|
try:
|
|
scene_index_id = video.index_scenes(
|
|
extraction_type=SceneExtractionType.shot_based,
|
|
prompt="Describe the visual content in this scene.",
|
|
)
|
|
except Exception as e:
|
|
match = re.search(r"id\s+([a-f0-9]+)", str(e))
|
|
if match:
|
|
scene_index_id = match.group(1)
|
|
else:
|
|
raise
|
|
|
|
# Use score_threshold to filter low-relevance noise (recommended: 0.3+)
|
|
try:
|
|
results = video.search(
|
|
query="person writing on a whiteboard",
|
|
search_type=SearchType.semantic,
|
|
index_type=IndexType.scene,
|
|
scene_index_id=scene_index_id,
|
|
score_threshold=0.3,
|
|
)
|
|
shots = results.get_shots()
|
|
stream_url = results.compile()
|
|
except InvalidRequestError as e:
|
|
if "No results found" in str(e):
|
|
shots = []
|
|
else:
|
|
raise
|
|
```
|
|
|
|
### Timeline editing
|
|
|
|
**Important:** Always validate timestamps before building a timeline:
|
|
- `start` must be >= 0 (negative values are silently accepted but produce broken output)
|
|
- `start` must be < `end`
|
|
- `end` must be <= `video.length`
|
|
|
|
```python
|
|
from videodb.timeline import Timeline
|
|
from videodb.asset import VideoAsset, TextAsset, TextStyle
|
|
|
|
timeline = Timeline(conn)
|
|
timeline.add_inline(VideoAsset(asset_id=video.id, start=10, end=30))
|
|
timeline.add_overlay(0, TextAsset(text="The End", duration=3, style=TextStyle(fontsize=36)))
|
|
stream_url = timeline.generate_stream()
|
|
```
|
|
|
|
### Transcode video (resolution / quality change)
|
|
|
|
```python
|
|
from videodb import TranscodeMode, VideoConfig, AudioConfig
|
|
|
|
# Change resolution, quality, or aspect ratio server-side
|
|
job_id = conn.transcode(
|
|
source="https://example.com/video.mp4",
|
|
callback_url="https://example.com/webhook",
|
|
mode=TranscodeMode.economy,
|
|
video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"),
|
|
audio_config=AudioConfig(mute=False),
|
|
)
|
|
```
|
|
|
|
### Reframe aspect ratio (for social platforms)
|
|
|
|
**Warning:** `reframe()` is a slow server-side operation. For long videos it can take
|
|
several minutes and may time out. Best practices:
|
|
- Always limit to a short segment using `start`/`end` when possible
|
|
- For full-length videos, use `callback_url` for async processing
|
|
- Trim the video on a `Timeline` first, then reframe the shorter result
|
|
|
|
```python
|
|
from videodb import ReframeMode
|
|
|
|
# Always prefer reframing a short segment:
|
|
reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
|
|
|
|
# Async reframe for full-length videos (returns None, result via webhook):
|
|
video.reframe(target="vertical", callback_url="https://example.com/webhook")
|
|
|
|
# Presets: "vertical" (9:16), "square" (1:1), "landscape" (16:9)
|
|
reframed = video.reframe(start=0, end=60, target="square")
|
|
|
|
# Custom dimensions
|
|
reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})
|
|
```
|
|
|
|
### Generative media
|
|
|
|
```python
|
|
image = coll.generate_image(
|
|
prompt="a sunset over mountains",
|
|
aspect_ratio="16:9",
|
|
)
|
|
```
|
|
|
|
## Error handling
|
|
|
|
```python
|
|
from videodb.exceptions import AuthenticationError, InvalidRequestError
|
|
|
|
try:
|
|
conn = videodb.connect()
|
|
except AuthenticationError:
|
|
print("Check your VIDEO_DB_API_KEY")
|
|
|
|
try:
|
|
video = coll.upload(url="https://example.com/video.mp4")
|
|
except InvalidRequestError as e:
|
|
print(f"Upload failed: {e}")
|
|
```
|
|
|
|
### Common pitfalls
|
|
|
|
| Scenario | Error message | Solution |
|
|
|----------|--------------|----------|
|
|
| Indexing an already-indexed video | `Spoken word index for video already exists` | Use `video.index_spoken_words(force=True)` to skip if already indexed |
|
|
| Scene index already exists | `Scene index with id XXXX already exists` | Extract the existing `scene_index_id` from the error with `re.search(r"id\s+([a-f0-9]+)", str(e))` |
|
|
| Search finds no matches | `InvalidRequestError: No results found` | Catch the exception and treat as empty results (`shots = []`) |
|
|
| Reframe times out | Blocks indefinitely on long videos | Use `start`/`end` to limit segment, or pass `callback_url` for async |
|
|
| Negative timestamps on Timeline | Silently produces broken stream | Always validate `start >= 0` before creating `VideoAsset` |
|
|
| `generate_video()` / `create_collection()` fails | `Operation not allowed` or `maximum limit` | Plan-gated features — inform the user about plan limits |
|
|
|
|
## Examples
|
|
|
|
### Canonical prompts
|
|
- "Start desktop capture and alert when a password field appears."
|
|
- "Record my session and produce an actionable summary when it ends."
|
|
- "Ingest this file and return a playable stream link."
|
|
- "Index this folder and find every scene with people, return timestamps."
|
|
- "Generate subtitles, burn them in, and add light background music."
|
|
- "Connect this RTSP URL and alert when a person enters the zone."
|
|
|
|
### Screen Recording (Desktop Capture)
|
|
|
|
Use `ws_listener.py` to capture WebSocket events during recording sessions. Desktop capture supports **macOS** only.
|
|
|
|
#### Quick Start
|
|
|
|
1. **Choose state dir**: `STATE_DIR="${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}"`
|
|
2. **Start listener**: `VIDEODB_EVENTS_DIR="$STATE_DIR" python scripts/ws_listener.py --clear "$STATE_DIR" &`
|
|
3. **Get WebSocket ID**: `cat "$STATE_DIR/videodb_ws_id"`
|
|
4. **Run capture code** (see reference/capture.md for the full workflow)
|
|
5. **Events written to**: `$STATE_DIR/videodb_events.jsonl`
|
|
|
|
Use `--clear` whenever you start a fresh capture run so stale transcript and visual events do not leak into the new session.
|
|
|
|
#### Query Events
|
|
|
|
```python
|
|
import json
|
|
import os
|
|
import time
|
|
from pathlib import Path
|
|
|
|
events_dir = Path(os.environ.get("VIDEODB_EVENTS_DIR", Path.home() / ".local" / "state" / "videodb"))
|
|
events_file = events_dir / "videodb_events.jsonl"
|
|
events = []
|
|
|
|
if events_file.exists():
|
|
with events_file.open(encoding="utf-8") as handle:
|
|
for line in handle:
|
|
try:
|
|
events.append(json.loads(line))
|
|
except json.JSONDecodeError:
|
|
continue
|
|
|
|
transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
|
|
cutoff = time.time() - 300
|
|
recent_visual = [
|
|
e for e in events
|
|
if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff
|
|
]
|
|
```
|
|
|
|
## Additional docs
|
|
|
|
Reference documentation is in the `reference/` directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.
|
|
|
|
- [reference/api-reference.md](reference/api-reference.md) - Complete VideoDB Python SDK API reference
|
|
- [reference/search.md](reference/search.md) - In-depth guide to video search (spoken word and scene-based)
|
|
- [reference/editor.md](reference/editor.md) - Timeline editing, assets, and composition
|
|
- [reference/streaming.md](reference/streaming.md) - HLS streaming and instant playback
|
|
- [reference/generative.md](reference/generative.md) - AI-powered media generation (images, video, audio)
|
|
- [reference/rtstream.md](reference/rtstream.md) - Live stream ingestion workflow (RTSP/RTMP)
|
|
- [reference/rtstream-reference.md](reference/rtstream-reference.md) - RTStream SDK methods and AI pipelines
|
|
- [reference/capture.md](reference/capture.md) - Desktop capture workflow
|
|
- [reference/capture-reference.md](reference/capture-reference.md) - Capture SDK and WebSocket events
|
|
- [reference/use-cases.md](reference/use-cases.md) - Common video processing patterns and examples
|
|
|
|
**Do not use ffmpeg, moviepy, or local encoding tools** when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing).
|
|
|
|
### When to use what
|
|
|
|
| Problem | VideoDB solution |
|
|
|---------|-----------------|
|
|
| Platform rejects video aspect ratio or resolution | `video.reframe()` or `conn.transcode()` with `VideoConfig` |
|
|
| Need to resize video for Twitter/Instagram/TikTok | `video.reframe(target="vertical")` or `target="square"` |
|
|
| Need to change resolution (e.g. 1080p → 720p) | `conn.transcode()` with `VideoConfig(resolution=720)` |
|
|
| Need to overlay audio/music on video | `AudioAsset` on a `Timeline` |
|
|
| Need to add subtitles | `video.add_subtitle()` or `CaptionAsset` |
|
|
| Need to combine/trim clips | `VideoAsset` on a `Timeline` |
|
|
| Need to generate voiceover, music, or SFX | `coll.generate_voice()`, `generate_music()`, `generate_sound_effect()` |
|
|
|
|
## Provenance
|
|
|
|
Reference material for this skill is vendored locally under `skills/videodb/reference/`.
|
|
Use the local copies above instead of following external repository links at runtime.
|
|
|
|
**Maintained By:** [VideoDB](https://www.videodb.io/)
|