docs(zh-CN): update

2026-04-14 13:53:29 +08:00 · 2026-03-13 17:45:44 +08:00
parent f548ca3e19
commit 4c0107a322
88 changed files with 16872 additions and 280 deletions
--- a/docs/zh-CN/skills/videodb/reference/api-reference.md
+++ b/docs/zh-CN/skills/videodb/reference/api-reference.md
@@ -0,0 +1,550 @@
+# 完整 API 参考
+
+VideoDB 技能参考材料。关于使用指南和工作流选择，请从 [../SKILL.md](../SKILL.md) 开始。
+
+## 连接
+
+```python
+import videodb
+
+conn = videodb.connect(
+    api_key="your-api-key",      # or set VIDEO_DB_API_KEY env var
+    base_url=None,                # custom API endpoint (optional)
+)
+```
+
+**返回:** `Connection` 对象
+
+### 连接方法
+
+| 方法 | 返回 | 描述 |
+|--------|---------|-------------|
+| `conn.get_collection(collection_id="default")` | `Collection` | 获取集合（若无 ID 则获取默认集合） |
+| `conn.get_collections()` | `list[Collection]` | 列出所有集合 |
+| `conn.create_collection(name, description, is_public=False)` | `Collection` | 创建新集合 |
+| `conn.update_collection(id, name, description)` | `Collection` | 更新集合 |
+| `conn.check_usage()` | `dict` | 获取账户使用统计 |
+| `conn.upload(source, media_type, name, ...)` | `Video\|Audio\|Image` | 上传到默认集合 |
+| `conn.record_meeting(meeting_url, bot_name, ...)` | `Meeting` | 录制会议 |
+| `conn.create_capture_session(...)` | `CaptureSession` | 创建捕获会话（见 [capture-reference.md](capture-reference.md)） |
+| `conn.youtube_search(query, result_threshold, duration)` | `list[dict]` | 搜索 YouTube |
+| `conn.transcode(source, callback_url, mode, ...)` | `str` | 转码视频（返回作业 ID） |
+| `conn.get_transcode_details(job_id)` | `dict` | 获取转码作业状态和详情 |
+| `conn.connect_websocket(collection_id)` | `WebSocketConnection` | 连接到 WebSocket（见 [capture-reference.md](capture-reference.md)） |
+
+### 转码
+
+使用自定义分辨率、质量和音频设置从 URL 转码视频。处理在服务器端进行——无需本地 ffmpeg。
+
+```python
+from videodb import TranscodeMode, VideoConfig, AudioConfig
+
+job_id = conn.transcode(
+    source="https://example.com/video.mp4",
+    callback_url="https://example.com/webhook",
+    mode=TranscodeMode.economy,
+    video_config=VideoConfig(resolution=720, quality=23),
+    audio_config=AudioConfig(mute=False),
+)
+```
+
+#### transcode 参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `source` | `str` | 必需 | 要转码的视频 URL（最好是可下载的 URL） |
+| `callback_url` | `str` | 必需 | 转码完成时接收回调的 URL |
+| `mode` | `TranscodeMode` | `TranscodeMode.economy` | 转码速度：`economy` 或 `lightning` |
+| `video_config` | `VideoConfig` | `VideoConfig()` | 视频编码设置 |
+| `audio_config` | `AudioConfig` | `AudioConfig()` | 音频编码设置 |
+
+返回一个作业 ID (`str`)。使用 `conn.get_transcode_details(job_id)` 来检查作业状态。
+
+```python
+details = conn.get_transcode_details(job_id)
+```
+
+#### VideoConfig
+
+```python
+from videodb import VideoConfig, ResizeMode
+
+config = VideoConfig(
+    resolution=720,              # Target resolution height (e.g. 480, 720, 1080)
+    quality=23,                  # Encoding quality (lower = better, default 23)
+    framerate=30,                # Target framerate
+    aspect_ratio="16:9",         # Target aspect ratio
+    resize_mode=ResizeMode.crop, # How to fit: crop, fit, or pad
+)
+```
+
+| 字段 | 类型 | 默认值 | 描述 |
+|-------|------|---------|-------------|
+| `resolution` | `int\|None` | `None` | 目标分辨率高度（像素） |
+| `quality` | `int` | `23` | 编码质量（值越低，质量越高） |
+| `framerate` | `int\|None` | `None` | 目标帧率 |
+| `aspect_ratio` | `str\|None` | `None` | 目标宽高比（例如 `"16:9"`, `"9:16"`） |
+| `resize_mode` | `str` | `ResizeMode.crop` | 调整大小策略：`crop`, `fit`, 或 `pad` |
+
+#### AudioConfig
+
+```python
+from videodb import AudioConfig
+
+config = AudioConfig(mute=False)
+```
+
+| 字段 | 类型 | 默认值 | 描述 |
+|-------|------|---------|-------------|
+| `mute` | `bool` | `False` | 静音音轨 |
+
+## 集合
+
+```python
+coll = conn.get_collection()
+```
+
+### 集合方法
+
+| 方法 | 返回 | 描述 |
+|--------|---------|-------------|
+| `coll.get_videos()` | `list[Video]` | 列出所有视频 |
+| `coll.get_video(video_id)` | `Video` | 获取特定视频 |
+| `coll.get_audios()` | `list[Audio]` | 列出所有音频 |
+| `coll.get_audio(audio_id)` | `Audio` | 获取特定音频 |
+| `coll.get_images()` | `list[Image]` | 列出所有图像 |
+| `coll.get_image(image_id)` | `Image` | 获取特定图像 |
+| `coll.upload(url=None, file_path=None, media_type=None, name=None)` | `Video\|Audio\|Image` | 上传媒体 |
+| `coll.search(query, search_type, index_type, score_threshold, namespace, scene_index_id, ...)` | `SearchResult` | 在集合中搜索（仅语义搜索；关键词和场景搜索会引发 `NotImplementedError`） |
+| `coll.generate_image(prompt, aspect_ratio="1:1")` | `Image` | 使用 AI 生成图像 |
+| `coll.generate_video(prompt, duration=5)` | `Video` | 使用 AI 生成视频 |
+| `coll.generate_music(prompt, duration=5)` | `Audio` | 使用 AI 生成音乐 |
+| `coll.generate_sound_effect(prompt, duration=2)` | `Audio` | 生成音效 |
+| `coll.generate_voice(text, voice_name="Default")` | `Audio` | 从文本生成语音 |
+| `coll.generate_text(prompt, model_name="basic", response_type="text")` | `dict` | LLM 文本生成——通过 `["output"]` 访问结果 |
+| `coll.dub_video(video_id, language_code)` | `Video` | 将视频配音为另一种语言 |
+| `coll.record_meeting(meeting_url, bot_name, ...)` | `Meeting` | 录制实时会议 |
+| `coll.create_capture_session(...)` | `CaptureSession` | 创建捕获会话（见 [capture-reference.md](capture-reference.md)） |
+| `coll.get_capture_session(...)` | `CaptureSession` | 检索捕获会话（见 [capture-reference.md](capture-reference.md)） |
+| `coll.connect_rtstream(url, name, ...)` | `RTStream` | 连接到实时流（见 [rtstream-reference.md](rtstream-reference.md)） |
+| `coll.make_public()` | `None` | 使集合公开 |
+| `coll.make_private()` | `None` | 使集合私有 |
+| `coll.delete_video(video_id)` | `None` | 删除视频 |
+| `coll.delete_audio(audio_id)` | `None` | 删除音频 |
+| `coll.delete_image(image_id)` | `None` | 删除图像 |
+| `coll.delete()` | `None` | 删除集合 |
+
+### 上传参数
+
+```python
+video = coll.upload(
+    url=None,            # Remote URL (HTTP, YouTube)
+    file_path=None,      # Local file path
+    media_type=None,     # "video", "audio", or "image" (auto-detected if omitted)
+    name=None,           # Custom name for the media
+    description=None,    # Description
+    callback_url=None,   # Webhook URL for async notification
+)
+```
+
+## 视频对象
+
+```python
+video = coll.get_video(video_id)
+```
+
+### 视频属性
+
+| 属性 | 类型 | 描述 |
+|----------|------|-------------|
+| `video.id` | `str` | 唯一视频 ID |
+| `video.collection_id` | `str` | 父集合 ID |
+| `video.name` | `str` | 视频名称 |
+| `video.description` | `str` | 视频描述 |
+| `video.length` | `float` | 时长（秒） |
+| `video.stream_url` | `str` | 默认流 URL |
+| `video.player_url` | `str` | 播放器嵌入 URL |
+| `video.thumbnail_url` | `str` | 缩略图 URL |
+
+### 视频方法
+
+| 方法 | 返回 | 描述 |
+|--------|---------|-------------|
+| `video.generate_stream(timeline=None)` | `str` | 生成流 URL（可选的 `[(start, end)]` 元组时间线） |
+| `video.play()` | `str` | 在浏览器中打开流，返回播放器 URL |
+| `video.index_spoken_words(language_code=None, force=False)` | `None` | 为语音搜索建立索引。使用 `force=True` 在已建立索引时跳过。 |
+| `video.index_scenes(extraction_type, prompt, extraction_config, metadata, model_name, name, scenes, callback_url)` | `str` | 索引视觉场景（返回 scene\_index\_id） |
+| `video.index_visuals(prompt, batch_config, ...)` | `str` | 索引视觉内容（返回 scene\_index\_id） |
+| `video.index_audio(prompt, model_name, ...)` | `str` | 使用 LLM 索引音频（返回 scene\_index\_id） |
+| `video.get_transcript(start=None, end=None)` | `list[dict]` | 获取带时间戳的转录稿 |
+| `video.get_transcript_text(start=None, end=None)` | `str` | 获取完整转录文本 |
+| `video.generate_transcript(force=None)` | `dict` | 生成转录稿 |
+| `video.translate_transcript(language, additional_notes)` | `list[dict]` | 翻译转录稿 |
+| `video.search(query, search_type, index_type, filter, **kwargs)` | `SearchResult` | 在视频内搜索 |
+| `video.add_subtitle(style=SubtitleStyle())` | `str` | 添加字幕（返回流 URL） |
+| `video.generate_thumbnail(time=None)` | `str\|Image` | 生成缩略图 |
+| `video.get_thumbnails()` | `list[Image]` | 获取所有缩略图 |
+| `video.extract_scenes(extraction_type, extraction_config)` | `SceneCollection` | 提取场景 |
+| `video.reframe(start, end, target, mode, callback_url)` | `Video\|None` | 调整视频宽高比 |
+| `video.clip(prompt, content_type, model_name)` | `str` | 根据提示生成剪辑（返回流 URL） |
+| `video.insert_video(video, timestamp)` | `str` | 在时间戳处插入视频 |
+| `video.download(name=None)` | `dict` | 下载视频 |
+| `video.delete()` | `None` | 删除视频 |
+
+### 调整宽高比
+
+将视频转换为不同的宽高比，可选智能对象跟踪。处理在服务器端进行。
+
+> **警告：** 调整宽高比是缓慢的服务器端操作。对于长视频可能需要几分钟，并可能超时。始终使用 `start`/`end` 来限制片段，或传递 `callback_url` 进行异步处理。
+
+```python
+from videodb import ReframeMode
+
+# Always prefer short segments to avoid timeouts:
+reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
+
+# Async reframe for full-length videos (returns None, result via webhook):
+video.reframe(target="vertical", callback_url="https://example.com/webhook")
+
+# Custom dimensions
+reframed = video.reframe(start=0, end=60, target={"width": 1080, "height": 1080})
+```
+
+#### reframe 参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `start` | `float\|None` | `None` | 开始时间（秒）（None = 开始） |
+| `end` | `float\|None` | `None` | 结束时间（秒）（None = 视频结束） |
+| `target` | `str\|dict` | `"vertical"` | 预设字符串（`"vertical"`, `"square"`, `"landscape"`）或 `{"width": int, "height": int}` |
+| `mode` | `str` | `ReframeMode.smart` | `"simple"`（中心裁剪）或 `"smart"`（对象跟踪） |
+| `callback_url` | `str\|None` | `None` | 异步通知的 Webhook URL |
+
+当未提供 `callback_url` 时返回 `Video` 对象，否则返回 `None`。
+
+## 音频对象
+
+```python
+audio = coll.get_audio(audio_id)
+```
+
+### 音频属性
+
+| 属性 | 类型 | 描述 |
+|----------|------|-------------|
+| `audio.id` | `str` | 唯一音频 ID |
+| `audio.collection_id` | `str` | 父集合 ID |
+| `audio.name` | `str` | 音频名称 |
+| `audio.length` | `float` | 时长（秒） |
+
+### 音频方法
+
+| 方法 | 返回 | 描述 |
+|--------|---------|-------------|
+| `audio.generate_url()` | `str` | 生成用于播放的签名 URL |
+| `audio.get_transcript(start=None, end=None)` | `list[dict]` | 获取带时间戳的转录稿 |
+| `audio.get_transcript_text(start=None, end=None)` | `str` | 获取完整转录文本 |
+| `audio.generate_transcript(force=None)` | `dict` | 生成转录稿 |
+| `audio.delete()` | `None` | 删除音频 |
+
+## 图像对象
+
+```python
+image = coll.get_image(image_id)
+```
+
+### 图像属性
+
+| 属性 | 类型 | 描述 |
+|----------|------|-------------|
+| `image.id` | `str` | 唯一图像 ID |
+| `image.collection_id` | `str` | 父集合 ID |
+| `image.name` | `str` | 图像名称 |
+| `image.url` | `str\|None` | 图像 URL（对于生成的图像可能为 `None`——请改用 `generate_url()`） |
+
+### 图像方法
+
+| 方法 | 返回 | 描述 |
+|--------|---------|-------------|
+| `image.generate_url()` | `str` | 生成签名 URL |
+| `image.delete()` | `None` | 删除图像 |
+
+## 时间线与编辑器
+
+### 时间线
+
+```python
+from videodb.timeline import Timeline
+
+timeline = Timeline(conn)
+```
+
+| 方法 | 返回 | 描述 |
+|--------|---------|-------------|
+| `timeline.add_inline(asset)` | `None` | 在主轨道上顺序添加 `VideoAsset` |
+| `timeline.add_overlay(start, asset)` | `None` | 在时间戳处叠加 `AudioAsset`、`ImageAsset` 或 `TextAsset` |
+| `timeline.generate_stream()` | `str` | 编译并获取流 URL |
+
+### 资产类型
+
+#### VideoAsset
+
+```python
+from videodb.asset import VideoAsset
+
+asset = VideoAsset(
+    asset_id=video.id,
+    start=0,              # trim start (seconds)
+    end=None,             # trim end (seconds, None = full)
+)
+```
+
+#### AudioAsset
+
+```python
+from videodb.asset import AudioAsset
+
+asset = AudioAsset(
+    asset_id=audio.id,
+    start=0,
+    end=None,
+    disable_other_tracks=True,   # mute original audio when True
+    fade_in_duration=0,          # seconds (max 5)
+    fade_out_duration=0,         # seconds (max 5)
+)
+```
+
+#### ImageAsset
+
+```python
+from videodb.asset import ImageAsset
+
+asset = ImageAsset(
+    asset_id=image.id,
+    duration=None,        # display duration (seconds)
+    width=100,            # display width
+    height=100,           # display height
+    x=80,                 # horizontal position (px from left)
+    y=20,                 # vertical position (px from top)
+)
+```
+
+#### TextAsset
+
+```python
+from videodb.asset import TextAsset, TextStyle
+
+asset = TextAsset(
+    text="Hello World",
+    duration=5,
+    style=TextStyle(
+        fontsize=24,
+        fontcolor="black",
+        boxcolor="white",       # background box colour
+        alpha=1.0,
+        font="Sans",
+        text_align="T",         # text alignment within box
+    ),
+)
+```
+
+#### CaptionAsset（编辑器 API）
+
+CaptionAsset 属于编辑器 API，它有自己的时间线、轨道和剪辑系统：
+
+```python
+from videodb.editor import CaptionAsset, FontStyling
+
+asset = CaptionAsset(
+    src="auto",                    # "auto" or base64 ASS string
+    font=FontStyling(name="Clear Sans", size=30),
+    primary_color="&H00FFFFFF",
+)
+```
+
+完整的 CaptionAsset 用法请见 [editor.md](../../../../../skills/videodb/reference/editor.md#caption-overlays) 中的编辑器 API。
+
+## 视频搜索参数
+
+```python
+results = video.search(
+    query="your query",
+    search_type=SearchType.semantic,       # semantic, keyword, or scene
+    index_type=IndexType.spoken_word,      # spoken_word or scene
+    result_threshold=None,                 # max number of results
+    score_threshold=None,                  # minimum relevance score
+    dynamic_score_percentage=None,         # percentage of dynamic score
+    scene_index_id=None,                   # target a specific scene index (pass via **kwargs)
+    filter=[],                             # metadata filters for scene search
+)
+```
+
+> **注意：** `filter` 是 `video.search()` 中的一个显式命名参数。`scene_index_id` 通过 `**kwargs` 传递给 API。
+>
+> **重要：** `video.search()` 在没有匹配项时会引发 `InvalidRequestError`，并附带消息 `"No results found"`。请始终将搜索调用包装在 try/except 中。对于场景搜索，请使用 `score_threshold=0.3` 或更高值来过滤低相关性的噪声。
+
+对于场景搜索，请使用 `search_type=SearchType.semantic` 并设置 `index_type=IndexType.scene`。当针对特定场景索引时，传递 `scene_index_id`。详情请参阅 [search.md](search.md)。
+
+## SearchResult 对象
+
+```python
+results = video.search("query", search_type=SearchType.semantic)
+```
+
+| 方法 | 返回值 | 描述 |
+|--------|---------|-------------|
+| `results.get_shots()` | `list[Shot]` | 获取匹配的片段列表 |
+| `results.compile()` | `str` | 将所有镜头编译为流 URL |
+| `results.play()` | `str` | 在浏览器中打开编译后的流 |
+
+### Shot 属性
+
+| 属性 | 类型 | 描述 |
+|----------|------|-------------|
+| `shot.video_id` | `str` | 源视频 ID |
+| `shot.video_length` | `float` | 源视频时长 |
+| `shot.video_title` | `str` | 源视频标题 |
+| `shot.start` | `float` | 开始时间（秒） |
+| `shot.end` | `float` | 结束时间（秒） |
+| `shot.text` | `str` | 匹配的文本内容 |
+| `shot.search_score` | `float` | 搜索相关性分数 |
+
+| 方法 | 返回值 | 描述 |
+|--------|---------|-------------|
+| `shot.generate_stream()` | `str` | 流式传输此特定镜头 |
+| `shot.play()` | `str` | 在浏览器中打开镜头流 |
+
+## Meeting 对象
+
+```python
+meeting = coll.record_meeting(
+    meeting_url="https://meet.google.com/...",
+    bot_name="Bot",
+    callback_url=None,          # Webhook URL for status updates
+    callback_data=None,         # Optional dict passed through to callbacks
+    time_zone="UTC",            # Time zone for the meeting
+)
+```
+
+### Meeting 属性
+
+| 属性 | 类型 | 描述 |
+|----------|------|-------------|
+| `meeting.id` | `str` | 唯一会议 ID |
+| `meeting.collection_id` | `str` | 父集合 ID |
+| `meeting.status` | `str` | 当前状态 |
+| `meeting.video_id` | `str` | 录制视频 ID（完成后） |
+| `meeting.bot_name` | `str` | 机器人名称 |
+| `meeting.meeting_title` | `str` | 会议标题 |
+| `meeting.meeting_url` | `str` | 会议 URL |
+| `meeting.speaker_timeline` | `dict` | 发言人时间线数据 |
+| `meeting.is_active` | `bool` | 如果正在初始化或处理中则为真 |
+| `meeting.is_completed` | `bool` | 如果已完成则为真 |
+
+### Meeting 方法
+
+| 方法 | 返回值 | 描述 |
+|--------|---------|-------------|
+| `meeting.refresh()` | `Meeting` | 从服务器刷新数据 |
+| `meeting.wait_for_status(target_status, timeout=14400, interval=120)` | `bool` | 轮询直到达到指定状态 |
+
+## RTStream 与 Capture
+
+关于 RTStream（实时摄取、索引、转录），请参阅 [rtstream-reference.md](rtstream-reference.md)。
+
+关于捕获会话（桌面录制、CaptureClient、频道），请参阅 [capture-reference.md](capture-reference.md)。
+
+## 枚举与常量
+
+### SearchType
+
+```python
+from videodb import SearchType
+
+SearchType.semantic    # Natural language semantic search
+SearchType.keyword     # Exact keyword matching
+SearchType.scene       # Visual scene search (may require paid plan)
+SearchType.llm         # LLM-powered search
+```
+
+### SceneExtractionType
+
+```python
+from videodb import SceneExtractionType
+
+SceneExtractionType.shot_based   # Automatic shot boundary detection
+SceneExtractionType.time_based   # Fixed time interval extraction
+SceneExtractionType.transcript   # Transcript-based scene extraction
+```
+
+### SubtitleStyle
+
+```python
+from videodb import SubtitleStyle
+
+style = SubtitleStyle(
+    font_name="Arial",
+    font_size=18,
+    primary_colour="&H00FFFFFF",
+    bold=False,
+    # ... see SubtitleStyle for all options
+)
+video.add_subtitle(style=style)
+```
+
+### SubtitleAlignment 与 SubtitleBorderStyle
+
+```python
+from videodb import SubtitleAlignment, SubtitleBorderStyle
+```
+
+### TextStyle
+
+```python
+from videodb import TextStyle
+# or: from videodb.asset import TextStyle
+
+style = TextStyle(
+    fontsize=24,
+    fontcolor="black",
+    boxcolor="white",
+    font="Sans",
+    text_align="T",
+    alpha=1.0,
+)
+```
+
+### 其他常量
+
+```python
+from videodb import (
+    IndexType,          # spoken_word, scene
+    MediaType,          # video, audio, image
+    Segmenter,          # word, sentence, time
+    SegmentationType,   # sentence, llm
+    TranscodeMode,      # economy, lightning
+    ResizeMode,         # crop, fit, pad
+    ReframeMode,        # simple, smart
+    RTStreamChannelType,
+)
+```
+
+## 异常
+
+```python
+from videodb.exceptions import (
+    AuthenticationError,     # Invalid or missing API key
+    InvalidRequestError,     # Bad parameters or malformed request
+    RequestTimeoutError,     # Request timed out
+    SearchError,             # Search operation failure (e.g. not indexed)
+    VideodbError,            # Base exception for all VideoDB errors
+)
+```
+
+| 异常 | 常见原因 |
+|-----------|-------------|
+| `AuthenticationError` | 缺少或无效的 `VIDEO_DB_API_KEY` |
+| `InvalidRequestError` | 无效 URL、不支持的格式、错误参数 |
+| `RequestTimeoutError` | 服务器响应时间过长 |
+| `SearchError` | 在索引前进行搜索、无效的搜索类型 |
+| `VideodbError` | 服务器错误、网络问题、通用故障 |
--- a/docs/zh-CN/skills/videodb/reference/capture-reference.md
+++ b/docs/zh-CN/skills/videodb/reference/capture-reference.md
@@ -0,0 +1,416 @@
+# 捕获参考
+
+VideoDB 捕获会话的代码级详情。工作流程指南请参阅 [capture.md](capture.md)。
+
+***
+
+## WebSocket 事件
+
+来自捕获会话和 AI 流水线的实时事件。无需 webhook 或轮询。
+
+使用 [scripts/ws\_listener.py](../../../../../skills/videodb/scripts/ws_listener.py) 连接并将事件转储到 `${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_events.jsonl`。
+
+### 事件通道
+
+| 通道 | 来源 | 内容 |
+|---------|--------|---------|
+| `capture_session` | 会话生命周期 | 状态变更 |
+| `transcript` | `start_transcript()` | 语音转文字 |
+| `visual_index` / `scene_index` | `index_visuals()` | 视觉分析 |
+| `audio_index` | `index_audio()` | 音频分析 |
+| `alert` | `create_alert()` | 警报通知 |
+
+### 会话生命周期事件
+
+| 事件 | 状态 | 关键数据 |
+|-------|--------|----------|
+| `capture_session.created` | `created` | — |
+| `capture_session.starting` | `starting` | — |
+| `capture_session.active` | `active` | `rtstreams[]` |
+| `capture_session.stopping` | `stopping` | — |
+| `capture_session.stopped` | `stopped` | — |
+| `capture_session.exported` | `exported` | `exported_video_id`, `stream_url`, `player_url` |
+| `capture_session.failed` | `failed` | `error` |
+
+### 事件结构
+
+**转录事件：**
+
+```json
+{
+  "channel": "transcript",
+  "rtstream_id": "rts-xxx",
+  "rtstream_name": "mic:default",
+  "data": {
+    "text": "Let's schedule the meeting for Thursday",
+    "is_final": true,
+    "start": 1710000001234,
+    "end": 1710000002345
+  }
+}
+```
+
+**视觉索引事件：**
+
+```json
+{
+  "channel": "visual_index",
+  "rtstream_id": "rts-xxx",
+  "rtstream_name": "display:1",
+  "data": {
+    "text": "User is viewing a Slack conversation with 3 unread messages",
+    "start": 1710000012340,
+    "end": 1710000018900
+  }
+}
+```
+
+**音频索引事件：**
+
+```json
+{
+  "channel": "audio_index",
+  "rtstream_id": "rts-xxx",
+  "rtstream_name": "mic:default",
+  "data": {
+    "text": "Discussion about scheduling a team meeting",
+    "start": 1710000021500,
+    "end": 1710000029200
+  }
+}
+```
+
+**会话激活事件：**
+
+```json
+{
+  "event": "capture_session.active",
+  "capture_session_id": "cap-xxx",
+  "status": "active",
+  "data": {
+    "rtstreams": [
+      { "rtstream_id": "rts-1", "name": "mic:default", "media_types": ["audio"] },
+      { "rtstream_id": "rts-2", "name": "system_audio:default", "media_types": ["audio"] },
+      { "rtstream_id": "rts-3", "name": "display:1", "media_types": ["video"] }
+    ]
+  }
+}
+```
+
+**会话导出事件：**
+
+```json
+{
+  "event": "capture_session.exported",
+  "capture_session_id": "cap-xxx",
+  "status": "exported",
+  "data": {
+    "exported_video_id": "v_xyz789",
+    "stream_url": "https://stream.videodb.io/...",
+    "player_url": "https://console.videodb.io/player?url=..."
+  }
+}
+```
+
+> 有关最新详情，请参阅 [VideoDB 实时上下文文档](https://docs.videodb.io/pages/ingest/capture-sdks/realtime-context.md)。
+
+***
+
+## 事件持久化
+
+使用 `ws_listener.py` 将所有 WebSocket 事件转储到 JSONL 文件以供后续分析。
+
+### 启动监听器并获取 WebSocket ID
+
+```bash
+# Start with --clear to clear old events (recommended for new sessions)
+python scripts/ws_listener.py --clear &
+
+# Append to existing events (for reconnects)
+python scripts/ws_listener.py &
+```
+
+或者指定自定义输出目录：
+
+```bash
+python scripts/ws_listener.py --clear /path/to/output &
+# Or via environment variable:
+VIDEODB_EVENTS_DIR=/path/to/output python scripts/ws_listener.py --clear &
+```
+
+脚本在第一行输出 `WS_ID=<connection_id>`，然后无限期监听。
+
+**获取 ws\_id：**
+
+```bash
+cat "${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_ws_id"
+```
+
+**停止监听器：**
+
+```bash
+kill "$(cat "${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}/videodb_ws_pid")"
+```
+
+**接受 `ws_connection_id` 的函数：**
+
+| 函数 | 用途 |
+|----------|---------|
+| `conn.create_capture_session()` | 会话生命周期事件 |
+| RTStream 方法 | 参见 [rtstream-reference.md](rtstream-reference.md) |
+
+**输出文件**（位于输出目录中，默认为 `${XDG_STATE_HOME:-$HOME/.local/state}/videodb`）：
+
+* `videodb_ws_id` - WebSocket 连接 ID
+* `videodb_events.jsonl` - 所有事件
+* `videodb_ws_pid` - 进程 ID，便于终止
+
+**特性：**
+
+* `--clear` 标志，用于在启动时清除事件文件（用于新会话）
+* 连接断开时，使用指数退避自动重连
+* 在 SIGINT/SIGTERM 时优雅关闭
+* 连接状态日志记录
+
+### JSONL 格式
+
+每行是一个添加了时间戳的 JSON 对象：
+
+```json
+{"ts": "2026-03-02T10:15:30.123Z", "unix_ts": 1772446530.123, "channel": "visual_index", "data": {"text": "..."}}
+{"ts": "2026-03-02T10:15:31.456Z", "unix_ts": 1772446531.456, "event": "capture_session.active", "capture_session_id": "cap-xxx"}
+```
+
+### 读取事件
+
+```python
+import json
+import time
+from pathlib import Path
+
+events_path = Path.home() / ".local" / "state" / "videodb" / "videodb_events.jsonl"
+transcripts = []
+recent = []
+visual = []
+
+cutoff = time.time() - 600
+with events_path.open(encoding="utf-8") as handle:
+    for line in handle:
+        event = json.loads(line)
+        if event.get("channel") == "transcript":
+            transcripts.append(event)
+        if event.get("unix_ts", 0) > cutoff:
+            recent.append(event)
+        if (
+            event.get("channel") == "visual_index"
+            and "code" in event.get("data", {}).get("text", "").lower()
+        ):
+            visual.append(event)
+```
+
+***
+
+## WebSocket 连接
+
+连接以接收来自转录和索引流水线的实时 AI 结果。
+
+```python
+ws_wrapper = conn.connect_websocket()
+ws = await ws_wrapper.connect()
+ws_id = ws.connection_id
+```
+
+| 属性 / 方法 | 类型 | 描述 |
+|-------------------|------|-------------|
+| `ws.connection_id` | `str` | 唯一连接 ID（传递给 AI 流水线方法） |
+| `ws.receive()` | `AsyncIterator[dict]` | 异步迭代器，产生实时消息 |
+
+***
+
+## CaptureSession
+
+### 连接方法
+
+| 方法 | 返回值 | 描述 |
+|--------|---------|-------------|
+| `conn.create_capture_session(end_user_id, collection_id, ws_connection_id, metadata)` | `CaptureSession` | 创建新的捕获会话 |
+| `conn.get_capture_session(capture_session_id)` | `CaptureSession` | 检索现有的捕获会话 |
+| `conn.generate_client_token()` | `str` | 生成客户端身份验证令牌 |
+
+### 创建捕获会话
+
+```python
+from pathlib import Path
+
+ws_id = (Path.home() / ".local" / "state" / "videodb" / "videodb_ws_id").read_text().strip()
+
+session = conn.create_capture_session(
+    end_user_id="user-123",  # required
+    collection_id="default",
+    ws_connection_id=ws_id,
+    metadata={"app": "my-app"},
+)
+print(f"Session ID: {session.id}")
+```
+
+> **注意：** `end_user_id` 是必需的，用于标识发起捕获的用户。用于测试或演示目的时，任何唯一的字符串标识符都有效（例如 `"demo-user"`、`"test-123"`）。
+
+### CaptureSession 属性
+
+| 属性 | 类型 | 描述 |
+|----------|------|-------------|
+| `session.id` | `str` | 唯一的捕获会话 ID |
+
+### CaptureSession 方法
+
+| 方法 | 返回值 | 描述 |
+|--------|---------|-------------|
+| `session.get_rtstream(type)` | `list[RTStream]` | 按类型获取 RTStream：`"mic"`、`"screen"` 或 `"system_audio"` |
+
+### 生成客户端令牌
+
+```python
+token = conn.generate_client_token()
+```
+
+***
+
+## CaptureClient
+
+客户端在用户机器上运行，处理权限、通道发现和流传输。
+
+```python
+from videodb.capture import CaptureClient
+
+client = CaptureClient(client_token=token)
+```
+
+### CaptureClient 方法
+
+| 方法 | 返回值 | 描述 |
+|--------|---------|-------------|
+| `await client.request_permission(type)` | `None` | 请求设备权限（`"microphone"`、`"screen_capture"`） |
+| `await client.list_channels()` | `Channels` | 发现可用的音频/视频通道 |
+| `await client.start_capture_session(capture_session_id, channels, primary_video_channel_id)` | `None` | 开始流式传输选定的通道 |
+| `await client.stop_capture()` | `None` | 优雅地停止捕获会话 |
+| `await client.shutdown()` | `None` | 清理客户端资源 |
+
+### 请求权限
+
+```python
+await client.request_permission("microphone")
+await client.request_permission("screen_capture")
+```
+
+### 启动会话
+
+```python
+selected_channels = [c for c in [mic, display, system_audio] if c]
+await client.start_capture_session(
+    capture_session_id=session.id,
+    channels=selected_channels,
+    primary_video_channel_id=display.id if display else None,
+)
+```
+
+### 停止会话
+
+```python
+await client.stop_capture()
+await client.shutdown()
+```
+
+***
+
+## 通道
+
+由 `client.list_channels()` 返回。按类型分组可用设备。
+
+```python
+channels = await client.list_channels()
+for ch in channels.all():
+    print(f"  {ch.id} ({ch.type}): {ch.name}")
+
+mic = channels.mics.default
+display = channels.displays.default
+system_audio = channels.system_audio.default
+```
+
+### 通道组
+
+| 属性 | 类型 | 描述 |
+|----------|------|-------------|
+| `channels.mics` | `ChannelGroup` | 可用的麦克风 |
+| `channels.displays` | `ChannelGroup` | 可用的屏幕显示器 |
+| `channels.system_audio` | `ChannelGroup` | 可用的系统音频源 |
+
+### ChannelGroup 方法与属性
+
+| 成员 | 类型 | 描述 |
+|--------|------|-------------|
+| `group.default` | `Channel` | 组中的默认通道（或 `None`） |
+| `group.all()` | `list[Channel]` | 组中的所有通道 |
+
+### 通道属性
+
+| 属性 | 类型 | 描述 |
+|----------|------|-------------|
+| `ch.id` | `str` | 唯一的通道 ID |
+| `ch.type` | `str` | 通道类型（`"mic"`、`"display"`、`"system_audio"`） |
+| `ch.name` | `str` | 人类可读的通道名称 |
+| `ch.store` | `bool` | 是否持久化录制（设置为 `True` 以保存） |
+
+没有 `store = True`，流会实时处理但不保存。
+
+***
+
+## RTStream 和 AI 流水线
+
+会话激活后，使用 `session.get_rtstream()` 检索 RTStream 对象。
+
+关于 RTStream 方法（索引、转录、警报、批处理配置），请参阅 [rtstream-reference.md](rtstream-reference.md)。
+
+***
+
+## 会话生命周期
+
+```
+  create_capture_session()
+          │
+          v
+  ┌───────────────┐
+  │    created     │
+  └───────┬───────┘
+          │  client.start_capture_session()
+          v
+  ┌───────────────┐     WebSocket: capture_session.starting
+  │   starting     │ ──> Capture channels connect
+  └───────┬───────┘
+          │
+          v
+  ┌───────────────┐     WebSocket: capture_session.active
+  │    active      │ ──> Start AI pipelines
+  └───────┬──────────────┐
+          │              │
+          │              v
+          │      ┌───────────────┐     WebSocket: capture_session.failed
+          │      │    failed      │ ──> Inspect error payload and retry setup
+          │      └───────────────┘
+          │      unrecoverable capture error
+          │
+          │  client.stop_capture()
+          v
+  ┌───────────────┐     WebSocket: capture_session.stopping
+  │   stopping     │ ──> Finalize streams
+  └───────┬───────┘
+          │
+          v
+  ┌───────────────┐     WebSocket: capture_session.stopped
+  │   stopped      │ ──> All streams finalized
+  └───────┬───────┘
+          │  (if store=True)
+          v
+  ┌───────────────┐     WebSocket: capture_session.exported
+  │   exported     │ ──> Access video_id, stream_url, player_url
+  └───────────────┘
+```
--- a/docs/zh-CN/skills/videodb/reference/capture.md
+++ b/docs/zh-CN/skills/videodb/reference/capture.md
@@ -0,0 +1,104 @@
+# Capture 指南
+
+## 概述
+
+VideoDB Capture 支持实时屏幕和音频录制，并具备 AI 处理能力。桌面捕获目前仅支持 **macOS**。
+
+关于代码层面的详细信息（SDK 方法、事件结构、AI 管道），请参阅 [capture-reference.md](capture-reference.md)。
+
+## 快速开始
+
+1. **启动 WebSocket 监听器**：`python scripts/ws_listener.py --clear &`
+2. **运行捕获代码**（见下方完整捕获工作流）
+3. **事件写入到**：`/tmp/videodb_events.jsonl`
+
+***
+
+## 完整捕获工作流
+
+无需 webhook 或轮询。WebSocket 会传递所有事件，包括会话生命周期事件。
+
+> **关键提示：** `CaptureClient` 必须在整个捕获期间持续运行。它运行本地录制器二进制文件，将屏幕/音频数据流式传输到 VideoDB。如果创建 `CaptureClient` 的 Python 进程退出，录制器二进制文件将被终止，捕获会静默停止。请始终将捕获代码作为**长期运行的后台进程**运行（例如 `nohup python capture_script.py &`），并使用信号处理（`asyncio.Event` + `SIGINT`/`SIGTERM`）来保持其存活，直到您明确停止它。
+
+1. 在后台**启动 WebSocket 监听器**，使用 `--clear` 标志来清除旧事件。等待其创建 WebSocket ID 文件。
+
+2. **读取 WebSocket ID**。此 ID 是捕获会话和 AI 管道所必需的。
+
+3. **创建捕获会话**，并为桌面客户端生成客户端令牌。
+
+4. 使用令牌**初始化 CaptureClient**。请求麦克风和屏幕捕获权限。
+
+5. **列出并选择通道**（麦克风、显示器、系统音频）。在您希望持久化为视频的通道上设置 `store = True`。
+
+6. 使用选定的通道**启动会话**。
+
+7. 通过读取事件直到看到 `capture_session.active` 来**等待会话激活**。此事件包含 `rtstreams` 数组。将会话信息（会话 ID、RTStream ID）保存到文件（例如 `/tmp/videodb_capture_info.json`），以便其他脚本可以读取。
+
+8. **保持进程存活**。使用 `asyncio.Event` 配合 `SIGINT`/`SIGTERM` 的信号处理器来阻塞进程，直到显式停止。写入一个 PID 文件（例如 `/tmp/videodb_capture_pid`），以便稍后可以使用 `kill $(cat /tmp/videodb_capture_pid)` 停止该进程。PID 文件应在每次运行时被覆盖，以便重新运行时始终具有正确的 PID。
+
+9. **启动 AI 管道**（在单独的命令/脚本中）对每个 RTStream 进行音频索引和视觉索引。从保存的会话信息文件中读取 RTStream ID。
+
+10. **编写自定义事件处理逻辑**（在单独的命令/脚本中），根据您的用例读取实时事件。示例：
+    * 当 `visual_index` 提到 "Slack" 时记录 Slack 活动
+    * 当 `audio_index` 事件到达时总结讨论
+    * 当 `transcript` 中出现特定关键词时触发警报
+    * 从屏幕描述中跟踪应用程序使用情况
+
+11. **停止捕获** - 完成后，向捕获进程发送 SIGTERM。它应在信号处理器中调用 `client.stop_capture()` 和 `client.shutdown()`。
+
+12. **等待导出** - 通过读取事件直到看到 `capture_session.exported`。此事件包含 `exported_video_id`、`stream_url` 和 `player_url`。这可能在停止捕获后需要几秒钟。
+
+13. **停止 WebSocket 监听器** - 收到导出事件后，使用 `kill $(cat /tmp/videodb_ws_pid)` 来干净地终止它。
+
+***
+
+## 关机顺序
+
+正确的关机顺序对于确保捕获所有事件非常重要：
+
+1. **停止捕获会话** — `client.stop_capture()` 然后 `client.shutdown()`
+2. **等待导出事件** — 轮询 `/tmp/videodb_events.jsonl` 以查找 `capture_session.exported`
+3. **停止 WebSocket 监听器** — `kill $(cat /tmp/videodb_ws_pid)`
+
+在收到导出事件之前，请**不要**杀死 WebSocket 监听器，否则您将错过最终的视频 URL。
+
+***
+
+## 脚本
+
+| 脚本 | 描述 |
+|--------|-------------|
+| `scripts/ws_listener.py` | WebSocket 事件监听器（转储为 JSONL） |
+
+### ws\_listener.py 用法
+
+```bash
+# Start listener in background (append to existing events)
+python scripts/ws_listener.py &
+
+# Start listener with clear (new session, clears old events)
+python scripts/ws_listener.py --clear &
+
+# Custom output directory
+python scripts/ws_listener.py --clear /path/to/events &
+
+# Stop the listener
+kill $(cat /tmp/videodb_ws_pid)
+```
+
+**选项：**
+
+* `--clear`：在启动前清除事件文件。启动新捕获会话时使用。
+
+**输出文件：**
+
+* `videodb_events.jsonl` - 所有 WebSocket 事件
+* `videodb_ws_id` - WebSocket 连接 ID（用于 `ws_connection_id` 参数）
+* `videodb_ws_pid` - 进程 ID（用于停止监听器）
+
+**功能：**
+
+* 连接断开时自动重连，并采用指数退避
+* 收到 SIGINT/SIGTERM 时优雅关机
+* PID 文件，便于进程管理
+* 连接状态日志记录
--- a/docs/zh-CN/skills/videodb/reference/editor.md
+++ b/docs/zh-CN/skills/videodb/reference/editor.md
@@ -0,0 +1,443 @@
+# 时间线编辑指南
+
+VideoDB 提供了一个非破坏性的时间线编辑器，用于从多个素材合成视频、添加文本和图像叠加、混合音轨以及修剪片段——所有这些都在服务器端完成，无需重新编码或本地工具。可用于修剪、合并片段、在视频上叠加音频/音乐、添加字幕以及叠加文本或图像。
+
+## 前提条件
+
+视频、音频和图像**必须上传**到集合中，才能用作时间线素材。对于字幕叠加，视频还必须**为口语单词建立索引**。
+
+## 核心概念
+
+### 时间线
+
+`Timeline` 是一个虚拟合成层。素材可以**内联**（在主轨道上顺序放置）或作为**叠加层**（在特定时间戳分层放置）放置在时间线上。不会修改原始媒体；最终流是按需编译的。
+
+```python
+from videodb.timeline import Timeline
+
+timeline = Timeline(conn)
+```
+
+### 素材
+
+时间线上的每个元素都是一个**素材**。VideoDB 提供五种素材类型：
+
+| 素材 | 导入 | 主要用途 |
+|-------|--------|-------------|
+| `VideoAsset` | `from videodb.asset import VideoAsset` | 视频片段（修剪、排序） |
+| `AudioAsset` | `from videodb.asset import AudioAsset` | 音乐、音效、旁白 |
+| `ImageAsset` | `from videodb.asset import ImageAsset` | 徽标、缩略图、叠加层 |
+| `TextAsset` | `from videodb.asset import TextAsset, TextStyle` | 标题、字幕、下三分之一字幕 |
+| `CaptionAsset` | `from videodb.editor import CaptionAsset` | 自动渲染的字幕（编辑器 API） |
+
+## 构建时间线
+
+### 内联添加视频片段
+
+内联素材在主视频轨道上一个接一个播放。`add_inline` 方法只接受 `VideoAsset`：
+
+```python
+from videodb.asset import VideoAsset
+
+video_a = coll.get_video(video_id_a)
+video_b = coll.get_video(video_id_b)
+
+timeline = Timeline(conn)
+timeline.add_inline(VideoAsset(asset_id=video_a.id))
+timeline.add_inline(VideoAsset(asset_id=video_b.id))
+
+stream_url = timeline.generate_stream()
+```
+
+### 修剪 / 子片段
+
+在 `VideoAsset` 上使用 `start` 和 `end` 来提取一部分：
+
+```python
+# Take only seconds 10–30 from the source video
+clip = VideoAsset(asset_id=video.id, start=10, end=30)
+timeline.add_inline(clip)
+```
+
+### VideoAsset 参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `asset_id` | `str` | 必填 | 视频媒体 ID |
+| `start` | `float` | `0` | 修剪开始时间（秒） |
+| `end` | `float\|None` | `None` | 修剪结束时间（`None` = 完整视频） |
+
+> **警告：** SDK 不会验证负时间戳。传递 `start=-5` 会被静默接受，但会产生损坏或意外的输出。在创建 `VideoAsset` 之前，请始终确保 `start >= 0`、`start < end` 和 `end <= video.length`。
+
+## 文本叠加
+
+在时间线的任意点添加标题、下三分之一字幕或说明文字：
+
+```python
+from videodb.asset import TextAsset, TextStyle
+
+title = TextAsset(
+    text="Welcome to the Demo",
+    duration=5,
+    style=TextStyle(
+        fontsize=36,
+        fontcolor="white",
+        boxcolor="black",
+        alpha=0.8,
+        font="Sans",
+    ),
+)
+
+# Overlay the title at the very start (t=0)
+timeline.add_overlay(0, title)
+```
+
+### TextStyle 参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `fontsize` | `int` | `24` | 字体大小（像素） |
+| `fontcolor` | `str` | `"black"` | CSS 颜色名称或十六进制值 |
+| `fontcolor_expr` | `str` | `""` | 动态字体颜色表达式 |
+| `alpha` | `float` | `1.0` | 文本不透明度（0.0–1.0） |
+| `font` | `str` | `"Sans"` | 字体系列 |
+| `box` | `bool` | `True` | 启用背景框 |
+| `boxcolor` | `str` | `"white"` | 背景框颜色 |
+| `boxborderw` | `str` | `"10"` | 框边框宽度 |
+| `boxw` | `int` | `0` | 框宽度覆盖 |
+| `boxh` | `int` | `0` | 框高度覆盖 |
+| `line_spacing` | `int` | `0` | 行间距 |
+| `text_align` | `str` | `"T"` | 框内文本对齐方式 |
+| `y_align` | `str` | `"text"` | 垂直对齐参考 |
+| `borderw` | `int` | `0` | 文本边框宽度 |
+| `bordercolor` | `str` | `"black"` | 文本边框颜色 |
+| `expansion` | `str` | `"normal"` | 文本扩展模式 |
+| `basetime` | `int` | `0` | 基于时间的表达式的基础时间 |
+| `fix_bounds` | `bool` | `False` | 固定文本边界 |
+| `text_shaping` | `bool` | `True` | 启用文本整形 |
+| `shadowcolor` | `str` | `"black"` | 阴影颜色 |
+| `shadowx` | `int` | `0` | 阴影 X 偏移 |
+| `shadowy` | `int` | `0` | 阴影 Y 偏移 |
+| `tabsize` | `int` | `4` | 制表符大小（空格数） |
+| `x` | `str` | `"(main_w-text_w)/2"` | 水平位置表达式 |
+| `y` | `str` | `"(main_h-text_h)/2"` | 垂直位置表达式 |
+
+## 音频叠加
+
+在主视频轨道上叠加背景音乐、音效或旁白：
+
+```python
+from videodb.asset import AudioAsset
+
+music = coll.get_audio(music_id)
+
+audio_layer = AudioAsset(
+    asset_id=music.id,
+    disable_other_tracks=False,
+    fade_in_duration=2,
+    fade_out_duration=2,
+)
+
+# Start the music at t=0, overlaid on the video track
+timeline.add_overlay(0, audio_layer)
+```
+
+### AudioAsset 参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `asset_id` | `str` | 必填 | 音频媒体 ID |
+| `start` | `float` | `0` | 修剪开始时间（秒） |
+| `end` | `float\|None` | `None` | 修剪结束时间（`None` = 完整音频） |
+| `disable_other_tracks` | `bool` | `True` | 为 True 时，静音其他音轨 |
+| `fade_in_duration` | `float` | `0` | 淡入秒数（最大 5） |
+| `fade_out_duration` | `float` | `0` | 淡出秒数（最大 5） |
+
+## 图像叠加
+
+添加徽标、水印或生成的图像作为叠加层：
+
+```python
+from videodb.asset import ImageAsset
+
+logo = coll.get_image(logo_id)
+
+logo_overlay = ImageAsset(
+    asset_id=logo.id,
+    duration=10,
+    width=120,
+    height=60,
+    x=20,
+    y=20,
+)
+
+timeline.add_overlay(0, logo_overlay)
+```
+
+### ImageAsset 参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `asset_id` | `str` | 必填 | 图像媒体 ID |
+| `width` | `int\|str` | `100` | 显示宽度 |
+| `height` | `int\|str` | `100` | 显示高度 |
+| `x` | `int` | `80` | 水平位置（距离左侧的像素） |
+| `y` | `int` | `20` | 垂直位置（距离顶部的像素） |
+| `duration` | `float\|None` | `None` | 显示时长（秒） |
+
+## 字幕叠加
+
+有两种方式可以为视频添加字幕。
+
+### 方法 1：字幕工作流（最简单）
+
+使用 `video.add_subtitle()` 将字幕直接烧录到视频流中。这在内部使用 `videodb.timeline.Timeline`：
+
+```python
+from videodb import SubtitleStyle
+
+# Video must have spoken words indexed first (force=True skips if already done)
+video.index_spoken_words(force=True)
+
+# Add subtitles with default styling
+stream_url = video.add_subtitle()
+
+# Or customise the subtitle style
+stream_url = video.add_subtitle(style=SubtitleStyle(
+    font_name="Arial",
+    font_size=22,
+    primary_colour="&H00FFFFFF",
+    bold=True,
+))
+```
+
+### 方法 2：编辑器 API（高级）
+
+编辑器 API（`videodb.editor`）提供了一个基于轨道的合成系统，包含 `CaptionAsset`、`Clip`、`Track` 及其自身的 `Timeline`。这是一个与上述使用的 `videodb.timeline.Timeline` 独立的 API。
+
+```python
+from videodb.editor import (
+    CaptionAsset,
+    Clip,
+    Track,
+    Timeline as EditorTimeline,
+    FontStyling,
+    BorderAndShadow,
+    Positioning,
+    CaptionAnimation,
+)
+
+# Video must have spoken words indexed first (force=True skips if already done)
+video.index_spoken_words(force=True)
+
+# Create a caption asset
+caption = CaptionAsset(
+    src="auto",
+    font=FontStyling(name="Clear Sans", size=30),
+    primary_color="&H00FFFFFF",
+    back_color="&H00000000",
+    border=BorderAndShadow(outline=1),
+    position=Positioning(margin_v=30),
+    animation=CaptionAnimation.box_highlight,
+)
+
+# Build an editor timeline with tracks and clips
+editor_tl = EditorTimeline(conn)
+track = Track()
+track.add_clip(start=0, clip=Clip(asset=caption, duration=video.length))
+editor_tl.add_track(track)
+stream_url = editor_tl.generate_stream()
+```
+
+### CaptionAsset 参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `src` | `str` | `"auto"` | 字幕来源（`"auto"` 或 base64 ASS 字符串） |
+| `font` | `FontStyling\|None` | `FontStyling()` | 字体样式（名称、大小、粗体、斜体等） |
+| `primary_color` | `str` | `"&H00FFFFFF"` | 主文本颜色（ASS 格式） |
+| `secondary_color` | `str` | `"&H000000FF"` | 次文本颜色（ASS 格式） |
+| `back_color` | `str` | `"&H00000000"` | 背景颜色（ASS 格式） |
+| `border` | `BorderAndShadow\|None` | `BorderAndShadow()` | 边框和阴影样式 |
+| `position` | `Positioning\|None` | `Positioning()` | 字幕对齐方式和边距 |
+| `animation` | `CaptionAnimation\|None` | `None` | 动画效果（例如，`box_highlight`、`reveal`、`karaoke`） |
+
+## 编译与流式传输
+
+组装好时间线后，将其编译成可流式传输的 URL。流是即时生成的——无需渲染等待时间。
+
+```python
+stream_url = timeline.generate_stream()
+print(f"Stream: {stream_url}")
+```
+
+有关更多流式传输选项（分段流、搜索到流、音频播放），请参阅 [streaming.md](streaming.md)。
+
+## 完整工作流示例
+
+### 带标题卡的高光集锦
+
+```python
+import videodb
+from videodb import SearchType
+from videodb.exceptions import InvalidRequestError
+from videodb.timeline import Timeline
+from videodb.asset import VideoAsset, TextAsset, TextStyle
+
+conn = videodb.connect()
+coll = conn.get_collection()
+video = coll.get_video("your-video-id")
+
+# 1. Search for key moments
+video.index_spoken_words(force=True)
+try:
+    results = video.search("product announcement", search_type=SearchType.semantic)
+    shots = results.get_shots()
+except InvalidRequestError as exc:
+    if "No results found" in str(exc):
+        shots = []
+    else:
+        raise
+
+# 2. Build timeline
+timeline = Timeline(conn)
+
+# Title card
+title = TextAsset(
+    text="Product Launch Highlights",
+    duration=4,
+    style=TextStyle(fontsize=48, fontcolor="white", boxcolor="#1a1a2e", alpha=0.95),
+)
+timeline.add_overlay(0, title)
+
+# Append each matching clip
+for shot in shots:
+    asset = VideoAsset(asset_id=shot.video_id, start=shot.start, end=shot.end)
+    timeline.add_inline(asset)
+
+# 3. Generate stream
+stream_url = timeline.generate_stream()
+print(f"Highlight reel: {stream_url}")
+```
+
+### 带背景音乐的徽标叠加
+
+```python
+import videodb
+from videodb.timeline import Timeline
+from videodb.asset import VideoAsset, AudioAsset, ImageAsset
+
+conn = videodb.connect()
+coll = conn.get_collection()
+
+main_video = coll.get_video(main_video_id)
+music = coll.get_audio(music_id)
+logo = coll.get_image(logo_id)
+
+timeline = Timeline(conn)
+
+# Main video track
+timeline.add_inline(VideoAsset(asset_id=main_video.id))
+
+# Background music — disable_other_tracks=False to mix with video audio
+timeline.add_overlay(
+    0,
+    AudioAsset(asset_id=music.id, disable_other_tracks=False, fade_in_duration=3),
+)
+
+# Logo in top-right corner for first 10 seconds
+timeline.add_overlay(
+    0,
+    ImageAsset(asset_id=logo.id, duration=10, x=1140, y=20, width=120, height=60),
+)
+
+stream_url = timeline.generate_stream()
+print(f"Final video: {stream_url}")
+```
+
+### 来自多个视频的多片段蒙太奇
+
+```python
+import videodb
+from videodb.timeline import Timeline
+from videodb.asset import VideoAsset, TextAsset, TextStyle
+
+conn = videodb.connect()
+coll = conn.get_collection()
+
+clips = [
+    {"video_id": "vid_001", "start": 5, "end": 15, "label": "Scene 1"},
+    {"video_id": "vid_002", "start": 0, "end": 20, "label": "Scene 2"},
+    {"video_id": "vid_003", "start": 30, "end": 45, "label": "Scene 3"},
+]
+
+timeline = Timeline(conn)
+timeline_offset = 0.0
+
+for clip in clips:
+    # Add a label as an overlay on each clip
+    label = TextAsset(
+        text=clip["label"],
+        duration=2,
+        style=TextStyle(fontsize=32, fontcolor="white", boxcolor="#333333"),
+    )
+    timeline.add_inline(
+        VideoAsset(asset_id=clip["video_id"], start=clip["start"], end=clip["end"])
+    )
+    timeline.add_overlay(timeline_offset, label)
+    timeline_offset += clip["end"] - clip["start"]
+
+stream_url = timeline.generate_stream()
+print(f"Montage: {stream_url}")
+```
+
+## 两个时间线 API
+
+VideoDB 有两个独立的时间线系统。它们**不可互换**：
+
+| | `videodb.timeline.Timeline` | `videodb.editor.Timeline`（编辑器 API） |
+|---|---|---|
+| **导入** | `from videodb.timeline import Timeline` | `from videodb.editor import Timeline as EditorTimeline` |
+| **素材** | `VideoAsset`、`AudioAsset`、`ImageAsset`、`TextAsset` | `CaptionAsset`、`Clip`、`Track` |
+| **方法** | `add_inline()`、`add_overlay()` | `add_track()` 配合 `Track` / `Clip` |
+| **最适合** | 视频合成、叠加、多片段编辑 | 带动画的字幕/字幕样式设计 |
+
+不要将一个 API 的素材混入另一个 API。`CaptionAsset` 仅适用于编辑器 API。`VideoAsset` / `AudioAsset` / `ImageAsset` / `TextAsset` 仅适用于 `videodb.timeline.Timeline`。
+
+## 限制与约束
+
+时间线编辑器专为**非破坏性线性合成**而设计。**不支持**以下操作：
+
+### 不支持的操作
+
+| 限制 | 详情 |
+|---|---|
+| **无过渡或效果** | 片段之间没有交叉淡入淡出、划像、溶解或过渡。所有剪辑都是硬切。 |
+| **无视频叠加视频（画中画）** | `add_inline()` 只接受 `VideoAsset`。无法将一个视频流叠加在另一个之上。图像叠加可以近似静态画中画，但不能是实时视频。 |
+| **无速度或播放控制** | 没有慢动作、快进、倒放或时间重映射。`VideoAsset` 没有 `speed` 参数。 |
+| **无裁剪、缩放或平移** | 无法裁剪视频帧的区域、应用缩放效果或在帧上平移。`video.reframe()` 仅用于宽高比转换。 |
+| **无视频滤镜或色彩分级** | 没有亮度、对比度、饱和度、色调或色彩校正调整。 |
+| **无动画文本** | `TextAsset` 在其整个持续时间内是静态的。没有淡入/淡出、移动或动画。对于动画字幕，请使用带有编辑器 API 的 `CaptionAsset`。 |
+| **无混合文本样式** | 单个 `TextAsset` 只有一个 `TextStyle`。无法在单个文本块内混合粗体、斜体或颜色。 |
+| **无空白或纯色片段** | 无法创建纯色帧、黑屏或独立的标题卡。文本和图像叠加需要在内联轨道上有 `VideoAsset` 作为底层。 |
+| **无音频音量控制** | `AudioAsset` 没有 `volume` 参数。音频要么是全音量，要么通过 `disable_other_tracks` 静音。无法以降低的音量混合。 |
+| **无关键帧动画** | 无法随时间改变叠加属性（例如，将图像从位置 A 移动到 B）。 |
+
+### 约束
+
+| 约束 | 详情 |
+|---|---|
+| **音频淡入淡出最长 5 秒** | `fade_in_duration` 和 `fade_out_duration` 各自上限为 5 秒。 |
+| **叠加层定位为绝对定位** | 叠加层使用时间轴起始点的绝对时间戳。重新排列内联片段不会移动其叠加层。 |
+| **内联轨道仅支持视频** | `add_inline()` 仅接受 `VideoAsset`。音频、图像和文本必须使用 `add_overlay()`。 |
+| **叠加层与片段无绑定关系** | 叠加层被放置在固定的时间轴时间戳上。无法将叠加层附加到特定的内联片段以使其随之移动。 |
+
+## 提示
+
+* **非破坏性**：时间轴从不修改源媒体。您可以使用相同的素材创建多个时间轴。
+* **叠加层堆叠**：多个叠加层可以在同一时间戳开始。音频叠加层会混合在一起；图像/文本叠加层按添加顺序分层叠加。
+* **内联轨道仅支持 VideoAsset**：`add_inline()` 仅接受 `VideoAsset`。对于 `AudioAsset`、`ImageAsset` 和 `TextAsset`，请使用 `add_overlay()`。
+* **裁剪精度**：`start`/`end` 在 `VideoAsset` 和 `AudioAsset` 上以秒为单位。
+* **静音视频音频**：在 `AudioAsset` 上设置 `disable_other_tracks=True`，以便在叠加音乐或旁白时静音原始视频音频。
+* **淡入淡出限制**：`fade_in_duration` 和 `fade_out_duration` 在 `AudioAsset` 上最长不超过 5 秒。
+* **生成媒体**：使用 `coll.generate_music()`、`coll.generate_sound_effect()`、`coll.generate_voice()` 和 `coll.generate_image()` 创建可立即用作时间轴素材的媒体。
--- a/docs/zh-CN/skills/videodb/reference/generative.md
+++ b/docs/zh-CN/skills/videodb/reference/generative.md
@@ -0,0 +1,331 @@
+# 生成式媒体指南
+
+VideoDB 提供 AI 驱动的图像、视频、音乐、音效、语音和文本内容生成。所有生成方法均在 **Collection** 对象上。
+
+## 先决条件
+
+在调用任何生成方法之前，您需要一个连接和一个集合引用：
+
+```python
+import videodb
+
+conn = videodb.connect()
+coll = conn.get_collection()
+```
+
+## 图像生成
+
+根据文本提示生成图像：
+
+```python
+image = coll.generate_image(
+    prompt="a futuristic cityscape at sunset with flying cars",
+    aspect_ratio="16:9",
+)
+
+# Access the generated image
+print(image.id)
+print(image.generate_url())  # returns a signed download URL
+```
+
+### generate\_image 参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `prompt` | `str` | 必需 | 要生成的图像的文本描述 |
+| `aspect_ratio` | `str` | `"1:1"` | 宽高比：`"1:1"`, `"9:16"`, `"16:9"`, `"4:3"`, 或 `"3:4"` |
+| `callback_url` | `str\|None` | `None` | 接收异步回调的 URL |
+
+返回一个 `Image` 对象，包含 `.id`、`.name` 和 `.collection_id`。`.url` 属性对于生成的图像可能为 `None` —— 始终使用 `image.generate_url()` 来获取可靠的签名下载 URL。
+
+> **注意：** 与 `Video` 对象（使用 `.generate_stream()`）不同，`Image` 对象使用 `.generate_url()` 来检索图像 URL。`.url` 属性仅针对某些图像类型（例如缩略图）填充。
+
+## 视频生成
+
+根据文本提示生成短视频片段：
+
+```python
+video = coll.generate_video(
+    prompt="a timelapse of a flower blooming in a garden",
+    duration=5,
+)
+
+stream_url = video.generate_stream()
+video.play()
+```
+
+### generate\_video 参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `prompt` | `str` | 必需 | 要生成的视频的文本描述 |
+| `duration` | `int` | `5` | 持续时间（秒）（必须是整数值，5-8） |
+| `callback_url` | `str\|None` | `None` | 接收异步回调的 URL |
+
+返回一个 `Video` 对象。生成的视频会自动添加到集合中，并且可以像任何上传的视频一样在时间线、搜索和编译中使用。
+
+## 音频生成
+
+VideoDB 为不同的音频类型提供了三种独立的方法。
+
+### 音乐
+
+根据文本描述生成背景音乐：
+
+```python
+music = coll.generate_music(
+    prompt="upbeat electronic music with a driving beat, suitable for a tech demo",
+    duration=30,
+)
+
+print(music.id)
+```
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `prompt` | `str` | 必需 | 音乐的文本描述 |
+| `duration` | `int` | `5` | 持续时间（秒） |
+| `callback_url` | `str\|None` | `None` | 接收异步回调的 URL |
+
+### 音效
+
+生成特定的音效：
+
+```python
+sfx = coll.generate_sound_effect(
+    prompt="thunderstorm with heavy rain and distant thunder",
+    duration=10,
+)
+```
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `prompt` | `str` | 必需 | 音效的文本描述 |
+| `duration` | `int` | `2` | 持续时间（秒） |
+| `config` | `dict` | `{}` | 附加配置 |
+| `callback_url` | `str\|None` | `None` | 接收异步回调的 URL |
+
+### 语音（文本转语音）
+
+从文本生成语音：
+
+```python
+voice = coll.generate_voice(
+    text="Welcome to our product demo. Today we'll walk through the key features.",
+    voice_name="Default",
+)
+```
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `text` | `str` | 必需 | 要转换为语音的文本 |
+| `voice_name` | `str` | `"Default"` | 要使用的声音 |
+| `config` | `dict` | `{}` | 附加配置 |
+| `callback_url` | `str\|None` | `None` | 接收异步回调的 URL |
+
+所有三种音频方法都返回一个 `Audio` 对象，包含 `.id`、`.name`、`.length` 和 `.collection_id`。
+
+## 文本生成（LLM 集成）
+
+使用 `coll.generate_text()` 来运行 LLM 分析。这是一个 **集合级** 方法 —— 直接在提示字符串中传递任何上下文（转录、描述）。
+
+```python
+# Get transcript from a video first
+transcript_text = video.get_transcript_text()
+
+# Generate analysis using collection LLM
+result = coll.generate_text(
+    prompt=f"Summarize the key points discussed in this video:\n{transcript_text}",
+    model_name="pro",
+)
+
+print(result["output"])
+```
+
+### generate\_text 参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `prompt` | `str` | 必需 | 包含 LLM 上下文的提示 |
+| `model_name` | `str` | `"basic"` | 模型层级：`"basic"`、`"pro"` 或 `"ultra"` |
+| `response_type` | `str` | `"text"` | 响应格式：`"text"` 或 `"json"` |
+
+返回一个 `dict`，带有一个 `output` 键。当 `response_type="text"` 时，`output` 是一个 `str`。当 `response_type="json"` 时，`output` 是一个 `dict`。
+
+```python
+result = coll.generate_text(prompt="Summarize this", model_name="pro")
+print(result["output"])  # access the actual text/dict
+```
+
+### 使用 LLM 分析场景
+
+将场景提取与文本生成相结合：
+
+```python
+from videodb import SceneExtractionType
+
+# First index scenes
+scenes = video.index_scenes(
+    extraction_type=SceneExtractionType.time_based,
+    extraction_config={"time": 10},
+    prompt="Describe the visual content in this scene.",
+)
+
+# Get transcript for spoken context
+transcript_text = video.get_transcript_text()
+scene_descriptions = []
+for scene in scenes:
+    if isinstance(scene, dict):
+        description = scene.get("description") or scene.get("summary")
+    else:
+        description = getattr(scene, "description", None) or getattr(scene, "summary", None)
+    scene_descriptions.append(description or str(scene))
+
+scenes_text = "\n".join(scene_descriptions)
+
+# Analyze with collection LLM
+result = coll.generate_text(
+    prompt=(
+        f"Given this video transcript:\n{transcript_text}\n\n"
+        f"And these visual scene descriptions:\n{scenes_text}\n\n"
+        "Based on the spoken and visual content, describe the main topics covered."
+    ),
+    model_name="pro",
+)
+print(result["output"])
+```
+
+## 配音和翻译
+
+### 为视频配音
+
+使用集合方法将视频配音为另一种语言：
+
+```python
+dubbed_video = coll.dub_video(
+    video_id=video.id,
+    language_code="es",  # Spanish
+)
+
+dubbed_video.play()
+```
+
+### dub\_video 参数
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `video_id` | `str` | 必需 | 要配音的视频 ID |
+| `language_code` | `str` | 必需 | 目标语言代码（例如，`"es"`、`"fr"`、`"de"`） |
+| `callback_url` | `str\|None` | `None` | 接收异步回调的 URL |
+
+返回一个 `Video` 对象，其中包含配音内容。
+
+### 翻译转录
+
+翻译视频的转录文本，无需配音：
+
+```python
+translated = video.translate_transcript(
+    language="Spanish",
+    additional_notes="Use formal tone",
+)
+
+for entry in translated:
+    print(entry)
+```
+
+**支持的语言** 包括：`en`、`es`、`fr`、`de`、`it`、`pt`、`ja`、`ko`、`zh`、`hi`、`ar` 等。
+
+## 完整工作流示例
+
+### 为视频生成旁白
+
+```python
+import videodb
+
+conn = videodb.connect()
+coll = conn.get_collection()
+video = coll.get_video("your-video-id")
+
+# Get transcript
+transcript_text = video.get_transcript_text()
+
+# Generate narration script using collection LLM
+result = coll.generate_text(
+    prompt=(
+        f"Write a professional narration script for this video content:\n"
+        f"{transcript_text[:2000]}"
+    ),
+    model_name="pro",
+)
+script = result["output"]
+
+# Convert script to speech
+narration = coll.generate_voice(text=script)
+print(f"Narration audio: {narration.id}")
+```
+
+### 根据提示生成缩略图
+
+```python
+thumbnail = coll.generate_image(
+    prompt="professional video thumbnail showing data analytics dashboard, modern design",
+    aspect_ratio="16:9",
+)
+print(f"Thumbnail URL: {thumbnail.generate_url()}")
+```
+
+### 为视频添加生成的音乐
+
+```python
+import videodb
+from videodb.timeline import Timeline
+from videodb.asset import VideoAsset, AudioAsset
+
+conn = videodb.connect()
+coll = conn.get_collection()
+video = coll.get_video("your-video-id")
+
+# Generate background music
+music = coll.generate_music(
+    prompt="calm ambient background music for a tutorial video",
+    duration=60,
+)
+
+# Build timeline with video + music overlay
+timeline = Timeline(conn)
+timeline.add_inline(VideoAsset(asset_id=video.id))
+timeline.add_overlay(0, AudioAsset(asset_id=music.id, disable_other_tracks=False))
+
+stream_url = timeline.generate_stream()
+print(f"Video with music: {stream_url}")
+```
+
+### 结构化 JSON 输出
+
+```python
+transcript_text = video.get_transcript_text()
+
+result = coll.generate_text(
+    prompt=(
+        f"Given this transcript:\n{transcript_text}\n\n"
+        "Return a JSON object with keys: summary, topics (array), action_items (array)."
+    ),
+    model_name="pro",
+    response_type="json",
+)
+
+# result["output"] is a dict when response_type="json"
+print(result["output"]["summary"])
+print(result["output"]["topics"])
+```
+
+## 提示
+
+* **生成的媒体是持久性的**：所有生成的内容都存储在您的集合中，并且可以重复使用。
+* **三种音频方法**：使用 `generate_music()` 生成背景音乐，`generate_sound_effect()` 生成音效，`generate_voice()` 进行文本转语音。没有统一的 `generate_audio()` 方法。
+* **文本生成是集合级的**：`coll.generate_text()` 不会自动访问视频内容。使用 `video.get_transcript_text()` 获取转录文本，并将其传递到提示中。
+* **模型层级**：`"basic"` 速度最快，`"pro"` 是平衡选项，`"ultra"` 质量最高。对于大多数分析任务，使用 `"pro"`。
+* **组合生成类型**：生成图像用于叠加、生成音乐用于背景、生成语音用于旁白，然后使用时间线进行组合（参见 [editor.md](editor.md)）。
+* **提示质量很重要**：描述性、具体的提示在所有生成类型中都能产生更好的结果。
+* **图像的宽高比**：从 `"1:1"`、`"9:16"`、`"16:9"`、`"4:3"` 或 `"3:4"` 中选择。
--- a/docs/zh-CN/skills/videodb/reference/rtstream-reference.md
+++ b/docs/zh-CN/skills/videodb/reference/rtstream-reference.md
@@ -0,0 +1,567 @@
+# RTStream 参考
+
+RTStream 操作的代码级详情。工作流程指南请参阅 [rtstream.md](rtstream.md)。
+有关使用指导和流程选择，请从 [../SKILL.md](../SKILL.md) 开始。
+
+基于 [docs.videodb.io](https://docs.videodb.io/pages/ingest/live-streams/realtime-apis.md)。
+
+***
+
+## Collection RTStream 方法
+
+`Collection` 上用于管理 RTStream 的方法：
+
+| 方法 | 返回 | 描述 |
+|--------|---------|-------------|
+| `coll.connect_rtstream(url, name, ...)` | `RTStream` | 从 RTSP/RTMP URL 创建新的 RTStream |
+| `coll.get_rtstream(id)` | `RTStream` | 通过 ID 获取现有的 RTStream |
+| `coll.list_rtstreams(limit, offset, status, name, ordering)` | `List[RTStream]` | 列出集合中的所有 RTStream |
+| `coll.search(query, namespace="rtstream")` | `RTStreamSearchResult` | 在所有 RTStream 中搜索 |
+
+### 连接 RTStream
+
+```python
+import videodb
+
+conn = videodb.connect()
+coll = conn.get_collection()
+
+rtstream = coll.connect_rtstream(
+    url="rtmp://your-stream-server/live/stream-key",
+    name="My Live Stream",
+    media_types=["video"],  # or ["audio", "video"]
+    sample_rate=30,         # optional
+    store=True,             # enable recording storage for export
+    enable_transcript=True, # optional
+    ws_connection_id=ws_id, # optional, for real-time events
+)
+```
+
+### 获取现有 RTStream
+
+```python
+rtstream = coll.get_rtstream("rts-xxx")
+```
+
+### 列出 RTStream
+
+```python
+rtstreams = coll.list_rtstreams(
+    limit=10,
+    offset=0,
+    status="connected",  # optional filter
+    name="meeting",      # optional filter
+    ordering="-created_at",
+)
+
+for rts in rtstreams:
+    print(f"{rts.id}: {rts.name} - {rts.status}")
+```
+
+### 从捕获会话获取
+
+捕获会话激活后，检索 RTStream 对象：
+
+```python
+session = conn.get_capture_session(session_id)
+
+mics = session.get_rtstream("mic")
+displays = session.get_rtstream("screen")
+system_audios = session.get_rtstream("system_audio")
+```
+
+或使用 `capture_session.active` WebSocket 事件中的 `rtstreams` 数据：
+
+```python
+for rts in rtstreams:
+    rtstream = coll.get_rtstream(rts["rtstream_id"])
+```
+
+***
+
+## RTStream 方法
+
+| 方法 | 返回 | 描述 |
+|--------|---------|-------------|
+| `rtstream.start()` | `None` | 开始摄取 |
+| `rtstream.stop()` | `None` | 停止摄取 |
+| `rtstream.generate_stream(start, end)` | `str` | 流式传输录制的片段（Unix 时间戳） |
+| `rtstream.export(name=None)` | `RTStreamExportResult` | 导出为永久视频 |
+| `rtstream.index_visuals(prompt, ...)` | `RTStreamSceneIndex` | 创建带 AI 分析的视觉索引 |
+| `rtstream.index_audio(prompt, ...)` | `RTStreamSceneIndex` | 创建带 LLM 摘要的音频索引 |
+| `rtstream.list_scene_indexes()` | `List[RTStreamSceneIndex]` | 列出流上的所有场景索引 |
+| `rtstream.get_scene_index(index_id)` | `RTStreamSceneIndex` | 获取特定场景索引 |
+| `rtstream.search(query, ...)` | `RTStreamSearchResult` | 搜索索引内容 |
+| `rtstream.start_transcript(ws_connection_id, engine)` | `dict` | 开始实时转录 |
+| `rtstream.get_transcript(page, page_size, start, end, since)` | `dict` | 获取转录页面 |
+| `rtstream.stop_transcript(engine)` | `dict` | 停止转录 |
+
+***
+
+## 启动和停止
+
+```python
+# Begin ingestion
+rtstream.start()
+
+# ... stream is being recorded ...
+
+# Stop ingestion
+rtstream.stop()
+```
+
+***
+
+## 生成流
+
+使用 Unix 时间戳（而非秒数偏移）从录制内容生成播放流：
+
+```python
+import time
+
+start_ts = time.time()
+rtstream.start()
+
+# Let it record for a while...
+time.sleep(60)
+
+end_ts = time.time()
+rtstream.stop()
+
+# Generate a stream URL for the recorded segment
+stream_url = rtstream.generate_stream(start=start_ts, end=end_ts)
+print(f"Recorded stream: {stream_url}")
+```
+
+***
+
+## 导出为视频
+
+将录制的流导出为集合中的永久视频：
+
+```python
+export_result = rtstream.export(name="Meeting Recording 2024-01-15")
+
+print(f"Video ID: {export_result.video_id}")
+print(f"Stream URL: {export_result.stream_url}")
+print(f"Player URL: {export_result.player_url}")
+print(f"Duration: {export_result.duration}s")
+```
+
+### RTStreamExportResult 属性
+
+| 属性 | 类型 | 描述 |
+|----------|------|-------------|
+| `video_id` | `str` | 导出视频的 ID |
+| `stream_url` | `str` | HLS 流 URL |
+| `player_url` | `str` | Web 播放器 URL |
+| `name` | `str` | 视频名称 |
+| `duration` | `float` | 时长（秒） |
+
+***
+
+## AI 管道
+
+AI 管道处理实时流并通过 WebSocket 发送结果。
+
+### RTStream AI 管道方法
+
+| 方法 | 返回 | 描述 |
+|--------|---------|-------------|
+| `rtstream.index_audio(prompt, batch_config, ...)` | `RTStreamSceneIndex` | 开始带 LLM 摘要的音频索引 |
+| `rtstream.index_visuals(prompt, batch_config, ...)` | `RTStreamSceneIndex` | 开始屏幕内容的视觉索引 |
+
+### 音频索引
+
+以一定间隔生成音频内容的 LLM 摘要：
+
+```python
+audio_index = rtstream.index_audio(
+    prompt="Summarize what is being discussed",
+    batch_config={"type": "word", "value": 50},
+    model_name=None,       # optional
+    name="meeting_audio",  # optional
+    ws_connection_id=ws_id,
+)
+```
+
+**音频 batch\_config 选项：**
+
+| 类型 | 值 | 描述 |
+|------|-------|-------------|
+| `"word"` | count | 每 N 个词分段 |
+| `"sentence"` | count | 每 N 个句子分段 |
+| `"time"` | seconds | 每 N 秒分段 |
+
+示例：
+
+```python
+{"type": "word", "value": 50}      # every 50 words
+{"type": "sentence", "value": 5}   # every 5 sentences
+{"type": "time", "value": 30}      # every 30 seconds
+```
+
+结果通过 `audio_index` WebSocket 通道送达。
+
+### 视觉索引
+
+生成视觉内容的 AI 描述：
+
+```python
+scene_index = rtstream.index_visuals(
+    prompt="Describe what is happening on screen",
+    batch_config={"type": "time", "value": 2, "frame_count": 5},
+    model_name="basic",
+    name="screen_monitor",  # optional
+    ws_connection_id=ws_id,
+)
+```
+
+**参数：**
+
+| 参数 | 类型 | 描述 |
+|-----------|------|-------------|
+| `prompt` | `str` | AI 模型的指令（支持结构化 JSON 输出） |
+| `batch_config` | `dict` | 控制帧采样（见下文） |
+| `model_name` | `str` | 模型层级：`"mini"`、`"basic"`、`"pro"`、`"ultra"` |
+| `name` | `str` | 索引名称（可选） |
+| `ws_connection_id` | `str` | 用于接收结果的 WebSocket 连接 ID |
+
+**视觉 batch\_config：**
+
+| 键 | 类型 | 描述 |
+|-----|------|-------------|
+| `type` | `str` | 仅 `"time"` 支持视觉索引 |
+| `value` | `int` | 窗口大小（秒） |
+| `frame_count` | `int` | 每个窗口提取的帧数 |
+
+示例：`{"type": "time", "value": 2, "frame_count": 5}` 每 2 秒采样 5 帧并将其发送到模型。
+
+**结构化 JSON 输出：**
+
+使用请求 JSON 格式的提示语以获得结构化响应：
+
+```python
+scene_index = rtstream.index_visuals(
+    prompt="""Analyze the screen and return a JSON object with:
+{
+  "app_name": "name of the active application",
+  "activity": "what the user is doing",
+  "ui_elements": ["list of visible UI elements"],
+  "contains_text": true/false,
+  "dominant_colors": ["list of main colors"]
+}
+Return only valid JSON.""",
+    batch_config={"type": "time", "value": 3, "frame_count": 3},
+    model_name="pro",
+    ws_connection_id=ws_id,
+)
+```
+
+结果通过 `scene_index` WebSocket 通道送达。
+
+***
+
+## 批处理配置摘要
+
+| 索引类型 | `type` 选项 | `value` | 额外键 |
+|---------------|----------------|---------|------------|
+| **音频** | `"word"`、`"sentence"`、`"time"` | words/sentences/seconds | - |
+| **视觉** | 仅 `"time"` | seconds | `frame_count` |
+
+示例：
+
+```python
+# Audio: every 50 words
+{"type": "word", "value": 50}
+
+# Audio: every 30 seconds  
+{"type": "time", "value": 30}
+
+# Visual: 5 frames every 2 seconds
+{"type": "time", "value": 2, "frame_count": 5}
+```
+
+***
+
+## 转录
+
+通过 WebSocket 进行实时转录：
+
+```python
+# Start live transcription
+rtstream.start_transcript(
+    ws_connection_id=ws_id,
+    engine=None,  # optional, defaults to "assemblyai"
+)
+
+# Get transcript pages (with optional filters)
+transcript = rtstream.get_transcript(
+    page=1,
+    page_size=100,
+    start=None,   # optional: start timestamp filter
+    end=None,     # optional: end timestamp filter
+    since=None,   # optional: for polling, get transcripts after this timestamp
+    engine=None,
+)
+
+# Stop transcription
+rtstream.stop_transcript(engine=None)
+```
+
+转录结果通过 `transcript` WebSocket 通道送达。
+
+***
+
+## RTStreamSceneIndex
+
+当您调用 `index_audio()` 或 `index_visuals()` 时，该方法返回一个 `RTStreamSceneIndex` 对象。此对象表示正在运行的索引，并提供用于管理场景和警报的方法。
+
+```python
+# index_visuals returns an RTStreamSceneIndex
+scene_index = rtstream.index_visuals(
+    prompt="Describe what is on screen",
+    ws_connection_id=ws_id,
+)
+
+# index_audio also returns an RTStreamSceneIndex
+audio_index = rtstream.index_audio(
+    prompt="Summarize the discussion",
+    ws_connection_id=ws_id,
+)
+```
+
+### RTStreamSceneIndex 属性
+
+| 属性 | 类型 | 描述 |
+|----------|------|-------------|
+| `rtstream_index_id` | `str` | 索引的唯一 ID |
+| `rtstream_id` | `str` | 父 RTStream 的 ID |
+| `extraction_type` | `str` | 提取类型（`time` 或 `transcript`） |
+| `extraction_config` | `dict` | 提取配置 |
+| `prompt` | `str` | 用于分析的提示语 |
+| `name` | `str` | 索引名称 |
+| `status` | `str` | 状态（`connected`、`stopped`） |
+
+### RTStreamSceneIndex 方法
+
+| 方法 | 返回 | 描述 |
+|--------|---------|-------------|
+| `index.get_scenes(start, end, page, page_size)` | `dict` | 获取已索引的场景 |
+| `index.start()` | `None` | 启动/恢复索引 |
+| `index.stop()` | `None` | 停止索引 |
+| `index.create_alert(event_id, callback_url, ws_connection_id)` | `str` | 创建事件检测警报 |
+| `index.list_alerts()` | `list` | 列出此索引上的所有警报 |
+| `index.enable_alert(alert_id)` | `None` | 启用警报 |
+| `index.disable_alert(alert_id)` | `None` | 禁用警报 |
+
+### 获取场景
+
+从索引轮询已索引的场景：
+
+```python
+result = scene_index.get_scenes(
+    start=None,      # optional: start timestamp
+    end=None,        # optional: end timestamp
+    page=1,
+    page_size=100,
+)
+
+for scene in result["scenes"]:
+    print(f"[{scene['start']}-{scene['end']}] {scene['text']}")
+
+if result["next_page"]:
+    # fetch next page
+    pass
+```
+
+### 管理场景索引
+
+```python
+# List all indexes on the stream
+indexes = rtstream.list_scene_indexes()
+
+# Get a specific index by ID
+scene_index = rtstream.get_scene_index(index_id)
+
+# Stop an index
+scene_index.stop()
+
+# Restart an index
+scene_index.start()
+```
+
+***
+
+## 事件
+
+事件是可重用的检测规则。创建一次，即可通过警报附加到任何索引。
+
+### 连接事件方法
+
+| 方法 | 返回 | 描述 |
+|--------|---------|-------------|
+| `conn.create_event(event_prompt, label)` | `str` (event\_id) | 创建检测事件 |
+| `conn.list_events()` | `list` | 列出所有事件 |
+
+### 创建事件
+
+```python
+event_id = conn.create_event(
+    event_prompt="User opened Slack application",
+    label="slack_opened",
+)
+```
+
+### 列出事件
+
+```python
+events = conn.list_events()
+for event in events:
+    print(f"{event['event_id']}: {event['label']}")
+```
+
+***
+
+## 警报
+
+警报将事件连接到索引以实现实时通知。当 AI 检测到与事件描述匹配的内容时，会发送警报。
+
+### 创建警报
+
+```python
+# Get the RTStreamSceneIndex from index_visuals
+scene_index = rtstream.index_visuals(
+    prompt="Describe what application is open on screen",
+    ws_connection_id=ws_id,
+)
+
+# Create an alert on the index
+alert_id = scene_index.create_alert(
+    event_id=event_id,
+    callback_url="https://your-backend.com/alerts",  # for webhook delivery
+    ws_connection_id=ws_id,  # for WebSocket delivery (optional)
+)
+```
+
+**注意：** `callback_url` 是必需的。如果仅使用 WebSocket 交付，请传递空字符串 `""`。
+
+### 管理警报
+
+```python
+# List all alerts on an index
+alerts = scene_index.list_alerts()
+
+# Enable/disable alerts
+scene_index.disable_alert(alert_id)
+scene_index.enable_alert(alert_id)
+```
+
+### 警报交付
+
+| 方法 | 延迟 | 使用场景 |
+|--------|---------|----------|
+| WebSocket | 实时 | 仪表板、实时 UI |
+| Webhook | < 1 秒 | 服务器到服务器、自动化 |
+
+### WebSocket 警报事件
+
+```json
+{
+  "channel": "alert",
+  "rtstream_id": "rts-xxx",
+  "data": {
+    "event_label": "slack_opened",
+    "timestamp": 1710000012340,
+    "text": "User opened Slack application"
+  }
+}
+```
+
+### Webhook 负载
+
+```json
+{
+  "event_id": "event-xxx",
+  "label": "slack_opened",
+  "confidence": 0.95,
+  "explanation": "User opened the Slack application",
+  "timestamp": "2024-01-15T10:30:45Z",
+  "start_time": 1234.5,
+  "end_time": 1238.0,
+  "stream_url": "https://stream.videodb.io/v3/...",
+  "player_url": "https://console.videodb.io/player?url=..."
+}
+```
+
+***
+
+## WebSocket 集成
+
+所有实时 AI 结果均通过 WebSocket 交付。将 `ws_connection_id` 传递给：
+
+* `rtstream.start_transcript()`
+* `rtstream.index_audio()`
+* `rtstream.index_visuals()`
+* `scene_index.create_alert()`
+
+### WebSocket 通道
+
+| 通道 | 来源 | 内容 |
+|---------|--------|---------|
+| `transcript` | `start_transcript()` | 实时语音转文本 |
+| `scene_index` | `index_visuals()` | 视觉分析结果 |
+| `audio_index` | `index_audio()` | 音频分析结果 |
+| `alert` | `create_alert()` | 警报通知 |
+
+有关 WebSocket 事件结构和 ws\_listener 用法，请参阅 [capture-reference.md](capture-reference.md)。
+
+***
+
+## 完整工作流程
+
+```python
+import time
+import videodb
+from videodb.exceptions import InvalidRequestError
+
+conn = videodb.connect()
+coll = conn.get_collection()
+
+# 1. Connect and start recording
+rtstream = coll.connect_rtstream(
+    url="rtmp://your-stream-server/live/stream-key",
+    name="Weekly Standup",
+    store=True,
+)
+rtstream.start()
+
+# 2. Record for the duration of the meeting
+start_ts = time.time()
+time.sleep(1800)  # 30 minutes
+end_ts = time.time()
+rtstream.stop()
+
+# Generate an immediate playback URL for the captured window
+stream_url = rtstream.generate_stream(start=start_ts, end=end_ts)
+print(f"Recorded stream: {stream_url}")
+
+# 3. Export to a permanent video
+export_result = rtstream.export(name="Weekly Standup Recording")
+print(f"Exported video: {export_result.video_id}")
+
+# 4. Index the exported video for search
+video = coll.get_video(export_result.video_id)
+video.index_spoken_words(force=True)
+
+# 5. Search for action items
+try:
+    results = video.search("action items and next steps")
+    stream_url = results.compile()
+    print(f"Action items clip: {stream_url}")
+except InvalidRequestError as exc:
+    if "No results found" in str(exc):
+        print("No action items were detected in the recording.")
+    else:
+        raise
+```
--- a/docs/zh-CN/skills/videodb/reference/rtstream.md
+++ b/docs/zh-CN/skills/videodb/reference/rtstream.md
@@ -0,0 +1,59 @@
+# RTStream 指南
+
+## 概述
+
+RTStream 支持实时摄取直播视频流（RTSP/RTMP）和桌面捕获会话。连接后，您可以录制、索引、搜索和导出实时源的内容。
+
+有关代码级别的详细信息（SDK 方法、参数、示例），请参阅 [rtstream-reference.md](rtstream-reference.md)。
+
+## 使用场景
+
+* **安防与监控**：连接 RTSP 摄像头，检测事件，触发警报
+* **直播广播**：摄取 RTMP 流，实时索引，实现即时搜索
+* **会议录制**：捕获桌面屏幕和音频，实时转录，导出录制内容
+* **事件处理**：监控实时视频流，运行 AI 分析，响应检测到的内容
+
+## 快速入门
+
+1. **连接到实时流**（RTSP/RTMP URL）或从捕获会话获取 RTStream
+2. **开始摄取**以开始录制实时内容
+3. **启动 AI 流水线**以进行实时索引（音频、视觉、转录）
+4. **通过 WebSocket 监控事件**以获取实时 AI 结果和警报
+5. **完成时停止摄取**
+6. **导出为视频**以便永久存储和进一步处理
+7. **搜索录制内容**以查找特定时刻
+
+## RTStream 来源
+
+### 来自 RTSP/RTMP 流
+
+直接连接到实时视频源：
+
+```python
+rtstream = coll.connect_rtstream(
+    url="rtmp://your-stream-server/live/stream-key",
+    name="My Live Stream",
+)
+```
+
+### 来自捕获会话
+
+从桌面捕获（麦克风、屏幕、系统音频）获取 RTStream：
+
+```python
+session = conn.get_capture_session(session_id)
+
+mics = session.get_rtstream("mic")
+displays = session.get_rtstream("screen")
+system_audios = session.get_rtstream("system_audio")
+```
+
+有关捕获会话的工作流程，请参阅 [capture.md](capture.md)。
+
+***
+
+## 脚本
+
+| 脚本 | 描述 |
+|--------|-------------|
+| `scripts/ws_listener.py` | 用于实时 AI 结果的 WebSocket 事件监听器 |
--- a/docs/zh-CN/skills/videodb/reference/search.md
+++ b/docs/zh-CN/skills/videodb/reference/search.md
@@ -0,0 +1,230 @@
+# 搜索与索引指南
+
+搜索功能允许您使用自然语言查询、精确关键词或视觉场景描述来查找视频中的特定时刻。
+
+## 前提条件
+
+视频**必须被索引**后才能进行搜索。每种索引类型对每个视频只需执行一次索引操作。
+
+## 索引
+
+### 口语词索引
+
+为视频的转录语音内容建立索引，以支持语义搜索和关键词搜索：
+
+```python
+video = coll.get_video(video_id)
+
+# force=True makes indexing idempotent — skips if already indexed
+video.index_spoken_words(force=True)
+```
+
+此操作会转录音轨，并在口语内容上构建可搜索的索引。这是进行语义搜索和关键词搜索所必需的。
+
+**参数：**
+
+| 参数 | 类型 | 默认值 | 描述 |
+|-----------|------|---------|-------------|
+| `language_code` | `str\|None` | `None` | 视频的语言代码 |
+| `segmentation_type` | `SegmentationType` | `SegmentationType.sentence` | 分割类型 (`sentence` 或 `llm`) |
+| `force` | `bool` | `False` | 设置为 `True` 以跳过已索引的情况（避免“已存在”错误） |
+| `callback_url` | `str\|None` | `None` | 用于异步通知的 Webhook URL |
+
+### 场景索引
+
+通过生成场景的 AI 描述来索引视觉内容。与口语词索引类似，如果场景索引已存在，此操作会引发错误。从错误消息中提取现有的 `scene_index_id`。
+
+```python
+import re
+from videodb import SceneExtractionType
+
+try:
+    scene_index_id = video.index_scenes(
+        extraction_type=SceneExtractionType.shot_based,
+        prompt="Describe the visual content, objects, actions, and setting in this scene.",
+    )
+except Exception as e:
+    match = re.search(r"id\s+([a-f0-9]+)", str(e))
+    if match:
+        scene_index_id = match.group(1)
+    else:
+        raise
+```
+
+**提取类型：**
+
+| 类型 | 描述 | 最佳适用场景 |
+|------|-------------|----------|
+| `SceneExtractionType.shot_based` | 基于视觉镜头边界进行分割 | 通用目的，动作内容 |
+| `SceneExtractionType.time_based` | 按固定间隔进行分割 | 均匀采样，长时间静态内容 |
+| `SceneExtractionType.transcript` | 基于转录片段进行分割 | 语音驱动的场景边界 |
+
+**`time_based` 的参数：**
+
+```python
+video.index_scenes(
+    extraction_type=SceneExtractionType.time_based,
+    extraction_config={"time": 5, "select_frames": ["first", "last"]},
+    prompt="Describe what is happening in this scene.",
+)
+```
+
+## 搜索类型
+
+### 语义搜索
+
+使用自然语言查询匹配口语内容：
+
+```python
+from videodb import SearchType
+
+results = video.search(
+    query="explaining the benefits of machine learning",
+    search_type=SearchType.semantic,
+)
+```
+
+返回口语内容在语义上与查询匹配的排序片段。
+
+### 关键词搜索
+
+在转录语音中进行精确术语匹配：
+
+```python
+results = video.search(
+    query="artificial intelligence",
+    search_type=SearchType.keyword,
+)
+```
+
+返回包含精确关键词或短语的片段。
+
+### 场景搜索
+
+视觉内容查询与已索引的场景描述进行匹配。需要事先调用 `index_scenes()`。
+
+`index_scenes()` 返回一个 `scene_index_id`。将其传递给 `video.search()` 以定位特定的场景索引（当视频有多个场景索引时尤其重要）：
+
+```python
+from videodb import SearchType, IndexType
+from videodb.exceptions import InvalidRequestError
+
+# Search using semantic search against the scene index.
+# Use score_threshold to filter low-relevance noise (recommended: 0.3+).
+try:
+    results = video.search(
+        query="person writing on a whiteboard",
+        search_type=SearchType.semantic,
+        index_type=IndexType.scene,
+        scene_index_id=scene_index_id,
+        score_threshold=0.3,
+    )
+    shots = results.get_shots()
+except InvalidRequestError as e:
+    if "No results found" in str(e):
+        shots = []
+    else:
+        raise
+```
+
+**重要说明：**
+
+* 将 `SearchType.semantic` 与 `index_type=IndexType.scene` 结合使用——这是最可靠的组合，适用于所有套餐。
+* `SearchType.scene` 存在，但可能并非在所有套餐中都可用（例如免费套餐）。建议优先使用 `SearchType.semantic` 与 `IndexType.scene`。
+* `scene_index_id` 参数是可选的。如果省略，搜索将针对视频上的所有场景索引运行。传递此参数以定位特定索引。
+* 您可以为每个视频创建多个场景索引（使用不同的提示或提取类型），并使用 `scene_index_id` 独立搜索它们。
+
+### 带元数据筛选的场景搜索
+
+使用自定义元数据索引场景时，可以将语义搜索与元数据筛选器结合使用：
+
+```python
+from videodb import SearchType, IndexType
+
+results = video.search(
+    query="a skillful chasing scene",
+    search_type=SearchType.semantic,
+    index_type=IndexType.scene,
+    scene_index_id=scene_index_id,
+    filter=[{"camera_view": "road_ahead"}, {"action_type": "chasing"}],
+)
+```
+
+有关自定义元数据索引和筛选搜索的完整示例，请参阅 [scene\_level\_metadata\_indexing 示例](https://github.com/video-db/videodb-cookbook/blob/main/quickstart/scene_level_metadata_indexing.ipynb)。
+
+## 处理结果
+
+### 获取片段
+
+访问单个结果片段：
+
+```python
+results = video.search("your query")
+
+for shot in results.get_shots():
+    print(f"Video: {shot.video_id}")
+    print(f"Start: {shot.start:.2f}s")
+    print(f"End: {shot.end:.2f}s")
+    print(f"Text: {shot.text}")
+    print("---")
+```
+
+### 播放编译结果
+
+将所有匹配片段作为单个编译视频进行流式播放：
+
+```python
+results = video.search("your query")
+stream_url = results.compile()
+results.play()  # opens compiled stream in browser
+```
+
+### 提取剪辑
+
+下载或流式播放特定的结果片段：
+
+```python
+for shot in results.get_shots():
+    stream_url = shot.generate_stream()
+    print(f"Clip: {stream_url}")
+```
+
+## 跨集合搜索
+
+跨集合中的所有视频进行搜索：
+
+```python
+coll = conn.get_collection()
+
+# Search across all videos in the collection
+results = coll.search(
+    query="product demo",
+    search_type=SearchType.semantic,
+)
+
+for shot in results.get_shots():
+    print(f"Video: {shot.video_id} [{shot.start:.1f}s - {shot.end:.1f}s]")
+```
+
+> **注意：** 集合级搜索仅支持 `SearchType.semantic`。将 `SearchType.keyword` 或 `SearchType.scene` 与 `coll.search()` 结合使用将引发 `NotImplementedError`。要进行关键词或场景搜索，请改为对单个视频使用 `video.search()`。
+
+## 搜索 + 编译
+
+对匹配片段进行索引、搜索并编译成单个可播放的流：
+
+```python
+video.index_spoken_words(force=True)
+results = video.search(query="your query", search_type=SearchType.semantic)
+stream_url = results.compile()
+print(stream_url)
+```
+
+## 提示
+
+* **一次索引，多次搜索**：索引是昂贵的操作。一旦索引完成，搜索会很快。
+* **组合索引类型**：同时索引口语词和场景，以便在同一视频上启用所有搜索类型。
+* **优化查询**：语义搜索最适合描述性的自然语言短语，而不是单个关键词。
+* **使用关键词搜索提高精度**：当您需要精确的术语匹配时，关键词搜索可以避免语义漂移。
+* **处理“未找到结果”**：当没有结果匹配时，`video.search()` 会引发 `InvalidRequestError`。始终将搜索调用包装在 try/except 中，并将 `"No results found"` 视为空结果集。
+* **过滤场景搜索噪声**：对于模糊查询，语义场景搜索可能会返回低相关性的结果。使用 `score_threshold=0.3`（或更高值）来过滤噪声。
+* **幂等索引**：使用 `index_spoken_words(force=True)` 可以安全地重新索引。`index_scenes()` 没有 `force` 参数——将其包装在 try/except 中，并使用 `re.search(r"id\s+([a-f0-9]+)", str(e))` 从错误消息中提取现有的 `scene_index_id`。
--- a/docs/zh-CN/skills/videodb/reference/streaming.md
+++ b/docs/zh-CN/skills/videodb/reference/streaming.md
@@ -0,0 +1,406 @@
+# 流媒体与播放
+
+VideoDB 按需生成流媒体，返回 HLS 兼容的 URL，可在任何标准视频播放器中即时播放。无需渲染时间或导出等待——编辑、搜索和组合内容可立即流式传输。
+
+## 前提条件
+
+视频**必须上传**到某个集合后，才能生成流媒体。对于基于搜索的流媒体，视频还必须被**索引**（口语单词和/或场景）。有关索引的详细信息，请参阅 [search.md](search.md)。
+
+## 核心概念
+
+### 流媒体生成
+
+VideoDB 中的每个视频、搜索结果和时间线都可以生成一个**流媒体 URL**。该 URL 指向一个按需编译的 HLS（HTTP 实时流媒体）清单。
+
+```python
+# From a video
+stream_url = video.generate_stream()
+
+# From a timeline
+stream_url = timeline.generate_stream()
+
+# From search results
+stream_url = results.compile()
+```
+
+## 流式传输单个视频
+
+### 基本播放
+
+```python
+import videodb
+
+conn = videodb.connect()
+coll = conn.get_collection()
+video = coll.get_video("your-video-id")
+
+# Generate stream URL
+stream_url = video.generate_stream()
+print(f"Stream: {stream_url}")
+
+# Open in default browser
+video.play()
+```
+
+### 带字幕
+
+```python
+# Index and add subtitles first
+video.index_spoken_words(force=True)
+stream_url = video.add_subtitle()
+
+# Returned URL already includes subtitles
+print(f"Subtitled stream: {stream_url}")
+```
+
+### 特定片段
+
+通过传递时间戳范围的时间线，仅流式传输视频的一部分：
+
+```python
+# Stream seconds 10-30 and 60-90
+stream_url = video.generate_stream(timeline=[(10, 30), (60, 90)])
+print(f"Segment stream: {stream_url}")
+```
+
+## 流式传输时间线组合
+
+构建多资产组合并实时流式传输：
+
+```python
+import videodb
+from videodb.timeline import Timeline
+from videodb.asset import VideoAsset, AudioAsset, ImageAsset, TextAsset, TextStyle
+
+conn = videodb.connect()
+coll = conn.get_collection()
+
+video = coll.get_video(video_id)
+music = coll.get_audio(music_id)
+
+timeline = Timeline(conn)
+
+# Main video content
+timeline.add_inline(VideoAsset(asset_id=video.id))
+
+# Background music overlay (starts at second 0)
+timeline.add_overlay(0, AudioAsset(asset_id=music.id))
+
+# Text overlay at the beginning
+timeline.add_overlay(0, TextAsset(
+    text="Live Demo",
+    duration=3,
+    style=TextStyle(fontsize=48, fontcolor="white", boxcolor="#000000"),
+))
+
+# Generate the composed stream
+stream_url = timeline.generate_stream()
+print(f"Composed stream: {stream_url}")
+```
+
+**重要说明：**`add_inline()` 仅接受 `VideoAsset`。对于 `AudioAsset`、`ImageAsset` 和 `TextAsset`，请使用 `add_overlay()`。
+
+有关详细的时间线编辑，请参阅 [editor.md](editor.md)。
+
+## 流式传输搜索结果
+
+将搜索结果编译为包含所有匹配片段的单一流：
+
+```python
+from videodb import SearchType
+from videodb.exceptions import InvalidRequestError
+
+video.index_spoken_words(force=True)
+try:
+    results = video.search("key announcement", search_type=SearchType.semantic)
+
+    # Compile all matching shots into one stream
+    stream_url = results.compile()
+    print(f"Search results stream: {stream_url}")
+
+    # Or play directly
+    results.play()
+except InvalidRequestError as exc:
+    if "No results found" in str(exc):
+        print("No matching announcement segments were found.")
+    else:
+        raise
+```
+
+### 流式传输单个搜索结果
+
+```python
+from videodb.exceptions import InvalidRequestError
+
+try:
+    results = video.search("product demo", search_type=SearchType.semantic)
+    for i, shot in enumerate(results.get_shots()):
+        stream_url = shot.generate_stream()
+        print(f"Hit {i+1} [{shot.start:.1f}s-{shot.end:.1f}s]: {stream_url}")
+except InvalidRequestError as exc:
+    if "No results found" in str(exc):
+        print("No product demo segments matched the query.")
+    else:
+        raise
+```
+
+## 音频播放
+
+获取音频内容的签名播放 URL：
+
+```python
+audio = coll.get_audio(audio_id)
+playback_url = audio.generate_url()
+print(f"Audio URL: {playback_url}")
+```
+
+## 完整工作流程示例
+
+### 搜索到流媒体管道
+
+在一个工作流程中结合搜索、时间线组合和流式传输：
+
+```python
+import videodb
+from videodb import SearchType
+from videodb.exceptions import InvalidRequestError
+from videodb.timeline import Timeline
+from videodb.asset import VideoAsset, TextAsset, TextStyle
+
+conn = videodb.connect()
+coll = conn.get_collection()
+video = coll.get_video("your-video-id")
+
+video.index_spoken_words(force=True)
+
+# Search for key moments
+queries = ["introduction", "main demo", "Q&A"]
+timeline = Timeline(conn)
+timeline_offset = 0.0
+
+for query in queries:
+    try:
+        results = video.search(query, search_type=SearchType.semantic)
+        shots = results.get_shots()
+    except InvalidRequestError as exc:
+        if "No results found" in str(exc):
+            shots = []
+        else:
+            raise
+
+    if not shots:
+        continue
+
+    # Add the section label where this batch starts in the compiled timeline
+    timeline.add_overlay(timeline_offset, TextAsset(
+        text=query.title(),
+        duration=2,
+        style=TextStyle(fontsize=36, fontcolor="white", boxcolor="#222222"),
+    ))
+
+    for shot in shots:
+        timeline.add_inline(
+            VideoAsset(asset_id=shot.video_id, start=shot.start, end=shot.end)
+        )
+        timeline_offset += shot.end - shot.start
+
+stream_url = timeline.generate_stream()
+print(f"Dynamic compilation: {stream_url}")
+```
+
+### 多视频流
+
+将来自不同视频的片段组合成单一流：
+
+```python
+import videodb
+from videodb.timeline import Timeline
+from videodb.asset import VideoAsset
+
+conn = videodb.connect()
+coll = conn.get_collection()
+
+video_clips = [
+    {"id": "vid_001", "start": 0, "end": 15},
+    {"id": "vid_002", "start": 10, "end": 30},
+    {"id": "vid_003", "start": 5, "end": 25},
+]
+
+timeline = Timeline(conn)
+for clip in video_clips:
+    timeline.add_inline(
+        VideoAsset(asset_id=clip["id"], start=clip["start"], end=clip["end"])
+    )
+
+stream_url = timeline.generate_stream()
+print(f"Multi-video stream: {stream_url}")
+```
+
+### 条件流媒体组装
+
+根据搜索结果的可用性动态构建流媒体：
+
+```python
+import videodb
+from videodb import SearchType
+from videodb.exceptions import InvalidRequestError
+from videodb.timeline import Timeline
+from videodb.asset import VideoAsset, TextAsset, TextStyle
+
+conn = videodb.connect()
+coll = conn.get_collection()
+video = coll.get_video("your-video-id")
+
+video.index_spoken_words(force=True)
+
+timeline = Timeline(conn)
+
+# Try to find specific content; fall back to full video
+topics = ["opening remarks", "technical deep dive", "closing"]
+
+found_any = False
+timeline_offset = 0.0
+for topic in topics:
+    try:
+        results = video.search(topic, search_type=SearchType.semantic)
+        shots = results.get_shots()
+    except InvalidRequestError as exc:
+        if "No results found" in str(exc):
+            shots = []
+        else:
+            raise
+
+    if shots:
+        found_any = True
+        timeline.add_overlay(timeline_offset, TextAsset(
+            text=topic.title(),
+            duration=2,
+            style=TextStyle(fontsize=32, fontcolor="white", boxcolor="#1a1a2e"),
+        ))
+        for shot in shots:
+            timeline.add_inline(
+                VideoAsset(asset_id=shot.video_id, start=shot.start, end=shot.end)
+            )
+            timeline_offset += shot.end - shot.start
+
+if found_any:
+    stream_url = timeline.generate_stream()
+    print(f"Curated stream: {stream_url}")
+else:
+    # Fall back to full video stream
+    stream_url = video.generate_stream()
+    print(f"Full video stream: {stream_url}")
+```
+
+### 直播事件回顾
+
+将事件录音处理成包含多个部分的可流式传输回顾：
+
+```python
+import videodb
+from videodb import SearchType
+from videodb.exceptions import InvalidRequestError
+from videodb.timeline import Timeline
+from videodb.asset import VideoAsset, AudioAsset, ImageAsset, TextAsset, TextStyle
+
+conn = videodb.connect()
+coll = conn.get_collection()
+
+# Upload event recording
+event = coll.upload(url="https://example.com/event-recording.mp4")
+event.index_spoken_words(force=True)
+
+# Generate background music
+music = coll.generate_music(
+    prompt="upbeat corporate background music",
+    duration=120,
+)
+
+# Generate title image
+title_img = coll.generate_image(
+    prompt="modern event recap title card, dark background, professional",
+    aspect_ratio="16:9",
+)
+
+# Build the recap timeline
+timeline = Timeline(conn)
+timeline_offset = 0.0
+
+# Main video segments from search
+try:
+    keynote = event.search("keynote announcement", search_type=SearchType.semantic)
+    keynote_shots = keynote.get_shots()[:5]
+except InvalidRequestError as exc:
+    if "No results found" in str(exc):
+        keynote_shots = []
+    else:
+        raise
+if keynote_shots:
+    keynote_start = timeline_offset
+    for shot in keynote_shots:
+        timeline.add_inline(
+            VideoAsset(asset_id=shot.video_id, start=shot.start, end=shot.end)
+        )
+        timeline_offset += shot.end - shot.start
+else:
+    keynote_start = None
+
+try:
+    demo = event.search("product demo", search_type=SearchType.semantic)
+    demo_shots = demo.get_shots()[:5]
+except InvalidRequestError as exc:
+    if "No results found" in str(exc):
+        demo_shots = []
+    else:
+        raise
+if demo_shots:
+    demo_start = timeline_offset
+    for shot in demo_shots:
+        timeline.add_inline(
+            VideoAsset(asset_id=shot.video_id, start=shot.start, end=shot.end)
+        )
+        timeline_offset += shot.end - shot.start
+else:
+    demo_start = None
+
+# Overlay title card image
+timeline.add_overlay(0, ImageAsset(
+    asset_id=title_img.id, width=100, height=100, x=80, y=20, duration=5
+))
+
+# Overlay section labels at the correct timeline offsets
+if keynote_start is not None:
+    timeline.add_overlay(max(5, keynote_start), TextAsset(
+        text="Keynote Highlights",
+        duration=3,
+        style=TextStyle(fontsize=40, fontcolor="white", boxcolor="#0d1117"),
+    ))
+if demo_start is not None:
+    timeline.add_overlay(max(5, demo_start), TextAsset(
+        text="Demo Highlights",
+        duration=3,
+        style=TextStyle(fontsize=36, fontcolor="white", boxcolor="#0d1117"),
+    ))
+
+# Overlay background music
+timeline.add_overlay(0, AudioAsset(
+    asset_id=music.id, fade_in_duration=3
+))
+
+# Stream the final recap
+stream_url = timeline.generate_stream()
+print(f"Event recap: {stream_url}")
+```
+
+***
+
+## 提示
+
+* **HLS 兼容性**：流媒体 URL 返回 HLS 清单（`.m3u8`）。它们在 Safari 中原生工作，在其他浏览器中通过 hls.js 或类似库工作。
+* **按需编译**：流媒体在请求时在服务器端编译。首次播放可能会有短暂的编译延迟；同一组合的后续播放会被缓存。
+* **缓存**：第二次调用 `video.generate_stream()`（不带参数）将返回缓存的流媒体 URL，而不是重新编译。
+* **片段流**：`video.generate_stream(timeline=[(start, end)])` 是流式传输特定剪辑的最快方式，无需构建完整的 `Timeline` 对象。
+* **内联与叠加**：`add_inline()` 仅接受 `VideoAsset` 并将资产按顺序放置在主轨道上。`add_overlay()` 接受 `AudioAsset`、`ImageAsset` 和 `TextAsset`，并在给定开始时间将它们叠加在顶部。
+* **TextStyle 默认值**：`TextStyle` 默认为 `font='Sans'`、`fontcolor='black'`。对于文本背景色，请使用 `boxcolor`（而非 `bgcolor`）。
+* **与生成结合**：使用 `coll.generate_music(prompt, duration)` 和 `coll.generate_image(prompt, aspect_ratio)` 为时间线组合创建资产。
+* **播放**：`.play()` 在默认系统浏览器中打开流媒体 URL。对于编程使用，请直接处理 URL 字符串。
--- a/docs/zh-CN/skills/videodb/reference/use-cases.md
+++ b/docs/zh-CN/skills/videodb/reference/use-cases.md
@@ -0,0 +1,142 @@
+# 使用场景
+
+常见工作流及 VideoDB 所实现的功能。代码详情请参阅 [api-reference.md](api-reference.md)、[capture.md](capture.md)、[editor.md](editor.md) 和 [search.md](search.md)。
+
+***
+
+## 视频搜索与精彩片段
+
+### 创建精彩集锦
+
+上传长视频（会议演讲、讲座、会议录音），按主题（"产品发布"、"问答环节"、"演示"）搜索关键片段，并自动将匹配的片段汇编成可分享的精彩集锦。
+
+### 构建可搜索视频库
+
+批量上传视频到集合中，为语音内容建立索引以便搜索，然后在整个库中进行查询。即时在数百小时的内容中找到特定主题。
+
+### 提取特定片段
+
+搜索与查询匹配的片段（"预算讨论"、"行动项"），并将每个匹配的片段提取为独立的剪辑，拥有自己的流媒体 URL。
+
+***
+
+## 视频增强
+
+### 增添专业质感
+
+获取原始素材并进行增强：
+
+* 根据语音自动生成字幕
+* 在特定时间戳添加自定义缩略图
+* 背景音乐叠加
+* 带有生成图像的开场/结尾序列
+
+### AI 增强内容
+
+将现有视频与生成式 AI 结合：
+
+* 根据转录内容生成文本摘要
+* 创建与视频时长匹配的背景音乐
+* 生成标题卡和叠加图像
+* 将所有元素混合成精美的最终输出
+
+***
+
+## 实时录制（桌面/会议）
+
+### 带 AI 的屏幕 + 音频录制
+
+同时捕获屏幕、麦克风和系统音频。实时获取：
+
+* **实时转录** - 语音即时转文本
+* **音频摘要** - 定期生成的 AI 讨论摘要
+* **视觉索引** - AI 对屏幕活动的描述
+
+### 带摘要功能的会议录制
+
+录制会议并实时转录所有参与者的发言。获取包含关键讨论点、决策和行动项的定期摘要，实时交付。
+
+### 屏幕活动追踪
+
+通过 AI 生成的描述追踪屏幕活动：
+
+* "用户正在 Google Sheets 中浏览电子表格"
+* "用户切换到了包含 Python 文件的代码编辑器"
+* "正在进行屏幕共享的视频通话"
+
+### 会话后处理
+
+录制结束后，录音将导出为永久视频。然后：
+
+* 生成可搜索的转录稿
+* 在录制内容中搜索特定主题
+* 提取重要时刻的片段
+* 通过流媒体 URL 或播放器链接分享
+
+***
+
+## 直播流智能处理（RTSP/RTMP）
+
+### 连接外部流
+
+从 RTSP/RTMP 源（安全摄像头、编码器、广播）摄取实时视频。实时处理和索引内容。
+
+### 实时事件检测
+
+定义要在直播流中检测的事件：
+
+* "人员进入限制区域"
+* "十字路口交通违规"
+* "货架上可见产品"
+
+当事件发生时，通过 WebSocket 或 webhook 获取警报。
+
+### 直播流搜索
+
+在已录制的直播流内容中搜索。从数小时的连续素材中找到特定时刻并生成剪辑。
+
+***
+
+## 内容审核与安全
+
+### 自动化内容审查
+
+使用 AI 索引视频场景并搜索有问题内容。标记包含暴力、不当内容或违反政策的视频。
+
+### 脏话检测
+
+检测并定位音频中的脏话。可选择在检测到的时间戳叠加哔声。
+
+***
+
+## 平台集成
+
+### 社交媒体格式调整
+
+为不同平台调整视频格式：
+
+* 垂直（9:16）用于 TikTok、Reels、Shorts
+* 方形（1:1）用于 Instagram 动态
+* 横屏（16:9）用于 YouTube
+
+### 为分发转码
+
+针对不同的分发目标更改分辨率、比特率或质量。为网页、移动端或广播输出优化的流。
+
+### 生成可分享链接
+
+每次操作都会生成可播放的流媒体 URL。可嵌入网页播放器、直接分享或与现有平台集成。
+
+***
+
+## 工作流摘要
+
+| 目标 | VideoDB 方法 |
+|------|------------------|
+| 在视频中查找片段 | 索引语音/场景 → 搜索 → 汇编剪辑 |
+| 创建精彩集锦 | 搜索多个主题 → 构建时间线 → 生成流 |
+| 添加字幕 | 索引语音 → 添加字幕叠加层 |
+| 录制屏幕 + AI | 开始录制 → 运行 AI 流水线 → 导出视频 |
+| 监控直播流 | 连接 RTSP → 索引场景 → 创建警报 |
+| 为社交媒体调整格式 | 调整为目标宽高比 |
+| 合并剪辑 | 使用多个素材构建时间线 → 生成流 |