feat: add orchestration workflows and harness skills

2026-04-17 23:53:30 +08:00 · 2026-03-12 08:50:24 -07:00
parent 51eec12764
commit cb43402d7d
46 changed files with 5618 additions and 80 deletions
--- a/.agents/skills/fal-ai-media/SKILL.md
+++ b/.agents/skills/fal-ai-media/SKILL.md
@@ -0,0 +1,277 @@
+---
+name: fal-ai-media
+description: Unified media generation via fal.ai MCP — image, video, and audio. Covers text-to-image (Nano Banana), text/image-to-video (Seedance, Kling, Veo 3), text-to-speech (CSM-1B), and video-to-audio (ThinkSound). Use when the user wants to generate images, videos, or audio with AI.
+origin: ECC
+---
+
+# fal.ai Media Generation
+
+Generate images, videos, and audio using fal.ai models via MCP.
+
+## When to Activate
+
+- User wants to generate images from text prompts
+- Creating videos from text or images
+- Generating speech, music, or sound effects
+- Any media generation task
+- User says "generate image", "create video", "text to speech", "make a thumbnail", or similar
+
+## MCP Requirement
+
+fal.ai MCP server must be configured. Add to `~/.claude.json`:
+
+```json
+"fal-ai": {
+  "command": "npx",
+  "args": ["-y", "fal-ai-mcp-server"],
+  "env": { "FAL_KEY": "YOUR_FAL_KEY_HERE" }
+}
+```
+
+Get an API key at [fal.ai](https://fal.ai).
+
+## MCP Tools
+
+The fal.ai MCP provides these tools:
+- `search` — Find available models by keyword
+- `find` — Get model details and parameters
+- `generate` — Run a model with parameters
+- `result` — Check async generation status
+- `status` — Check job status
+- `cancel` — Cancel a running job
+- `estimate_cost` — Estimate generation cost
+- `models` — List popular models
+- `upload` — Upload files for use as inputs
+
+---
+
+## Image Generation
+
+### Nano Banana 2 (Fast)
+Best for: quick iterations, drafts, text-to-image, image editing.
+
+```
+generate(
+  model_name: "fal-ai/nano-banana-2",
+  input: {
+    "prompt": "a futuristic cityscape at sunset, cyberpunk style",
+    "image_size": "landscape_16_9",
+    "num_images": 1,
+    "seed": 42
+  }
+)
+```
+
+### Nano Banana Pro (High Fidelity)
+Best for: production images, realism, typography, detailed prompts.
+
+```
+generate(
+  model_name: "fal-ai/nano-banana-pro",
+  input: {
+    "prompt": "professional product photo of wireless headphones on marble surface, studio lighting",
+    "image_size": "square",
+    "num_images": 1,
+    "guidance_scale": 7.5
+  }
+)
+```
+
+### Common Image Parameters
+
+| Param | Type | Options | Notes |
+|-------|------|---------|-------|
+| `prompt` | string | required | Describe what you want |
+| `image_size` | string | `square`, `portrait_4_3`, `landscape_16_9`, `portrait_16_9`, `landscape_4_3` | Aspect ratio |
+| `num_images` | number | 1-4 | How many to generate |
+| `seed` | number | any integer | Reproducibility |
+| `guidance_scale` | number | 1-20 | How closely to follow the prompt (higher = more literal) |
+
+### Image Editing
+Use Nano Banana 2 with an input image for inpainting, outpainting, or style transfer:
+
+```
+# First upload the source image
+upload(file_path: "/path/to/image.png")
+
+# Then generate with image input
+generate(
+  model_name: "fal-ai/nano-banana-2",
+  input: {
+    "prompt": "same scene but in watercolor style",
+    "image_url": "<uploaded_url>",
+    "image_size": "landscape_16_9"
+  }
+)
+```
+
+---
+
+## Video Generation
+
+### Seedance 1.0 Pro (ByteDance)
+Best for: text-to-video, image-to-video with high motion quality.
+
+```
+generate(
+  model_name: "fal-ai/seedance-1-0-pro",
+  input: {
+    "prompt": "a drone flyover of a mountain lake at golden hour, cinematic",
+    "duration": "5s",
+    "aspect_ratio": "16:9",
+    "seed": 42
+  }
+)
+```
+
+### Kling Video v3 Pro
+Best for: text/image-to-video with native audio generation.
+
+```
+generate(
+  model_name: "fal-ai/kling-video/v3/pro",
+  input: {
+    "prompt": "ocean waves crashing on a rocky coast, dramatic clouds",
+    "duration": "5s",
+    "aspect_ratio": "16:9"
+  }
+)
+```
+
+### Veo 3 (Google DeepMind)
+Best for: video with generated sound, high visual quality.
+
+```
+generate(
+  model_name: "fal-ai/veo-3",
+  input: {
+    "prompt": "a bustling Tokyo street market at night, neon signs, crowd noise",
+    "aspect_ratio": "16:9"
+  }
+)
+```
+
+### Image-to-Video
+Start from an existing image:
+
+```
+generate(
+  model_name: "fal-ai/seedance-1-0-pro",
+  input: {
+    "prompt": "camera slowly zooms out, gentle wind moves the trees",
+    "image_url": "<uploaded_image_url>",
+    "duration": "5s"
+  }
+)
+```
+
+### Video Parameters
+
+| Param | Type | Options | Notes |
+|-------|------|---------|-------|
+| `prompt` | string | required | Describe the video |
+| `duration` | string | `"5s"`, `"10s"` | Video length |
+| `aspect_ratio` | string | `"16:9"`, `"9:16"`, `"1:1"` | Frame ratio |
+| `seed` | number | any integer | Reproducibility |
+| `image_url` | string | URL | Source image for image-to-video |
+
+---
+
+## Audio Generation
+
+### CSM-1B (Conversational Speech)
+Text-to-speech with natural, conversational quality.
+
+```
+generate(
+  model_name: "fal-ai/csm-1b",
+  input: {
+    "text": "Hello, welcome to the demo. Let me show you how this works.",
+    "speaker_id": 0
+  }
+)
+```
+
+### ThinkSound (Video-to-Audio)
+Generate matching audio from video content.
+
+```
+generate(
+  model_name: "fal-ai/thinksound",
+  input: {
+    "video_url": "<video_url>",
+    "prompt": "ambient forest sounds with birds chirping"
+  }
+)
+```
+
+### ElevenLabs (via API, no MCP)
+For professional voice synthesis, use ElevenLabs directly:
+
+```python
+import os
+import requests
+
+resp = requests.post(
+    "https://api.elevenlabs.io/v1/text-to-speech/<voice_id>",
+    headers={
+        "xi-api-key": os.environ["ELEVENLABS_API_KEY"],
+        "Content-Type": "application/json"
+    },
+    json={
+        "text": "Your text here",
+        "model_id": "eleven_turbo_v2_5",
+        "voice_settings": {"stability": 0.5, "similarity_boost": 0.75}
+    }
+)
+with open("output.mp3", "wb") as f:
+    f.write(resp.content)
+```
+
+### VideoDB Generative Audio
+If VideoDB is configured, use its generative audio:
+
+```python
+# Voice generation
+audio = coll.generate_voice(text="Your narration here", voice="alloy")
+
+# Music generation
+music = coll.generate_music(prompt="upbeat electronic background music", duration=30)
+
+# Sound effects
+sfx = coll.generate_sound_effect(prompt="thunder crack followed by rain")
+```
+
+---
+
+## Cost Estimation
+
+Before generating, check estimated cost:
+
+```
+estimate_cost(model_name: "fal-ai/nano-banana-pro", input: {...})
+```
+
+## Model Discovery
+
+Find models for specific tasks:
+
+```
+search(query: "text to video")
+find(model_name: "fal-ai/seedance-1-0-pro")
+models()
+```
+
+## Tips
+
+- Use `seed` for reproducible results when iterating on prompts
+- Start with lower-cost models (Nano Banana 2) for prompt iteration, then switch to Pro for finals
+- For video, keep prompts descriptive but concise — focus on motion and scene
+- Image-to-video produces more controlled results than pure text-to-video
+- Check `estimate_cost` before running expensive video generations
+
+## Related Skills
+
+- `videodb` — Video processing, editing, and streaming
+- `video-editing` — AI-powered video editing workflows
+- `content-engine` — Content creation for social platforms
--- a/.agents/skills/fal-ai-media/agents/openai.yaml
+++ b/.agents/skills/fal-ai-media/agents/openai.yaml
@@ -0,0 +1,7 @@
+interface:
+  display_name: "fal.ai Media"
+  short_description: "AI image, video, and audio generation via fal.ai"
+  brand_color: "#F43F5E"
+  default_prompt: "Generate images, videos, or audio using fal.ai models"
+policy:
+  allow_implicit_invocation: true