feat: deliver v1.8.0 harness reliability and parity updates

2026-06-14 20:21:23 +08:00 · 2026-03-04 14:48:06 -08:00
parent 32e9c293f0
commit 48b883d741
84 changed files with 2990 additions and 725 deletions
@@ -0,0 +1,73 @@
+---
+name: agent-harness-construction
+description: Design and optimize AI agent action spaces, tool definitions, and observation formatting for higher completion rates.
+origin: ECC
+---
+
+# Agent Harness Construction
+
+Use this skill when you are improving how an agent plans, calls tools, recovers from errors, and converges on completion.
+
+## Core Model
+
+Agent output quality is constrained by:
+1. Action space quality
+2. Observation quality
+3. Recovery quality
+4. Context budget quality
+
+## Action Space Design
+
+1. Use stable, explicit tool names.
+2. Keep inputs schema-first and narrow.
+3. Return deterministic output shapes.
+4. Avoid catch-all tools unless isolation is impossible.
+
+## Granularity Rules
+
+- Use micro-tools for high-risk operations (deploy, migration, permissions).
+- Use medium tools for common edit/read/search loops.
+- Use macro-tools only when round-trip overhead is the dominant cost.
+
+## Observation Design
+
+Every tool response should include:
+- `status`: success|warning|error
+- `summary`: one-line result
+- `next_actions`: actionable follow-ups
+- `artifacts`: file paths / IDs
+
+## Error Recovery Contract
+
+For every error path, include:
+- root cause hint
+- safe retry instruction
+- explicit stop condition
+
+## Context Budgeting
+
+1. Keep system prompt minimal and invariant.
+2. Move large guidance into skills loaded on demand.
+3. Prefer references to files over inlining long documents.
+4. Compact at phase boundaries, not arbitrary token thresholds.
+
+## Architecture Pattern Guidance
+
+- ReAct: best for exploratory tasks with uncertain path.
+- Function-calling: best for structured deterministic flows.
+- Hybrid (recommended): ReAct planning + typed tool execution.
+
+## Benchmarking
+
+Track:
+- completion rate
+- retries per task
+- pass@1 and pass@3
+- cost per successful task
+
+## Anti-Patterns
+
+- Too many tools with overlapping semantics.
+- Opaque tool output with no recovery hints.
+- Error-only output without next steps.
+- Context overloading with irrelevant references.
@@ -0,0 +1,63 @@
+---
+name: agentic-engineering
+description: Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing.
+origin: ECC
+---
+
+# Agentic Engineering
+
+Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.
+
+## Operating Principles
+
+1. Define completion criteria before execution.
+2. Decompose work into agent-sized units.
+3. Route model tiers by task complexity.
+4. Measure with evals and regression checks.
+
+## Eval-First Loop
+
+1. Define capability eval and regression eval.
+2. Run baseline and capture failure signatures.
+3. Execute implementation.
+4. Re-run evals and compare deltas.
+
+## Task Decomposition
+
+Apply the 15-minute unit rule:
+- each unit should be independently verifiable
+- each unit should have a single dominant risk
+- each unit should expose a clear done condition
+
+## Model Routing
+
+- Haiku: classification, boilerplate transforms, narrow edits
+- Sonnet: implementation and refactors
+- Opus: architecture, root-cause analysis, multi-file invariants
+
+## Session Strategy
+
+- Continue session for closely-coupled units.
+- Start fresh session after major phase transitions.
+- Compact after milestone completion, not during active debugging.
+
+## Review Focus for AI-Generated Code
+
+Prioritize:
+- invariants and edge cases
+- error boundaries
+- security and auth assumptions
+- hidden coupling and rollout risk
+
+Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.
+
+## Cost Discipline
+
+Track per task:
+- model
+- token estimate
+- retries
+- wall-clock time
+- success/failure
+
+Escalate model tier only when lower tier fails with a clear reasoning gap.
@@ -0,0 +1,51 @@
+---
+name: ai-first-engineering
+description: Engineering operating model for teams where AI agents generate a large share of implementation output.
+origin: ECC
+---
+
+# AI-First Engineering
+
+Use this skill when designing process, reviews, and architecture for teams shipping with AI-assisted code generation.
+
+## Process Shifts
+
+1. Planning quality matters more than typing speed.
+2. Eval coverage matters more than anecdotal confidence.
+3. Review focus shifts from syntax to system behavior.
+
+## Architecture Requirements
+
+Prefer architectures that are agent-friendly:
+- explicit boundaries
+- stable contracts
+- typed interfaces
+- deterministic tests
+
+Avoid implicit behavior spread across hidden conventions.
+
+## Code Review in AI-First Teams
+
+Review for:
+- behavior regressions
+- security assumptions
+- data integrity
+- failure handling
+- rollout safety
+
+Minimize time spent on style issues already covered by automation.
+
+## Hiring and Evaluation Signals
+
+Strong AI-first engineers:
+- decompose ambiguous work cleanly
+- define measurable acceptance criteria
+- produce high-signal prompts and evals
+- enforce risk controls under delivery pressure
+
+## Testing Standard
+
+Raise testing bar for generated code:
+- required regression coverage for touched domains
+- explicit edge-case assertions
+- integration checks for interface boundaries
@@ -6,6 +6,11 @@ origin: ECC

 # Autonomous Loops Skill

+> Compatibility note (v1.8.0): `autonomous-loops` is retained for one release.
+> The canonical skill name is now `continuous-agent-loop`. New loop guidance
+> should be authored there, while this skill remains available to avoid
+> breaking existing workflows.
+
 Patterns, architectures, and reference implementations for running Claude Code autonomously in loops. Covers everything from simple `claude -p` pipelines to full RFC-driven multi-agent DAG orchestration.

 ## When to Use
@@ -0,0 +1,45 @@
+---
+name: continuous-agent-loop
+description: Patterns for continuous autonomous agent loops with quality gates, evals, and recovery controls.
+origin: ECC
+---
+
+# Continuous Agent Loop
+
+This is the v1.8+ canonical loop skill name. It supersedes `autonomous-loops` while keeping compatibility for one release.
+
+## Loop Selection Flow
+
+```text
+Start
+  |
+  +-- Need strict CI/PR control? -- yes --> continuous-pr
+  |                                    
+  +-- Need RFC decomposition? -- yes --> rfc-dag
+  |
+  +-- Need exploratory parallel generation? -- yes --> infinite
+  |
+  +-- default --> sequential
+```
+
+## Combined Pattern
+
+Recommended production stack:
+1. RFC decomposition (`ralphinho-rfc-pipeline`)
+2. quality gates (`plankton-code-quality` + `/quality-gate`)
+3. eval loop (`eval-harness`)
+4. session persistence (`nanoclaw-repl`)
+
+## Failure Modes
+
+- loop churn without measurable progress
+- repeated retries with same root cause
+- merge queue stalls
+- cost drift from unbounded escalation
+
+## Recovery
+
+- freeze loop
+- run `/harness-audit`
+- reduce scope to failing unit
+- replay with explicit acceptance criteria
@@ -0,0 +1,133 @@
+#!/usr/bin/env bash
+# Continuous Learning v2 - Observer background loop
+
+set +e
+unset CLAUDECODE
+
+SLEEP_PID=""
+USR1_FIRED=0
+
+cleanup() {
+  [ -n "$SLEEP_PID" ] && kill "$SLEEP_PID" 2>/dev/null
+  if [ -f "$PID_FILE" ] && [ "$(cat "$PID_FILE" 2>/dev/null)" = "$$" ]; then
+    rm -f "$PID_FILE"
+  fi
+  exit 0
+}
+trap cleanup TERM INT
+
+analyze_observations() {
+  if [ ! -f "$OBSERVATIONS_FILE" ]; then
+    return
+  fi
+
+  obs_count=$(wc -l < "$OBSERVATIONS_FILE" 2>/dev/null || echo 0)
+  if [ "$obs_count" -lt "$MIN_OBSERVATIONS" ]; then
+    return
+  fi
+
+  echo "[$(date)] Analyzing $obs_count observations for project ${PROJECT_NAME}..." >> "$LOG_FILE"
+
+  if [ "${CLV2_IS_WINDOWS:-false}" = "true" ] && [ "${ECC_OBSERVER_ALLOW_WINDOWS:-false}" != "true" ]; then
+    echo "[$(date)] Skipping claude analysis on Windows due to known non-interactive hang issue (#295). Set ECC_OBSERVER_ALLOW_WINDOWS=true to override." >> "$LOG_FILE"
+    return
+  fi
+
+  if ! command -v claude >/dev/null 2>&1; then
+    echo "[$(date)] claude CLI not found, skipping analysis" >> "$LOG_FILE"
+    return
+  fi
+
+  prompt_file="$(mktemp "${TMPDIR:-/tmp}/ecc-observer-prompt.XXXXXX")"
+  cat > "$prompt_file" <<PROMPT
+Read ${OBSERVATIONS_FILE} and identify patterns for the project ${PROJECT_NAME} (user corrections, error resolutions, repeated workflows, tool preferences).
+If you find 3+ occurrences of the same pattern, create an instinct file in ${INSTINCTS_DIR}/<id>.md.
+
+CRITICAL: Every instinct file MUST use this exact format:
+
+---
+id: kebab-case-name
+trigger: when <specific condition>
+confidence: <0.3-0.85 based on frequency: 3-5 times=0.5, 6-10=0.7, 11+=0.85>
+domain: <one of: code-style, testing, git, debugging, workflow, file-patterns>
+source: session-observation
+scope: project
+project_id: ${PROJECT_ID}
+project_name: ${PROJECT_NAME}
+---
+
+# Title
+
+## Action
+<what to do, one clear sentence>
+
+## Evidence
+- Observed N times in session <id>
+- Pattern: <description>
+- Last observed: <date>
+
+Rules:
+- Be conservative, only clear patterns with 3+ observations
+- Use narrow, specific triggers
+- Never include actual code snippets, only describe patterns
+- If a similar instinct already exists in ${INSTINCTS_DIR}/, update it instead of creating a duplicate
+- The YAML frontmatter (between --- markers) with id field is MANDATORY
+- If a pattern seems universal (not project-specific), set scope to global instead of project
+- Examples of global patterns: always validate user input, prefer explicit error handling
+- Examples of project patterns: use React functional components, follow Django REST framework conventions
+PROMPT
+
+  timeout_seconds="${ECC_OBSERVER_TIMEOUT_SECONDS:-120}"
+  exit_code=0
+
+  claude --model haiku --max-turns 3 --print < "$prompt_file" >> "$LOG_FILE" 2>&1 &
+  claude_pid=$!
+
+  (
+    sleep "$timeout_seconds"
+    if kill -0 "$claude_pid" 2>/dev/null; then
+      echo "[$(date)] Claude analysis timed out after ${timeout_seconds}s; terminating process" >> "$LOG_FILE"
+      kill "$claude_pid" 2>/dev/null || true
+    fi
+  ) &
+  watchdog_pid=$!
+
+  wait "$claude_pid"
+  exit_code=$?
+  kill "$watchdog_pid" 2>/dev/null || true
+  rm -f "$prompt_file"
+
+  if [ "$exit_code" -ne 0 ]; then
+    echo "[$(date)] Claude analysis failed (exit $exit_code)" >> "$LOG_FILE"
+  fi
+
+  if [ -f "$OBSERVATIONS_FILE" ]; then
+    archive_dir="${PROJECT_DIR}/observations.archive"
+    mkdir -p "$archive_dir"
+    mv "$OBSERVATIONS_FILE" "$archive_dir/processed-$(date +%Y%m%d-%H%M%S)-$$.jsonl" 2>/dev/null || true
+  fi
+}
+
+on_usr1() {
+  [ -n "$SLEEP_PID" ] && kill "$SLEEP_PID" 2>/dev/null
+  SLEEP_PID=""
+  USR1_FIRED=1
+  analyze_observations
+}
+trap on_usr1 USR1
+
+echo "$$" > "$PID_FILE"
+echo "[$(date)] Observer started for ${PROJECT_NAME} (PID: $$)" >> "$LOG_FILE"
+
+while true; do
+  sleep "$OBSERVER_INTERVAL_SECONDS" &
+  SLEEP_PID=$!
+  wait "$SLEEP_PID" 2>/dev/null
+  SLEEP_PID=""
+
+  if [ "$USR1_FIRED" -eq 1 ]; then
+    USR1_FIRED=0
+  else
+    analyze_observations
+  fi
+done
@@ -23,6 +23,7 @@ set -e

 SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
 SKILL_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+OBSERVER_LOOP_SCRIPT="${SCRIPT_DIR}/observer-loop.sh"

 # Source shared project detection helper
 # This sets: PROJECT_ID, PROJECT_NAME, PROJECT_ROOT, PROJECT_DIR
@@ -74,6 +75,13 @@ OBSERVER_INTERVAL_SECONDS=$((OBSERVER_INTERVAL_MINUTES * 60))
 echo "Project: ${PROJECT_NAME} (${PROJECT_ID})"
 echo "Storage: ${PROJECT_DIR}"

+# Windows/Git-Bash detection (Issue #295)
+UNAME_LOWER="$(uname -s 2>/dev/null | tr '[:upper:]' '[:lower:]')"
+IS_WINDOWS=false
+case "$UNAME_LOWER" in
+  *mingw*|*msys*|*cygwin*) IS_WINDOWS=true ;;
+esac
+
 case "${1:-start}" in
  stop)
    if [ -f "$PID_FILE" ]; then
@@ -135,8 +143,13 @@ case "${1:-start}" in

    echo "Starting observer agent for ${PROJECT_NAME}..."

+    if [ ! -x "$OBSERVER_LOOP_SCRIPT" ]; then
+      echo "Observer loop script not found or not executable: $OBSERVER_LOOP_SCRIPT"
+      exit 1
+    fi
+
    # The observer loop — fully detached with nohup, IO redirected to log.
-    # Variables passed safely via env to avoid shell injection from special chars in paths.
+    # Variables are passed via env; observer-loop.sh handles analysis/retry flow.
    nohup env \
      CONFIG_DIR="$CONFIG_DIR" \
      PID_FILE="$PID_FILE" \
@@ -148,116 +161,8 @@ case "${1:-start}" in
      PROJECT_ID="$PROJECT_ID" \
      MIN_OBSERVATIONS="$MIN_OBSERVATIONS" \
      OBSERVER_INTERVAL_SECONDS="$OBSERVER_INTERVAL_SECONDS" \
-      /bin/bash -c '
-      set +e
-      unset CLAUDECODE
-
-      SLEEP_PID=""
-      USR1_FIRED=0
-
-      cleanup() {
-        [ -n "$SLEEP_PID" ] && kill "$SLEEP_PID" 2>/dev/null
-        # Only remove PID file if it still belongs to this process
-        if [ -f "$PID_FILE" ] && [ "$(cat "$PID_FILE" 2>/dev/null)" = "$$" ]; then
-          rm -f "$PID_FILE"
-        fi
-        exit 0
-      }
-      trap cleanup TERM INT
-
-      analyze_observations() {
-        if [ ! -f "$OBSERVATIONS_FILE" ]; then
-          return
-        fi
-        obs_count=$(wc -l < "$OBSERVATIONS_FILE" 2>/dev/null || echo 0)
-        if [ "$obs_count" -lt "$MIN_OBSERVATIONS" ]; then
-          return
-        fi
-
-        echo "[$(date)] Analyzing $obs_count observations for project ${PROJECT_NAME}..." >> "$LOG_FILE"
-
-        # Use Claude Code with Haiku to analyze observations
-        # The prompt specifies project-scoped instinct creation
-        if command -v claude &> /dev/null; then
-          exit_code=0
-          claude --model haiku --max-turns 3 --print \
-            "Read $OBSERVATIONS_FILE and identify patterns for the project '${PROJECT_NAME}' (user corrections, error resolutions, repeated workflows, tool preferences).
-If you find 3+ occurrences of the same pattern, create an instinct file in $INSTINCTS_DIR/<id>.md.
-
-CRITICAL: Every instinct file MUST use this exact format:
-
---
-id: kebab-case-name
-trigger: \"when <specific condition>\"
-confidence: <0.3-0.85 based on frequency: 3-5 times=0.5, 6-10=0.7, 11+=0.85>
-domain: <one of: code-style, testing, git, debugging, workflow, file-patterns>
-source: session-observation
-scope: project
-project_id: ${PROJECT_ID}
-project_name: ${PROJECT_NAME}
---
-
-# Title
-
-## Action
-<what to do, one clear sentence>
-
-## Evidence
- Observed N times in session <id>
- Pattern: <description>
- Last observed: <date>
-
-Rules:
- Be conservative, only clear patterns with 3+ observations
- Use narrow, specific triggers
- Never include actual code snippets, only describe patterns
- If a similar instinct already exists in $INSTINCTS_DIR/, update it instead of creating a duplicate
- The YAML frontmatter (between --- markers) with id field is MANDATORY
- If a pattern seems universal (not project-specific), set scope to 'global' instead of 'project'
- Examples of global patterns: 'always validate user input', 'prefer explicit error handling'
- Examples of project patterns: 'use React functional components', 'follow Django REST framework conventions'" \
-            >> "$LOG_FILE" 2>&1 || exit_code=$?
-          if [ "$exit_code" -ne 0 ]; then
-            echo "[$(date)] Claude analysis failed (exit $exit_code)" >> "$LOG_FILE"
-          fi
-        else
-          echo "[$(date)] claude CLI not found, skipping analysis" >> "$LOG_FILE"
-        fi
-
-        if [ -f "$OBSERVATIONS_FILE" ]; then
-          archive_dir="${PROJECT_DIR}/observations.archive"
-          mkdir -p "$archive_dir"
-          mv "$OBSERVATIONS_FILE" "$archive_dir/processed-$(date +%Y%m%d-%H%M%S)-$$.jsonl" 2>/dev/null || true
-        fi
-      }
-
-      on_usr1() {
-        # Kill pending sleep to avoid leak, then analyze
-        [ -n "$SLEEP_PID" ] && kill "$SLEEP_PID" 2>/dev/null
-        SLEEP_PID=""
-        USR1_FIRED=1
-        analyze_observations
-      }
-      trap on_usr1 USR1
-
-      echo "$$" > "$PID_FILE"
-      echo "[$(date)] Observer started for ${PROJECT_NAME} (PID: $$)" >> "$LOG_FILE"
-
-      while true; do
-        # Interruptible sleep — allows USR1 trap to fire immediately
-        sleep "$OBSERVER_INTERVAL_SECONDS" &
-        SLEEP_PID=$!
-        wait $SLEEP_PID 2>/dev/null
-        SLEEP_PID=""
-
-        # Skip scheduled analysis if USR1 already ran it
-        if [ "$USR1_FIRED" -eq 1 ]; then
-          USR1_FIRED=0
-        else
-          analyze_observations
-        fi
-      done
-    ' >> "$LOG_FILE" 2>&1 &
+      CLV2_IS_WINDOWS="$IS_WINDOWS" \
+      "$OBSERVER_LOOP_SCRIPT" >> "$LOG_FILE" 2>&1 &

    # Wait for PID file
    sleep 2
@@ -116,4 +116,4 @@ Homunculus v2 takes a more sophisticated approach:
 4. **Domain tagging** - code-style, testing, git, debugging, etc.
 5. **Evolution path** - Cluster related instincts into skills/commands

-See: `/Users/affoon/Documents/tasks/12-continuous-learning-v2.md` for full spec.
+See: `docs/continuous-learning-v2-spec.md` for full spec.
@@ -0,0 +1,50 @@
+---
+name: enterprise-agent-ops
+description: Operate long-lived agent workloads with observability, security boundaries, and lifecycle management.
+origin: ECC
+---
+
+# Enterprise Agent Ops
+
+Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions.
+
+## Operational Domains
+
+1. runtime lifecycle (start, pause, stop, restart)
+2. observability (logs, metrics, traces)
+3. safety controls (scopes, permissions, kill switches)
+4. change management (rollout, rollback, audit)
+
+## Baseline Controls
+
+- immutable deployment artifacts
+- least-privilege credentials
+- environment-level secret injection
+- hard timeout and retry budgets
+- audit log for high-risk actions
+
+## Metrics to Track
+
+- success rate
+- mean retries per task
+- time to recovery
+- cost per successful task
+- failure class distribution
+
+## Incident Pattern
+
+When failure spikes:
+1. freeze new rollout
+2. capture representative traces
+3. isolate failing route
+4. patch with smallest safe change
+5. run regression + security checks
+6. resume gradually
+
+## Deployment Integrations
+
+This skill pairs with:
+- PM2 workflows
+- systemd services
+- container orchestrators
+- CI/CD gates
@@ -234,3 +234,37 @@ Capability: 5/5 passed (pass@3: 100%)
 Regression: 3/3 passed (pass^3: 100%)
 Status: SHIP IT
 ```
+
+## Product Evals (v1.8)
+
+Use product evals when behavior quality cannot be captured by unit tests alone.
+
+### Grader Types
+
+1. Code grader (deterministic assertions)
+2. Rule grader (regex/schema constraints)
+3. Model grader (LLM-as-judge rubric)
+4. Human grader (manual adjudication for ambiguous outputs)
+
+### pass@k Guidance
+
+- `pass@1`: direct reliability
+- `pass@3`: practical reliability under controlled retries
+- `pass^3`: stability test (all 3 runs must pass)
+
+Recommended thresholds:
+- Capability evals: pass@3 >= 0.90
+- Regression evals: pass^3 = 1.00 for release-critical paths
+
+### Eval Anti-Patterns
+
+- Overfitting prompts to known eval examples
+- Measuring only happy-path outputs
+- Ignoring cost and latency drift while chasing pass rates
+- Allowing flaky graders in release gates
+
+### Minimal Eval Artifact Layout
+
+- `.claude/evals/<feature>.md` definition
+- `.claude/evals/<feature>.log` run history
+- `docs/releases/<version>/eval-summary.md` release snapshot
@@ -0,0 +1,33 @@
+---
+name: nanoclaw-repl
+description: Operate and extend NanoClaw v2, ECC's zero-dependency session-aware REPL built on claude -p.
+origin: ECC
+---
+
+# NanoClaw REPL
+
+Use this skill when running or extending `scripts/claw.js`.
+
+## Capabilities
+
+- persistent markdown-backed sessions
+- model switching with `/model`
+- dynamic skill loading with `/load`
+- session branching with `/branch`
+- cross-session search with `/search`
+- history compaction with `/compact`
+- export to md/json/txt with `/export`
+- session metrics with `/metrics`
+
+## Operating Guidance
+
+1. Keep sessions task-focused.
+2. Branch before high-risk changes.
+3. Compact after major milestones.
+4. Export before sharing or archival.
+
+## Extension Rules
+
+- keep zero external runtime dependencies
+- preserve markdown-as-database compatibility
+- keep command handlers deterministic and local
@@ -194,3 +194,46 @@ Plankton's `.claude/hooks/config.json` controls all behavior:
 - Plankton (credit: @alxfazio)
 - Plankton REFERENCE.md — Full architecture documentation (credit: @alxfazio)
 - Plankton SETUP.md — Detailed installation guide (credit: @alxfazio)
+
+## ECC v1.8 Additions
+
+### Copyable Hook Profile
+
+Set strict quality behavior:
+
+```bash
+export ECC_HOOK_PROFILE=strict
+export ECC_QUALITY_GATE_FIX=true
+export ECC_QUALITY_GATE_STRICT=true
+```
+
+### Language Gate Table
+
+- TypeScript/JavaScript: Biome preferred, Prettier fallback
+- Python: Ruff format/check
+- Go: gofmt
+
+### Config Tamper Guard
+
+During quality enforcement, flag changes to config files in same iteration:
+
+- `biome.json`, `.eslintrc*`, `prettier.config*`, `tsconfig.json`, `pyproject.toml`
+
+If config is changed to suppress violations, require explicit review before merge.
+
+### CI Integration Pattern
+
+Use the same commands in CI as local hooks:
+
+1. run formatter checks
+2. run lint/type checks
+3. fail fast on strict mode
+4. publish remediation summary
+
+### Health Metrics
+
+Track:
+- edits flagged by gates
+- average remediation time
+- repeat violations by category
+- merge blocks due to gate failures
@@ -0,0 +1,67 @@
+---
+name: ralphinho-rfc-pipeline
+description: RFC-driven multi-agent DAG execution pattern with quality gates, merge queues, and work unit orchestration.
+origin: ECC
+---
+
+# Ralphinho RFC Pipeline
+
+Inspired by [humanplane](https://github.com/humanplane) style RFC decomposition patterns and multi-unit orchestration workflows.
+
+Use this skill when a feature is too large for a single agent pass and must be split into independently verifiable work units.
+
+## Pipeline Stages
+
+1. RFC intake
+2. DAG decomposition
+3. Unit assignment
+4. Unit implementation
+5. Unit validation
+6. Merge queue and integration
+7. Final system verification
+
+## Unit Spec Template
+
+Each work unit should include:
+- `id`
+- `depends_on`
+- `scope`
+- `acceptance_tests`
+- `risk_level`
+- `rollback_plan`
+
+## Complexity Tiers
+
+- Tier 1: isolated file edits, deterministic tests
+- Tier 2: multi-file behavior changes, moderate integration risk
+- Tier 3: schema/auth/perf/security changes
+
+## Quality Pipeline per Unit
+
+1. research
+2. implementation plan
+3. implementation
+4. tests
+5. review
+6. merge-ready report
+
+## Merge Queue Rules
+
+- Never merge a unit with unresolved dependency failures.
+- Always rebase unit branches on latest integration branch.
+- Re-run integration tests after each queued merge.
+
+## Recovery
+
+If a unit stalls:
+- evict from active queue
+- snapshot findings
+- regenerate narrowed unit scope
+- retry with updated constraints
+
+## Outputs
+
+- RFC execution log
+- unit scorecards
+- dependency graph snapshot
+- integration risk summary