mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-04-12 20:53:34 +08:00
feat: deliver v1.8.0 harness reliability and parity updates
This commit is contained in:
73
skills/agent-harness-construction/SKILL.md
Normal file
73
skills/agent-harness-construction/SKILL.md
Normal file
@@ -0,0 +1,73 @@
|
||||
---
|
||||
name: agent-harness-construction
|
||||
description: Design and optimize AI agent action spaces, tool definitions, and observation formatting for higher completion rates.
|
||||
origin: ECC
|
||||
---
|
||||
|
||||
# Agent Harness Construction
|
||||
|
||||
Use this skill when you are improving how an agent plans, calls tools, recovers from errors, and converges on completion.
|
||||
|
||||
## Core Model
|
||||
|
||||
Agent output quality is constrained by:
|
||||
1. Action space quality
|
||||
2. Observation quality
|
||||
3. Recovery quality
|
||||
4. Context budget quality
|
||||
|
||||
## Action Space Design
|
||||
|
||||
1. Use stable, explicit tool names.
|
||||
2. Keep inputs schema-first and narrow.
|
||||
3. Return deterministic output shapes.
|
||||
4. Avoid catch-all tools unless isolation is impossible.
|
||||
|
||||
## Granularity Rules
|
||||
|
||||
- Use micro-tools for high-risk operations (deploy, migration, permissions).
|
||||
- Use medium tools for common edit/read/search loops.
|
||||
- Use macro-tools only when round-trip overhead is the dominant cost.
|
||||
|
||||
## Observation Design
|
||||
|
||||
Every tool response should include:
|
||||
- `status`: success|warning|error
|
||||
- `summary`: one-line result
|
||||
- `next_actions`: actionable follow-ups
|
||||
- `artifacts`: file paths / IDs
|
||||
|
||||
## Error Recovery Contract
|
||||
|
||||
For every error path, include:
|
||||
- root cause hint
|
||||
- safe retry instruction
|
||||
- explicit stop condition
|
||||
|
||||
## Context Budgeting
|
||||
|
||||
1. Keep system prompt minimal and invariant.
|
||||
2. Move large guidance into skills loaded on demand.
|
||||
3. Prefer references to files over inlining long documents.
|
||||
4. Compact at phase boundaries, not arbitrary token thresholds.
|
||||
|
||||
## Architecture Pattern Guidance
|
||||
|
||||
- ReAct: best for exploratory tasks with uncertain path.
|
||||
- Function-calling: best for structured deterministic flows.
|
||||
- Hybrid (recommended): ReAct planning + typed tool execution.
|
||||
|
||||
## Benchmarking
|
||||
|
||||
Track:
|
||||
- completion rate
|
||||
- retries per task
|
||||
- pass@1 and pass@3
|
||||
- cost per successful task
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
- Too many tools with overlapping semantics.
|
||||
- Opaque tool output with no recovery hints.
|
||||
- Error-only output without next steps.
|
||||
- Context overloading with irrelevant references.
|
||||
63
skills/agentic-engineering/SKILL.md
Normal file
63
skills/agentic-engineering/SKILL.md
Normal file
@@ -0,0 +1,63 @@
|
||||
---
|
||||
name: agentic-engineering
|
||||
description: Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing.
|
||||
origin: ECC
|
||||
---
|
||||
|
||||
# Agentic Engineering
|
||||
|
||||
Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.
|
||||
|
||||
## Operating Principles
|
||||
|
||||
1. Define completion criteria before execution.
|
||||
2. Decompose work into agent-sized units.
|
||||
3. Route model tiers by task complexity.
|
||||
4. Measure with evals and regression checks.
|
||||
|
||||
## Eval-First Loop
|
||||
|
||||
1. Define capability eval and regression eval.
|
||||
2. Run baseline and capture failure signatures.
|
||||
3. Execute implementation.
|
||||
4. Re-run evals and compare deltas.
|
||||
|
||||
## Task Decomposition
|
||||
|
||||
Apply the 15-minute unit rule:
|
||||
- each unit should be independently verifiable
|
||||
- each unit should have a single dominant risk
|
||||
- each unit should expose a clear done condition
|
||||
|
||||
## Model Routing
|
||||
|
||||
- Haiku: classification, boilerplate transforms, narrow edits
|
||||
- Sonnet: implementation and refactors
|
||||
- Opus: architecture, root-cause analysis, multi-file invariants
|
||||
|
||||
## Session Strategy
|
||||
|
||||
- Continue session for closely-coupled units.
|
||||
- Start fresh session after major phase transitions.
|
||||
- Compact after milestone completion, not during active debugging.
|
||||
|
||||
## Review Focus for AI-Generated Code
|
||||
|
||||
Prioritize:
|
||||
- invariants and edge cases
|
||||
- error boundaries
|
||||
- security and auth assumptions
|
||||
- hidden coupling and rollout risk
|
||||
|
||||
Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.
|
||||
|
||||
## Cost Discipline
|
||||
|
||||
Track per task:
|
||||
- model
|
||||
- token estimate
|
||||
- retries
|
||||
- wall-clock time
|
||||
- success/failure
|
||||
|
||||
Escalate model tier only when lower tier fails with a clear reasoning gap.
|
||||
51
skills/ai-first-engineering/SKILL.md
Normal file
51
skills/ai-first-engineering/SKILL.md
Normal file
@@ -0,0 +1,51 @@
|
||||
---
|
||||
name: ai-first-engineering
|
||||
description: Engineering operating model for teams where AI agents generate a large share of implementation output.
|
||||
origin: ECC
|
||||
---
|
||||
|
||||
# AI-First Engineering
|
||||
|
||||
Use this skill when designing process, reviews, and architecture for teams shipping with AI-assisted code generation.
|
||||
|
||||
## Process Shifts
|
||||
|
||||
1. Planning quality matters more than typing speed.
|
||||
2. Eval coverage matters more than anecdotal confidence.
|
||||
3. Review focus shifts from syntax to system behavior.
|
||||
|
||||
## Architecture Requirements
|
||||
|
||||
Prefer architectures that are agent-friendly:
|
||||
- explicit boundaries
|
||||
- stable contracts
|
||||
- typed interfaces
|
||||
- deterministic tests
|
||||
|
||||
Avoid implicit behavior spread across hidden conventions.
|
||||
|
||||
## Code Review in AI-First Teams
|
||||
|
||||
Review for:
|
||||
- behavior regressions
|
||||
- security assumptions
|
||||
- data integrity
|
||||
- failure handling
|
||||
- rollout safety
|
||||
|
||||
Minimize time spent on style issues already covered by automation.
|
||||
|
||||
## Hiring and Evaluation Signals
|
||||
|
||||
Strong AI-first engineers:
|
||||
- decompose ambiguous work cleanly
|
||||
- define measurable acceptance criteria
|
||||
- produce high-signal prompts and evals
|
||||
- enforce risk controls under delivery pressure
|
||||
|
||||
## Testing Standard
|
||||
|
||||
Raise testing bar for generated code:
|
||||
- required regression coverage for touched domains
|
||||
- explicit edge-case assertions
|
||||
- integration checks for interface boundaries
|
||||
@@ -6,6 +6,11 @@ origin: ECC
|
||||
|
||||
# Autonomous Loops Skill
|
||||
|
||||
> Compatibility note (v1.8.0): `autonomous-loops` is retained for one release.
|
||||
> The canonical skill name is now `continuous-agent-loop`. New loop guidance
|
||||
> should be authored there, while this skill remains available to avoid
|
||||
> breaking existing workflows.
|
||||
|
||||
Patterns, architectures, and reference implementations for running Claude Code autonomously in loops. Covers everything from simple `claude -p` pipelines to full RFC-driven multi-agent DAG orchestration.
|
||||
|
||||
## When to Use
|
||||
|
||||
45
skills/continuous-agent-loop/SKILL.md
Normal file
45
skills/continuous-agent-loop/SKILL.md
Normal file
@@ -0,0 +1,45 @@
|
||||
---
|
||||
name: continuous-agent-loop
|
||||
description: Patterns for continuous autonomous agent loops with quality gates, evals, and recovery controls.
|
||||
origin: ECC
|
||||
---
|
||||
|
||||
# Continuous Agent Loop
|
||||
|
||||
This is the v1.8+ canonical loop skill name. It supersedes `autonomous-loops` while keeping compatibility for one release.
|
||||
|
||||
## Loop Selection Flow
|
||||
|
||||
```text
|
||||
Start
|
||||
|
|
||||
+-- Need strict CI/PR control? -- yes --> continuous-pr
|
||||
|
|
||||
+-- Need RFC decomposition? -- yes --> rfc-dag
|
||||
|
|
||||
+-- Need exploratory parallel generation? -- yes --> infinite
|
||||
|
|
||||
+-- default --> sequential
|
||||
```
|
||||
|
||||
## Combined Pattern
|
||||
|
||||
Recommended production stack:
|
||||
1. RFC decomposition (`ralphinho-rfc-pipeline`)
|
||||
2. quality gates (`plankton-code-quality` + `/quality-gate`)
|
||||
3. eval loop (`eval-harness`)
|
||||
4. session persistence (`nanoclaw-repl`)
|
||||
|
||||
## Failure Modes
|
||||
|
||||
- loop churn without measurable progress
|
||||
- repeated retries with same root cause
|
||||
- merge queue stalls
|
||||
- cost drift from unbounded escalation
|
||||
|
||||
## Recovery
|
||||
|
||||
- freeze loop
|
||||
- run `/harness-audit`
|
||||
- reduce scope to failing unit
|
||||
- replay with explicit acceptance criteria
|
||||
133
skills/continuous-learning-v2/agents/observer-loop.sh
Executable file
133
skills/continuous-learning-v2/agents/observer-loop.sh
Executable file
@@ -0,0 +1,133 @@
|
||||
#!/usr/bin/env bash
|
||||
# Continuous Learning v2 - Observer background loop
|
||||
|
||||
set +e
|
||||
unset CLAUDECODE
|
||||
|
||||
SLEEP_PID=""
|
||||
USR1_FIRED=0
|
||||
|
||||
cleanup() {
|
||||
[ -n "$SLEEP_PID" ] && kill "$SLEEP_PID" 2>/dev/null
|
||||
if [ -f "$PID_FILE" ] && [ "$(cat "$PID_FILE" 2>/dev/null)" = "$$" ]; then
|
||||
rm -f "$PID_FILE"
|
||||
fi
|
||||
exit 0
|
||||
}
|
||||
trap cleanup TERM INT
|
||||
|
||||
analyze_observations() {
|
||||
if [ ! -f "$OBSERVATIONS_FILE" ]; then
|
||||
return
|
||||
fi
|
||||
|
||||
obs_count=$(wc -l < "$OBSERVATIONS_FILE" 2>/dev/null || echo 0)
|
||||
if [ "$obs_count" -lt "$MIN_OBSERVATIONS" ]; then
|
||||
return
|
||||
fi
|
||||
|
||||
echo "[$(date)] Analyzing $obs_count observations for project ${PROJECT_NAME}..." >> "$LOG_FILE"
|
||||
|
||||
if [ "${CLV2_IS_WINDOWS:-false}" = "true" ] && [ "${ECC_OBSERVER_ALLOW_WINDOWS:-false}" != "true" ]; then
|
||||
echo "[$(date)] Skipping claude analysis on Windows due to known non-interactive hang issue (#295). Set ECC_OBSERVER_ALLOW_WINDOWS=true to override." >> "$LOG_FILE"
|
||||
return
|
||||
fi
|
||||
|
||||
if ! command -v claude >/dev/null 2>&1; then
|
||||
echo "[$(date)] claude CLI not found, skipping analysis" >> "$LOG_FILE"
|
||||
return
|
||||
fi
|
||||
|
||||
prompt_file="$(mktemp "${TMPDIR:-/tmp}/ecc-observer-prompt.XXXXXX")"
|
||||
cat > "$prompt_file" <<PROMPT
|
||||
Read ${OBSERVATIONS_FILE} and identify patterns for the project ${PROJECT_NAME} (user corrections, error resolutions, repeated workflows, tool preferences).
|
||||
If you find 3+ occurrences of the same pattern, create an instinct file in ${INSTINCTS_DIR}/<id>.md.
|
||||
|
||||
CRITICAL: Every instinct file MUST use this exact format:
|
||||
|
||||
---
|
||||
id: kebab-case-name
|
||||
trigger: when <specific condition>
|
||||
confidence: <0.3-0.85 based on frequency: 3-5 times=0.5, 6-10=0.7, 11+=0.85>
|
||||
domain: <one of: code-style, testing, git, debugging, workflow, file-patterns>
|
||||
source: session-observation
|
||||
scope: project
|
||||
project_id: ${PROJECT_ID}
|
||||
project_name: ${PROJECT_NAME}
|
||||
---
|
||||
|
||||
# Title
|
||||
|
||||
## Action
|
||||
<what to do, one clear sentence>
|
||||
|
||||
## Evidence
|
||||
- Observed N times in session <id>
|
||||
- Pattern: <description>
|
||||
- Last observed: <date>
|
||||
|
||||
Rules:
|
||||
- Be conservative, only clear patterns with 3+ observations
|
||||
- Use narrow, specific triggers
|
||||
- Never include actual code snippets, only describe patterns
|
||||
- If a similar instinct already exists in ${INSTINCTS_DIR}/, update it instead of creating a duplicate
|
||||
- The YAML frontmatter (between --- markers) with id field is MANDATORY
|
||||
- If a pattern seems universal (not project-specific), set scope to global instead of project
|
||||
- Examples of global patterns: always validate user input, prefer explicit error handling
|
||||
- Examples of project patterns: use React functional components, follow Django REST framework conventions
|
||||
PROMPT
|
||||
|
||||
timeout_seconds="${ECC_OBSERVER_TIMEOUT_SECONDS:-120}"
|
||||
exit_code=0
|
||||
|
||||
claude --model haiku --max-turns 3 --print < "$prompt_file" >> "$LOG_FILE" 2>&1 &
|
||||
claude_pid=$!
|
||||
|
||||
(
|
||||
sleep "$timeout_seconds"
|
||||
if kill -0 "$claude_pid" 2>/dev/null; then
|
||||
echo "[$(date)] Claude analysis timed out after ${timeout_seconds}s; terminating process" >> "$LOG_FILE"
|
||||
kill "$claude_pid" 2>/dev/null || true
|
||||
fi
|
||||
) &
|
||||
watchdog_pid=$!
|
||||
|
||||
wait "$claude_pid"
|
||||
exit_code=$?
|
||||
kill "$watchdog_pid" 2>/dev/null || true
|
||||
rm -f "$prompt_file"
|
||||
|
||||
if [ "$exit_code" -ne 0 ]; then
|
||||
echo "[$(date)] Claude analysis failed (exit $exit_code)" >> "$LOG_FILE"
|
||||
fi
|
||||
|
||||
if [ -f "$OBSERVATIONS_FILE" ]; then
|
||||
archive_dir="${PROJECT_DIR}/observations.archive"
|
||||
mkdir -p "$archive_dir"
|
||||
mv "$OBSERVATIONS_FILE" "$archive_dir/processed-$(date +%Y%m%d-%H%M%S)-$$.jsonl" 2>/dev/null || true
|
||||
fi
|
||||
}
|
||||
|
||||
on_usr1() {
|
||||
[ -n "$SLEEP_PID" ] && kill "$SLEEP_PID" 2>/dev/null
|
||||
SLEEP_PID=""
|
||||
USR1_FIRED=1
|
||||
analyze_observations
|
||||
}
|
||||
trap on_usr1 USR1
|
||||
|
||||
echo "$$" > "$PID_FILE"
|
||||
echo "[$(date)] Observer started for ${PROJECT_NAME} (PID: $$)" >> "$LOG_FILE"
|
||||
|
||||
while true; do
|
||||
sleep "$OBSERVER_INTERVAL_SECONDS" &
|
||||
SLEEP_PID=$!
|
||||
wait "$SLEEP_PID" 2>/dev/null
|
||||
SLEEP_PID=""
|
||||
|
||||
if [ "$USR1_FIRED" -eq 1 ]; then
|
||||
USR1_FIRED=0
|
||||
else
|
||||
analyze_observations
|
||||
fi
|
||||
done
|
||||
@@ -23,6 +23,7 @@ set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
SKILL_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
|
||||
OBSERVER_LOOP_SCRIPT="${SCRIPT_DIR}/observer-loop.sh"
|
||||
|
||||
# Source shared project detection helper
|
||||
# This sets: PROJECT_ID, PROJECT_NAME, PROJECT_ROOT, PROJECT_DIR
|
||||
@@ -74,6 +75,13 @@ OBSERVER_INTERVAL_SECONDS=$((OBSERVER_INTERVAL_MINUTES * 60))
|
||||
echo "Project: ${PROJECT_NAME} (${PROJECT_ID})"
|
||||
echo "Storage: ${PROJECT_DIR}"
|
||||
|
||||
# Windows/Git-Bash detection (Issue #295)
|
||||
UNAME_LOWER="$(uname -s 2>/dev/null | tr '[:upper:]' '[:lower:]')"
|
||||
IS_WINDOWS=false
|
||||
case "$UNAME_LOWER" in
|
||||
*mingw*|*msys*|*cygwin*) IS_WINDOWS=true ;;
|
||||
esac
|
||||
|
||||
case "${1:-start}" in
|
||||
stop)
|
||||
if [ -f "$PID_FILE" ]; then
|
||||
@@ -135,8 +143,13 @@ case "${1:-start}" in
|
||||
|
||||
echo "Starting observer agent for ${PROJECT_NAME}..."
|
||||
|
||||
if [ ! -x "$OBSERVER_LOOP_SCRIPT" ]; then
|
||||
echo "Observer loop script not found or not executable: $OBSERVER_LOOP_SCRIPT"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# The observer loop — fully detached with nohup, IO redirected to log.
|
||||
# Variables passed safely via env to avoid shell injection from special chars in paths.
|
||||
# Variables are passed via env; observer-loop.sh handles analysis/retry flow.
|
||||
nohup env \
|
||||
CONFIG_DIR="$CONFIG_DIR" \
|
||||
PID_FILE="$PID_FILE" \
|
||||
@@ -148,116 +161,8 @@ case "${1:-start}" in
|
||||
PROJECT_ID="$PROJECT_ID" \
|
||||
MIN_OBSERVATIONS="$MIN_OBSERVATIONS" \
|
||||
OBSERVER_INTERVAL_SECONDS="$OBSERVER_INTERVAL_SECONDS" \
|
||||
/bin/bash -c '
|
||||
set +e
|
||||
unset CLAUDECODE
|
||||
|
||||
SLEEP_PID=""
|
||||
USR1_FIRED=0
|
||||
|
||||
cleanup() {
|
||||
[ -n "$SLEEP_PID" ] && kill "$SLEEP_PID" 2>/dev/null
|
||||
# Only remove PID file if it still belongs to this process
|
||||
if [ -f "$PID_FILE" ] && [ "$(cat "$PID_FILE" 2>/dev/null)" = "$$" ]; then
|
||||
rm -f "$PID_FILE"
|
||||
fi
|
||||
exit 0
|
||||
}
|
||||
trap cleanup TERM INT
|
||||
|
||||
analyze_observations() {
|
||||
if [ ! -f "$OBSERVATIONS_FILE" ]; then
|
||||
return
|
||||
fi
|
||||
obs_count=$(wc -l < "$OBSERVATIONS_FILE" 2>/dev/null || echo 0)
|
||||
if [ "$obs_count" -lt "$MIN_OBSERVATIONS" ]; then
|
||||
return
|
||||
fi
|
||||
|
||||
echo "[$(date)] Analyzing $obs_count observations for project ${PROJECT_NAME}..." >> "$LOG_FILE"
|
||||
|
||||
# Use Claude Code with Haiku to analyze observations
|
||||
# The prompt specifies project-scoped instinct creation
|
||||
if command -v claude &> /dev/null; then
|
||||
exit_code=0
|
||||
claude --model haiku --max-turns 3 --print \
|
||||
"Read $OBSERVATIONS_FILE and identify patterns for the project '${PROJECT_NAME}' (user corrections, error resolutions, repeated workflows, tool preferences).
|
||||
If you find 3+ occurrences of the same pattern, create an instinct file in $INSTINCTS_DIR/<id>.md.
|
||||
|
||||
CRITICAL: Every instinct file MUST use this exact format:
|
||||
|
||||
---
|
||||
id: kebab-case-name
|
||||
trigger: \"when <specific condition>\"
|
||||
confidence: <0.3-0.85 based on frequency: 3-5 times=0.5, 6-10=0.7, 11+=0.85>
|
||||
domain: <one of: code-style, testing, git, debugging, workflow, file-patterns>
|
||||
source: session-observation
|
||||
scope: project
|
||||
project_id: ${PROJECT_ID}
|
||||
project_name: ${PROJECT_NAME}
|
||||
---
|
||||
|
||||
# Title
|
||||
|
||||
## Action
|
||||
<what to do, one clear sentence>
|
||||
|
||||
## Evidence
|
||||
- Observed N times in session <id>
|
||||
- Pattern: <description>
|
||||
- Last observed: <date>
|
||||
|
||||
Rules:
|
||||
- Be conservative, only clear patterns with 3+ observations
|
||||
- Use narrow, specific triggers
|
||||
- Never include actual code snippets, only describe patterns
|
||||
- If a similar instinct already exists in $INSTINCTS_DIR/, update it instead of creating a duplicate
|
||||
- The YAML frontmatter (between --- markers) with id field is MANDATORY
|
||||
- If a pattern seems universal (not project-specific), set scope to 'global' instead of 'project'
|
||||
- Examples of global patterns: 'always validate user input', 'prefer explicit error handling'
|
||||
- Examples of project patterns: 'use React functional components', 'follow Django REST framework conventions'" \
|
||||
>> "$LOG_FILE" 2>&1 || exit_code=$?
|
||||
if [ "$exit_code" -ne 0 ]; then
|
||||
echo "[$(date)] Claude analysis failed (exit $exit_code)" >> "$LOG_FILE"
|
||||
fi
|
||||
else
|
||||
echo "[$(date)] claude CLI not found, skipping analysis" >> "$LOG_FILE"
|
||||
fi
|
||||
|
||||
if [ -f "$OBSERVATIONS_FILE" ]; then
|
||||
archive_dir="${PROJECT_DIR}/observations.archive"
|
||||
mkdir -p "$archive_dir"
|
||||
mv "$OBSERVATIONS_FILE" "$archive_dir/processed-$(date +%Y%m%d-%H%M%S)-$$.jsonl" 2>/dev/null || true
|
||||
fi
|
||||
}
|
||||
|
||||
on_usr1() {
|
||||
# Kill pending sleep to avoid leak, then analyze
|
||||
[ -n "$SLEEP_PID" ] && kill "$SLEEP_PID" 2>/dev/null
|
||||
SLEEP_PID=""
|
||||
USR1_FIRED=1
|
||||
analyze_observations
|
||||
}
|
||||
trap on_usr1 USR1
|
||||
|
||||
echo "$$" > "$PID_FILE"
|
||||
echo "[$(date)] Observer started for ${PROJECT_NAME} (PID: $$)" >> "$LOG_FILE"
|
||||
|
||||
while true; do
|
||||
# Interruptible sleep — allows USR1 trap to fire immediately
|
||||
sleep "$OBSERVER_INTERVAL_SECONDS" &
|
||||
SLEEP_PID=$!
|
||||
wait $SLEEP_PID 2>/dev/null
|
||||
SLEEP_PID=""
|
||||
|
||||
# Skip scheduled analysis if USR1 already ran it
|
||||
if [ "$USR1_FIRED" -eq 1 ]; then
|
||||
USR1_FIRED=0
|
||||
else
|
||||
analyze_observations
|
||||
fi
|
||||
done
|
||||
' >> "$LOG_FILE" 2>&1 &
|
||||
CLV2_IS_WINDOWS="$IS_WINDOWS" \
|
||||
"$OBSERVER_LOOP_SCRIPT" >> "$LOG_FILE" 2>&1 &
|
||||
|
||||
# Wait for PID file
|
||||
sleep 2
|
||||
|
||||
@@ -116,4 +116,4 @@ Homunculus v2 takes a more sophisticated approach:
|
||||
4. **Domain tagging** - code-style, testing, git, debugging, etc.
|
||||
5. **Evolution path** - Cluster related instincts into skills/commands
|
||||
|
||||
See: `/Users/affoon/Documents/tasks/12-continuous-learning-v2.md` for full spec.
|
||||
See: `docs/continuous-learning-v2-spec.md` for full spec.
|
||||
|
||||
50
skills/enterprise-agent-ops/SKILL.md
Normal file
50
skills/enterprise-agent-ops/SKILL.md
Normal file
@@ -0,0 +1,50 @@
|
||||
---
|
||||
name: enterprise-agent-ops
|
||||
description: Operate long-lived agent workloads with observability, security boundaries, and lifecycle management.
|
||||
origin: ECC
|
||||
---
|
||||
|
||||
# Enterprise Agent Ops
|
||||
|
||||
Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions.
|
||||
|
||||
## Operational Domains
|
||||
|
||||
1. runtime lifecycle (start, pause, stop, restart)
|
||||
2. observability (logs, metrics, traces)
|
||||
3. safety controls (scopes, permissions, kill switches)
|
||||
4. change management (rollout, rollback, audit)
|
||||
|
||||
## Baseline Controls
|
||||
|
||||
- immutable deployment artifacts
|
||||
- least-privilege credentials
|
||||
- environment-level secret injection
|
||||
- hard timeout and retry budgets
|
||||
- audit log for high-risk actions
|
||||
|
||||
## Metrics to Track
|
||||
|
||||
- success rate
|
||||
- mean retries per task
|
||||
- time to recovery
|
||||
- cost per successful task
|
||||
- failure class distribution
|
||||
|
||||
## Incident Pattern
|
||||
|
||||
When failure spikes:
|
||||
1. freeze new rollout
|
||||
2. capture representative traces
|
||||
3. isolate failing route
|
||||
4. patch with smallest safe change
|
||||
5. run regression + security checks
|
||||
6. resume gradually
|
||||
|
||||
## Deployment Integrations
|
||||
|
||||
This skill pairs with:
|
||||
- PM2 workflows
|
||||
- systemd services
|
||||
- container orchestrators
|
||||
- CI/CD gates
|
||||
@@ -234,3 +234,37 @@ Capability: 5/5 passed (pass@3: 100%)
|
||||
Regression: 3/3 passed (pass^3: 100%)
|
||||
Status: SHIP IT
|
||||
```
|
||||
|
||||
## Product Evals (v1.8)
|
||||
|
||||
Use product evals when behavior quality cannot be captured by unit tests alone.
|
||||
|
||||
### Grader Types
|
||||
|
||||
1. Code grader (deterministic assertions)
|
||||
2. Rule grader (regex/schema constraints)
|
||||
3. Model grader (LLM-as-judge rubric)
|
||||
4. Human grader (manual adjudication for ambiguous outputs)
|
||||
|
||||
### pass@k Guidance
|
||||
|
||||
- `pass@1`: direct reliability
|
||||
- `pass@3`: practical reliability under controlled retries
|
||||
- `pass^3`: stability test (all 3 runs must pass)
|
||||
|
||||
Recommended thresholds:
|
||||
- Capability evals: pass@3 >= 0.90
|
||||
- Regression evals: pass^3 = 1.00 for release-critical paths
|
||||
|
||||
### Eval Anti-Patterns
|
||||
|
||||
- Overfitting prompts to known eval examples
|
||||
- Measuring only happy-path outputs
|
||||
- Ignoring cost and latency drift while chasing pass rates
|
||||
- Allowing flaky graders in release gates
|
||||
|
||||
### Minimal Eval Artifact Layout
|
||||
|
||||
- `.claude/evals/<feature>.md` definition
|
||||
- `.claude/evals/<feature>.log` run history
|
||||
- `docs/releases/<version>/eval-summary.md` release snapshot
|
||||
|
||||
33
skills/nanoclaw-repl/SKILL.md
Normal file
33
skills/nanoclaw-repl/SKILL.md
Normal file
@@ -0,0 +1,33 @@
|
||||
---
|
||||
name: nanoclaw-repl
|
||||
description: Operate and extend NanoClaw v2, ECC's zero-dependency session-aware REPL built on claude -p.
|
||||
origin: ECC
|
||||
---
|
||||
|
||||
# NanoClaw REPL
|
||||
|
||||
Use this skill when running or extending `scripts/claw.js`.
|
||||
|
||||
## Capabilities
|
||||
|
||||
- persistent markdown-backed sessions
|
||||
- model switching with `/model`
|
||||
- dynamic skill loading with `/load`
|
||||
- session branching with `/branch`
|
||||
- cross-session search with `/search`
|
||||
- history compaction with `/compact`
|
||||
- export to md/json/txt with `/export`
|
||||
- session metrics with `/metrics`
|
||||
|
||||
## Operating Guidance
|
||||
|
||||
1. Keep sessions task-focused.
|
||||
2. Branch before high-risk changes.
|
||||
3. Compact after major milestones.
|
||||
4. Export before sharing or archival.
|
||||
|
||||
## Extension Rules
|
||||
|
||||
- keep zero external runtime dependencies
|
||||
- preserve markdown-as-database compatibility
|
||||
- keep command handlers deterministic and local
|
||||
@@ -194,3 +194,46 @@ Plankton's `.claude/hooks/config.json` controls all behavior:
|
||||
- Plankton (credit: @alxfazio)
|
||||
- Plankton REFERENCE.md — Full architecture documentation (credit: @alxfazio)
|
||||
- Plankton SETUP.md — Detailed installation guide (credit: @alxfazio)
|
||||
|
||||
## ECC v1.8 Additions
|
||||
|
||||
### Copyable Hook Profile
|
||||
|
||||
Set strict quality behavior:
|
||||
|
||||
```bash
|
||||
export ECC_HOOK_PROFILE=strict
|
||||
export ECC_QUALITY_GATE_FIX=true
|
||||
export ECC_QUALITY_GATE_STRICT=true
|
||||
```
|
||||
|
||||
### Language Gate Table
|
||||
|
||||
- TypeScript/JavaScript: Biome preferred, Prettier fallback
|
||||
- Python: Ruff format/check
|
||||
- Go: gofmt
|
||||
|
||||
### Config Tamper Guard
|
||||
|
||||
During quality enforcement, flag changes to config files in same iteration:
|
||||
|
||||
- `biome.json`, `.eslintrc*`, `prettier.config*`, `tsconfig.json`, `pyproject.toml`
|
||||
|
||||
If config is changed to suppress violations, require explicit review before merge.
|
||||
|
||||
### CI Integration Pattern
|
||||
|
||||
Use the same commands in CI as local hooks:
|
||||
|
||||
1. run formatter checks
|
||||
2. run lint/type checks
|
||||
3. fail fast on strict mode
|
||||
4. publish remediation summary
|
||||
|
||||
### Health Metrics
|
||||
|
||||
Track:
|
||||
- edits flagged by gates
|
||||
- average remediation time
|
||||
- repeat violations by category
|
||||
- merge blocks due to gate failures
|
||||
|
||||
67
skills/ralphinho-rfc-pipeline/SKILL.md
Normal file
67
skills/ralphinho-rfc-pipeline/SKILL.md
Normal file
@@ -0,0 +1,67 @@
|
||||
---
|
||||
name: ralphinho-rfc-pipeline
|
||||
description: RFC-driven multi-agent DAG execution pattern with quality gates, merge queues, and work unit orchestration.
|
||||
origin: ECC
|
||||
---
|
||||
|
||||
# Ralphinho RFC Pipeline
|
||||
|
||||
Inspired by [humanplane](https://github.com/humanplane) style RFC decomposition patterns and multi-unit orchestration workflows.
|
||||
|
||||
Use this skill when a feature is too large for a single agent pass and must be split into independently verifiable work units.
|
||||
|
||||
## Pipeline Stages
|
||||
|
||||
1. RFC intake
|
||||
2. DAG decomposition
|
||||
3. Unit assignment
|
||||
4. Unit implementation
|
||||
5. Unit validation
|
||||
6. Merge queue and integration
|
||||
7. Final system verification
|
||||
|
||||
## Unit Spec Template
|
||||
|
||||
Each work unit should include:
|
||||
- `id`
|
||||
- `depends_on`
|
||||
- `scope`
|
||||
- `acceptance_tests`
|
||||
- `risk_level`
|
||||
- `rollback_plan`
|
||||
|
||||
## Complexity Tiers
|
||||
|
||||
- Tier 1: isolated file edits, deterministic tests
|
||||
- Tier 2: multi-file behavior changes, moderate integration risk
|
||||
- Tier 3: schema/auth/perf/security changes
|
||||
|
||||
## Quality Pipeline per Unit
|
||||
|
||||
1. research
|
||||
2. implementation plan
|
||||
3. implementation
|
||||
4. tests
|
||||
5. review
|
||||
6. merge-ready report
|
||||
|
||||
## Merge Queue Rules
|
||||
|
||||
- Never merge a unit with unresolved dependency failures.
|
||||
- Always rebase unit branches on latest integration branch.
|
||||
- Re-run integration tests after each queued merge.
|
||||
|
||||
## Recovery
|
||||
|
||||
If a unit stalls:
|
||||
- evict from active queue
|
||||
- snapshot findings
|
||||
- regenerate narrowed unit scope
|
||||
- retry with updated constraints
|
||||
|
||||
## Outputs
|
||||
|
||||
- RFC execution log
|
||||
- unit scorecards
|
||||
- dependency graph snapshot
|
||||
- integration risk summary
|
||||
Reference in New Issue
Block a user