From e09c548edfc5f6943d2077b42250e9b81b6113b4 Mon Sep 17 00:00:00 2001 From: Affaan Mustafa Date: Sun, 5 Apr 2026 20:10:54 -0700 Subject: [PATCH] feat: add agent introspection debugging skill --- .../agent-introspection-debugging/SKILL.md | 153 ++++++++++++++++++ AGENTS.md | 4 +- README.md | 6 +- README.zh-CN.md | 2 +- WORKING-CONTEXT.md | 1 + docs/zh-CN/AGENTS.md | 4 +- docs/zh-CN/README.md | 6 +- manifests/install-modules.json | 1 + skills/agent-introspection-debugging/SKILL.md | 153 ++++++++++++++++++ 9 files changed, 319 insertions(+), 11 deletions(-) create mode 100644 .agents/skills/agent-introspection-debugging/SKILL.md create mode 100644 skills/agent-introspection-debugging/SKILL.md diff --git a/.agents/skills/agent-introspection-debugging/SKILL.md b/.agents/skills/agent-introspection-debugging/SKILL.md new file mode 100644 index 00000000..ea5a2c58 --- /dev/null +++ b/.agents/skills/agent-introspection-debugging/SKILL.md @@ -0,0 +1,153 @@ +--- +name: agent-introspection-debugging +description: Structured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports. +origin: ECC +--- + +# Agent Introspection Debugging + +Use this skill when an agent run is failing repeatedly, consuming tokens without progress, looping on the same tools, or drifting away from the intended task. + +This is a workflow skill, not a hidden runtime. It teaches the agent to debug itself systematically before escalating to a human. + +## When to Activate + +- Maximum tool call / loop-limit failures +- Repeated retries with no forward progress +- Context growth or prompt drift that starts degrading output quality +- File-system or environment state mismatch between expectation and reality +- Tool failures that are likely recoverable with diagnosis and a smaller corrective action + +## Scope Boundaries + +Activate this skill for: +- capturing failure state before retrying blindly +- diagnosing common agent-specific failure patterns +- applying contained recovery actions +- producing a structured human-readable debug report + +Do not use this skill as the primary source for: +- feature verification after code changes; use `verification-loop` +- framework-specific debugging when a narrower ECC skill already exists +- runtime promises the current harness cannot enforce automatically + +## Four-Phase Loop + +### Phase 1: Failure Capture + +Before trying to recover, record the failure precisely. + +Capture: +- error type, message, and stack trace when available +- last meaningful tool call sequence +- what the agent was trying to do +- current context pressure: repeated prompts, oversized pasted logs, duplicated plans, or runaway notes +- current environment assumptions: cwd, branch, relevant service state, expected files + +Minimum capture template: + +```markdown +## Failure Capture +- Session / task: +- Goal in progress: +- Error: +- Last successful step: +- Last failed tool / command: +- Repeated pattern seen: +- Environment assumptions to verify: +``` + +### Phase 2: Root-Cause Diagnosis + +Match the failure to a known pattern before changing anything. + +| Pattern | Likely Cause | Check | +| --- | --- | --- | +| Maximum tool calls / repeated same command | loop or no-exit observer path | inspect the last N tool calls for repetition | +| Context overflow / degraded reasoning | unbounded notes, repeated plans, oversized logs | inspect recent context for duplication and low-signal bulk | +| `ECONNREFUSED` / timeout | service unavailable or wrong port | verify service health, URL, and port assumptions | +| `429` / quota exhaustion | retry storm or missing backoff | count repeated calls and inspect retry spacing | +| file missing after write / stale diff | race, wrong cwd, or branch drift | re-check path, cwd, git status, and actual file existence | +| tests still failing after “fix” | wrong hypothesis | isolate the exact failing test and re-derive the bug | + +Diagnosis questions: +- is this a logic failure, state failure, environment failure, or policy failure? +- did the agent lose the real objective and start optimizing the wrong subtask? +- is the failure deterministic or transient? +- what is the smallest reversible action that would validate the diagnosis? + +### Phase 3: Contained Recovery + +Recover with the smallest action that changes the diagnosis surface. + +Safe recovery actions: +- stop repeated retries and restate the hypothesis +- trim low-signal context and keep only the active goal, blockers, and evidence +- re-check the actual filesystem / branch / process state +- narrow the task to one failing command, one file, or one test +- switch from speculative reasoning to direct observation +- escalate to a human when the failure is high-risk or externally blocked + +Do not claim unsupported auto-healing actions like “reset agent state” or “update harness config” unless you are actually doing them through real tools in the current environment. + +Contained recovery checklist: + +```markdown +## Recovery Action +- Diagnosis chosen: +- Smallest action taken: +- Why this is safe: +- What evidence would prove the fix worked: +``` + +### Phase 4: Introspection Report + +End with a report that makes the recovery legible to the next agent or human. + +```markdown +## Agent Self-Debug Report +- Session / task: +- Failure: +- Root cause: +- Recovery action: +- Result: success | partial | blocked +- Token / time burn risk: +- Follow-up needed: +- Preventive change to encode later: +``` + +## Recovery Heuristics + +Prefer these interventions in order: + +1. Restate the real objective in one sentence. +2. Verify the world state instead of trusting memory. +3. Shrink the failing scope. +4. Run one discriminating check. +5. Only then retry. + +Bad pattern: +- retrying the same action three times with slightly different wording + +Good pattern: +- capture failure +- classify the pattern +- run one direct check +- change the plan only if the check supports it + +## Integration with ECC + +- Use `verification-loop` after recovery if code was changed. +- Use `continuous-learning-v2` when the failure pattern is worth turning into an instinct or later skill. +- Use `council` when the issue is not technical failure but decision ambiguity. +- Use `workspace-surface-audit` if the failure came from conflicting local state or repo drift. + +## Output Standard + +When this skill is active, do not end with “I fixed it” alone. + +Always provide: +- the failure pattern +- the root-cause hypothesis +- the recovery action +- the evidence that the situation is now better or still blocked diff --git a/AGENTS.md b/AGENTS.md index 9aad5972..3412f269 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,6 +1,6 @@ # Everything Claude Code (ECC) — Agent Instructions -This is a **production-ready AI coding plugin** providing 47 specialized agents, 180 skills, 79 commands, and automated hook workflows for software development. +This is a **production-ready AI coding plugin** providing 47 specialized agents, 181 skills, 79 commands, and automated hook workflows for software development. **Version:** 1.10.0 @@ -146,7 +146,7 @@ Troubleshoot failures: check test isolation → verify mocks → fix implementat ``` agents/ — 47 specialized subagents -skills/ — 180 workflow skills and domain knowledge +skills/ — 181 workflow skills and domain knowledge commands/ — 79 slash commands hooks/ — Trigger-based automations rules/ — Always-follow guidelines (common + per-language) diff --git a/README.md b/README.md index eb856afa..8d63889d 100644 --- a/README.md +++ b/README.md @@ -236,7 +236,7 @@ For manual install instructions see the README in the `rules/` folder. When copy /plugin list ecc@ecc ``` -**That's it!** You now have access to 47 agents, 180 skills, and 79 legacy command shims. +**That's it!** You now have access to 47 agents, 181 skills, and 79 legacy command shims. ### Multi-model commands require additional setup @@ -1154,7 +1154,7 @@ The configuration is automatically detected from `.opencode/opencode.json`. |---------|-------------|----------|--------| | Agents | PASS: 47 agents | PASS: 12 agents | **Claude Code leads** | | Commands | PASS: 79 commands | PASS: 31 commands | **Claude Code leads** | -| Skills | PASS: 180 skills | PASS: 37 skills | **Claude Code leads** | +| Skills | PASS: 181 skills | PASS: 37 skills | **Claude Code leads** | | Hooks | PASS: 8 event types | PASS: 11 events | **OpenCode has more!** | | Rules | PASS: 29 rules | PASS: 13 instructions | **Claude Code leads** | | MCP Servers | PASS: 14 servers | PASS: Full | **Full parity** | @@ -1263,7 +1263,7 @@ ECC is the **first plugin to maximize every major AI coding tool**. Here's how e |---------|------------|------------|-----------|----------| | **Agents** | 47 | Shared (AGENTS.md) | Shared (AGENTS.md) | 12 | | **Commands** | 79 | Shared | Instruction-based | 31 | -| **Skills** | 180 | Shared | 10 (native format) | 37 | +| **Skills** | 181 | Shared | 10 (native format) | 37 | | **Hook Events** | 8 types | 15 types | None yet | 11 types | | **Hook Scripts** | 20+ scripts | 16 scripts (DRY adapter) | N/A | Plugin hooks | | **Rules** | 34 (common + lang) | 34 (YAML frontmatter) | Instruction-based | 13 instructions | diff --git a/README.zh-CN.md b/README.zh-CN.md index 86849294..e56b323f 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -106,7 +106,7 @@ cp -r everything-claude-code/rules/perl ~/.claude/rules/ /plugin list ecc@ecc ``` -**完成!** 你现在可以使用 47 个代理、180 个技能和 79 个命令。 +**完成!** 你现在可以使用 47 个代理、181 个技能和 79 个命令。 ### multi-* 命令需要额外配置 diff --git a/WORKING-CONTEXT.md b/WORKING-CONTEXT.md index 2f6aa614..82ae07ba 100644 --- a/WORKING-CONTEXT.md +++ b/WORKING-CONTEXT.md @@ -92,6 +92,7 @@ Keep this file detailed for only the current sprint, blockers, and next actions. - 2026-04-05: Continued `#1213` overlap cleanup by narrowing `coding-standards` into the baseline cross-project conventions layer instead of deleting it. The skill now explicitly points detailed React/UI guidance to `frontend-patterns`, backend/API structure to `backend-patterns` / `api-design`, and keeps only reusable naming, readability, immutability, and code-quality expectations. - 2026-04-05: Added a packaging regression guard for the OpenCode release path after `#1287` showed the published `v1.10.0` artifact was still stale. `tests/scripts/build-opencode.test.js` now asserts the `npm pack --dry-run` tarball includes `.opencode/dist/index.js` plus compiled plugin/tool entrypoints, so future releases cannot silently omit the built OpenCode payload. +- 2026-04-05: Landed `skills/agent-introspection-debugging` for `#829` as an ECC-native self-debugging framework. It is intentionally guidance-first rather than fake runtime automation: capture failure state, classify the pattern, apply the smallest contained recovery action, then emit a structured introspection report and hand off to `verification-loop` / `continuous-learning-v2` when appropriate. - 2026-04-05: Fixed the `main` npm CI break after the latest direct ports. `package-lock.json` had drifted behind `package.json` on the `globals` devDependency (`^17.1.0` vs `^17.4.0`), which caused all npm-based GitHub Actions jobs to fail at `npm ci`. Refreshed the lockfile only, verified `npm ci --ignore-scripts`, and kept the mixed-lock workspace otherwise untouched. - 2026-04-05: Direct-ported the useful discoverability part of `#1221` without duplicating a second healthcare compliance system. Added `skills/hipaa-compliance/SKILL.md` as a thin HIPAA-specific entrypoint that points into the canonical `healthcare-phi-compliance` / `healthcare-reviewer` lane, and wired both healthcare privacy skills into the `security` install module for selective installs. - 2026-04-05: Direct-ported the audited blockchain/web3 security lane from `#1222` into `main` as four self-contained skills: `defi-amm-security`, `evm-token-decimals`, `llm-trading-agent-security`, and `nodejs-keccak256`. These are now part of the `security` install module instead of living as an unmerged fork PR. diff --git a/docs/zh-CN/AGENTS.md b/docs/zh-CN/AGENTS.md index c22bd5d4..0bad9c1c 100644 --- a/docs/zh-CN/AGENTS.md +++ b/docs/zh-CN/AGENTS.md @@ -1,6 +1,6 @@ # Everything Claude Code (ECC) — 智能体指令 -这是一个**生产就绪的 AI 编码插件**,提供 47 个专业代理、180 项技能、79 条命令以及自动化钩子工作流,用于软件开发。 +这是一个**生产就绪的 AI 编码插件**,提供 47 个专业代理、181 项技能、79 条命令以及自动化钩子工作流,用于软件开发。 **版本:** 1.10.0 @@ -147,7 +147,7 @@ ``` agents/ — 47 个专业子代理 -skills/ — 180 个工作流技能和领域知识 +skills/ — 181 个工作流技能和领域知识 commands/ — 79 个斜杠命令 hooks/ — 基于触发的自动化 rules/ — 始终遵循的指导方针(通用 + 每种语言) diff --git a/docs/zh-CN/README.md b/docs/zh-CN/README.md index e81b8c3b..3ea2d83a 100644 --- a/docs/zh-CN/README.md +++ b/docs/zh-CN/README.md @@ -209,7 +209,7 @@ npx ecc-install typescript /plugin list ecc@ecc ``` -**搞定!** 你现在可以使用 47 个智能体、180 项技能和 79 个命令了。 +**搞定!** 你现在可以使用 47 个智能体、181 项技能和 79 个命令了。 *** @@ -1096,7 +1096,7 @@ opencode |---------|-------------|----------|--------| | 智能体 | PASS: 47 个 | PASS: 12 个 | **Claude Code 领先** | | 命令 | PASS: 79 个 | PASS: 31 个 | **Claude Code 领先** | -| 技能 | PASS: 180 项 | PASS: 37 项 | **Claude Code 领先** | +| 技能 | PASS: 181 项 | PASS: 37 项 | **Claude Code 领先** | | 钩子 | PASS: 8 种事件类型 | PASS: 11 种事件 | **OpenCode 更多!** | | 规则 | PASS: 29 条 | PASS: 13 条指令 | **Claude Code 领先** | | MCP 服务器 | PASS: 14 个 | PASS: 完整 | **完全对等** | @@ -1208,7 +1208,7 @@ ECC 是**第一个最大化利用每个主要 AI 编码工具的插件**。以 |---------|------------|------------|-----------|----------| | **智能体** | 47 | 共享 (AGENTS.md) | 共享 (AGENTS.md) | 12 | | **命令** | 79 | 共享 | 基于指令 | 31 | -| **技能** | 180 | 共享 | 10 (原生格式) | 37 | +| **技能** | 181 | 共享 | 10 (原生格式) | 37 | | **钩子事件** | 8 种类型 | 15 种类型 | 暂无 | 11 种类型 | | **钩子脚本** | 20+ 个脚本 | 16 个脚本 (DRY 适配器) | N/A | 插件钩子 | | **规则** | 34 (通用 + 语言) | 34 (YAML 前页) | 基于指令 | 13 条指令 | diff --git a/manifests/install-modules.json b/manifests/install-modules.json index dc00316c..29c4b841 100644 --- a/manifests/install-modules.json +++ b/manifests/install-modules.json @@ -200,6 +200,7 @@ "description": "Evaluation, TDD, verification, learning, and compaction skills.", "paths": [ "skills/agent-sort", + "skills/agent-introspection-debugging", "skills/ai-regression-testing", "skills/configure-ecc", "skills/code-tour", diff --git a/skills/agent-introspection-debugging/SKILL.md b/skills/agent-introspection-debugging/SKILL.md new file mode 100644 index 00000000..ea5a2c58 --- /dev/null +++ b/skills/agent-introspection-debugging/SKILL.md @@ -0,0 +1,153 @@ +--- +name: agent-introspection-debugging +description: Structured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports. +origin: ECC +--- + +# Agent Introspection Debugging + +Use this skill when an agent run is failing repeatedly, consuming tokens without progress, looping on the same tools, or drifting away from the intended task. + +This is a workflow skill, not a hidden runtime. It teaches the agent to debug itself systematically before escalating to a human. + +## When to Activate + +- Maximum tool call / loop-limit failures +- Repeated retries with no forward progress +- Context growth or prompt drift that starts degrading output quality +- File-system or environment state mismatch between expectation and reality +- Tool failures that are likely recoverable with diagnosis and a smaller corrective action + +## Scope Boundaries + +Activate this skill for: +- capturing failure state before retrying blindly +- diagnosing common agent-specific failure patterns +- applying contained recovery actions +- producing a structured human-readable debug report + +Do not use this skill as the primary source for: +- feature verification after code changes; use `verification-loop` +- framework-specific debugging when a narrower ECC skill already exists +- runtime promises the current harness cannot enforce automatically + +## Four-Phase Loop + +### Phase 1: Failure Capture + +Before trying to recover, record the failure precisely. + +Capture: +- error type, message, and stack trace when available +- last meaningful tool call sequence +- what the agent was trying to do +- current context pressure: repeated prompts, oversized pasted logs, duplicated plans, or runaway notes +- current environment assumptions: cwd, branch, relevant service state, expected files + +Minimum capture template: + +```markdown +## Failure Capture +- Session / task: +- Goal in progress: +- Error: +- Last successful step: +- Last failed tool / command: +- Repeated pattern seen: +- Environment assumptions to verify: +``` + +### Phase 2: Root-Cause Diagnosis + +Match the failure to a known pattern before changing anything. + +| Pattern | Likely Cause | Check | +| --- | --- | --- | +| Maximum tool calls / repeated same command | loop or no-exit observer path | inspect the last N tool calls for repetition | +| Context overflow / degraded reasoning | unbounded notes, repeated plans, oversized logs | inspect recent context for duplication and low-signal bulk | +| `ECONNREFUSED` / timeout | service unavailable or wrong port | verify service health, URL, and port assumptions | +| `429` / quota exhaustion | retry storm or missing backoff | count repeated calls and inspect retry spacing | +| file missing after write / stale diff | race, wrong cwd, or branch drift | re-check path, cwd, git status, and actual file existence | +| tests still failing after “fix” | wrong hypothesis | isolate the exact failing test and re-derive the bug | + +Diagnosis questions: +- is this a logic failure, state failure, environment failure, or policy failure? +- did the agent lose the real objective and start optimizing the wrong subtask? +- is the failure deterministic or transient? +- what is the smallest reversible action that would validate the diagnosis? + +### Phase 3: Contained Recovery + +Recover with the smallest action that changes the diagnosis surface. + +Safe recovery actions: +- stop repeated retries and restate the hypothesis +- trim low-signal context and keep only the active goal, blockers, and evidence +- re-check the actual filesystem / branch / process state +- narrow the task to one failing command, one file, or one test +- switch from speculative reasoning to direct observation +- escalate to a human when the failure is high-risk or externally blocked + +Do not claim unsupported auto-healing actions like “reset agent state” or “update harness config” unless you are actually doing them through real tools in the current environment. + +Contained recovery checklist: + +```markdown +## Recovery Action +- Diagnosis chosen: +- Smallest action taken: +- Why this is safe: +- What evidence would prove the fix worked: +``` + +### Phase 4: Introspection Report + +End with a report that makes the recovery legible to the next agent or human. + +```markdown +## Agent Self-Debug Report +- Session / task: +- Failure: +- Root cause: +- Recovery action: +- Result: success | partial | blocked +- Token / time burn risk: +- Follow-up needed: +- Preventive change to encode later: +``` + +## Recovery Heuristics + +Prefer these interventions in order: + +1. Restate the real objective in one sentence. +2. Verify the world state instead of trusting memory. +3. Shrink the failing scope. +4. Run one discriminating check. +5. Only then retry. + +Bad pattern: +- retrying the same action three times with slightly different wording + +Good pattern: +- capture failure +- classify the pattern +- run one direct check +- change the plan only if the check supports it + +## Integration with ECC + +- Use `verification-loop` after recovery if code was changed. +- Use `continuous-learning-v2` when the failure pattern is worth turning into an instinct or later skill. +- Use `council` when the issue is not technical failure but decision ambiguity. +- Use `workspace-surface-audit` if the failure came from conflicting local state or repo drift. + +## Output Standard + +When this skill is active, do not end with “I fixed it” alone. + +Always provide: +- the failure pattern +- the root-cause hypothesis +- the recovery action +- the evidence that the situation is now better or still blocked