feat: add agent introspection debugging skill

2026-06-16 05:01:32 +08:00 · 2026-04-05 20:10:54 -07:00
parent c2994ba24f
commit e09c548edf
9 changed files with 319 additions and 11 deletions
@@ -0,0 +1,153 @@
+---
+name: agent-introspection-debugging
+description: Structured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports.
+origin: ECC
+---
+
+# Agent Introspection Debugging
+
+Use this skill when an agent run is failing repeatedly, consuming tokens without progress, looping on the same tools, or drifting away from the intended task.
+
+This is a workflow skill, not a hidden runtime. It teaches the agent to debug itself systematically before escalating to a human.
+
+## When to Activate
+
+- Maximum tool call / loop-limit failures
+- Repeated retries with no forward progress
+- Context growth or prompt drift that starts degrading output quality
+- File-system or environment state mismatch between expectation and reality
+- Tool failures that are likely recoverable with diagnosis and a smaller corrective action
+
+## Scope Boundaries
+
+Activate this skill for:
+- capturing failure state before retrying blindly
+- diagnosing common agent-specific failure patterns
+- applying contained recovery actions
+- producing a structured human-readable debug report
+
+Do not use this skill as the primary source for:
+- feature verification after code changes; use `verification-loop`
+- framework-specific debugging when a narrower ECC skill already exists
+- runtime promises the current harness cannot enforce automatically
+
+## Four-Phase Loop
+
+### Phase 1: Failure Capture
+
+Before trying to recover, record the failure precisely.
+
+Capture:
+- error type, message, and stack trace when available
+- last meaningful tool call sequence
+- what the agent was trying to do
+- current context pressure: repeated prompts, oversized pasted logs, duplicated plans, or runaway notes
+- current environment assumptions: cwd, branch, relevant service state, expected files
+
+Minimum capture template:
+
+```markdown
+## Failure Capture
+- Session / task:
+- Goal in progress:
+- Error:
+- Last successful step:
+- Last failed tool / command:
+- Repeated pattern seen:
+- Environment assumptions to verify:
+```
+
+### Phase 2: Root-Cause Diagnosis
+
+Match the failure to a known pattern before changing anything.
+
+| Pattern | Likely Cause | Check |
+| --- | --- | --- |
+| Maximum tool calls / repeated same command | loop or no-exit observer path | inspect the last N tool calls for repetition |
+| Context overflow / degraded reasoning | unbounded notes, repeated plans, oversized logs | inspect recent context for duplication and low-signal bulk |
+| `ECONNREFUSED` / timeout | service unavailable or wrong port | verify service health, URL, and port assumptions |
+| `429` / quota exhaustion | retry storm or missing backoff | count repeated calls and inspect retry spacing |
+| file missing after write / stale diff | race, wrong cwd, or branch drift | re-check path, cwd, git status, and actual file existence |
+| tests still failing after “fix” | wrong hypothesis | isolate the exact failing test and re-derive the bug |
+
+Diagnosis questions:
+- is this a logic failure, state failure, environment failure, or policy failure?
+- did the agent lose the real objective and start optimizing the wrong subtask?
+- is the failure deterministic or transient?
+- what is the smallest reversible action that would validate the diagnosis?
+
+### Phase 3: Contained Recovery
+
+Recover with the smallest action that changes the diagnosis surface.
+
+Safe recovery actions:
+- stop repeated retries and restate the hypothesis
+- trim low-signal context and keep only the active goal, blockers, and evidence
+- re-check the actual filesystem / branch / process state
+- narrow the task to one failing command, one file, or one test
+- switch from speculative reasoning to direct observation
+- escalate to a human when the failure is high-risk or externally blocked
+
+Do not claim unsupported auto-healing actions like “reset agent state” or “update harness config” unless you are actually doing them through real tools in the current environment.
+
+Contained recovery checklist:
+
+```markdown
+## Recovery Action
+- Diagnosis chosen:
+- Smallest action taken:
+- Why this is safe:
+- What evidence would prove the fix worked:
+```
+
+### Phase 4: Introspection Report
+
+End with a report that makes the recovery legible to the next agent or human.
+
+```markdown
+## Agent Self-Debug Report
+- Session / task:
+- Failure:
+- Root cause:
+- Recovery action:
+- Result: success | partial | blocked
+- Token / time burn risk:
+- Follow-up needed:
+- Preventive change to encode later:
+```
+
+## Recovery Heuristics
+
+Prefer these interventions in order:
+
+1. Restate the real objective in one sentence.
+2. Verify the world state instead of trusting memory.
+3. Shrink the failing scope.
+4. Run one discriminating check.
+5. Only then retry.
+
+Bad pattern:
+- retrying the same action three times with slightly different wording
+
+Good pattern:
+- capture failure
+- classify the pattern
+- run one direct check
+- change the plan only if the check supports it
+
+## Integration with ECC
+
+- Use `verification-loop` after recovery if code was changed.
+- Use `continuous-learning-v2` when the failure pattern is worth turning into an instinct or later skill.
+- Use `council` when the issue is not technical failure but decision ambiguity.
+- Use `workspace-surface-audit` if the failure came from conflicting local state or repo drift.
+
+## Output Standard
+
+When this skill is active, do not end with “I fixed it” alone.
+
+Always provide:
+- the failure pattern
+- the root-cause hypothesis
+- the recovery action
+- the evidence that the situation is now better or still blocked
@@ -1,6 +1,6 @@
 # Everything Claude Code (ECC) — Agent Instructions

-This is a **production-ready AI coding plugin** providing 47 specialized agents, 180 skills, 79 commands, and automated hook workflows for software development.
+This is a **production-ready AI coding plugin** providing 47 specialized agents, 181 skills, 79 commands, and automated hook workflows for software development.

 **Version:** 1.10.0

@@ -146,7 +146,7 @@ Troubleshoot failures: check test isolation → verify mocks → fix implementat

 ```
 agents/          — 47 specialized subagents
-skills/          — 180 workflow skills and domain knowledge
+skills/          — 181 workflow skills and domain knowledge
 commands/        — 79 slash commands
 hooks/           — Trigger-based automations
 rules/           — Always-follow guidelines (common + per-language)
@@ -236,7 +236,7 @@ For manual install instructions see the README in the `rules/` folder. When copy
 /plugin list ecc@ecc
 ```

-**That's it!** You now have access to 47 agents, 180 skills, and 79 legacy command shims.
+**That's it!** You now have access to 47 agents, 181 skills, and 79 legacy command shims.

 ### Multi-model commands require additional setup

@@ -1154,7 +1154,7 @@ The configuration is automatically detected from `.opencode/opencode.json`.
 |---------|-------------|----------|--------|
 | Agents | PASS: 47 agents | PASS: 12 agents | **Claude Code leads** |
 | Commands | PASS: 79 commands | PASS: 31 commands | **Claude Code leads** |
-| Skills | PASS: 180 skills | PASS: 37 skills | **Claude Code leads** |
+| Skills | PASS: 181 skills | PASS: 37 skills | **Claude Code leads** |
 | Hooks | PASS: 8 event types | PASS: 11 events | **OpenCode has more!** |
 | Rules | PASS: 29 rules | PASS: 13 instructions | **Claude Code leads** |
 | MCP Servers | PASS: 14 servers | PASS: Full | **Full parity** |
@@ -1263,7 +1263,7 @@ ECC is the **first plugin to maximize every major AI coding tool**. Here's how e
 |---------|------------|------------|-----------|----------|
 | **Agents** | 47 | Shared (AGENTS.md) | Shared (AGENTS.md) | 12 |
 | **Commands** | 79 | Shared | Instruction-based | 31 |
-| **Skills** | 180 | Shared | 10 (native format) | 37 |
+| **Skills** | 181 | Shared | 10 (native format) | 37 |
 | **Hook Events** | 8 types | 15 types | None yet | 11 types |
 | **Hook Scripts** | 20+ scripts | 16 scripts (DRY adapter) | N/A | Plugin hooks |
 | **Rules** | 34 (common + lang) | 34 (YAML frontmatter) | Instruction-based | 13 instructions |
@@ -106,7 +106,7 @@ cp -r everything-claude-code/rules/perl ~/.claude/rules/
 /plugin list ecc@ecc
 ```

-**完成！** 你现在可以使用 47 个代理、180 个技能和 79 个命令。
+**完成！** 你现在可以使用 47 个代理、181 个技能和 79 个命令。

 ### multi-* 命令需要额外配置

@@ -92,6 +92,7 @@ Keep this file detailed for only the current sprint, blockers, and next actions.

 - 2026-04-05: Continued `#1213` overlap cleanup by narrowing `coding-standards` into the baseline cross-project conventions layer instead of deleting it. The skill now explicitly points detailed React/UI guidance to `frontend-patterns`, backend/API structure to `backend-patterns` / `api-design`, and keeps only reusable naming, readability, immutability, and code-quality expectations.
 - 2026-04-05: Added a packaging regression guard for the OpenCode release path after `#1287` showed the published `v1.10.0` artifact was still stale. `tests/scripts/build-opencode.test.js` now asserts the `npm pack --dry-run` tarball includes `.opencode/dist/index.js` plus compiled plugin/tool entrypoints, so future releases cannot silently omit the built OpenCode payload.
+- 2026-04-05: Landed `skills/agent-introspection-debugging` for `#829` as an ECC-native self-debugging framework. It is intentionally guidance-first rather than fake runtime automation: capture failure state, classify the pattern, apply the smallest contained recovery action, then emit a structured introspection report and hand off to `verification-loop` / `continuous-learning-v2` when appropriate.
 - 2026-04-05: Fixed the `main` npm CI break after the latest direct ports. `package-lock.json` had drifted behind `package.json` on the `globals` devDependency (`^17.1.0` vs `^17.4.0`), which caused all npm-based GitHub Actions jobs to fail at `npm ci`. Refreshed the lockfile only, verified `npm ci --ignore-scripts`, and kept the mixed-lock workspace otherwise untouched.
 - 2026-04-05: Direct-ported the useful discoverability part of `#1221` without duplicating a second healthcare compliance system. Added `skills/hipaa-compliance/SKILL.md` as a thin HIPAA-specific entrypoint that points into the canonical `healthcare-phi-compliance` / `healthcare-reviewer` lane, and wired both healthcare privacy skills into the `security` install module for selective installs.
 - 2026-04-05: Direct-ported the audited blockchain/web3 security lane from `#1222` into `main` as four self-contained skills: `defi-amm-security`, `evm-token-decimals`, `llm-trading-agent-security`, and `nodejs-keccak256`. These are now part of the `security` install module instead of living as an unmerged fork PR.
@@ -1,6 +1,6 @@
 # Everything Claude Code (ECC) — 智能体指令

-这是一个**生产就绪的 AI 编码插件**，提供 47 个专业代理、180 项技能、79 条命令以及自动化钩子工作流，用于软件开发。
+这是一个**生产就绪的 AI 编码插件**，提供 47 个专业代理、181 项技能、79 条命令以及自动化钩子工作流，用于软件开发。

 **版本:** 1.10.0

@@ -147,7 +147,7 @@

 ```
 agents/          — 47 个专业子代理
-skills/          — 180 个工作流技能和领域知识
+skills/          — 181 个工作流技能和领域知识
 commands/        — 79 个斜杠命令
 hooks/           — 基于触发的自动化
 rules/           — 始终遵循的指导方针（通用 + 每种语言）
@@ -209,7 +209,7 @@ npx ecc-install typescript
 /plugin list ecc@ecc
 ```

-**搞定！** 你现在可以使用 47 个智能体、180 项技能和 79 个命令了。
+**搞定！** 你现在可以使用 47 个智能体、181 项技能和 79 个命令了。

 ***

@@ -1096,7 +1096,7 @@ opencode
 |---------|-------------|----------|--------|
 | 智能体 | PASS: 47 个 | PASS: 12 个 | **Claude Code 领先** |
 | 命令 | PASS: 79 个 | PASS: 31 个 | **Claude Code 领先** |
-| 技能 | PASS: 180 项 | PASS: 37 项 | **Claude Code 领先** |
+| 技能 | PASS: 181 项 | PASS: 37 项 | **Claude Code 领先** |
 | 钩子 | PASS: 8 种事件类型 | PASS: 11 种事件 | **OpenCode 更多！** |
 | 规则 | PASS: 29 条 | PASS: 13 条指令 | **Claude Code 领先** |
 | MCP 服务器 | PASS: 14 个 | PASS: 完整 | **完全对等** |
@@ -1208,7 +1208,7 @@ ECC 是**第一个最大化利用每个主要 AI 编码工具的插件**。以
 |---------|------------|------------|-----------|----------|
 | **智能体** | 47 | 共享 (AGENTS.md) | 共享 (AGENTS.md) | 12 |
 | **命令** | 79 | 共享 | 基于指令 | 31 |
-| **技能** | 180 | 共享 | 10 (原生格式) | 37 |
+| **技能** | 181 | 共享 | 10 (原生格式) | 37 |
 | **钩子事件** | 8 种类型 | 15 种类型 | 暂无 | 11 种类型 |
 | **钩子脚本** | 20+ 个脚本 | 16 个脚本 (DRY 适配器) | N/A | 插件钩子 |
 | **规则** | 34 (通用 + 语言) | 34 (YAML 前页) | 基于指令 | 13 条指令 |
@@ -200,6 +200,7 @@
      "description": "Evaluation, TDD, verification, learning, and compaction skills.",
      "paths": [
        "skills/agent-sort",
+        "skills/agent-introspection-debugging",
        "skills/ai-regression-testing",
        "skills/configure-ecc",
        "skills/code-tour",
@@ -0,0 +1,153 @@
+---
+name: agent-introspection-debugging
+description: Structured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports.
+origin: ECC
+---
+
+# Agent Introspection Debugging
+
+Use this skill when an agent run is failing repeatedly, consuming tokens without progress, looping on the same tools, or drifting away from the intended task.
+
+This is a workflow skill, not a hidden runtime. It teaches the agent to debug itself systematically before escalating to a human.
+
+## When to Activate
+
+- Maximum tool call / loop-limit failures
+- Repeated retries with no forward progress
+- Context growth or prompt drift that starts degrading output quality
+- File-system or environment state mismatch between expectation and reality
+- Tool failures that are likely recoverable with diagnosis and a smaller corrective action
+
+## Scope Boundaries
+
+Activate this skill for:
+- capturing failure state before retrying blindly
+- diagnosing common agent-specific failure patterns
+- applying contained recovery actions
+- producing a structured human-readable debug report
+
+Do not use this skill as the primary source for:
+- feature verification after code changes; use `verification-loop`
+- framework-specific debugging when a narrower ECC skill already exists
+- runtime promises the current harness cannot enforce automatically
+
+## Four-Phase Loop
+
+### Phase 1: Failure Capture
+
+Before trying to recover, record the failure precisely.
+
+Capture:
+- error type, message, and stack trace when available
+- last meaningful tool call sequence
+- what the agent was trying to do
+- current context pressure: repeated prompts, oversized pasted logs, duplicated plans, or runaway notes
+- current environment assumptions: cwd, branch, relevant service state, expected files
+
+Minimum capture template:
+
+```markdown
+## Failure Capture
+- Session / task:
+- Goal in progress:
+- Error:
+- Last successful step:
+- Last failed tool / command:
+- Repeated pattern seen:
+- Environment assumptions to verify:
+```
+
+### Phase 2: Root-Cause Diagnosis
+
+Match the failure to a known pattern before changing anything.
+
+| Pattern | Likely Cause | Check |
+| --- | --- | --- |
+| Maximum tool calls / repeated same command | loop or no-exit observer path | inspect the last N tool calls for repetition |
+| Context overflow / degraded reasoning | unbounded notes, repeated plans, oversized logs | inspect recent context for duplication and low-signal bulk |
+| `ECONNREFUSED` / timeout | service unavailable or wrong port | verify service health, URL, and port assumptions |
+| `429` / quota exhaustion | retry storm or missing backoff | count repeated calls and inspect retry spacing |
+| file missing after write / stale diff | race, wrong cwd, or branch drift | re-check path, cwd, git status, and actual file existence |
+| tests still failing after “fix” | wrong hypothesis | isolate the exact failing test and re-derive the bug |
+
+Diagnosis questions:
+- is this a logic failure, state failure, environment failure, or policy failure?
+- did the agent lose the real objective and start optimizing the wrong subtask?
+- is the failure deterministic or transient?
+- what is the smallest reversible action that would validate the diagnosis?
+
+### Phase 3: Contained Recovery
+
+Recover with the smallest action that changes the diagnosis surface.
+
+Safe recovery actions:
+- stop repeated retries and restate the hypothesis
+- trim low-signal context and keep only the active goal, blockers, and evidence
+- re-check the actual filesystem / branch / process state
+- narrow the task to one failing command, one file, or one test
+- switch from speculative reasoning to direct observation
+- escalate to a human when the failure is high-risk or externally blocked
+
+Do not claim unsupported auto-healing actions like “reset agent state” or “update harness config” unless you are actually doing them through real tools in the current environment.
+
+Contained recovery checklist:
+
+```markdown
+## Recovery Action
+- Diagnosis chosen:
+- Smallest action taken:
+- Why this is safe:
+- What evidence would prove the fix worked:
+```
+
+### Phase 4: Introspection Report
+
+End with a report that makes the recovery legible to the next agent or human.
+
+```markdown
+## Agent Self-Debug Report
+- Session / task:
+- Failure:
+- Root cause:
+- Recovery action:
+- Result: success | partial | blocked
+- Token / time burn risk:
+- Follow-up needed:
+- Preventive change to encode later:
+```
+
+## Recovery Heuristics
+
+Prefer these interventions in order:
+
+1. Restate the real objective in one sentence.
+2. Verify the world state instead of trusting memory.
+3. Shrink the failing scope.
+4. Run one discriminating check.
+5. Only then retry.
+
+Bad pattern:
+- retrying the same action three times with slightly different wording
+
+Good pattern:
+- capture failure
+- classify the pattern
+- run one direct check
+- change the plan only if the check supports it
+
+## Integration with ECC
+
+- Use `verification-loop` after recovery if code was changed.
+- Use `continuous-learning-v2` when the failure pattern is worth turning into an instinct or later skill.
+- Use `council` when the issue is not technical failure but decision ambiguity.
+- Use `workspace-surface-audit` if the failure came from conflicting local state or repo drift.
+
+## Output Standard
+
+When this skill is active, do not end with “I fixed it” alone.
+
+Always provide:
+- the failure pattern
+- the root-cause hypothesis
+- the recovery action
+- the evidence that the situation is now better or still blocked