feat: add agent introspection debugging skill

2026-06-13 19:51:24 +08:00 · 2026-04-05 20:10:54 -07:00
parent c2994ba24f
commit e09c548edf
9 changed files with 319 additions and 11 deletions
@@ -0,0 +1,153 @@
+---
+name: agent-introspection-debugging
+description: Structured self-debugging workflow for AI agent failures using capture, diagnosis, contained recovery, and introspection reports.
+origin: ECC
+---
+
+# Agent Introspection Debugging
+
+Use this skill when an agent run is failing repeatedly, consuming tokens without progress, looping on the same tools, or drifting away from the intended task.
+
+This is a workflow skill, not a hidden runtime. It teaches the agent to debug itself systematically before escalating to a human.
+
+## When to Activate
+
+- Maximum tool call / loop-limit failures
+- Repeated retries with no forward progress
+- Context growth or prompt drift that starts degrading output quality
+- File-system or environment state mismatch between expectation and reality
+- Tool failures that are likely recoverable with diagnosis and a smaller corrective action
+
+## Scope Boundaries
+
+Activate this skill for:
+- capturing failure state before retrying blindly
+- diagnosing common agent-specific failure patterns
+- applying contained recovery actions
+- producing a structured human-readable debug report
+
+Do not use this skill as the primary source for:
+- feature verification after code changes; use `verification-loop`
+- framework-specific debugging when a narrower ECC skill already exists
+- runtime promises the current harness cannot enforce automatically
+
+## Four-Phase Loop
+
+### Phase 1: Failure Capture
+
+Before trying to recover, record the failure precisely.
+
+Capture:
+- error type, message, and stack trace when available
+- last meaningful tool call sequence
+- what the agent was trying to do
+- current context pressure: repeated prompts, oversized pasted logs, duplicated plans, or runaway notes
+- current environment assumptions: cwd, branch, relevant service state, expected files
+
+Minimum capture template:
+
+```markdown
+## Failure Capture
+- Session / task:
+- Goal in progress:
+- Error:
+- Last successful step:
+- Last failed tool / command:
+- Repeated pattern seen:
+- Environment assumptions to verify:
+```
+
+### Phase 2: Root-Cause Diagnosis
+
+Match the failure to a known pattern before changing anything.
+
+| Pattern | Likely Cause | Check |
+| --- | --- | --- |
+| Maximum tool calls / repeated same command | loop or no-exit observer path | inspect the last N tool calls for repetition |
+| Context overflow / degraded reasoning | unbounded notes, repeated plans, oversized logs | inspect recent context for duplication and low-signal bulk |
+| `ECONNREFUSED` / timeout | service unavailable or wrong port | verify service health, URL, and port assumptions |
+| `429` / quota exhaustion | retry storm or missing backoff | count repeated calls and inspect retry spacing |
+| file missing after write / stale diff | race, wrong cwd, or branch drift | re-check path, cwd, git status, and actual file existence |
+| tests still failing after “fix” | wrong hypothesis | isolate the exact failing test and re-derive the bug |
+
+Diagnosis questions:
+- is this a logic failure, state failure, environment failure, or policy failure?
+- did the agent lose the real objective and start optimizing the wrong subtask?
+- is the failure deterministic or transient?
+- what is the smallest reversible action that would validate the diagnosis?
+
+### Phase 3: Contained Recovery
+
+Recover with the smallest action that changes the diagnosis surface.
+
+Safe recovery actions:
+- stop repeated retries and restate the hypothesis
+- trim low-signal context and keep only the active goal, blockers, and evidence
+- re-check the actual filesystem / branch / process state
+- narrow the task to one failing command, one file, or one test
+- switch from speculative reasoning to direct observation
+- escalate to a human when the failure is high-risk or externally blocked
+
+Do not claim unsupported auto-healing actions like “reset agent state” or “update harness config” unless you are actually doing them through real tools in the current environment.
+
+Contained recovery checklist:
+
+```markdown
+## Recovery Action
+- Diagnosis chosen:
+- Smallest action taken:
+- Why this is safe:
+- What evidence would prove the fix worked:
+```
+
+### Phase 4: Introspection Report
+
+End with a report that makes the recovery legible to the next agent or human.
+
+```markdown
+## Agent Self-Debug Report
+- Session / task:
+- Failure:
+- Root cause:
+- Recovery action:
+- Result: success | partial | blocked
+- Token / time burn risk:
+- Follow-up needed:
+- Preventive change to encode later:
+```
+
+## Recovery Heuristics
+
+Prefer these interventions in order:
+
+1. Restate the real objective in one sentence.
+2. Verify the world state instead of trusting memory.
+3. Shrink the failing scope.
+4. Run one discriminating check.
+5. Only then retry.
+
+Bad pattern:
+- retrying the same action three times with slightly different wording
+
+Good pattern:
+- capture failure
+- classify the pattern
+- run one direct check
+- change the plan only if the check supports it
+
+## Integration with ECC
+
+- Use `verification-loop` after recovery if code was changed.
+- Use `continuous-learning-v2` when the failure pattern is worth turning into an instinct or later skill.
+- Use `council` when the issue is not technical failure but decision ambiguity.
+- Use `workspace-surface-audit` if the failure came from conflicting local state or repo drift.
+
+## Output Standard
+
+When this skill is active, do not end with “I fixed it” alone.
+
+Always provide:
+- the failure pattern
+- the root-cause hypothesis
+- the recovery action
+- the evidence that the situation is now better or still blocked