feat: add ecc tools cost audit workflow

2026-06-15 12:41:26 +08:00 · 2026-04-05 15:19:56 -07:00
parent bf5961e8d1
commit a77c8c3f85
8 changed files with 175 additions and 12 deletions
@@ -1,6 +1,6 @@
 # Everything Claude Code (ECC) — Agent Instructions

-This is a **production-ready AI coding plugin** providing 38 specialized agents, 158 skills, 72 commands, and automated hook workflows for software development.
+This is a **production-ready AI coding plugin** providing 38 specialized agents, 159 skills, 72 commands, and automated hook workflows for software development.

 **Version:** 1.10.0

@@ -146,7 +146,7 @@ Troubleshoot failures: check test isolation → verify mocks → fix implementat

 ```
 agents/          — 38 specialized subagents
-skills/          — 158 workflow skills and domain knowledge
+skills/          — 159 workflow skills and domain knowledge
 commands/        — 72 slash commands
 hooks/           — Trigger-based automations
 rules/           — Always-follow guidelines (common + per-language)
@@ -85,7 +85,7 @@ This repo is the raw code only. The guides explain everything.
 ### v1.10.0 — Surface Refresh, Operator Workflows, and ECC 2.0 Alpha (Apr 2026)

 - **Public surface synced to the live repo** — metadata, catalog counts, plugin manifests, and install-facing docs now match the actual OSS surface: 38 agents, 156 skills, and 72 legacy command shims.
- **Operator and outbound workflow expansion** — `brand-voice`, `social-graph-ranker`, `connections-optimizer`, `customer-billing-ops`, `google-workspace-ops`, `project-flow-ops`, and `workspace-surface-audit` round out the operator lane.
+- **Operator and outbound workflow expansion** — `brand-voice`, `social-graph-ranker`, `connections-optimizer`, `customer-billing-ops`, `ecc-tools-cost-audit`, `google-workspace-ops`, `project-flow-ops`, and `workspace-surface-audit` round out the operator lane.
 - **Media and launch tooling** — `manim-video`, `remotion-video-creation`, and upgraded social publishing surfaces make technical explainers and launch content part of the same system.
 - **Framework and product surface growth** — `nestjs-patterns`, richer Codex/OpenCode install surfaces, and expanded cross-harness packaging keep the repo usable beyond Claude Code alone.
 - **ECC 2.0 alpha is in-tree** — the Rust control-plane prototype in `ecc2/` now builds locally and exposes `dashboard`, `start`, `sessions`, `status`, `stop`, `resume`, and `daemon` commands. It is usable as an alpha, not yet a general release.
@@ -236,7 +236,7 @@ For manual install instructions see the README in the `rules/` folder. When copy
 /plugin list ecc@ecc
 ```

-**That's it!** You now have access to 38 agents, 158 skills, and 72 legacy command shims.
+**That's it!** You now have access to 38 agents, 159 skills, and 72 legacy command shims.

 ### Multi-model commands require additional setup

@@ -1154,7 +1154,7 @@ The configuration is automatically detected from `.opencode/opencode.json`.
 |---------|-------------|----------|--------|
 | Agents | PASS: 38 agents | PASS: 12 agents | **Claude Code leads** |
 | Commands | PASS: 72 commands | PASS: 31 commands | **Claude Code leads** |
-| Skills | PASS: 158 skills | PASS: 37 skills | **Claude Code leads** |
+| Skills | PASS: 159 skills | PASS: 37 skills | **Claude Code leads** |
 | Hooks | PASS: 8 event types | PASS: 11 events | **OpenCode has more!** |
 | Rules | PASS: 29 rules | PASS: 13 instructions | **Claude Code leads** |
 | MCP Servers | PASS: 14 servers | PASS: Full | **Full parity** |
@@ -1263,7 +1263,7 @@ ECC is the **first plugin to maximize every major AI coding tool**. Here's how e
 |---------|------------|------------|-----------|----------|
 | **Agents** | 38 | Shared (AGENTS.md) | Shared (AGENTS.md) | 12 |
 | **Commands** | 72 | Shared | Instruction-based | 31 |
-| **Skills** | 158 | Shared | 10 (native format) | 37 |
+| **Skills** | 159 | Shared | 10 (native format) | 37 |
 | **Hook Events** | 8 types | 15 types | None yet | 11 types |
 | **Hook Scripts** | 20+ scripts | 16 scripts (DRY adapter) | N/A | Plugin hooks |
 | **Rules** | 34 (common + lang) | 34 (YAML frontmatter) | Instruction-based | 13 instructions |
@@ -106,7 +106,7 @@ cp -r everything-claude-code/rules/perl ~/.claude/rules/
 /plugin list ecc@ecc
 ```

-**完成！** 你现在可以使用 38 个代理、158 个技能和 72 个命令。
+**完成！** 你现在可以使用 38 个代理、159 个技能和 72 个命令。

 ### multi-* 命令需要额外配置

@@ -138,3 +138,5 @@ Keep this file detailed for only the current sprint, blockers, and next actions.
 - 2026-04-05: Shipped `846ffb7` (`chore: ship v1.10.0 release surface refresh`). This updated README/plugin metadata/package versions, synced the explicit plugin agent inventory, bumped stale star/fork/contributor counts, created `docs/releases/1.10.0/*`, tagged and released `v1.10.0`, and posted the announcement discussion at `#1272`.
 - 2026-04-05: Salvaged the reusable Hermes-branch operator skills in `6eba30f` without replaying the full branch. Added `skills/github-ops`, `skills/knowledge-ops`, and `skills/hookify-rules`, wired them into install modules, and re-synced the repo to `159` skills. `knowledge-ops` was explicitly adapted to the current workspace model: live code in cloned repos, active truth in GitHub/Linear, broader non-code context in the KB/archive layers.
 - 2026-04-05: Fixed the remaining OpenCode npm-publish gap in `db6d52e`. The root package now builds `.opencode/dist` during `prepack`, includes the compiled OpenCode plugin assets in the published tarball, and carries a dedicated regression test (`tests/scripts/build-opencode.test.js`) so the package no longer ships only raw TypeScript source for that surface.
+- 2026-04-05: Fixed the stale-row bug in `.github/workflows/monthly-metrics.yml` with `bf5961e`. The workflow now refreshes the current month row in issue `#1087` instead of early-returning when the month already exists, and the dispatched run updated the April snapshot to the current star/fork/release counts.
+- 2026-04-05: Recovered the useful cost-control workflow from the divergent Hermes branch as a small ECC-native operator skill instead of replaying the branch. `skills/ecc-tools-cost-audit/SKILL.md` is now wired into `operator-workflows` and focused on webhook -> queue -> worker tracing, burn containment, quota bypass, premium-model leakage, and retry fanout in the sibling `ECC-Tools` repo.
@@ -1,6 +1,6 @@
 # Everything Claude Code (ECC) — 智能体指令

-这是一个**生产就绪的 AI 编码插件**，提供 38 个专业代理、158 项技能、72 条命令以及自动化钩子工作流，用于软件开发。
+这是一个**生产就绪的 AI 编码插件**，提供 38 个专业代理、159 项技能、72 条命令以及自动化钩子工作流，用于软件开发。

 **版本:** 1.10.0

@@ -147,7 +147,7 @@

 ```
 agents/          — 38 个专业子代理
-skills/          — 158 个工作流技能和领域知识
+skills/          — 159 个工作流技能和领域知识
 commands/        — 72 个斜杠命令
 hooks/           — 基于触发的自动化
 rules/           — 始终遵循的指导方针（通用 + 每种语言）
@@ -209,7 +209,7 @@ npx ecc-install typescript
 /plugin list ecc@ecc
 ```

-**搞定！** 你现在可以使用 38 个智能体、158 项技能和 72 个命令了。
+**搞定！** 你现在可以使用 38 个智能体、159 项技能和 72 个命令了。

 ***

@@ -1096,7 +1096,7 @@ opencode
 |---------|-------------|----------|--------|
 | 智能体 | PASS: 38 个 | PASS: 12 个 | **Claude Code 领先** |
 | 命令 | PASS: 72 个 | PASS: 31 个 | **Claude Code 领先** |
-| 技能 | PASS: 158 项 | PASS: 37 项 | **Claude Code 领先** |
+| 技能 | PASS: 159 项 | PASS: 37 项 | **Claude Code 领先** |
 | 钩子 | PASS: 8 种事件类型 | PASS: 11 种事件 | **OpenCode 更多！** |
 | 规则 | PASS: 29 条 | PASS: 13 条指令 | **Claude Code 领先** |
 | MCP 服务器 | PASS: 14 个 | PASS: 完整 | **完全对等** |
@@ -1208,7 +1208,7 @@ ECC 是**第一个最大化利用每个主要 AI 编码工具的插件**。以
 |---------|------------|------------|-----------|----------|
 | **智能体** | 38 | 共享 (AGENTS.md) | 共享 (AGENTS.md) | 12 |
 | **命令** | 72 | 共享 | 基于指令 | 31 |
-| **技能** | 158 | 共享 | 10 (原生格式) | 37 |
+| **技能** | 159 | 共享 | 10 (原生格式) | 37 |
 | **钩子事件** | 8 种类型 | 15 种类型 | 暂无 | 11 种类型 |
 | **钩子脚本** | 20+ 个脚本 | 16 个脚本 (DRY 适配器) | N/A | 插件钩子 |
 | **规则** | 34 (通用 + 语言) | 34 (YAML 前页) | 基于指令 | 13 条指令 |
@@ -315,6 +315,7 @@
      "paths": [
        "skills/connections-optimizer",
        "skills/customer-billing-ops",
+        "skills/ecc-tools-cost-audit",
        "skills/github-ops",
        "skills/google-workspace-ops",
        "skills/jira-integration",
@@ -0,0 +1,160 @@
+---
+name: ecc-tools-cost-audit
+description: Evidence-first ECC Tools burn and billing audit workflow. Use when investigating runaway PR creation, quota bypass, premium-model leakage, duplicate jobs, or GitHub App cost spikes in the ECC Tools repo.
+origin: ECC
+---
+
+# ECC Tools Cost Audit
+
+Use this skill when the user suspects the ECC Tools GitHub App is burning cost, over-creating PRs, bypassing usage limits, or routing free users into premium analysis paths.
+
+This is a focused operator workflow for the sibling [ECC-Tools](../../ECC-Tools) repo. It is not a generic billing skill and it is not a repo-wide code review pass.
+
+## Skill Stack
+
+Pull these ECC-native skills into the workflow when relevant:
+
+- `autonomous-loops` for bounded multi-step audits that cross webhooks, queues, billing, and retries
+- `agentic-engineering` for tracing the request path into discrete, provable units
+- `customer-billing-ops` when repo behavior and customer-impact math must be separated cleanly
+- `search-first` before inventing helpers or re-implementing repo-local utilities
+- `security-review` when auth, usage gates, entitlements, or secrets are touched
+- `verification-loop` for proving rerun safety and exact post-fix state
+- `tdd-workflow` when the fix needs regression coverage in the worker, router, or billing paths
+
+## When To Use
+
+- user says ECC Tools burn rate, PR recursion, over-created PRs, usage-limit bypass, or premium-model leakage
+- the task is in the sibling `ECC-Tools` repo and depends on webhook handlers, queue workers, usage reservation, PR creation logic, or paid-gate enforcement
+- a customer report says the app created too many PRs, billed incorrectly, or analyzed code without producing a usable result
+
+## Scope Guardrails
+
+- work in the sibling `ECC-Tools` repo, not in `everything-claude-code`
+- start read-only unless the user clearly asked for a fix
+- do not mutate unrelated billing, checkout, or UI flows while tracing analysis burn
+- treat app-generated branches and app-generated PRs as red-flag recursion paths until proved otherwise
+- separate three things explicitly:
+  - repo-side burn root cause
+  - customer-facing billing impact
+  - product or entitlement gaps that need backlog follow-up
+
+## Workflow
+
+### 1. Freeze repo scope
+
+- switch into the sibling `ECC-Tools` repo
+- check branch and local diff first
+- identify the exact surface under audit:
+  - webhook router
+  - queue producer
+  - queue consumer
+  - PR creation path
+  - usage reservation / billing path
+  - model routing path
+
+### 2. Trace ingress before theorizing
+
+- inspect `src/index.*` or the main entrypoint first
+- map every enqueue path before suggesting a fix
+- confirm which GitHub events share a queue type
+- confirm whether push, pull_request, synchronize, comment, or manual re-run events can converge on the same expensive path
+
+### 3. Trace the worker and side effects
+
+- inspect the queue consumer or scheduled worker that handles analysis
+- confirm whether a queued analysis always ends in:
+  - PR creation
+  - branch creation
+  - file updates
+  - premium model calls
+  - usage increments
+- if analysis can spend tokens and then fail before output is persisted, classify it as burn-with-broken-output
+
+### 4. Audit the high-signal burn paths
+
+#### PR multiplication
+
+- inspect PR helpers and branch naming
+- check dedupe, synchronize-event handling, and existing-PR reuse
+- if app-generated branches can re-enter analysis, treat that as a priority-0 recursion risk
+
+#### Quota bypass
+
+- inspect where quota is checked versus where usage is reserved or incremented
+- if quota is checked before enqueue but usage is charged only inside the worker, treat concurrent front-door passes as a real race
+
+#### Premium-model leakage
+
+- inspect model selection, tier branching, and provider routing
+- verify whether free or capped users can still hit premium analyzers when premium keys are present
+
+#### Retry burn
+
+- inspect retry loops, duplicate queue jobs, and deterministic failure reruns
+- if the same non-transient error can spend analysis repeatedly, fix that before quality improvements
+
+### 5. Fix in burn order
+
+If the user asked for code changes, prioritize fixes in this order:
+
+1. stop automatic PR multiplication
+2. stop quota bypass
+3. stop premium leakage
+4. stop duplicate-job fanout and pointless retries
+5. close rerun/update safety gaps
+
+Keep the pass bounded to one to three direct fixes unless the same root cause clearly spans multiple files.
+
+### 6. Verify with the smallest proving steps
+
+- rerun only the targeted tests or integration slices that cover the changed path
+- verify whether the burn path is now:
+  - blocked
+  - deduped
+  - downgraded to cheaper analysis
+  - or rejected early
+- state the final status exactly:
+  - changed locally
+  - verified locally
+  - pushed
+  - deployed
+  - still blocked
+
+## High-Signal Failure Patterns
+
+### 1. One queue type for all triggers
+
+If pushes, PR syncs, and manual audits all enqueue the same job and the worker always creates a PR, analysis equals PR spam.
+
+### 2. Post-enqueue usage reservation
+
+If usage is checked at the front door but only incremented in the worker, concurrent requests can all pass the gate and exceed quota.
+
+### 3. Free tier on premium path
+
+If free queued jobs can still route into Anthropic or another premium provider when keys exist, that is real spend leakage even if the user never sees the premium result.
+
+### 4. App-generated branches re-enter the webhook
+
+If `pull_request.synchronize`, branch pushes, or comment-triggered runs fire on app-owned branches, the app can recursively analyze its own output.
+
+### 5. Expensive work before persistence safety
+
+If the system can spend tokens and then fail on PR creation, file update, or branch collision, it is burning cost without shipping value.
+
+## Pitfalls
+
+- do not begin with broad repo wandering; settle webhook -> queue -> worker first
+- do not mix customer billing inference with code-backed product truth
+- do not fix lower-value quality issues before the highest-burn path is contained
+- do not claim burn is fixed until the narrow proving step was rerun
+- do not push or deploy unless the user asked
+- do not touch unrelated repo-local changes if they are already in progress
+
+## Verification
+
+- root causes cite exact file paths and code areas
+- fixes are ordered by burn impact, not code neatness
+- proving commands are named
+- final status distinguishes local change, verification, push, and deployment