Commit Graph

607 Commits

Author SHA1 Message Date
Affaan Mustafa
e53933de1b docs: refine billing readback dashboard blocker 2026-05-18 02:03:37 -04:00
Affaan Mustafa
80f6c27957 Merge PR #1976 provider response guards 2026-05-18 01:05:37 -04:00
Affaan Mustafa
eb0d893948 fix: harden openai-compatible provider responses 2026-05-18 01:04:28 -04:00
Affaan Mustafa
044d1863d0 test: skip insaits monitor subprocesses without python 2026-05-18 00:47:05 -04:00
Affaan Mustafa
c276639bc7 docs: mirror marketplace billing provenance gate 2026-05-18 00:36:01 -04:00
Affaan Mustafa
d14191bed8 test: align dashboard fixture with May 18 evidence 2026-05-17 22:29:06 -04:00
Affaan Mustafa
5475db4f97 test: point platform audit at May 18 evidence 2026-05-17 22:22:02 -04:00
Affaan Mustafa
9b1d891870 fix(hooks): persist metrics warning dedup 2026-05-17 21:41:24 -04:00
Affaan Mustafa
4cafdb8304 fix(hooks): suppress repeated metrics warning breadcrumbs 2026-05-17 21:41:24 -04:00
Jamkris
086e44c964 fix(hooks): log fail-open breadcrumb on parse/read errors in metrics bridge
coderabbitai flagged: the two `catch` blocks in `readSessionCost`
silently swallowed every failure mode. A malformed `costs.jsonl`
row, a permission error opening the file, or any other unexpected
I/O failure would silently return zero cost — masking real
problems and feeding stale or zero numbers into
`ecc-context-monitor.js` (which then injects them as
`additionalContext` into the live model turn).

Fix two things, both fail-open-preserving:

1. **Inner JSON.parse catch** — count malformed lines and write
   one aggregated breadcrumb per call:

     [ecc-metrics-bridge] skipped N malformed line(s) in <path>

   Aggregating (rather than per-line) keeps a log-flooded
   `costs.jsonl` diagnosable without overwhelming stderr.

2. **Outer fs.readFileSync catch** — write a breadcrumb on real
   errors, but stay silent on `ENOENT`. The "no costs.jsonl yet"
   case is genuinely normal (no Stop event has fired this session)
   and producing noise on every PreToolUse before the first Stop
   would be reviewer-visible spam. All other error codes
   (`EACCES`, `EISDIR`, `EMFILE`, …) get:

     [ecc-metrics-bridge] failing open after <name> reading <path>: <msg>

In both cases the function still returns the zero-cost fallback
so the bridge never breaks tool execution — only the
diagnosability changes.

Two new regression tests in
`tests/hooks/ecc-metrics-bridge.test.js`:

  ✓ readSessionCost writes a stderr breadcrumb when malformed
    lines are skipped — feeds 4 rows (2 valid, 2 malformed),
    asserts the last valid row still wins AND captured stderr
    contains "skipped 2 malformed line(s)".

  ✓ readSessionCost stays silent when costs.jsonl does not exist
    (ENOENT) — uses a fresh tmp HOME with no metrics dir, asserts
    zero return AND empty stderr.

Test count: 16 → 18; `npm test` green; `yarn lint` clean.
2026-05-17 21:41:24 -04:00
Jamkris
63c9788f50 fix(hooks): scan full costs.jsonl when locating session row
`readSessionCost` read only the trailing 8 KiB of
`~/.claude/metrics/costs.jsonl` to "avoid scanning entire file".
That ceiling is the opposite-sign sibling of the double-count bug
fixed in the previous commit: once a session's most recent
cumulative row gets pushed past the 8 KiB window by newer rows
from other sessions, the bridge silently reports `totalCost: 0`,
`totalIn: 0`, `totalOut: 0` for that session — same false signal
to `ecc-context-monitor.js`, same wrong number injected into the
live model turn as `additionalContext`.

`cost-tracker.js` has no rotation policy, so on any non-trivial
workstation costs.jsonl grows past 8 KiB within minutes of normal
use. For users who keep multiple concurrent sessions, this means
the second-and-later sessions silently report zero almost
immediately.

Reproduced before this commit:

  $ HOME=/tmp/eccc node -e '
      const fs = require("fs");
      const m = require("./scripts/hooks/ecc-metrics-bridge.js");
      // S1 row at file start, then 200 rows of OTHER-session noise (~16 KiB).
      // S1 is the row we want, but it sits past the 8 KiB tail.
      const s1 = `{"session_id":"S1","estimated_cost_usd":0.5,"input_tokens":500,"output_tokens":250}`;
      const other = `{"session_id":"OTHER","estimated_cost_usd":1,"input_tokens":100,"output_tokens":50}`;
      fs.mkdirSync("/tmp/eccc/.claude/metrics", { recursive: true });
      fs.writeFileSync("/tmp/eccc/.claude/metrics/costs.jsonl",
        [s1, ...Array(200).fill(other)].join("\\n") + "\\n");
      console.log(JSON.stringify(m.readSessionCost("S1")));'
  {"totalCost":0,"totalIn":0,"totalOut":0}

Expected: `{"totalCost":0.5, "totalIn":500, "totalOut":250}` (the
S1 row that exists in the file).
Actual: zero — the row is past the 8 KiB tail.

Fix: drop the `fs.openSync` + bounded `fs.readSync` + position
arithmetic in favour of `fs.readFileSync(costsPath, 'utf8')` and
iterate every line. Each row is ~150 bytes; even 100k rows is
~15 MB and a single sync read on PreToolUse is in the low ms.
If file rotation lands in `cost-tracker.js` later, this scan
becomes proportionally cheaper.

After this commit the reproduction above returns
`{"totalCost":0.5, "totalIn":500, "totalOut":250}`.

Regression test in `tests/hooks/ecc-metrics-bridge.test.js`:
`readSessionCost finds session row beyond the old 8 KiB tail
boundary`. The test asserts the costs.jsonl fixture is > 8 KiB
before reading so any reintroduction of a bounded tail would
re-fail the test (i.e. the assertion is the contract, not the
specific number 8192).

Together with the previous commit, both directions of the
metrics-bridge cost-reporting bug are closed.
2026-05-17 21:41:24 -04:00
Jamkris
4f21ed2acf fix(hooks): use last cumulative row for session cost in metrics bridge
`ecc-metrics-bridge.js#readSessionCost` summed the
`estimated_cost_usd`, `input_tokens`, and `output_tokens` of
every matching row in `~/.claude/metrics/costs.jsonl`. That breaks
the documented contract of `scripts/hooks/cost-tracker.js`, which
explicitly states (in its module docblock):

  Cumulative behavior: Stop fires per assistant response, not
  per session. Each row therefore represents the cumulative
  session total up to that point. To get per-session cost, take
  the last row per session_id.

Summing N cumulative rows over-counts by roughly (N+1)/2 ×. For a
session with 3 rows at 0.01, 0.02, 0.03 USD (true running total
0.03), the bridge today reports 0.06 USD. The over-counted value
feeds `ecc-context-monitor.js`, which then trips its
COST_NOTICE_USD / COST_WARNING_USD / COST_CRITICAL_USD thresholds
on phantom spend AND injects the inflated number as
`additionalContext` into the live model turn — so the agent
itself is told a wrong cost.

Reproduced on `main` before this commit:

  $ cat > /tmp/eccc/.claude/metrics/costs.jsonl <<EOF
  {"session_id":"S1","estimated_cost_usd":0.01,"input_tokens":333,"output_tokens":166}
  {"session_id":"S1","estimated_cost_usd":0.02,"input_tokens":666,"output_tokens":333}
  {"session_id":"S1","estimated_cost_usd":0.03,"input_tokens":1000,"output_tokens":500}
  EOF

  $ HOME=/tmp/eccc node -e 'const m = require("./scripts/hooks/ecc-metrics-bridge.js"); \
      console.log(JSON.stringify(m.readSessionCost("S1")))'
  {"totalCost":0.06,"totalIn":1999,"totalOut":999}

Expected: `{"totalCost":0.03,"totalIn":1000,"totalOut":500}` (the
last cumulative row).
Actual: 2× over-count.

Fix: replace `+=` with `=` in the matching branch so the assigned
values reflect the most recent row encountered. The iteration
order is file order, which is also event time order, so the last
assignment wins — exactly the contract cost-tracker writes
against.

After this commit the reproduction above returns
`{"totalCost":0.03,"totalIn":1000,"totalOut":500}`.

Regression test in `tests/hooks/ecc-metrics-bridge.test.js`:
`readSessionCost returns the LAST cumulative row, not the sum
(cost-tracker contract)`. The existing
`readSessionCost does not include unrelated default-session rows`
test happened to pass even with the bug because it only had one
target-session row — single-row sessions are coincidentally
correct under both formulas. The new test uses three rows so the
two formulas diverge.

A second issue in the same function — the 8 KiB tail-only read
silently drops older rows once a session's recent cumulative
totals scroll past that window — is fixed in the next commit.
2026-05-17 21:41:24 -04:00
Jamkris
7bb3172041 test(ci): coverage for round-1 fixes (quoted write-all, dedup, lifecycle scope)
Three test changes in response to the round-1 review:

1. **Add quoted-write-all coverage** (cubic P0 follow-up).
   Two new cases assert the regex now matches the double-quoted and
   single-quoted YAML forms of `permissions: "write-all"`:
     - `rejects double-quoted permissions: "write-all"`
     - `rejects single-quoted permissions: 'write-all'`
   Both fixtures trigger only the persist-credentials gate, so they
   exercise the WRITE_ALL_PATTERN OR-clause in isolation.

2. **Add expression+ref dedup coverage** (greptile P2 follow-up).
   `emits a single violation when both expressionPattern and refPattern
   match the same step` — uses `refs/pull/${{ … head.sha }}/merge` as
   the fixture (which matches both patterns) and counts ERROR lines for
   the `pull_request_target` rule, asserting exactly one. Re-introducing
   the duplicate-push bug would re-fail this test immediately.

3. **Drop the `npm ci without --ignore-scripts under write-all` test**
   (greptile P2). That test happened to pass under the previous
   `--ignore-scripts` regex, but `UNSAFE_INSTALL_PATTERNS` (added in
   `f7035b56`) fires unconditionally for every workflow regardless of
   permissions. So the test was exercising a pre-existing code path
   that has nothing to do with WRITE_ALL_PATTERN. Reviewer flagged this
   could mislead future contributors into thinking lifecycle-script
   enforcement is gated on write permissions.

   Replaced by the surrounding `rejects checkout credential persistence
   in workflows with permissions: write-all` test (already present) and
   the new quoted-form tests above, which all exercise the actual
   persist-credentials gate that the WRITE_ALL_PATTERN clause newly
   activates.

Test count: 22 → 24 (added 3 new, dropped 1). All green; `yarn lint`
clean.

The cohort comment above the write-all block was also tightened to
explicitly note that "the lifecycle-script gate already fires
unconditionally for every workflow" so the next reader sees the
distinction up front.
2026-05-17 21:19:29 -04:00
Jamkris
cdbc925d89 fix(ci): flag refs/pull checkouts under pull_request_target
The `pull_request_target` rule's `expressionPattern` matches only
the canonical `github.event.pull_request.head.{ref,sha,repo.full_name}`
interpolations. It does not match the second canonical form of
the same exploit — fetching `refs/pull/<N>/{head,merge}` directly:

  - uses: actions/checkout@v4
    with:
      ref: refs/pull/${{ github.event.pull_request.number }}/merge

The merge-ref variant is what GitHub's own security guidance calls
out as the highest-severity privilege-escalation pattern under
`pull_request_target`: it materialises the PR's merge commit
(attacker code spliced with base), executes inside a workflow that
has full repo-scoped tokens, and gives the attacker the chance to
exfiltrate secrets or push to default branches. `refs/pull/N/head`
is functionally equivalent — same source, same trust boundary.

Reproduced on `main` before this commit:

  $ cat /tmp/bad.yml
  name: bad
  on: { pull_request_target: { types: [opened] } }
  permissions: { contents: read }
  jobs:
    do:
      runs-on: ubuntu-latest
      steps:
        - uses: actions/checkout@v4
          with:
            ref: refs/pull/${{ github.event.pull_request.number }}/merge
            persist-credentials: false
        - run: npm ci --ignore-scripts

  $ ECC_WORKFLOWS_DIR=/tmp node scripts/ci/validate-workflow-security.js
  Validated workflow security for 1 workflow files
  $ echo $?
  0

Expected: violation flagging the refs/pull checkout under pull_request_target.
Actual: passes silently.

Fix: add a `refPattern` to the `pull_request_target` rule:

    /^\s*ref:\s*['"]?[^'"\n]*refs\/(?:remotes\/)?pull\/[^'"\n\s]+/m

and apply it per checkout step inside the existing
event-gated loop. The pattern matches the ref VALUE so it catches
all interpolation shapes — `refs/pull/123/head`,
`refs/pull/${{ github.event.pull_request.number }}/merge`,
`${{ env.FOO }}/refs/pull/N/head` — without enumerating the
possible interpolations themselves.

Scoping: the rule is already gated on the workflow containing
`pull_request_target:`, so non-privileged `pull_request` workflows
that legitimately check out a PR ref are not affected.

After this commit the reproduction above exits 1 with:

  ERROR: bad.yml:10 - pull_request_target must not checkout an untrusted pull_request head ref/repository

Three new regression tests in `tests/ci/validate-workflow-security.test.js`:
  - rejects pull_request_target + refs/pull/<N>/merge
  - rejects pull_request_target + hardcoded refs/pull/<N>/head
  - allows pull_request_target with no `with.ref:` (base-ref checkout —
    the safe pattern from GitHub's own guidance)

Test count: 17 → 20 in this file; full `yarn test` still green.

Together with the previous commit, this closes the two
independent `validate-workflow-security.js` bypasses I found.
2026-05-17 21:19:29 -04:00
Jamkris
7f971b7e6f fix(ci): treat 'permissions: write-all' as a write-permission gate
`WRITE_PERMISSION_PATTERN` in `validate-workflow-security.js`
enumerates named GitHub Actions scopes (`contents: write`,
`issues: write`, etc.) to decide whether a workflow needs to:
  - disable `persist-credentials` on `actions/checkout`
  - pass `--ignore-scripts` to `npm ci`

The pattern misses the top-level shorthand `permissions:
write-all`, which is the strictly broader form — it grants every
named scope write access in a single line. As a result, a
workflow that opts into write-all currently slips both gates.

Reproduced on `main` before this commit:

  $ cat /tmp/bad.yml
  name: bad
  on: [push]
  permissions: write-all
  jobs:
    do:
      runs-on: ubuntu-latest
      steps:
        - uses: actions/checkout@v4
        - run: npm ci

  $ ECC_WORKFLOWS_DIR=/tmp node scripts/ci/validate-workflow-security.js
  Validated workflow security for 1 workflow files
  $ echo $?
  0

Expected: at least two violations (missing `persist-credentials:
false`, missing `--ignore-scripts`).
Actual: passes silently.

Fix: add a sibling pattern `WRITE_ALL_PATTERN` that matches
`^\s*permissions:\s*write-all\b` and OR it with
`WRITE_PERMISSION_PATTERN` at the single gate. Both top-level
and job-level `permissions:` blocks satisfy the `^\s*` prefix.

After this commit the reproduction above exits 1 with:

  ERROR: bad.yml:8 - workflows with write permissions must disable checkout credential persistence
  ERROR: bad.yml:9 - workflows with write permissions must install npm dependencies with --ignore-scripts

Three new regression tests in `tests/ci/validate-workflow-security.test.js`:
  - rejects write-all + credential-persisting checkout
  - rejects write-all + `npm ci` without `--ignore-scripts`
  - allows write-all when both gates are satisfied (no over-block)

Test count: 14 → 17 in this file; full `yarn test` still green.

A separate `refs/pull/N/merge` bypass under `pull_request_target`
exists in the same validator and is fixed in the next commit.
2026-05-17 21:19:29 -04:00
Affaan Mustafa
666b4e2261 fix(installer): harden locale docs install 2026-05-17 20:46:04 -04:00
Affaan Mustafa
519c592a12 fix: skip disabled discussion queries in platform audit 2026-05-17 20:32:09 -04:00
Affaan Mustafa
36d390aa7d security: cover gh-token-monitor token persistence 2026-05-17 17:46:35 -04:00
Affaan Mustafa
3215e655ef Add preview pack smoke gate 2026-05-17 15:35:23 -04:00
Affaan Mustafa
355c4f12cf Refresh Linear progress readiness detection 2026-05-17 15:13:42 -04:00
Affaan Mustafa
f397216aa0 Track legacy localization tail in readiness dashboard 2026-05-17 14:47:29 -04:00
Affaan Mustafa
f9bf94b246 Refresh operator dashboard readiness markers 2026-05-17 14:28:16 -04:00
Affaan Mustafa
744f416997 test: normalize zed install path assertion 2026-05-17 07:27:05 -04:00
Affaan Mustafa
2371a3cf05 feat: add zed install target 2026-05-17 07:06:49 -04:00
Affaan Mustafa
fb6d4a7104 fix: tighten supply-chain ioc markers 2026-05-17 06:48:01 -04:00
Affaan Mustafa
b47dfa95a3 fix: add context monitor cost warning opt-out 2026-05-17 01:53:57 -04:00
Affaan Mustafa
d6d1adbb2f test: cover agentshield dashboard promotion states 2026-05-17 01:31:30 -04:00
Affaan Mustafa
cc5c255529 docs: mirror agentshield policy promotion gate 2026-05-17 01:31:30 -04:00
Affaan Mustafa
6d130cfcd5 fix: reduce observer hook scanner signatures 2026-05-16 15:26:25 -04:00
Affaan Mustafa
609eb25898 Record AgentShield policy export milestone 2026-05-16 13:17:31 -04:00
Affaan Mustafa
aaabe5949e fix: recognize legacy salvage manual review backlog 2026-05-16 03:50:30 -04:00
Affaan Mustafa
7420441512 fix: tighten supply-chain IOC package matching 2026-05-16 03:30:30 -04:00
Affaan Mustafa
fc2d23de80 docs: record AgentShield fleet review items 2026-05-16 02:20:50 -04:00
Affaan Mustafa
6976a2a7dd docs: record ECC Tools harness fleet routing 2026-05-16 02:02:17 -04:00
Affaan Mustafa
fb28e469f1 docs: record ECC Tools hosted finding evidence 2026-05-16 01:49:46 -04:00
Affaan Mustafa
a1cf97e3f2 docs: record ECC Tools fleet evidence consumption 2026-05-16 01:38:11 -04:00
Affaan Mustafa
cc83a85eb8 docs: record AgentShield fleet routing evidence 2026-05-16 01:24:20 -04:00
Affaan Mustafa
fe49a31e9a docs: record AgentShield evidence-pack inspect evidence 2026-05-16 01:03:06 -04:00
Affaan Mustafa
1eb7b0809d docs: record AgentShield plugin-cache evidence 2026-05-16 00:27:48 -04:00
Kris Pahel
50ac061f9e chore: update statusline ANSI color palette
- Replace blinking red (5;31m) with bold red (1;31m) for critical context bar
- Replace cyan metrics (36m) with sky blue (38;5;117m)
- Replace plain bold task (1m) with bold bright white (1;97m)
- Update test assertion to match new bold red code
2026-05-15 23:18:01 -04:00
Affaan Mustafa
714200fd20 Cover advisory source renderer branches 2026-05-15 23:09:54 -04:00
Affaan Mustafa
2b387fb761 Cover advisory source refresh branches 2026-05-15 23:09:54 -04:00
Affaan Mustafa
a8e3bcb00f Add supply-chain advisory source refresh 2026-05-15 23:09:54 -04:00
Affaan Mustafa
2d46c00763 Cover operator dashboard render branches 2026-05-15 22:28:16 -04:00
Affaan Mustafa
3315f0ed61 Include operator dashboard in publish surface test 2026-05-15 22:28:16 -04:00
Affaan Mustafa
e26b5132c2 Align platform audit with generated dashboard 2026-05-15 22:28:16 -04:00
Affaan Mustafa
50f375bc2c Add repeatable operator readiness dashboard 2026-05-15 22:28:16 -04:00
Affaan Mustafa
f7035b5644 Harden CI installs against supply-chain lifecycle hooks 2026-05-15 17:29:03 -04:00
Affaan Mustafa
6951b8d5d2 Add scheduled supply-chain watch workflow 2026-05-15 16:56:49 -04:00
Affaan Mustafa
6887f2952d Add discussion audit gate 2026-05-15 16:26:57 -04:00