Commit Graph

1933 Commits

Author SHA1 Message Date
Jamkris
8149beac02 test(lib): make concurrent-write test actually concurrent + use regex matcher for assert.throws
Two round-1 review findings in `tests/lib/session-bridge.test.js`,
both about test correctness rather than the underlying fix:

1. **greptile P1 + coderabbitai Major + cubic P2 (all three): concurrent-write test ran sequentially.**

   The test spawned two child processes with two consecutive
   `spawnSync` calls. Because `spawnSync` blocks until the child
   exits, the second writer started *after* the first finished —
   the two writers never overlapped, so the rename race the fix
   targets was never actually exercised. The test would have passed
   with the old broken `${target}.tmp` suffix.

   Fix: introduce a one-off "race runner" helper that runs inside
   its own subprocess and uses async `spawn` to start both writers
   simultaneously. The runner waits for both to exit (the event
   loop is local to the runner subprocess, so this stays compatible
   with the synchronous test harness used elsewhere in this file)
   and reports both exit codes plus stderrs on stdout. The test
   then calls the runner via `spawnSync` and parses the result.
   Both writer children now overlap for the duration of their 200
   `writeBridgeAtomic` calls each, which is enough wall time to
   reliably trigger the rename race against the pre-fix code.

   Verified: with the fixed `${target}.${pid}.${nonce}.tmp` suffix,
   the test passes; with the old fixed `${target}.tmp` suffix
   reintroduced, it fails as expected (one writer hits ENOENT on
   roughly half its rename calls).

2. **greptile P2 + cubic P3: `assert.throws` used a string as the second argument.**

   Node deprecated passing a string as the second argument to
   `assert.throws` years ago: the string is silently treated as
   the assertion failure message (what to print when the function
   does *not* throw) rather than as an error matcher. The check
   passed for any thrown error, not just the rename failure.

   Fix: pass a regex matcher as the second arg and keep the
   explanatory text as the third. The regex matches `EISDIR`,
   `EPERM`, `ENOTDIR`, or `ENOENT` because `renameSync` of a
   regular tmp file onto an existing directory raises different
   codes on Linux / macOS / BSD — making the matcher portable
   across CI runners.

Test count unchanged at 14; `npm test` green; `npm run lint` clean.

The two helper files (`tests/__tmp_bridge_writer.js`,
`tests/__tmp_bridge_race_runner.js`) are written and unlinked
inside the test's try/finally so they never persist beyond the
test run.
2026-05-19 09:44:53 +09:00
Jamkris
a2b98be4a3 test(lib): concurrent writeBridgeAtomic + tmp-cleanup regression
Two regression tests pin down the previous two commits' atomic-rename
fixes:

1. **concurrent writes don't throw ENOENT or corrupt the file** —
   spawns two child Node processes (`tests/__tmp_bridge_writer.js`
   created in-test, cleaned up in finally) that each call
   `writeBridgeAtomic(sid, …)` 200 times against the same session
   ID with independent payloads. Asserts both subprocesses exit 0
   (the previous implementation produced ENOENT on roughly 50% of
   rename calls, all swallowed by the in-test catch) and the final
   bridge file is parseable JSON belonging to one of the two writers
   (last-writer-wins is fine; the contract is *no corruption* and
   *no rename ENOENT*, not data preservation).

2. **tmp file cleanup on rename failure** — pre-creates a directory
   at the target bridge path so `renameSync(tmp, target)` fails,
   calls `writeBridgeAtomic`, asserts the call throws AND that no
   tmp file with the writer's `pid.<nonce>.tmp` prefix is left
   behind in `os.tmpdir()`. The previous code had no cleanup; the
   fix's `try/catch + unlinkSync` keeps tmpdir from accumulating
   orphan files across repeated rename failures.

The first test deliberately writes independent payloads from each
subprocess so this regression doesn't try to claim a property the
fix doesn't actually deliver (read-modify-write race in the caller
is a separate issue and out of scope per PR body).

Test count: 12 → 14 in `tests/lib/session-bridge.test.js`;
`npm test` green; `npm run lint` clean.
2026-05-19 09:31:35 +09:00
Jamkris
74fda58a9c fix(hooks): use unique tmp suffix in writeWarnState (ecc-context-monitor)
Mirror the previous commit's `writeBridgeAtomic` fix on the
companion `writeWarnState` in `ecc-context-monitor.js`. Same shape:
fixed `${target}.tmp` → `${target}.${process.pid}.${randomNonce}.tmp`,
plus best-effort cleanup of the tmp file on `renameSync` failure
(throws original error after cleanup).

`writeWarnState` debounces the context-monitor's threshold alarms
(`COST_NOTICE_USD`, `COST_WARNING_USD`, `COST_CRITICAL_USD`, plus the
context-remaining and loop-detection ones). Without unique suffixes,
two PostToolUse subprocesses racing on the warn-state file produce
either a corrupted JSON debounce-state on disk or an ENOENT throw
that the hook catches and swallows — either way the next warn-state
read returns the default `{callsSinceWarn: 0, lastSeverity: null}`
and the threshold alarms re-fire or stop firing erratically. Users
see warning messages flicker or vanish; debounce no longer works.

Three call sites in this repo now share the same atomic-write
contract:
- `writeBridgeAtomic` (scripts/lib/session-bridge.js) — primary
- `writeCostWarningIfChanged` (scripts/hooks/ecc-metrics-bridge.js) — cost cache
- `writeWarnState` (this file) — debounce state

`yarn lint` clean. Regression test covering both `writeBridgeAtomic`
and `writeWarnState` under concurrent load lands in the next commit.
2026-05-19 09:30:07 +09:00
Jamkris
95602cc27b fix(lib): use unique tmp suffix in writeBridgeAtomic to eliminate ENOENT race
`writeBridgeAtomic` wrote to a fixed `${target}.tmp` path before
calling `renameSync`. When two processes write to the same session
bridge concurrently (e.g. PostToolUse `ecc-metrics-bridge` + the
background `ecc-statusline`, both calling `writeBridgeAtomic(sessionId, ...)`),
the canonical atomic-rename race fires:

  1. Process A: writeFileSync(target.tmp, JSON_A) — tmp file exists.
  2. Process B: writeFileSync(target.tmp, JSON_B) — tmp file overwritten.
  3. Process A: renameSync(target.tmp, target) — succeeds; target = JSON_B
     (A's payload silently corrupted en-route).
  4. Process B: renameSync(target.tmp, target) — throws ENOENT (the
     rename consumed the file).

Every caller in the repo wraps `writeBridgeAtomic` in `try {} catch {}`,
so the ENOENT exception is swallowed and the user-visible symptom is
just "the bridge file occasionally contains the wrong process's
payload" with no diagnostic.

Reproduced before this commit:

  $ # two concurrent writers, each calling writeBridgeAtomic 500 times
  $ # against the same session ID
  [A] errors=244   # 244 ENOENT exceptions swallowed
  [B] errors=248   # ditto

After this commit the same workload reports 0 errors in both
subprocesses: tmp paths no longer collide.

Fix: change `${target}.tmp` to
`${target}.${process.pid}.${crypto.randomBytes(4).toString('hex')}.tmp`,
matching the pattern already used by `writeCostWarningIfChanged` in
`scripts/hooks/ecc-metrics-bridge.js` (commit 9b1d8918). The pid +
4-byte nonce gives each writer process a distinct tmp path, so step 2
above no longer overwrites step 1's payload and step 4 no longer
races step 3.

Also added: on `renameSync` failure, attempt `fs.unlinkSync(tmp)` so
a writer that fails (disk full, permission, parent dir gone) does
not leak its tmp file. The cleanup is best-effort and the original
error is still re-thrown.

**Scope clarification.** This commit closes the atomic-rename
primitive's race only. The *read-modify-write* race in callers —
two writers each read the same bridge state, increment, and write
back, the second clobbering the first — is a separate concern that
needs locking or per-writer logs, and is intentionally out of scope
for this PR. The cost-tracker / metrics-bridge callers tolerate
last-writer-wins on their cumulative aggregates today and this
commit does not change that contract.

The companion `writeWarnState` in `ecc-context-monitor.js` has the
same fixed-suffix pattern and the same race; that fix lands in the
next commit so each can be reviewed against its own diff.
2026-05-19 09:29:09 +09:00
Affaan Mustafa
cb81f1b0fe docs: narrow ecc tools billing blocker 2026-05-18 16:45:31 -04:00
Affaan Mustafa
7e2cdeaeb5 docs: refresh rc1 operator evidence 2026-05-18 16:27:09 -04:00
Affaan Mustafa
4470e2e670 docs: refresh rc1 publication evidence 2026-05-18 16:12:37 -04:00
Affaan Mustafa
67e63e63f9 docs: align publication readiness evidence 2026-05-18 15:36:39 -04:00
Affaan Mustafa
fe7b4f2ba3 docs: regenerate operator readiness dashboard 2026-05-18 15:24:25 -04:00
Affaan Mustafa
0f1775e30b docs: refresh release blockers evidence 2026-05-18 15:23:48 -04:00
Affaan Mustafa
12ac22e674 docs: add discussion response playbook 2026-05-18 14:39:11 -04:00
Affaan Mustafa
c032e07b1e docs: refresh may 18 release evidence 2026-05-18 14:24:50 -04:00
Affaan Mustafa
97567a91e7 test: normalize release workflow line endings 2026-05-18 13:53:26 -04:00
Affaan Mustafa
7911af4a39 security: scope release oidc publishing 2026-05-18 13:41:10 -04:00
Affaan Mustafa
386326df8e fix: treat MCP HTTP 406 probes as reachable 2026-05-18 12:48:52 -04:00
Affaan Mustafa
b41e6fb3d0 docs: refresh publication readiness gate 2026-05-18 10:49:49 -04:00
Affaan Mustafa
99e01ded7d docs: refresh operator dashboard evidence 2026-05-18 10:32:26 -04:00
Affaan Mustafa
2ba0c62d8a docs: mirror agentshield fleet ticket evidence 2026-05-18 10:24:21 -04:00
Affaan Mustafa
9abe721bfe docs: refresh release readiness evidence 2026-05-18 09:30:14 -04:00
Affaan Mustafa
680aeff0fb test: enforce release publication checklist in readiness gates 2026-05-18 09:10:51 -04:00
Affaan Mustafa
6c0fbfb6c5 docs: add release plugin publication checklist 2026-05-18 08:56:17 -04:00
Affaan Mustafa
0e88e6a4dd docs: refresh zero queue dashboard 2026-05-18 06:37:10 -04:00
Affaan Mustafa
cdc92de42a docs: finish owner queue cleanup 2026-05-18 06:35:44 -04:00
Affaan Mustafa
25dc518e1d docs: regenerate owner queue dashboard 2026-05-18 06:17:31 -04:00
Affaan Mustafa
08807e7fd6 docs: record owner-wide queue cleanup 2026-05-18 06:16:45 -04:00
Affaan Mustafa
feeaa97511 docs: regenerate operator readiness dashboard 2026-05-18 05:38:44 -04:00
Affaan Mustafa
5e8f412cb5 docs: refresh ecc tools billing blocker evidence 2026-05-18 05:38:14 -04:00
Affaan Mustafa
4d6fc194ea fix: include blender skill in install manifest 2026-05-18 04:54:17 -04:00
Affaan Mustafa
aae735d458 docs: regenerate operator readiness dashboard 2026-05-18 04:30:43 -04:00
Affaan Mustafa
ff3eaff137 docs: refresh billing readback gate evidence 2026-05-18 04:30:09 -04:00
Da Wei
922d2d8f8b Add Blender motion state inspection skill
Adds the Blender motion state inspection skill with maintainer refinements for tools metadata, usage guidance, meter-scale threshold assumptions, and Blender interpreter notes.
2026-05-18 04:11:31 -04:00
Affaan Mustafa
bf17737969 test: stabilize repair lifecycle on Windows 2026-05-18 03:48:51 -04:00
Affaan Mustafa
f92f15199c docs: refresh target billing dashboard evidence 2026-05-18 03:28:36 -04:00
Affaan Mustafa
fb4b0c8dce docs: mirror target billing readback gate 2026-05-18 03:27:42 -04:00
Affaan Mustafa
aa634df9e5 docs: record clean preview pack smoke 2026-05-18 02:48:41 -04:00
Affaan Mustafa
742bc58d97 docs: refresh release evidence after ioc scanner hardening 2026-05-18 02:45:30 -04:00
Affaan Mustafa
04d4d81938 fix: ignore defensive ioc deny rules 2026-05-18 02:29:59 -04:00
Affaan Mustafa
99e9f118bd docs: refresh evidence head after billing mirror 2026-05-18 02:18:22 -04:00
Affaan Mustafa
f010f78332 docs: refresh dashboard after wrangler billing mirror 2026-05-18 02:04:21 -04:00
Affaan Mustafa
e53933de1b docs: refine billing readback dashboard blocker 2026-05-18 02:03:37 -04:00
Affaan Mustafa
10313d847a docs: mirror ecc tools wrangler billing readback 2026-05-18 02:00:46 -04:00
Affaan Mustafa
aa4ae863f8 docs: refresh release evidence after provider guard merge 2026-05-18 01:30:51 -04:00
Affaan Mustafa
80f6c27957 Merge PR #1976 provider response guards 2026-05-18 01:05:37 -04:00
Affaan Mustafa
eb0d893948 fix: harden openai-compatible provider responses 2026-05-18 01:04:28 -04:00
Your Name
cc62e89152 fix: guard against empty choices in OpenAI and AstraFlow LLM providers
The OpenAI-compatible API can return HTTP 200 with an empty choices list
or choices[0].message = None (content-filtered responses on Gemini,
overwhelmed Ollama instances). Without a guard, both sites raise an
unhandled IndexError or AttributeError crashing the provider.

Added guard in OpenAIProvider.generate() and AstraFlowProvider.generate().
2026-05-17 23:49:00 -05:00
Affaan Mustafa
044d1863d0 test: skip insaits monitor subprocesses without python 2026-05-18 00:47:05 -04:00
Affaan Mustafa
43822b9c1a docs: refresh operator readiness dashboard 2026-05-18 00:36:30 -04:00
Affaan Mustafa
c276639bc7 docs: mirror marketplace billing provenance gate 2026-05-18 00:36:01 -04:00
Affaan Mustafa
804f8ab79a docs: refresh dashboard for billing readback 2026-05-18 00:01:16 -04:00
Affaan Mustafa
34cc0c1856 docs: mirror billing kv readback gate 2026-05-18 00:00:37 -04:00