Mirror the previous commit's `writeBridgeAtomic` fix on the
companion `writeWarnState` in `ecc-context-monitor.js`. Same shape:
fixed `${target}.tmp` → `${target}.${process.pid}.${randomNonce}.tmp`,
plus best-effort cleanup of the tmp file on `renameSync` failure
(throws original error after cleanup).
`writeWarnState` debounces the context-monitor's threshold alarms
(`COST_NOTICE_USD`, `COST_WARNING_USD`, `COST_CRITICAL_USD`, plus the
context-remaining and loop-detection ones). Without unique suffixes,
two PostToolUse subprocesses racing on the warn-state file produce
either a corrupted JSON debounce-state on disk or an ENOENT throw
that the hook catches and swallows — either way the next warn-state
read returns the default `{callsSinceWarn: 0, lastSeverity: null}`
and the threshold alarms re-fire or stop firing erratically. Users
see warning messages flicker or vanish; debounce no longer works.
Three call sites in this repo now share the same atomic-write
contract:
- `writeBridgeAtomic` (scripts/lib/session-bridge.js) — primary
- `writeCostWarningIfChanged` (scripts/hooks/ecc-metrics-bridge.js) — cost cache
- `writeWarnState` (this file) — debounce state
`yarn lint` clean. Regression test covering both `writeBridgeAtomic`
and `writeWarnState` under concurrent load lands in the next commit.
`writeBridgeAtomic` wrote to a fixed `${target}.tmp` path before
calling `renameSync`. When two processes write to the same session
bridge concurrently (e.g. PostToolUse `ecc-metrics-bridge` + the
background `ecc-statusline`, both calling `writeBridgeAtomic(sessionId, ...)`),
the canonical atomic-rename race fires:
1. Process A: writeFileSync(target.tmp, JSON_A) — tmp file exists.
2. Process B: writeFileSync(target.tmp, JSON_B) — tmp file overwritten.
3. Process A: renameSync(target.tmp, target) — succeeds; target = JSON_B
(A's payload silently corrupted en-route).
4. Process B: renameSync(target.tmp, target) — throws ENOENT (the
rename consumed the file).
Every caller in the repo wraps `writeBridgeAtomic` in `try {} catch {}`,
so the ENOENT exception is swallowed and the user-visible symptom is
just "the bridge file occasionally contains the wrong process's
payload" with no diagnostic.
Reproduced before this commit:
$ # two concurrent writers, each calling writeBridgeAtomic 500 times
$ # against the same session ID
[A] errors=244 # 244 ENOENT exceptions swallowed
[B] errors=248 # ditto
After this commit the same workload reports 0 errors in both
subprocesses: tmp paths no longer collide.
Fix: change `${target}.tmp` to
`${target}.${process.pid}.${crypto.randomBytes(4).toString('hex')}.tmp`,
matching the pattern already used by `writeCostWarningIfChanged` in
`scripts/hooks/ecc-metrics-bridge.js` (commit 9b1d8918). The pid +
4-byte nonce gives each writer process a distinct tmp path, so step 2
above no longer overwrites step 1's payload and step 4 no longer
races step 3.
Also added: on `renameSync` failure, attempt `fs.unlinkSync(tmp)` so
a writer that fails (disk full, permission, parent dir gone) does
not leak its tmp file. The cleanup is best-effort and the original
error is still re-thrown.
**Scope clarification.** This commit closes the atomic-rename
primitive's race only. The *read-modify-write* race in callers —
two writers each read the same bridge state, increment, and write
back, the second clobbering the first — is a separate concern that
needs locking or per-writer logs, and is intentionally out of scope
for this PR. The cost-tracker / metrics-bridge callers tolerate
last-writer-wins on their cumulative aggregates today and this
commit does not change that contract.
The companion `writeWarnState` in `ecc-context-monitor.js` has the
same fixed-suffix pattern and the same race; that fix lands in the
next commit so each can be reviewed against its own diff.
Adds the Blender motion state inspection skill with maintainer refinements for tools metadata, usage guidance, meter-scale threshold assumptions, and Blender interpreter notes.
The OpenAI-compatible API can return HTTP 200 with an empty choices list
or choices[0].message = None (content-filtered responses on Gemini,
overwhelmed Ollama instances). Without a guard, both sites raise an
unhandled IndexError or AttributeError crashing the provider.
Added guard in OpenAIProvider.generate() and AstraFlowProvider.generate().