fix: stability batch — hook stdin truncation, Codex exa TOML, Stop hook JSON, GateGuard repetition (#2227)

* fix(hooks): fail open on oversized stdin instead of echoing truncated JSON (#2222) run-with-flags.js capped stdin at 1MB but every fallthrough path still echoed the truncated string to stdout. The harness parses hook stdout as JSON, got a document cut mid-stream, and blocked the tool call — so any Edit/Write with a >1MB hook payload was permanently blocked by every registered pre-write hook, before ECC_HOOK_PROFILE / ECC_DISABLED_HOOKS gating could run. - Exit 0 with empty stdout (no opinion) when the stdin cap trips, before any echo or gating logic. - Flush stdout via write callback before process.exit: exiting right after stdout.write() dropped everything past the ~64KB pipe buffer, cutting even sub-cap pass-through payloads mid-JSON. Regression tests cover the enabled, disabled, and missing-arg paths for oversized payloads plus full echo of sub-cap >64KB payloads. * fix(codex): stop emitting invalid exa url entry, align merge with connector policy (#2224) The Codex MCP merge declared exa with a url key, but Codex's [mcp_servers.*] TOML schema is stdio-only — the url key makes the entire config.toml fail to load, bricking both the codex CLI and the desktop app. Every install/update re-injected the line because the urlEntry branch treated the broken entry as present. - ECC_SERVERS now emits only the current default set per docs/MCP-CONNECTOR-POLICY.md: chrome-devtools (stdio, command/args). Retired servers (supabase, playwright, context7, exa, github, memory, sequential-thinking) are never re-emitted; existing user-managed entries are untouched. - The merge now repairs the exact ECC-emitted broken form (url-only exa entry) on every run so re-running the installer fixes broken configs instead of preserving them. User stdio exa entries (command + mcp-remote) are left alone. - check-codex-global-state.sh requires chrome-devtools instead of the retired set, and flags url-only exa entries with a repair hint. Tests cover repair, re-run idempotence, stdio-entry preservation, and no-retired-server emission in add, update, dry-run, and disabled modes. * fix(hooks): never echo truncated stdin from Stop hooks (#2090) Stop hooks follow the ECC pass-through convention (echo stdin on stdout), but every echoing Stop hook capped stdin and echoed the capped string. The Stop payload carries last_assistant_message, so a long final assistant message produced a JSON document cut mid-stream on stdout, which the harness reports as 'Stop hook error: JSON validation failed' across the whole Stop chain. Reproduced: a Stop payload with a >64KB last_assistant_message run through run-with-flags + cost-tracker emitted exactly 65536 bytes of invalid JSON (cost-tracker capped stdin at 64KB — far below realistic Stop payloads). - cost-tracker: raise the cap to 1MB (matching all other hooks) and suppress the pass-through echo when stdin was truncated. - check-console-log, stop-format-typecheck, desktop-notify: suppress the echo when stdin was truncated; flush stdout before process.exit so sub-cap payloads are not cut at the ~64KB pipe buffer. - All hooks keep exiting 0 (fail-open); diagnostics go to stderr. New stop-hooks-stdout test asserts the contract for every registered Stop hook: stdout is empty or valid JSON, exit code 0 — for realistic 100KB payloads and oversized >1MB payloads, via the production runner and via direct invocation. Updated the old hooks.test.js case that codified the truncated-echo behavior. * fix(hooks): dampen GateGuard fact-force repetition in long sessions (#2142) In long autonomous sessions the fact-force gate produced 10+ near-identical 'state facts -> blocked -> restate -> retry' blocks in one context window, which measurably raises the odds of the model collapsing into a degenerate single-token repetition loop. - Track a per-session fact_force_denials counter in GateGuard state (merged max across concurrent writers, reset with the session, robust to malformed on-disk values). - The first GATEGUARD_FACT_FORCE_FULL_DENIALS denials (default 3) keep the full four-fact block; later denials emit a condensed single-line message that carries the denial ordinal, so consecutive denials are structurally different and never textually identical. - True retries of the same target remain allowed without re-prompting (unchanged). Destructive-Bash and routine-Bash gates are unchanged, as are the ECC_GATEGUARD=off / ECC_DISABLED_HOOKS escape hatches. Eight new tests cover budget counting, condensed format, ordinal advancement, retry pass-through, env tuning, malformed state, MultiEdit dampening, and destructive-gate exemption. * fix(hooks): keep security hooks able to block on oversized stdin (#2222) Refine the truncation fail-open: instead of skipping the hook entirely, the runner now suppresses only its own raw-echo when stdin was truncated. The hook still executes and receives the truncated flag (run() context / ECC_HOOK_INPUT_TRUNCATED), so config-protection keeps blocking truncated protected-config payloads (its test requires exit 2) while pass-through hooks fail open with empty stdout as before. * style: apply repo formatter to touched hook files
2026-06-12 19:23:07 +08:00 · 2026-06-11 00:31:33 -04:00
parent 3bdb4a5e12
commit 6319c7d309
14 changed files with 846 additions and 151 deletions
--- a/scripts/hooks/check-console-log.js
+++ b/scripts/hooks/check-console-log.js
@@ -28,20 +28,40 @@ const EXCLUDED_PATTERNS = [

 const MAX_STDIN = 1024 * 1024; // 1MB limit
 let data = '';
+let truncated = false;
 process.stdin.setEncoding('utf8');

 process.stdin.on('data', chunk => {
  if (data.length < MAX_STDIN) {
    const remaining = MAX_STDIN - data.length;
    data += chunk.substring(0, remaining);
+    if (chunk.length > remaining) truncated = true;
+  } else {
+    truncated = true;
  }
 });

+/**
+ * Echo stdin back (ECC pass-through convention), then exit once the pipe has
+ * flushed. Truncated stdin is never echoed: a JSON document cut mid-stream is
+ * reported by the harness as a Stop hook JSON validation failure (#2090).
+ */
+function passThroughAndExit() {
+  if (truncated) {
+    log('[Hook] check-console-log: stdin exceeded 1MB; suppressing pass-through (fail-open)');
+    process.exit(0);
+  }
+  if (!data) {
+    process.exit(0);
+  }
+  process.stdout.write(data, () => process.exit(0));
+}
+
 process.stdin.on('end', () => {
  try {
    if (!isGitRepo()) {
-      process.stdout.write(data);
-      process.exit(0);
+      passThroughAndExit();
+      return;
    }

    const files = getGitModifiedFiles(['\\.tsx?$', '\\.jsx?$'])
@@ -65,7 +85,6 @@ process.stdin.on('end', () => {
    log(`[Hook] check-console-log error: ${err.message}`);
  }

-  // Always output the original data
-  process.stdout.write(data);
-  process.exit(0);
+  // Always output the original data (unless truncated)
+  passThroughAndExit();
 });
--- a/scripts/hooks/cost-tracker.js
+++ b/scripts/hooks/cost-tracker.js
@@ -128,12 +128,22 @@ function sumUsageFromTranscript(transcriptPath) {
  return { inputTokens, outputTokens, cacheWriteTokens, cacheReadTokens, model };
 }

-const MAX_STDIN = 64 * 1024;
+// 1MB, matching the other Stop hooks. The Stop payload carries
+// last_assistant_message, which routinely exceeded the old 64KB cap and
+// made this hook echo a JSON document cut mid-stream (#2090).
+const MAX_STDIN = 1024 * 1024;
 let raw = '';
+let truncated = false;

 process.stdin.setEncoding('utf8');
 process.stdin.on('data', chunk => {
-  if (raw.length < MAX_STDIN) raw += chunk.substring(0, MAX_STDIN - raw.length);
+  if (raw.length < MAX_STDIN) {
+    const remaining = MAX_STDIN - raw.length;
+    raw += chunk.substring(0, remaining);
+    if (chunk.length > remaining) truncated = true;
+  } else {
+    truncated = true;
+  }
 });

 process.stdin.on('end', () => {
@@ -201,6 +211,11 @@ process.stdin.on('end', () => {
    // Non-blocking — never fail the Stop hook.
  }

-  // Pass stdin through (required by ECC hook convention).
+  // Pass stdin through (ECC hook convention) — but never echo truncated
+  // stdin: invalid JSON on stdout is reported as a Stop hook failure (#2090).
+  if (truncated) {
+    process.stderr.write('[Hook] cost-tracker: stdin exceeded 1MB; suppressing pass-through (fail-open)\n');
+    return;
+  }
  process.stdout.write(raw);
 });
--- a/scripts/hooks/desktop-notify.js
+++ b/scripts/hooks/desktop-notify.js
@@ -236,15 +236,26 @@ module.exports = { run };
 if (require.main === module) {
  const MAX_STDIN = 1024 * 1024;
  let data = '';
+  let truncated = false;

  process.stdin.setEncoding('utf8');
  process.stdin.on('data', chunk => {
    if (data.length < MAX_STDIN) {
-      data += chunk.substring(0, MAX_STDIN - data.length);
+      const remaining = MAX_STDIN - data.length;
+      data += chunk.substring(0, remaining);
+      if (chunk.length > remaining) truncated = true;
+    } else {
+      truncated = true;
    }
  });
  process.stdin.on('end', () => {
    const output = run(data);
+    // Never echo truncated stdin — invalid JSON on stdout is reported as a
+    // Stop hook failure (#2090).
+    if (truncated) {
+      log('[DesktopNotify] stdin exceeded 1MB; suppressing pass-through (fail-open)');
+      return;
+    }
    if (output) process.stdout.write(output);
  });
 }
--- a/scripts/hooks/gateguard-fact-force.js
+++ b/scripts/hooks/gateguard-fact-force.js
@@ -592,6 +592,7 @@ function saveState(state) {

    let mergedChecked = Array.isArray(state.checked) ? state.checked : [];
    let mergedLastActive = typeof state.last_active === 'number' ? state.last_active : 0;
+    let mergedDenials = getDenialCount(state);

    try {
      if (fs.existsSync(stateFile)) {
@@ -602,6 +603,7 @@ function saveState(state) {
        if (typeof diskState.last_active === 'number') {
          mergedLastActive = Math.max(mergedLastActive, diskState.last_active);
        }
+        mergedDenials = Math.max(mergedDenials, getDenialCount(diskState));
      }
    } catch (_) {
      /* ignore malformed or transient disk state */
@@ -609,7 +611,8 @@ function saveState(state) {

    const finalState = {
      checked: pruneCheckedEntries(mergedChecked),
-      last_active: Math.max(mergedLastActive, Date.now())
+      last_active: Math.max(mergedLastActive, Date.now()),
+      fact_force_denials: mergedDenials
    };

    // Atomic write: temp file + rename prevents partial reads
@@ -652,6 +655,48 @@ function markChecked(key) {
  return true;
 }

+// --- Fact-force denial dampening (#2142) ---
+//
+// In long sessions the near-identical four-fact deny blocks accumulate in
+// the context window and measurably raise the odds of the model dropping
+// into a degenerate repetition loop. Emit the full four-fact block only for
+// the first GATEGUARD_FACT_FORCE_FULL_DENIALS denials per session (default
+// 3); afterwards emit a condensed single-line denial that carries the
+// denial ordinal, so consecutive denials are structurally different and
+// never textually identical. True retries of an already-gated target are
+// unaffected (they were always allowed). Destructive-Bash and routine-Bash
+// gates are unchanged.
+
+const DEFAULT_FULL_DENIALS = 3;
+
+function getFullDenialBudget() {
+  const raw = Number.parseInt(process.env.GATEGUARD_FACT_FORCE_FULL_DENIALS || '', 10);
+  if (Number.isInteger(raw) && raw >= 0) {
+    return raw;
+  }
+  return DEFAULT_FULL_DENIALS;
+}
+
+function getDenialCount(state) {
+  const n = Number(state && state.fact_force_denials);
+  return Number.isFinite(n) && n >= 0 ? Math.floor(n) : 0;
+}
+
+/**
+ * Record a first-touch target AND count the fact-force denial in the same
+ * state write. Returns the new denial ordinal (1-based) plus whether the
+ * write persisted.
+ */
+function markCheckedAndCountDenial(key) {
+  const state = loadState();
+  if (!state.checked.includes(key)) {
+    state.checked.push(key);
+  }
+  const denials = getDenialCount(state) + 1;
+  state.fact_force_denials = denials;
+  return { ok: saveState(state), denials };
+}
+
 function isChecked(key) {
  const state = loadState();
  const found = state.checked.includes(key);
@@ -792,6 +837,20 @@ function writeGateMsg(filePath) {
  ].join('\n');
 }

+/**
+ * Condensed single-line denial used after the full-block budget is spent
+ * (#2142). Carries the denial ordinal so consecutive denials differ
+ * textually, and a one-line recovery hint instead of the multi-line block.
+ */
+function condensedGateMsg(action, filePath, ordinal) {
+  const safe = sanitizePath(filePath);
+  return (
+    `[Fact-Forcing Gate] (denial #${ordinal} this session) First ${action} of ${safe}: ` +
+    "briefly state importers/callers, affected API, data schemas if any, and the user's verbatim instruction, then retry. " +
+    '(ECC_GATEGUARD=off disables this gate.)'
+  );
+}
+
 function destructiveBashMsg() {
  return [
    '[Fact-Forcing Gate]',
@@ -902,9 +961,14 @@ function run(rawInput) {
    }

    if (!isChecked(filePath)) {
-      if (!markChecked(filePath)) {
+      const { ok, denials } = markCheckedAndCountDenial(filePath);
+      if (!ok) {
        return allowWithStateWarning();
      }
+      if (denials > getFullDenialBudget()) {
+        const action = toolName === 'Edit' ? 'edit' : 'creation';
+        return denyResult(condensedGateMsg(action, filePath, denials), { includeRecoveryHint: false });
+      }
      return denyResult(toolName === 'Edit' ? editGateMsg(filePath) : writeGateMsg(filePath));
    }

@@ -920,9 +984,13 @@ function run(rawInput) {
    for (const edit of edits) {
      const filePath = edit.file_path || '';
      if (filePath && !isClaudeSettingsPath(filePath) && !isChecked(filePath)) {
-        if (!markChecked(filePath)) {
+        const { ok, denials } = markCheckedAndCountDenial(filePath);
+        if (!ok) {
          return allowWithStateWarning();
        }
+        if (denials > getFullDenialBudget()) {
+          return denyResult(condensedGateMsg('edit', filePath, denials), { includeRecoveryHint: false });
+        }
        return denyResult(editGateMsg(filePath));
      }
    }
--- a/scripts/hooks/run-with-flags.js
+++ b/scripts/hooks/run-with-flags.js
@@ -45,40 +45,52 @@ function writeStderr(stderr) {
  process.stderr.write(stderr.endsWith('\n') ? stderr : `${stderr}\n`);
 }

-function emitHookResult(raw, output) {
+/**
+ * Write stdout fully, then exit. `process.exit()` immediately after
+ * `process.stdout.write()` drops anything beyond the ~64KB pipe buffer,
+ * which cut large pass-through payloads mid-JSON and made the harness
+ * treat the hook as failed (#2222). The write callback fires only after
+ * the chunk is flushed to the pipe.
+ */
+function exitWithStdout(text, exitCode) {
+  if (typeof text !== 'string' || text.length === 0) {
+    process.exit(exitCode);
+  }
+  process.stdout.write(text, () => process.exit(exitCode));
+}
+
+function resolveHookResult(raw, output) {
  if (typeof output === 'string' || Buffer.isBuffer(output)) {
-    process.stdout.write(String(output));
-    return 0;
+    return { stdout: String(output), exitCode: 0 };
  }

  if (output && typeof output === 'object') {
    writeStderr(output.stderr);
+    const exitCode = Number.isInteger(output.exitCode) ? output.exitCode : 0;

    if (Object.prototype.hasOwnProperty.call(output, 'additionalContext')) {
-      process.stdout.write(buildPreToolUseAdditionalContext(output.additionalContext));
-    } else if (Object.prototype.hasOwnProperty.call(output, 'stdout')) {
-      process.stdout.write(String(output.stdout ?? ''));
-    } else if (!Number.isInteger(output.exitCode) || output.exitCode === 0) {
-      process.stdout.write(raw);
+      return { stdout: buildPreToolUseAdditionalContext(output.additionalContext), exitCode };
    }
-
-    return Number.isInteger(output.exitCode) ? output.exitCode : 0;
+    if (Object.prototype.hasOwnProperty.call(output, 'stdout')) {
+      return { stdout: String(output.stdout ?? ''), exitCode };
+    }
+    return { stdout: exitCode === 0 ? raw : '', exitCode };
  }

-  process.stdout.write(raw);
-  return 0;
+  return { stdout: raw, exitCode: 0 };
 }

-function writeLegacySpawnOutput(raw, result) {
+function resolveLegacySpawnStdout(raw, result) {
  const stdout = typeof result.stdout === 'string' ? result.stdout : '';
  if (stdout) {
-    process.stdout.write(stdout);
-    return;
+    return stdout;
  }

  if (Number.isInteger(result.status) && result.status === 0) {
-    process.stdout.write(raw);
+    return raw;
  }
+
+  return '';
 }

 function getPluginRoot() {
@@ -92,14 +104,25 @@ async function main() {
  const [, , hookId, relScriptPath, profilesCsv] = process.argv;
  const { raw, truncated } = await readStdinRaw();

+  // Oversized payloads: never echo the truncated string — a JSON document
+  // cut mid-stream is treated by the harness as a hook failure, blocking the
+  // tool call (#2222). Empty stdout + exit 0 means "no opinion", so
+  // pass-through paths fail open. The hook itself still runs and receives
+  // the truncated flag (run() context / ECC_HOOK_INPUT_TRUNCATED), so
+  // security hooks like config-protection can still choose to block.
+  const sanitizeEcho = text => (truncated && text === raw ? '' : text);
+  if (truncated) {
+    process.stderr.write(`[Hook] stdin exceeded ${MAX_STDIN} bytes for ${hookId || 'unknown'}; suppressing pass-through (fail-open unless the hook blocks)\n`);
+  }
+
  if (!hookId || !relScriptPath) {
-    process.stdout.write(raw);
-    process.exit(0);
+    exitWithStdout(sanitizeEcho(raw), 0);
+    return;
  }

  if (!isHookEnabled(hookId, { profiles: profilesCsv })) {
-    process.stdout.write(raw);
-    process.exit(0);
+    exitWithStdout(sanitizeEcho(raw), 0);
+    return;
  }

  const pluginRoot = getPluginRoot();
@@ -109,14 +132,14 @@ async function main() {
  // Prevent path traversal outside the plugin root
  if (!scriptPath.startsWith(resolvedRoot + path.sep)) {
    process.stderr.write(`[Hook] Path traversal rejected for ${hookId}: ${scriptPath}\n`);
-    process.stdout.write(raw);
-    process.exit(0);
+    exitWithStdout(sanitizeEcho(raw), 0);
+    return;
  }

  if (!fs.existsSync(scriptPath)) {
    process.stderr.write(`[Hook] Script not found for ${hookId}: ${scriptPath}\n`);
-    process.stdout.write(raw);
-    process.exit(0);
+    exitWithStdout(sanitizeEcho(raw), 0);
+    return;
  }

  // Prefer direct require() when the hook exports a run(rawInput) function.
@@ -147,12 +170,13 @@ async function main() {
        truncated,
        maxStdin: MAX_STDIN
      });
-      process.exit(emitHookResult(raw, output));
+      const result = resolveHookResult(raw, output);
+      exitWithStdout(sanitizeEcho(result.stdout), result.exitCode);
    } catch (runErr) {
      process.stderr.write(`[Hook] run() error for ${hookId}: ${runErr.message}\n`);
-      process.stdout.write(raw);
+      exitWithStdout(sanitizeEcho(raw), 0);
    }
-    process.exit(0);
+    return;
  }

  // Legacy path: spawn a child Node process for hooks without run() export
@@ -171,20 +195,17 @@ async function main() {
    timeout: 30000
  });

-  writeLegacySpawnOutput(raw, result);
+  const legacyStdout = sanitizeEcho(resolveLegacySpawnStdout(raw, result));
  if (result.stderr) process.stderr.write(result.stderr);

  if (result.error || result.signal || result.status === null) {
-    const failureDetail = result.error
-      ? result.error.message
-      : result.signal
-        ? `terminated by signal ${result.signal}`
-        : 'missing exit status';
+    const failureDetail = result.error ? result.error.message : result.signal ? `terminated by signal ${result.signal}` : 'missing exit status';
    writeStderr(`[Hook] legacy hook execution failed for ${hookId}: ${failureDetail}`);
-    process.exit(1);
+    exitWithStdout(legacyStdout, 1);
+    return;
  }

-  process.exit(Number.isInteger(result.status) ? result.status : 0);
+  exitWithStdout(legacyStdout, Number.isInteger(result.status) ? result.status : 0);
 }

 main().catch(err => {
--- a/scripts/hooks/stop-format-typecheck.js
+++ b/scripts/hooks/stop-format-typecheck.js
@@ -196,13 +196,30 @@ function run(rawInput) {

 if (require.main === module) {
  let stdinData = '';
+  let truncated = false;
  process.stdin.setEncoding('utf8');
  process.stdin.on('data', chunk => {
-    if (stdinData.length < MAX_STDIN) stdinData += chunk.substring(0, MAX_STDIN - stdinData.length);
+    if (stdinData.length < MAX_STDIN) {
+      const remaining = MAX_STDIN - stdinData.length;
+      stdinData += chunk.substring(0, remaining);
+      if (chunk.length > remaining) truncated = true;
+    } else {
+      truncated = true;
+    }
  });
  process.stdin.on('end', () => {
-    process.stdout.write(run(stdinData));
-    process.exit(0);
+    const output = run(stdinData);
+    // Never echo truncated stdin (invalid JSON would be reported as a Stop
+    // hook failure, #2090); flush stdout before exiting so large payloads
+    // are not cut at the pipe buffer.
+    if (truncated) {
+      process.stderr.write('[Hook] stop-format-typecheck: stdin exceeded 1MB; suppressing pass-through (fail-open)\n');
+      process.exit(0);
+    }
+    if (!output) {
+      process.exit(0);
+    }
+    process.stdout.write(output, () => process.exit(0));
  });
 }