fix: stability batch — hook stdin truncation, Codex exa TOML, Stop hook JSON, GateGuard repetition (#2227)

* fix(hooks): fail open on oversized stdin instead of echoing truncated JSON (#2222) run-with-flags.js capped stdin at 1MB but every fallthrough path still echoed the truncated string to stdout. The harness parses hook stdout as JSON, got a document cut mid-stream, and blocked the tool call — so any Edit/Write with a >1MB hook payload was permanently blocked by every registered pre-write hook, before ECC_HOOK_PROFILE / ECC_DISABLED_HOOKS gating could run. - Exit 0 with empty stdout (no opinion) when the stdin cap trips, before any echo or gating logic. - Flush stdout via write callback before process.exit: exiting right after stdout.write() dropped everything past the ~64KB pipe buffer, cutting even sub-cap pass-through payloads mid-JSON. Regression tests cover the enabled, disabled, and missing-arg paths for oversized payloads plus full echo of sub-cap >64KB payloads. * fix(codex): stop emitting invalid exa url entry, align merge with connector policy (#2224) The Codex MCP merge declared exa with a url key, but Codex's [mcp_servers.*] TOML schema is stdio-only — the url key makes the entire config.toml fail to load, bricking both the codex CLI and the desktop app. Every install/update re-injected the line because the urlEntry branch treated the broken entry as present. - ECC_SERVERS now emits only the current default set per docs/MCP-CONNECTOR-POLICY.md: chrome-devtools (stdio, command/args). Retired servers (supabase, playwright, context7, exa, github, memory, sequential-thinking) are never re-emitted; existing user-managed entries are untouched. - The merge now repairs the exact ECC-emitted broken form (url-only exa entry) on every run so re-running the installer fixes broken configs instead of preserving them. User stdio exa entries (command + mcp-remote) are left alone. - check-codex-global-state.sh requires chrome-devtools instead of the retired set, and flags url-only exa entries with a repair hint. Tests cover repair, re-run idempotence, stdio-entry preservation, and no-retired-server emission in add, update, dry-run, and disabled modes. * fix(hooks): never echo truncated stdin from Stop hooks (#2090) Stop hooks follow the ECC pass-through convention (echo stdin on stdout), but every echoing Stop hook capped stdin and echoed the capped string. The Stop payload carries last_assistant_message, so a long final assistant message produced a JSON document cut mid-stream on stdout, which the harness reports as 'Stop hook error: JSON validation failed' across the whole Stop chain. Reproduced: a Stop payload with a >64KB last_assistant_message run through run-with-flags + cost-tracker emitted exactly 65536 bytes of invalid JSON (cost-tracker capped stdin at 64KB — far below realistic Stop payloads). - cost-tracker: raise the cap to 1MB (matching all other hooks) and suppress the pass-through echo when stdin was truncated. - check-console-log, stop-format-typecheck, desktop-notify: suppress the echo when stdin was truncated; flush stdout before process.exit so sub-cap payloads are not cut at the ~64KB pipe buffer. - All hooks keep exiting 0 (fail-open); diagnostics go to stderr. New stop-hooks-stdout test asserts the contract for every registered Stop hook: stdout is empty or valid JSON, exit code 0 — for realistic 100KB payloads and oversized >1MB payloads, via the production runner and via direct invocation. Updated the old hooks.test.js case that codified the truncated-echo behavior. * fix(hooks): dampen GateGuard fact-force repetition in long sessions (#2142) In long autonomous sessions the fact-force gate produced 10+ near-identical 'state facts -> blocked -> restate -> retry' blocks in one context window, which measurably raises the odds of the model collapsing into a degenerate single-token repetition loop. - Track a per-session fact_force_denials counter in GateGuard state (merged max across concurrent writers, reset with the session, robust to malformed on-disk values). - The first GATEGUARD_FACT_FORCE_FULL_DENIALS denials (default 3) keep the full four-fact block; later denials emit a condensed single-line message that carries the denial ordinal, so consecutive denials are structurally different and never textually identical. - True retries of the same target remain allowed without re-prompting (unchanged). Destructive-Bash and routine-Bash gates are unchanged, as are the ECC_GATEGUARD=off / ECC_DISABLED_HOOKS escape hatches. Eight new tests cover budget counting, condensed format, ordinal advancement, retry pass-through, env tuning, malformed state, MultiEdit dampening, and destructive-gate exemption. * fix(hooks): keep security hooks able to block on oversized stdin (#2222) Refine the truncation fail-open: instead of skipping the hook entirely, the runner now suppresses only its own raw-echo when stdin was truncated. The hook still executes and receives the truncated flag (run() context / ECC_HOOK_INPUT_TRUNCATED), so config-protection keeps blocking truncated protected-config payloads (its test requires exit 2) while pass-through hooks fail open with empty stdout as before. * style: apply repo formatter to touched hook files
2026-06-12 19:23:07 +08:00 · 2026-06-11 00:31:33 -04:00
parent 3bdb4a5e12
commit 6319c7d309
14 changed files with 846 additions and 151 deletions
--- a/tests/scripts/codex-hooks.test.js
+++ b/tests/scripts/codex-hooks.test.js
@@ -261,7 +261,7 @@ if (
 else failed++;

 if (
-  test('merge-mcp-config dry-run appends all recommended servers without mutating target', () => {
+  test('merge-mcp-config dry-run appends the current default set without mutating target', () => {
    const tempDir = createTempDir('mcp-merge-dry-run-');
    const configPath = path.join(tempDir, 'config.toml');
    const original = '';
@@ -272,9 +272,12 @@ if (

      assert.strictEqual(result.status, 0, `${result.stdout}\n${result.stderr}`);
      assert.match(result.stdout, /Package manager: npm \(exec: npx\)/);
-      assert.match(result.stdout, /\[add\] mcp_servers\.supabase/);
-      assert.match(result.stdout, /\[mcp_servers\.github\]/);
+      assert.match(result.stdout, /\[add\] mcp_servers\.chrome-devtools/);
+      assert.match(result.stdout, /\[mcp_servers\.chrome-devtools\]/);
      assert.match(result.stdout, /Dry run/);
+      // Retired defaults (June 2026 connector policy) must not be emitted.
+      assert.doesNotMatch(result.stdout, /mcp_servers\.(supabase|playwright|context7|exa|github|memory|sequential-thinking)\b/);
+      assert.doesNotMatch(result.stdout, /url = /);
      assert.strictEqual(fs.readFileSync(configPath, 'utf8'), original);
    } finally {
      cleanup(tempDir);
@@ -296,14 +299,17 @@ if (

      const merged = fs.readFileSync(configPath, 'utf8');
      const parsed = TOML.parse(merged);
-      assert.strictEqual(parsed.mcp_servers.exa.url, 'https://mcp.exa.ai/mcp');
-      assert.strictEqual(parsed.mcp_servers.github.command, 'bash');
-      assert.deepStrictEqual(parsed.mcp_servers.memory.args, ['@modelcontextprotocol/server-memory']);
-      assert.strictEqual(parsed.mcp_servers.supabase.tool_timeout_sec, 120);
+      assert.strictEqual(parsed.mcp_servers['chrome-devtools'].command, 'npx');
+      assert.deepStrictEqual(parsed.mcp_servers['chrome-devtools'].args, ['chrome-devtools-mcp@latest']);
+      assert.strictEqual(parsed.mcp_servers['chrome-devtools'].startup_timeout_sec, 30);
+      // No retired server may be (re-)emitted — exa's url form broke Codex (#2224).
+      assert.strictEqual(parsed.mcp_servers.exa, undefined);
+      assert.strictEqual(parsed.mcp_servers.github, undefined);
+      assert.strictEqual(parsed.mcp_servers.supabase, undefined);

      const second = runNode(mergeMcpConfigScript, [configPath], deterministicPackageEnv);
      assert.strictEqual(second.status, 0, `${second.stdout}\n${second.stderr}`);
-      assert.match(second.stdout, /\[ok\] mcp_servers\.github/);
+      assert.match(second.stdout, /\[ok\] mcp_servers\.chrome-devtools/);
      assert.match(second.stdout, /All ECC MCP servers already present/);
      assert.strictEqual(fs.readFileSync(configPath, 'utf8'), merged);
    } finally {
@@ -315,24 +321,88 @@ if (
 else failed++;

 if (
-  test('merge-mcp-config update dry-run reports canonical and legacy section refreshes', () => {
+  test('merge-mcp-config repairs the invalid exa url entry from earlier ECC versions (#2224)', () => {
+    const tempDir = createTempDir('mcp-merge-exa-repair-');
+    const configPath = path.join(tempDir, 'config.toml');
+    const original = [
+      '[mcp_servers.github]',
+      'command = "npx"',
+      'args = ["-y", "@modelcontextprotocol/server-github"]',
+      '',
+      '[mcp_servers.exa]',
+      'url = "https://mcp.exa.ai/mcp"',
+      '',
+    ].join('\n');
+
+    try {
+      fs.writeFileSync(configPath, original);
+      const result = runNode(mergeMcpConfigScript, [configPath], deterministicPackageEnv);
+
+      assert.strictEqual(result.status, 0, `${result.stdout}\n${result.stderr}`);
+      assert.match(result.stdout, /\[repair\] mcp_servers\.exa/);
+
+      const updated = fs.readFileSync(configPath, 'utf8');
+      const parsed = TOML.parse(updated);
+      assert.strictEqual(parsed.mcp_servers.exa, undefined, 'invalid exa url entry must be removed');
+      assert.doesNotMatch(updated, /url = "https:\/\/mcp\.exa\.ai\/mcp"/);
+      // User-managed servers are untouched; current default is added.
+      assert.strictEqual(parsed.mcp_servers.github.command, 'npx');
+      assert.strictEqual(parsed.mcp_servers['chrome-devtools'].command, 'npx');
+
+      // Re-running must not re-introduce the invalid entry.
+      const second = runNode(mergeMcpConfigScript, [configPath], deterministicPackageEnv);
+      assert.strictEqual(second.status, 0, `${second.stdout}\n${second.stderr}`);
+      assert.doesNotMatch(fs.readFileSync(configPath, 'utf8'), /mcp_servers\.exa/);
+    } finally {
+      cleanup(tempDir);
+    }
+  })
+)
+  passed++;
+else failed++;
+
+if (
+  test('merge-mcp-config leaves a user-managed stdio exa entry untouched', () => {
+    const tempDir = createTempDir('mcp-merge-exa-stdio-');
+    const configPath = path.join(tempDir, 'config.toml');
+    const original = [
+      '[mcp_servers.exa]',
+      'command = "npx"',
+      'args = ["-y", "mcp-remote", "https://mcp.exa.ai/mcp"]',
+      'startup_timeout_sec = 30',
+      '',
+    ].join('\n');
+
+    try {
+      fs.writeFileSync(configPath, original);
+      const result = runNode(mergeMcpConfigScript, [configPath], deterministicPackageEnv);
+
+      assert.strictEqual(result.status, 0, `${result.stdout}\n${result.stderr}`);
+      assert.doesNotMatch(result.stdout, /\[repair\]/);
+
+      const parsed = TOML.parse(fs.readFileSync(configPath, 'utf8'));
+      assert.strictEqual(parsed.mcp_servers.exa.command, 'npx');
+      assert.deepStrictEqual(parsed.mcp_servers.exa.args, ['-y', 'mcp-remote', 'https://mcp.exa.ai/mcp']);
+    } finally {
+      cleanup(tempDir);
+    }
+  })
+)
+  passed++;
+else failed++;
+
+if (
+  test('merge-mcp-config update dry-run refreshes managed sections and leaves user servers alone', () => {
    const tempDir = createTempDir('mcp-merge-update-dry-run-');
    const configPath = path.join(tempDir, 'config.toml');
    const original = [
+      '[mcp_servers.chrome-devtools]',
+      'command = "custom"',
+      'args = ["old"]',
+      '',
      '[mcp_servers.context7]',
-      'command = "custom"',
-      'args = ["old"]',
-      '',
-      '[mcp_servers.context7-mcp]',
      'command = "npx"',
-      'args = ["legacy"]',
-      '',
-      '[mcp_servers.supabase]',
-      'command = "custom"',
-      'args = ["old"]',
-      '',
-      '[mcp_servers.supabase.env]',
-      'SUPABASE_ACCESS_TOKEN = "token"',
+      'args = ["-y", "@upstash/context7-mcp@latest"]',
      '',
    ].join('\n');

@@ -341,11 +411,10 @@ if (
      const result = runNode(mergeMcpConfigScript, [configPath, '--update-mcp', '--dry-run'], deterministicPackageEnv);

      assert.strictEqual(result.status, 0, `${result.stdout}\n${result.stderr}`);
-      assert.match(result.stdout, /\[remove\] mcp_servers\.context7/);
-      assert.match(result.stdout, /\[remove\] mcp_servers\.context7-mcp/);
-      assert.match(result.stdout, /\[remove\] mcp_servers\.supabase/);
-      assert.match(result.stdout, /\[mcp_servers\.supabase\]/);
-      assert.match(result.stdout, /\[mcp_servers\.context7\]/);
+      assert.match(result.stdout, /\[remove\] mcp_servers\.chrome-devtools/);
+      assert.match(result.stdout, /\[mcp_servers\.chrome-devtools\]/);
+      // Retired servers are no longer ECC-managed: never removed or re-added.
+      assert.doesNotMatch(result.stdout, /\[remove\] mcp_servers\.context7/);
      assert.strictEqual(fs.readFileSync(configPath, 'utf8'), original);
    } finally {
      cleanup(tempDir);
@@ -356,38 +425,31 @@ if (
 else failed++;

 if (
-  test('merge-mcp-config removes disabled legacy servers without appending replacements', () => {
+  test('merge-mcp-config removes disabled servers without appending replacements', () => {
    const tempDir = createTempDir('mcp-merge-disabled-');
    const configPath = path.join(tempDir, 'config.toml');
    const original = [
-      '[mcp_servers.context7-mcp]',
+      '[mcp_servers.chrome-devtools]',
      'command = "npx"',
-      'args = ["legacy"]',
-      '',
-      '[mcp_servers.exa]',
-      'url = "https://mcp.exa.ai/mcp"',
+      'args = ["chrome-devtools-mcp@latest"]',
      '',
    ].join('\n');
-    const allServersDisabled = 'supabase,playwright,context7,exa,github,memory,sequential-thinking';

    try {
      fs.writeFileSync(configPath, original);
      const result = runNode(mergeMcpConfigScript, [configPath], {
        ...deterministicPackageEnv,
-        ECC_DISABLED_MCPS: allServersDisabled,
+        ECC_DISABLED_MCPS: 'chrome-devtools',
      });

      assert.strictEqual(result.status, 0, `${result.stdout}\n${result.stderr}`);
      assert.match(result.stdout, /Disabled via ECC_DISABLED_MCPS/);
-      assert.match(result.stdout, /\[skip\] mcp_servers\.context7 \(disabled\)/);
-      assert.match(result.stdout, /\[skip\] mcp_servers\.exa \(disabled\)/);
-      assert.match(result.stdout, /\[update\] mcp_servers\.context7-mcp \(disabled\)/);
-      assert.match(result.stdout, /\[update\] mcp_servers\.exa \(disabled\)/);
-      assert.match(result.stdout, /Done\. Removed 2 disabled server\(s\)\./);
+      assert.match(result.stdout, /\[skip\] mcp_servers\.chrome-devtools \(disabled\)/);
+      assert.match(result.stdout, /\[update\] mcp_servers\.chrome-devtools \(disabled\)/);
+      assert.match(result.stdout, /Done\. Removed 1 server section\(s\)\./);

      const updated = fs.readFileSync(configPath, 'utf8');
-      assert.doesNotMatch(updated, /context7-mcp/);
-      assert.doesNotMatch(updated, /mcp_servers\.exa/);
+      assert.doesNotMatch(updated, /chrome-devtools/);
    } finally {
      cleanup(tempDir);
    }
@@ -454,7 +516,10 @@ if (
      assert.strictEqual(parsedConfig.agents.explorer.config_file, 'agents/explorer.toml');
      assert.strictEqual(parsedConfig.agents.reviewer.config_file, 'agents/reviewer.toml');
      assert.strictEqual(parsedConfig.agents.docs_researcher.config_file, 'agents/docs-researcher.toml');
-      assert.ok(parsedConfig.mcp_servers.exa);
+      // Current default connector is added; retired servers are not emitted,
+      // and pre-existing user-managed entries are preserved untouched.
+      assert.ok(parsedConfig.mcp_servers['chrome-devtools']);
+      assert.strictEqual(parsedConfig.mcp_servers.exa, undefined);
      assert.ok(parsedConfig.mcp_servers.github);
      assert.ok(parsedConfig.mcp_servers.memory);
      assert.ok(parsedConfig.mcp_servers['sequential-thinking']);