Commit Graph

29 Commits

Author SHA1 Message Date
Affaan Mustafa
6319c7d309 fix: stability batch — hook stdin truncation, Codex exa TOML, Stop hook JSON, GateGuard repetition (#2227)
* fix(hooks): fail open on oversized stdin instead of echoing truncated JSON (#2222)

run-with-flags.js capped stdin at 1MB but every fallthrough path still
echoed the truncated string to stdout. The harness parses hook stdout as
JSON, got a document cut mid-stream, and blocked the tool call — so any
Edit/Write with a >1MB hook payload was permanently blocked by every
registered pre-write hook, before ECC_HOOK_PROFILE / ECC_DISABLED_HOOKS
gating could run.

- Exit 0 with empty stdout (no opinion) when the stdin cap trips, before
  any echo or gating logic.
- Flush stdout via write callback before process.exit: exiting right
  after stdout.write() dropped everything past the ~64KB pipe buffer,
  cutting even sub-cap pass-through payloads mid-JSON.

Regression tests cover the enabled, disabled, and missing-arg paths for
oversized payloads plus full echo of sub-cap >64KB payloads.

* fix(codex): stop emitting invalid exa url entry, align merge with connector policy (#2224)

The Codex MCP merge declared exa with a url key, but Codex's
[mcp_servers.*] TOML schema is stdio-only — the url key makes the
entire config.toml fail to load, bricking both the codex CLI and the
desktop app. Every install/update re-injected the line because the
urlEntry branch treated the broken entry as present.

- ECC_SERVERS now emits only the current default set per
  docs/MCP-CONNECTOR-POLICY.md: chrome-devtools (stdio, command/args).
  Retired servers (supabase, playwright, context7, exa, github, memory,
  sequential-thinking) are never re-emitted; existing user-managed
  entries are untouched.
- The merge now repairs the exact ECC-emitted broken form (url-only
  exa entry) on every run so re-running the installer fixes broken
  configs instead of preserving them. User stdio exa entries
  (command + mcp-remote) are left alone.
- check-codex-global-state.sh requires chrome-devtools instead of the
  retired set, and flags url-only exa entries with a repair hint.

Tests cover repair, re-run idempotence, stdio-entry preservation, and
no-retired-server emission in add, update, dry-run, and disabled modes.

* fix(hooks): never echo truncated stdin from Stop hooks (#2090)

Stop hooks follow the ECC pass-through convention (echo stdin on
stdout), but every echoing Stop hook capped stdin and echoed the capped
string. The Stop payload carries last_assistant_message, so a long
final assistant message produced a JSON document cut mid-stream on
stdout, which the harness reports as 'Stop hook error: JSON validation
failed' across the whole Stop chain.

Reproduced: a Stop payload with a >64KB last_assistant_message run
through run-with-flags + cost-tracker emitted exactly 65536 bytes of
invalid JSON (cost-tracker capped stdin at 64KB — far below realistic
Stop payloads).

- cost-tracker: raise the cap to 1MB (matching all other hooks) and
  suppress the pass-through echo when stdin was truncated.
- check-console-log, stop-format-typecheck, desktop-notify: suppress
  the echo when stdin was truncated; flush stdout before process.exit
  so sub-cap payloads are not cut at the ~64KB pipe buffer.
- All hooks keep exiting 0 (fail-open); diagnostics go to stderr.

New stop-hooks-stdout test asserts the contract for every registered
Stop hook: stdout is empty or valid JSON, exit code 0 — for realistic
100KB payloads and oversized >1MB payloads, via the production runner
and via direct invocation. Updated the old hooks.test.js case that
codified the truncated-echo behavior.

* fix(hooks): dampen GateGuard fact-force repetition in long sessions (#2142)

In long autonomous sessions the fact-force gate produced 10+
near-identical 'state facts -> blocked -> restate -> retry' blocks in
one context window, which measurably raises the odds of the model
collapsing into a degenerate single-token repetition loop.

- Track a per-session fact_force_denials counter in GateGuard state
  (merged max across concurrent writers, reset with the session, robust
  to malformed on-disk values).
- The first GATEGUARD_FACT_FORCE_FULL_DENIALS denials (default 3) keep
  the full four-fact block; later denials emit a condensed single-line
  message that carries the denial ordinal, so consecutive denials are
  structurally different and never textually identical.
- True retries of the same target remain allowed without re-prompting
  (unchanged). Destructive-Bash and routine-Bash gates are unchanged,
  as are the ECC_GATEGUARD=off / ECC_DISABLED_HOOKS escape hatches.

Eight new tests cover budget counting, condensed format, ordinal
advancement, retry pass-through, env tuning, malformed state, MultiEdit
dampening, and destructive-gate exemption.

* fix(hooks): keep security hooks able to block on oversized stdin (#2222)

Refine the truncation fail-open: instead of skipping the hook entirely,
the runner now suppresses only its own raw-echo when stdin was
truncated. The hook still executes and receives the truncated flag
(run() context / ECC_HOOK_INPUT_TRUNCATED), so config-protection keeps
blocking truncated protected-config payloads (its test requires exit 2)
while pass-through hooks fail open with empty stdout as before.

* style: apply repo formatter to touched hook files
2026-06-11 00:31:33 -04:00
bymle
0cb8907e14 fix(gateguard): gate force/path git checkout as destructive (#2158)
* fix(gateguard): gate force/path git checkout as destructive

The destructive-command gate's `checkout` handler only flagged
`git checkout -- <path>`. It missed `git checkout --force` / `-f <branch>`
and `git checkout .`, all of which discard uncommitted working-tree changes,
so they bypassed the gate (once the once-per-session routine-Bash gate is
satisfied, they ran with no challenge). The sibling `switch` handler already
covers these force forms; mirror it for `checkout`.

* test(gateguard): document Test 7b force-checkout case

---------

Co-authored-by: bymle <229636660+bymle@users.noreply.github.com>
2026-06-07 13:26:08 +08:00
Gaurav Dubey
4afdb90800 feat(gateguard): add env knobs for routine bash gate + extra destructive patterns (#2161)
* feat(gateguard): add env knobs for routine bash gate + extra destructive patterns

The JS port of gateguard-fact-force has two bash gates: a destructive
gate (rm -rf, drop table, git push --force, etc.) that operators want
to keep, and a once-per-session routine gate that fires on the very
first bash invocation regardless of intent. Operators on hosts where
the routine gate is friction without signal (Cursor, OpenCode, etc.)
have been maintaining local patches that get clobbered on every plugin
update; the Python upstream gateguard-ai already exposes equivalent
config via .gateguard.yml.

Adds two env vars, both off-by-default so existing behavior is
preserved:

- GATEGUARD_BASH_ROUTINE_DISABLED — truthy values (1, true, on, yes,
  enabled) skip the routine bash gate. Destructive gate is unaffected.
- GATEGUARD_BASH_EXTRA_DESTRUCTIVE — regex source string for additional
  destructive patterns. Matches against the same quote-stripped,
  subshell-flattened command the built-in DESTRUCTIVE_SQL_DD regex sees,
  so a custom phrase inside $(...) or backticks is also caught. A
  malformed regex is logged once to stderr and treated as not configured
  rather than crashing the hook (hooks must never block tool execution
  unexpectedly).

Twelve new tests pin both env vars (truthy aliases, falsy values, unset
baseline, destructive-gate-still-fires, alternation members, malformed
regex degrades safely, custom phrase inside command substitution).
Existing 2619/2619 tests still pass; eslint clean.

Fixes #2078

* fix(gateguard): reset extra-destructive warn-once gate when env value changes

Both reviewers (CodeRabbit + cubic) flagged that
extraDestructiveWarnLogged was never reset when GATEGUARD_BASH_EXTRA_DESTRUCTIVE
flipped from one invalid regex to a different invalid regex. The
sticky boolean meant a long-running process saw bad-pattern-a's
warning then silently swallowed bad-pattern-b's parse failure.

Fix: clear extraDestructiveWarnLogged whenever the cache key changes
(i.e. before the regex compile attempt). The warn-once-per-distinct-
pattern invariant now matches the per-key cache invariant.

Adds a same-process regression test via loadDirectHook() that spies on
process.stderr.write and asserts: same bad pattern warns once across
multiple invocations; switching to a different bad pattern emits a
second warning; switching to a valid regex emits zero warnings.
2026-06-07 13:01:30 +08:00
luyua9
14d88e517b fix(gateguard): preserve quoted git introspection args 2026-05-19 13:24:17 -04:00
SeungHyun
8cfadfea28 fix(hooks): close grouped command bypasses in gateguard (#1912)
Inspect executable bodies inside plain subshells and brace groups before applying destructive command classifiers.\n\nCo-authored-by: Jamkris <82251632+Jamkris@users.noreply.github.com>
2026-05-15 01:39:15 -04:00
SeungHyun
0e169fecbc fix: harden GateGuard destructive bash tokenizer
Co-authored-by: Jamkris <dltmdgus1412@gmail.com>
2026-05-13 02:43:04 -04:00
Affaan Mustafa
7b964402ee fix: bypass GateGuard file gates in subagents (#1710) 2026-05-11 01:51:24 -04:00
Affaan Mustafa
bb40978e31 fix: show correct gateguard hook recovery id 2026-04-30 11:26:15 -04:00
Affaan Mustafa
7c5452f4fa fix: keep gateguard destructive gate strict 2026-04-30 11:26:15 -04:00
Affaan Mustafa
cfe770a735 fix: add gateguard recovery escape hatch 2026-04-30 11:26:15 -04:00
Affaan Mustafa
95bef977c1 fix: fail open on gateguard state write errors 2026-04-30 08:15:27 -04:00
Affaan Mustafa
1188aeafc4 fix: refine gateguard destructive git detection 2026-04-29 22:41:22 -04:00
Affaan Mustafa
c3ea7a1e5e fix: preserve gateguard concurrent state writes (#1623) 2026-04-29 19:31:11 -04:00
Junming
20041294d9 fix(gateguard): rewrite routineBashMsg to use fact-presentation pattern (#1531)
* fix(gateguard): rewrite routineBashMsg to use fact-presentation pattern

The imperative 'Quote user's instruction verbatim. Then retry.' phrasing
triggers Claude Code's runtime anti-prompt-injection filter, deadlocking
the first Bash call of every session. The sibling gates (edit, write,
destructive) use multi-point fact-list framing that the runtime accepts.

Align routineBashMsg with that pattern to restore the gate's intended
behavior without changing run(), state schema, or any public API.

Closes #1530

* docs(gateguard): sync SKILL.md routine gate spec with new message format

CodeRabbit flagged that skills/gateguard/SKILL.md still described the
pre-fix imperative message. Update the Routine Bash Gate section to
match the numbered fact-list format used by the new routineBashMsg().
2026-04-21 18:02:16 -04:00
Affaan Mustafa
8776c4f8f3 fix: harden urgent install and gateguard patch 2026-04-14 19:44:08 -07:00
Affaan Mustafa
3be24a5704 fix: restore urgent PR CI health 2026-04-14 19:26:24 -07:00
Affaan Mustafa
76b6e22b4d fix: unblock urgent install and gateguard regressions 2026-04-14 19:23:07 -07:00
Affaan Mustafa
6c67566767 fix: keep gateguard session state alive 2026-04-13 00:58:50 -07:00
seto
dd2962ee92 fix: 5 bugs + 2 tests from 3-agent deep bughunt
Bugs fixed:
- B1: JS gate messages still said "cat one real record" -> redacted/synthetic
- B2: Destructive bash key used 200-char truncation (collision bypass) -> SHA256 hash
- B3: sanitizePath only stripped \n\r -> now strips null bytes, bidi overrides, all control chars
- B4: Tool name matching was case-sensitive (latent bypass) -> lookup map normalization
- B5: SKILL.md Gate Types missing MultiEdit -> added with explanation

Tests added:
- T1: MultiEdit gate denies first unchecked file (CRITICAL - was untested)
- T2: MultiEdit allows after all files gated

11/11 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:32:46 +09:00
seto
4dbed5ff5b fix: cubic-dev-ai round 2 — 3 issues across SKILL.md + pruning
P1: Gate message asked for raw production data records — changed to
    "redacted or synthetic values" to prevent sensitive data exfiltration

P2: SKILL.md description now includes MultiEdit (was missing after
    MultiEdit gate was added in previous commit)

P2: Session key pruning now caps __prefixed keys at 50 to prevent
    unbounded growth even in theoretical edge cases

9/9 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 16:11:33 +09:00
seto
5540282dcb fix: remove unnecessary disk I/O + fix test cleanup
- isChecked() no longer calls saveState() — read-only operation
  should not write to disk (was causing 3x writes per tool call)
- Test cleanup uses fs.rmSync(recursive) instead of fs.rmdirSync
  which failed with ENOTEMPTY when .tmp files remained

9/9 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:41:58 +09:00
seto
67256194a0 fix: P1 test state-file PID mismatch + P2 session key eviction
P1 (cubic-dev-ai): Test process PID differs from spawned hook PID,
so test was seeding/clearing wrong state file. Fix: pass fixed
CLAUDE_SESSION_ID='gateguard-test-session' to spawned hooks.

P2 (cubic-dev-ai): Pruning checked array could evict __bash_session__
and other session keys, causing gates to re-fire mid-session. Fix:
preserve __prefixed keys during pruning, only evict file-path entries.

9/9 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:40:13 +09:00
seto
6ed1c643e7 fix: MultiEdit gate bypass — handle edits[].file_path correctly
P1 bug reported by greptile-apps: MultiEdit uses toolInput.edits[].file_path,
not toolInput.file_path. The gate was silently allowing all MultiEdit calls.

Fix: separate MultiEdit into its own branch that iterates edits array
and gates on the first unchecked file_path.

9/9 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:37:39 +09:00
seto
45823fcede fix: session-scoped state to prevent cross-session race
Addresses reviewer feedback from @affaan-m:

1. State keyed by CLAUDE_SESSION_ID / ECC_SESSION_ID
   - Falls back to pid-based isolation when env vars absent
   - State file: state-{sessionId}.json (was .session_state.json)

2. Atomic write+rename semantics
   - Write to temp file, then fs.renameSync to final path
   - Prevents partial reads from concurrent hooks

3. Bounded checked list (MAX_CHECKED_ENTRIES = 500)
   - Prunes to last 500 entries when cap exceeded
   - Stale session files auto-deleted after 1 hour

9/9 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:30:34 +09:00
seto
9a64e0d271 fix: gate MultiEdit tool alongside Edit/Write
MultiEdit was bypassing the fact-forcing gate because only Edit and
Write were checked. Now MultiEdit triggers the same edit gate (list
importers, public API, data schemas) before allowing file modifications.

Updated both the hook logic and hooks.json matcher pattern.

Addresses coderabbit/greptile/cubic-dev: "MultiEdit bypasses gate"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 18:18:16 +09:00
seto
b6a290d061 fix: allow destructive bash retry after facts presented
Destructive bash gate previously denied every invocation with no
isChecked call, creating an infinite deny loop. Now gates per-command
on first attempt and allows retry after the model presents the required
facts (targets, rollback plan, user instruction).

Addresses greptile P1: "Destructive bash gate permanently blocks"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 18:08:15 +09:00
seto
96139b2dad fix: address P2 review feedback (coderabbitai, cubic-dev-ai)
- GATEGUARD_STATE_DIR env var for test isolation (hook + tests)
- Exit code assertions on all 9 tests (no vacuous passes)
- Non-vacuous allow-path assertions (verify pass-through preserves input)
- Robust newline-injection assertion
- clearState() now reports errors instead of swallowing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 18:04:09 +09:00
seto
8a2d13187c fix: address P1 review feedback from greptile bot
1. Use run-with-flags.js wrapper (supports ECC_HOOK_PROFILE, ECC_DISABLED_HOOKS)
2. Add session timeout (30min inactivity = state reset, fixes "once ever" bug)
3. Add 9 integration tests (deny/allow/timeout/sanitize/disable)

Refactored hook to module.exports.run() pattern for direct require() by
run-with-flags.js (~50-100ms faster per invocation).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 17:42:32 +09:00
seto
5a03922934 feat(hooks,skills): add gateguard fact-forcing pre-action gate
A PreToolUse hook that forces Claude to investigate before editing.
Instead of self-evaluation ("are you sure?"), it demands concrete facts:
importers, public API, data schemas, user instruction.

A/B tested: +2.25 quality points (9.0 vs 6.75) across two independent tasks.

- scripts/hooks/gateguard-fact-force.js — standalone Node.js hook
- skills/gateguard/SKILL.md — skill documentation
- hooks/hooks.json — PreToolUse entries for Edit|Write and Bash

Full package with config: pip install gateguard-ai
Repo: https://github.com/zunoworks/gateguard

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 11:41:33 +09:00