Three fixes for the positive feedback loop causing runaway memory usage:
1. SIGUSR1 throttling in observe.sh: Signal observer only every 20
observations (configurable via ECC_OBSERVER_SIGNAL_EVERY_N) instead
of on every tool call. Uses a counter file to track invocations.
2. Re-entrancy guard in observer-loop.sh on_usr1(): ANALYZING flag
prevents parallel Claude analysis processes from spawning when
signals arrive while analysis is already running.
3. Cooldown + tail-based sampling in observer-loop.sh:
- 60s cooldown between analyses (ECC_OBSERVER_ANALYSIS_COOLDOWN)
- Only last 500 lines sent to LLM (ECC_OBSERVER_MAX_ANALYSIS_LINES)
instead of the entire observations file
Closes#521
* feat: add project cooldown log to prevent rapid observer re-spawn
Adds session-guardian.sh, called by observer-loop.sh before each Haiku
spawn. It reads ~/.claude/observer-last-run.log and blocks the cycle if
the same project was observed within OBSERVER_INTERVAL_SECONDS (default
300s).
Prevents self-referential loops where a spawned session triggers
observe.sh, which signals the observer before the cooldown has elapsed.
Uses a mkdir-based lock for safe concurrent access across multiple
simultaneously-observed projects. Log entries use tab-delimited format
to handle paths containing spaces. Fails open on lock contention.
Config:
OBSERVER_INTERVAL_SECONDS default: 300
OBSERVER_LAST_RUN_LOG default: ~/.claude/observer-last-run.log
No external dependencies. Works on macOS, Linux, Windows (Git Bash/MSYS2).
* feat: extend session-guardian with time window and idle detection gates
Adds Gate 1 (active hours check) and Gate 3 (system idle detection) to
session-guardian.sh, building on the per-project cooldown log from PR 1.
Gate 1 — Time Window:
- OBSERVER_ACTIVE_HOURS_START/END (default 800–2300 local time)
- Uses date +%k%M with 10# prefix to avoid octal crash at midnight
- Toolless on all platforms; set both vars to 0 to disable
Gate 3 — Idle Detection:
- macOS: ioreg + awk (built-in, no deps)
- Linux: xprintidle if available, else fail open
- Windows (Git Bash/MSYS2): PowerShell GetLastInputInfo via Add-Type
- Unknown/headless: always returns 0 (fail open)
- OBSERVER_MAX_IDLE_SECONDS=0 disables gate
Fixes in this commit:
- 10# base-10 prefix prevents octal arithmetic crash on midnight minutes
containing digits 8 or 9 (e.g. 00:08 = "008" is invalid octal)
- PowerShell output piped through tr -d '\r' to strip Windows CRLF;
also uses [long] cast to avoid TickCount 32-bit overflow after 24 days
- mktemp now uses log file directory instead of TMPDIR to ensure
same-filesystem mv on Linux (atomic rename instead of copy+unlink)
- mkdir -p failure exits 0 (fail open) rather than crashing under set -e
- Numeric validation on last_spawn prevents arithmetic error on corrupt log
Gate execution order: 1 (time, ~0ms) → 2 (cooldown, ~1ms) → 3 (idle, ~50ms)
* fix: harden session guardian gates
---------
Co-authored-by: Affaan Mustafa <affaan@dcube.ai>
* fix(observe): add 5-layer automated session guard to prevent self-loop observations
observe.sh currently fires for ALL hook events including automated/programmatic
sessions: the ECC observer's own Haiku analysis runs, claude-mem observer
sessions, CI pipelines, and any other tool that spawns `claude --print`.
This causes an infinite feedback loop where automated sessions generate
observations that trigger more automated analysis, burning Haiku tokens with
no human activity.
Add a 5-layer guard block after the `disabled` check:
Layer 1: agent_id payload field — only present in subagent hooks; skip any
subagent-scoped session (always automated by definition).
Layer 2: CLAUDE_CODE_ENTRYPOINT env var — Claude Code sets this to sdk-ts,
sdk-py, sdk-cli, mcp, or remote for programmatic/SDK invocations.
Skip if any non-cli entrypoint is detected. This is universal: catches
any tool using the Anthropic SDK without requiring tool cooperation.
Layer 3: ECC_HOOK_PROFILE=minimal — existing ECC mechanism; respect it here
to suppress non-essential hooks in observer contexts.
Layer 4: ECC_SKIP_OBSERVE=1 — cooperative env var any external tool can set
before spawning automated sessions (explicit opt-out contract).
Layer 5: CWD path exclusions — skip sessions whose working directory matches
known observer-session path patterns. Configurable via
ECC_OBSERVE_SKIP_PATHS (comma-separated substrings, default:
"observer-sessions,.claude-mem").
Also fix observer-loop.sh to set ECC_SKIP_OBSERVE=1 and ECC_HOOK_PROFILE=minimal
before spawning the Haiku analysis subprocess, making the observer loop
self-aware and closing the ECC→ECC self-observation loop without needing
external coordination.
Fixes: observe.sh fires unconditionally on automated sessions (#398)
* fix(observe): address review feedback — reorder guards cheapest-first, fix empty pattern bug
Two issues flagged by Copilot and CodeRabbit in PR #399:
1. Layer ordering: the agent_id check spawns a Python subprocess but ran
before the cheap env-var checks (CLAUDE_CODE_ENTRYPOINT, ECC_HOOK_PROFILE,
ECC_SKIP_OBSERVE). Reorder to put all env-var checks first (Layers 1-3),
then the subprocess-requiring agent_id check (Layer 4). Automated sessions
that set env vars — the common case — now exit without spawning Python.
2. Empty pattern bug in Layer 5: if ECC_OBSERVE_SKIP_PATHS contains a trailing
comma or spaces after commas (e.g. "path1, path2" or "path1,"), _pattern
becomes empty or whitespace-only, and the glob *""* matches every CWD,
silently disabling all observations. Fix: trim leading/trailing whitespace
from each pattern and skip empty patterns with `continue`.
* fix: fail closed for non-cli entrypoints
---------
Co-authored-by: Affaan Mustafa <affaan@dcube.ai>
- Redirect observer output to temp log before appending to main log
- Check temp log for confirmation/permission language immediately after start
- Fail closed with exit 2 if detected, preventing retry loops
* fix(continuous-learning-v2): observer background process crashes immediately
Three bugs prevent the observer from running:
1. Nested session detection: When launched from a Claude Code session,
the child process inherits CLAUDECODE env var, causing `claude` CLI
to refuse with "cannot be launched inside another session". Fix: unset
CLAUDECODE in the background process.
2. set -e kills the loop: The parent script's `set -e` is inherited by
the subshell. When `claude` exits non-zero (e.g. max turns reached),
the entire observer loop dies. Fix: `set +e` in the background process.
3. Subshell dies when parent exits: `( ... ) & disown` loses IO handles
when the parent shell exits, killing the background process. Fix: use
`nohup /bin/bash -c '...'` for full detachment, and `sleep & wait`
to allow SIGUSR1 to interrupt sleep without killing the process.
Additionally, the prompt for Haiku now includes the exact instinct file
format inline (YAML frontmatter with id/trigger/confidence/domain/source
fields), since the previous prompt referenced "the observer agent spec"
which Haiku could not actually read, resulting in instinct files that
the CLI parser could not parse.
* fix: address review feedback on observer process management
- Use `env` to pass variables to child process instead of quote-splicing,
avoiding shell injection risk from special chars in paths
- Add USR1_FIRED flag to prevent double analysis when SIGUSR1 interrupts
the sleep/wait cycle
- Track SLEEP_PID and kill it in both TERM trap and USR1 handler to
prevent orphaned sleep processes from accumulating
- Consolidate cleanup logic into a dedicated cleanup() function
* fix: guard PID file cleanup against race condition on restart
Only remove PID file in cleanup trap if it still belongs to the
current process, preventing a restarted observer from losing its
PID file when the old process exits.
- Fix MD012 trailing blank lines in commands/projects.md and commands/promote.md
- Fix MD050 strong-style in continuous-learning-v2 (escape __tests__ as inline code)
- Extract doc-file-warning hook to standalone script to fix hooks validator regex parsing
- Update session-end test to match #317 behavior (always update summary content)
- Allow shell script hooks in integration test format validation
All 992 tests passing.
- Add try-catch around readFileSync in validate-agents, validate-commands,
validate-skills to handle TOCTOU races and file read errors
- Add validate-hooks.js and all test suites to package.json test script
(was only running 4/5 validators and 0/4 test files)
- Fix shell variable injection in observe.sh: use os.environ instead of
interpolating $timestamp/$OBSERVATIONS_FILE into Python string literals
- Fix $? always being 0 in start-observer.sh: capture exit code before
conditional since `if !` inverts the status
- Add OLD_VERSION validation in release.sh and use pipe delimiter in sed
to avoid issues with slash-containing values
- Add jq dependency check in evaluate-session.sh before parsing config
- Sync .cursor/ copies of all modified shell scripts
Core library fixes:
- session-manager.js: wrap all statSync calls in try-catch to prevent
TOCTOU crashes when files are deleted between readdir and stat
- session-manager.js: use birthtime||ctime fallback for Linux compat
- session-manager.js: remove redundant existsSync before readFile
- utils.js: fix findFiles TOCTOU race on statSync inside readdir loop
Hook improvements:
- Add 1MB stdin buffer limits to all PostToolUse hooks to prevent
unbounded memory growth from large payloads
- suggest-compact.js: use fd-based atomic read+write for counter file
to reduce race window between concurrent invocations
- session-end.js: log when transcript file is missing, check
replaceInFile return value for failed timestamp updates
- start-observer.sh: log claude CLI failures instead of silently
swallowing them, check observations file exists before analysis
Test fixes:
- Fix blocking hook tests to send matching input (dev server command)
and expect correct exit code 2 instead of 1