Files
everything-claude-code/skills/continuous-learning-v2/scripts/detect-project.sh
jtzingsheim1 9661a6f042 fix(hooks): scrub secrets and harden hook security (#348)
* fix(hooks): scrub secrets and harden hook security

- Scrub common secret patterns (api_key, token, password, etc.) from
  observation logs before persisting to JSONL (observe.sh)
- Auto-purge observation files older than 30 days (observe.sh)
- Strip embedded credentials from git remote URLs before saving to
  projects.json (detect-project.sh)
- Add command prefix allowlist to runCommand — only git, node, npx,
  which, where are permitted (utils.js)
- Sanitize CLAUDE_SESSION_ID in temp file paths to prevent path
  traversal (suggest-compact.js)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(hooks): address review feedback from CodeRabbit and Cubic

- Reject shell command-chaining operators (;|&`) in runCommand, strip
  quoted sections before checking to avoid false positives (utils.js)
- Remove command string from blocked error message to avoid leaking
  secrets (utils.js)
- Fix Python regex quoting: switch outer shell string from double to
  single quotes so regex compiles correctly (observe.sh)
- Add optional auth scheme match (Bearer, Basic) to secret scrubber
  regex (observe.sh)
- Scope auto-purge to current project dir and match only archived
  files (observations-*.jsonl), not live queue (observe.sh)
- Add second fallback after session ID sanitization to prevent empty
  string (suggest-compact.js)
- Preserve backward compatibility when credential stripping changes
  project hash — detect and migrate legacy directories
  (detect-project.sh)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(hooks): block $() substitution, fix Bearer redaction, add security tests

- Add $ and \n to blocked shell metacharacters in runCommand to prevent
  command substitution via $(cmd) and newline injection (utils.js)
- Make auth scheme group capturing so Bearer/Basic is preserved in
  redacted output instead of being silently dropped (observe.sh)
- Add 10 unit tests covering runCommand allowlist blocking (rm, curl,
  bash prefixes) and metacharacter rejection (;|&`$ chaining), plus
  error message leak prevention (utils.test.js)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(hooks): scrub parse-error fallback, strengthen security tests

Address remaining reviewer feedback from CodeRabbit and Cubic:

- Scrub secrets in observe.sh parse-error fallback path (was writing
  raw unsanitized input to observations file)
- Remove redundant re.IGNORECASE flag ((?i) inline flag already set)
- Add inline comment documenting quote-stripping limitation trade-off
- Fix misleading test name for error-output test
- Add 5 new security tests: single-quote passthrough, mixed
  quoted+unquoted metacharacters, prefix boundary (no trailing space),
  npx acceptance, and newline injection
- Improve existing quoted-metacharacter test to actually exercise
  quote-stripping logic

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): block $() and backtick inside quotes in runCommand

Shell evaluates $() and backticks inside double quotes, so checking
only the unquoted portion was insufficient. Now $ and ` are rejected
anywhere in the command string, while ; | & remain quote-aware.

Addresses CodeRabbit and Cubic review feedback on PR #348.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 14:47:31 -08:00

161 lines
5.7 KiB
Bash
Executable File

#!/bin/bash
# Continuous Learning v2 - Project Detection Helper
#
# Shared logic for detecting current project context.
# Sourced by observe.sh and start-observer.sh.
#
# Exports:
# _CLV2_PROJECT_ID - Short hash identifying the project (or "global")
# _CLV2_PROJECT_NAME - Human-readable project name
# _CLV2_PROJECT_ROOT - Absolute path to project root
# _CLV2_PROJECT_DIR - Project-scoped storage directory under homunculus
#
# Also sets unprefixed convenience aliases:
# PROJECT_ID, PROJECT_NAME, PROJECT_ROOT, PROJECT_DIR
#
# Detection priority:
# 1. CLAUDE_PROJECT_DIR env var (if set)
# 2. git remote URL (hashed for uniqueness across machines)
# 3. git repo root path (fallback, machine-specific)
# 4. "global" (no project context detected)
_CLV2_HOMUNCULUS_DIR="${HOME}/.claude/homunculus"
_CLV2_PROJECTS_DIR="${_CLV2_HOMUNCULUS_DIR}/projects"
_CLV2_REGISTRY_FILE="${_CLV2_HOMUNCULUS_DIR}/projects.json"
_clv2_detect_project() {
local project_root=""
local project_name=""
local project_id=""
local source_hint=""
# 1. Try CLAUDE_PROJECT_DIR env var
if [ -n "$CLAUDE_PROJECT_DIR" ] && [ -d "$CLAUDE_PROJECT_DIR" ]; then
project_root="$CLAUDE_PROJECT_DIR"
source_hint="env"
fi
# 2. Try git repo root from CWD (only if git is available)
if [ -z "$project_root" ] && command -v git &>/dev/null; then
project_root=$(git rev-parse --show-toplevel 2>/dev/null || true)
if [ -n "$project_root" ]; then
source_hint="git"
fi
fi
# 3. No project detected — fall back to global
if [ -z "$project_root" ]; then
_CLV2_PROJECT_ID="global"
_CLV2_PROJECT_NAME="global"
_CLV2_PROJECT_ROOT=""
_CLV2_PROJECT_DIR="${_CLV2_HOMUNCULUS_DIR}"
return 0
fi
# Derive project name from directory basename
project_name=$(basename "$project_root")
# Derive project ID: prefer git remote URL hash (portable across machines),
# fall back to path hash (machine-specific but still useful)
local remote_url=""
if command -v git &>/dev/null; then
if [ "$source_hint" = "git" ] || [ -d "${project_root}/.git" ]; then
remote_url=$(git -C "$project_root" remote get-url origin 2>/dev/null || true)
fi
fi
# Compute hash from the original remote URL (legacy, for backward compatibility)
local legacy_hash_input="${remote_url:-$project_root}"
# Strip embedded credentials from remote URL (e.g., https://ghp_xxxx@github.com/...)
if [ -n "$remote_url" ]; then
remote_url=$(printf '%s' "$remote_url" | sed -E 's|://[^@]+@|://|')
fi
local hash_input="${remote_url:-$project_root}"
# Use SHA256 via python3 (portable across macOS/Linux, no shasum/sha256sum divergence)
project_id=$(printf '%s' "$hash_input" | python3 -c "import sys,hashlib; print(hashlib.sha256(sys.stdin.buffer.read()).hexdigest()[:12])" 2>/dev/null)
# Fallback if python3 failed
if [ -z "$project_id" ]; then
project_id=$(printf '%s' "$hash_input" | shasum -a 256 2>/dev/null | cut -c1-12 || \
printf '%s' "$hash_input" | sha256sum 2>/dev/null | cut -c1-12 || \
echo "fallback")
fi
# Backward compatibility: if credentials were stripped and the hash changed,
# check if a project dir exists under the legacy hash and reuse it
if [ "$legacy_hash_input" != "$hash_input" ]; then
local legacy_id
legacy_id=$(printf '%s' "$legacy_hash_input" | python3 -c "import sys,hashlib; print(hashlib.sha256(sys.stdin.buffer.read()).hexdigest()[:12])" 2>/dev/null)
if [ -n "$legacy_id" ] && [ -d "${_CLV2_PROJECTS_DIR}/${legacy_id}" ] && [ ! -d "${_CLV2_PROJECTS_DIR}/${project_id}" ]; then
# Migrate legacy directory to new hash
mv "${_CLV2_PROJECTS_DIR}/${legacy_id}" "${_CLV2_PROJECTS_DIR}/${project_id}" 2>/dev/null || project_id="$legacy_id"
fi
fi
# Export results
_CLV2_PROJECT_ID="$project_id"
_CLV2_PROJECT_NAME="$project_name"
_CLV2_PROJECT_ROOT="$project_root"
_CLV2_PROJECT_DIR="${_CLV2_PROJECTS_DIR}/${project_id}"
# Ensure project directory structure exists
mkdir -p "${_CLV2_PROJECT_DIR}/instincts/personal"
mkdir -p "${_CLV2_PROJECT_DIR}/instincts/inherited"
mkdir -p "${_CLV2_PROJECT_DIR}/observations.archive"
mkdir -p "${_CLV2_PROJECT_DIR}/evolved/skills"
mkdir -p "${_CLV2_PROJECT_DIR}/evolved/commands"
mkdir -p "${_CLV2_PROJECT_DIR}/evolved/agents"
# Update project registry (lightweight JSON mapping)
_clv2_update_project_registry "$project_id" "$project_name" "$project_root" "$remote_url"
}
_clv2_update_project_registry() {
local pid="$1"
local pname="$2"
local proot="$3"
local premote="$4"
mkdir -p "$(dirname "$_CLV2_REGISTRY_FILE")"
# Pass values via env vars to avoid shell→python injection.
# python3 reads them with os.environ, which is safe for any string content.
_CLV2_REG_PID="$pid" \
_CLV2_REG_PNAME="$pname" \
_CLV2_REG_PROOT="$proot" \
_CLV2_REG_PREMOTE="$premote" \
_CLV2_REG_FILE="$_CLV2_REGISTRY_FILE" \
python3 -c '
import json, os
from datetime import datetime, timezone
registry_path = os.environ["_CLV2_REG_FILE"]
try:
with open(registry_path) as f:
registry = json.load(f)
except (FileNotFoundError, json.JSONDecodeError):
registry = {}
registry[os.environ["_CLV2_REG_PID"]] = {
"name": os.environ["_CLV2_REG_PNAME"],
"root": os.environ["_CLV2_REG_PROOT"],
"remote": os.environ["_CLV2_REG_PREMOTE"],
"last_seen": datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")
}
with open(registry_path, "w") as f:
json.dump(registry, f, indent=2)
' 2>/dev/null || true
}
# Auto-detect on source
_clv2_detect_project
# Convenience aliases for callers (short names pointing to prefixed vars)
PROJECT_ID="$_CLV2_PROJECT_ID"
PROJECT_NAME="$_CLV2_PROJECT_NAME"
PROJECT_ROOT="$_CLV2_PROJECT_ROOT"
PROJECT_DIR="$_CLV2_PROJECT_DIR"