fix: port continuous-learning observer fixes

Ports continuous-learning observer signal, storage, remote normalization, and v1 deprecation fixes onto current main.
This commit is contained in:
Affaan Mustafa
2026-05-11 03:35:42 -04:00
committed by GitHub
parent e674a7dbd7
commit 12e1bc424d
17 changed files with 512 additions and 56 deletions

View File

@@ -26,7 +26,7 @@ An advanced learning system that turns your Claude Code sessions into reusable k
| Feature | v2.0 | v2.1 |
|---------|------|------|
| Storage | Global (~/.claude/homunculus/) | Project-scoped (projects/<hash>/) |
| Storage | Global (`~/.claude/homunculus/`) | Project-scoped (`${XDG_DATA_HOME:-~/.local/share}/ecc-homunculus/projects/<hash>/`) |
| Scope | All instincts apply everywhere | Project-scoped + global |
| Detection | None | git remote URL / repo path |
| Promotion | N/A | Project → global when seen in 2+ projects |
@@ -132,7 +132,21 @@ The system automatically detects your current project:
3. **`git rev-parse --show-toplevel`** -- fallback using repo path (machine-specific)
4. **Global fallback** -- if no project is detected, instincts go to global scope
Each project gets a 12-character hash ID (e.g., `a1b2c3d4e5f6`). A registry file at `~/.claude/homunculus/projects.json` maps IDs to human-readable names.
Each project gets a 12-character hash ID (e.g., `a1b2c3d4e5f6`). A registry file at `${XDG_DATA_HOME:-~/.local/share}/ecc-homunculus/projects.json` maps IDs to human-readable names.
### Data Directory
Continuous-learning-v2 stores observer data outside `~/.claude` so Claude Code's sensitive-path guard does not block background instinct writes:
1. `CLV2_HOMUNCULUS_DIR` when set to an absolute path
2. `$XDG_DATA_HOME/ecc-homunculus`
3. `$HOME/.local/share/ecc-homunculus`
Existing users with data at `~/.claude/homunculus` can migrate once:
```bash
bash skills/continuous-learning-v2/scripts/migrate-homunculus.sh
```
## Quick Start
@@ -173,7 +187,7 @@ The system creates directories automatically on first use, but you can also crea
```bash
# Global directories
mkdir -p ~/.claude/homunculus/{instincts/{personal,inherited},evolved/{agents,skills,commands},projects}
mkdir -p "${XDG_DATA_HOME:-$HOME/.local/share}/ecc-homunculus"/{instincts/{personal,inherited},evolved/{agents,skills,commands},projects}
# Project directories are auto-created when the hook first runs in a git repo
```
@@ -226,7 +240,7 @@ Other behavior (observation capture, instinct thresholds, project scoping, promo
## File Structure
```
~/.claude/homunculus/
${XDG_DATA_HOME:-~/.local/share}/ecc-homunculus/
+-- identity.json # Your profile, technical level
+-- projects.json # Registry: project hash -> name/path/remote
+-- observations.jsonl # Global observations (fallback)
@@ -322,7 +336,7 @@ Hooks fire **100% of the time**, deterministically. This means:
## Backward Compatibility
v2.1 is fully compatible with v2.0 and v1:
- Existing global instincts in `~/.claude/homunculus/instincts/` still work as global instincts
- Existing global instincts can be migrated from `~/.claude/homunculus/instincts/` with `scripts/migrate-homunculus.sh`
- Existing `~/.claude/skills/learned/` skills from v1 still work
- Stop hook still runs (but now also feeds into v2)
- Gradual migration: run both in parallel

View File

@@ -10,6 +10,7 @@ unset CLAUDECODE
SLEEP_PID=""
USR1_FIRED=0
PENDING_ANALYSIS=0
ANALYZING=0
LAST_ANALYSIS_EPOCH=0
# Minimum seconds between analyses (prevents rapid re-triggering)
@@ -258,14 +259,17 @@ PROMPT
on_usr1() {
[ -n "$SLEEP_PID" ] && kill "$SLEEP_PID" 2>/dev/null
SLEEP_PID=""
USR1_FIRED=1
# Re-entrancy guard: skip if analysis is already running (#521)
# Re-entrancy guard: defer the nudge so the main loop runs a follow-up
# analysis immediately after the current analysis finishes.
if [ "$ANALYZING" -eq 1 ]; then
echo "[$(date)] Analysis already in progress, skipping signal" >> "$LOG_FILE"
PENDING_ANALYSIS=1
echo "[$(date)] Analysis already in progress, deferring signal" >> "$LOG_FILE"
return
fi
USR1_FIRED=1
# Cooldown: skip if last analysis was too recent (#521)
now_epoch=$(date +%s)
elapsed=$(( now_epoch - LAST_ANALYSIS_EPOCH ))
@@ -290,6 +294,17 @@ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
while true; do
exit_if_idle_without_sessions
if [ "$PENDING_ANALYSIS" -eq 1 ]; then
PENDING_ANALYSIS=0
USR1_FIRED=0
ANALYZING=1
analyze_observations
LAST_ANALYSIS_EPOCH=$(date +%s)
ANALYZING=0
continue
fi
sleep "$OBSERVER_INTERVAL_SECONDS" &
SLEEP_PID=$!
wait "$SLEEP_PID" 2>/dev/null
@@ -299,6 +314,9 @@ while true; do
if [ "$USR1_FIRED" -eq 1 ]; then
USR1_FIRED=0
else
ANALYZING=1
analyze_observations
LAST_ANALYSIS_EPOCH=$(date +%s)
ANALYZING=0
fi
done

View File

@@ -17,8 +17,8 @@ A background agent that analyzes observations from Claude Code sessions to detec
## Input
Reads observations from the **project-scoped** observations file:
- Project: `~/.claude/homunculus/projects/<project-hash>/observations.jsonl`
- Global fallback: `~/.claude/homunculus/observations.jsonl`
- Project: `${XDG_DATA_HOME:-~/.local/share}/ecc-homunculus/projects/<project-hash>/observations.jsonl`
- Global fallback: `${XDG_DATA_HOME:-~/.local/share}/ecc-homunculus/observations.jsonl`
```jsonl
{"timestamp":"2025-01-22T10:30:00Z","event":"tool_start","session":"abc123","tool":"Edit","input":"...","project_id":"a1b2c3d4e5f6","project_name":"my-react-app"}
@@ -66,8 +66,8 @@ When certain tools are consistently preferred:
## Output
Creates/updates instincts in the **project-scoped** instincts directory:
- Project: `~/.claude/homunculus/projects/<project-hash>/instincts/personal/`
- Global: `~/.claude/homunculus/instincts/personal/` (for universal patterns)
- Project: `${XDG_DATA_HOME:-~/.local/share}/ecc-homunculus/projects/<project-hash>/instincts/personal/`
- Global: `${XDG_DATA_HOME:-~/.local/share}/ecc-homunculus/instincts/personal/` (for universal patterns)
### Project-Scoped Instinct (default)

View File

@@ -35,9 +35,13 @@ PYTHON_CMD="${CLV2_PYTHON_CMD:-}"
# Configuration
# ─────────────────────────────────────────────
CONFIG_DIR="${HOME}/.claude/homunculus"
# shellcheck disable=SC1091
. "${SKILL_ROOT}/scripts/lib/homunculus-dir.sh"
CONFIG_DIR="$(_ecc_resolve_homunculus_dir)"
if [ -n "${CLV2_CONFIG:-}" ]; then
CONFIG_FILE="$CLV2_CONFIG"
elif [ -f "${CONFIG_DIR}/config.json" ]; then
CONFIG_FILE="${CONFIG_DIR}/config.json"
else
CONFIG_FILE="${SKILL_ROOT}/config.json"
fi

View File

@@ -115,7 +115,9 @@ fi
# Sourcing detect-project.sh creates project-scoped directories and updates
# projects.json, so automated sessions must return before that point.
CONFIG_DIR="${HOME}/.claude/homunculus"
# shellcheck disable=SC1091
. "$(dirname "$0")/../scripts/lib/homunculus-dir.sh"
CONFIG_DIR="$(_ecc_resolve_homunculus_dir)"
# Skip if disabled (check both default and CLV2_CONFIG-derived locations)
if [ -f "$CONFIG_DIR/disabled" ]; then
@@ -344,10 +346,12 @@ if [ -f "${CONFIG_DIR}/disabled" ]; then
OBSERVER_ENABLED=false
else
OBSERVER_ENABLED=false
CONFIG_FILE="${SKILL_ROOT}/config.json"
# Allow CLV2_CONFIG override
if [ -n "${CLV2_CONFIG:-}" ]; then
CONFIG_FILE="$CLV2_CONFIG"
elif [ -f "${CONFIG_DIR}/config.json" ]; then
CONFIG_FILE="${CONFIG_DIR}/config.json"
else
CONFIG_FILE="${SKILL_ROOT}/config.json"
fi
# Use effective config path for both existence check and reading
EFFECTIVE_CONFIG="$CONFIG_FILE"

View File

@@ -19,7 +19,9 @@
# 3. git repo root path (fallback, machine-specific)
# 4. "global" (no project context detected)
_CLV2_HOMUNCULUS_DIR="${HOME}/.claude/homunculus"
# shellcheck disable=SC1091
. "$(dirname "${BASH_SOURCE[0]}")/lib/homunculus-dir.sh"
_CLV2_HOMUNCULUS_DIR="$(_ecc_resolve_homunculus_dir)"
_CLV2_PROJECTS_DIR="${_CLV2_HOMUNCULUS_DIR}/projects"
_CLV2_REGISTRY_FILE="${_CLV2_HOMUNCULUS_DIR}/projects.json"
@@ -49,6 +51,30 @@ export CLV2_PYTHON_CMD
CLV2_OBSERVER_PROMPT_PATTERN='Can you confirm|requires permission|Awaiting (user confirmation|confirmation|approval|permission)|confirm I should proceed|once granted access|grant.*access'
export CLV2_OBSERVER_PROMPT_PATTERN
_clv2_normalize_remote_url() {
local url="$1"
[ -z "$url" ] && return 0
local is_network=0
case "$url" in
file://*) is_network=0 ;;
*://*) is_network=1 ;;
*@*:*) is_network=1 ;;
*) is_network=0 ;;
esac
url=$(printf '%s' "$url" | sed -E 's|://[^@]+@|://|')
url=$(printf '%s' "$url" | sed -E 's|^[A-Za-z][A-Za-z0-9+.-]*://||')
url=$(printf '%s' "$url" | sed -E 's|^[^@/:]+@([^:/]+):|\1/|')
url=$(printf '%s' "$url" | sed -E 's|\.git/?$||; s|/+$||')
if [ "$is_network" = "1" ]; then
printf '%s' "$url" | tr '[:upper:]' '[:lower:]'
else
printf '%s' "$url"
fi
}
_clv2_detect_project() {
local project_root=""
local project_name=""
@@ -94,15 +120,20 @@ _clv2_detect_project() {
fi
fi
# Compute hash from the original remote URL (legacy, for backward compatibility)
local legacy_hash_input="${remote_url:-$project_root}"
local raw_remote_url="$remote_url"
# Strip embedded credentials from remote URL (e.g., https://ghp_xxxx@github.com/...)
if [ -n "$remote_url" ]; then
remote_url=$(printf '%s' "$remote_url" | sed -E 's|://[^@]+@|://|')
fi
local hash_input="${remote_url:-$project_root}"
local legacy_hash_input="${remote_url:-$project_root}"
local normalized_remote=""
if [ -n "$remote_url" ]; then
normalized_remote=$(_clv2_normalize_remote_url "$remote_url")
fi
local hash_input="${normalized_remote:-${remote_url:-$project_root}}"
# Prefer Python for consistent SHA256 behavior across shells/platforms.
# Pass the value via env var and encode as UTF-8 inside Python so the hash
# is locale-independent (shells vary between UTF-8 / CP932 / CP1252, which
@@ -122,19 +153,33 @@ print(hashlib.sha256(s.encode("utf-8")).hexdigest()[:12])
echo "fallback")
fi
# Backward compatibility: if credentials were stripped and the hash changed,
# check if a project dir exists under the legacy hash and reuse it
if [ "$legacy_hash_input" != "$hash_input" ] && [ -n "$_CLV2_PYTHON_CMD" ]; then
local legacy_id=""
legacy_id=$(_CLV2_HASH_INPUT="$legacy_hash_input" "$_CLV2_PYTHON_CMD" -c '
# Backward compatibility: migrate a single legacy project directory from
# credential-stripped or raw remote hashes to the normalized remote hash.
if [ -n "$_CLV2_PYTHON_CMD" ] && [ ! -d "${_CLV2_PROJECTS_DIR}/${project_id}" ]; then
local legacy_inputs=()
[ -n "$legacy_hash_input" ] && [ "$legacy_hash_input" != "$hash_input" ] \
&& legacy_inputs+=("$legacy_hash_input")
[ -n "$raw_remote_url" ] && [ "$raw_remote_url" != "$hash_input" ] \
&& [ "$raw_remote_url" != "$legacy_hash_input" ] \
&& legacy_inputs+=("$raw_remote_url")
local legacy_input legacy_id
for legacy_input in "${legacy_inputs[@]}"; do
legacy_id=$(_CLV2_HASH_INPUT="$legacy_input" "$_CLV2_PYTHON_CMD" -c '
import os, hashlib
s = os.environ["_CLV2_HASH_INPUT"]
print(hashlib.sha256(s.encode("utf-8")).hexdigest()[:12])
' 2>/dev/null)
if [ -n "$legacy_id" ] && [ -d "${_CLV2_PROJECTS_DIR}/${legacy_id}" ] && [ ! -d "${_CLV2_PROJECTS_DIR}/${project_id}" ]; then
# Migrate legacy directory to new hash
mv "${_CLV2_PROJECTS_DIR}/${legacy_id}" "${_CLV2_PROJECTS_DIR}/${project_id}" 2>/dev/null || project_id="$legacy_id"
fi
if [ -n "$legacy_id" ] && [ "$legacy_id" != "$project_id" ] \
&& [ -d "${_CLV2_PROJECTS_DIR}/${legacy_id}" ]; then
if mv "${_CLV2_PROJECTS_DIR}/${legacy_id}" "${_CLV2_PROJECTS_DIR}/${project_id}" 2>/dev/null; then
break
else
project_id="$legacy_id"
break
fi
fi
done
fi
# Export results

View File

@@ -38,7 +38,48 @@ except ImportError:
# Configuration
# ─────────────────────────────────────────────
HOMUNCULUS_DIR = Path.home() / ".claude" / "homunculus"
def _resolve_homunculus_dir() -> Path:
override = os.environ.get("CLV2_HOMUNCULUS_DIR")
if override:
if Path(override).is_absolute():
return Path(override)
print(f"[ecc] CLV2_HOMUNCULUS_DIR={override!r} is not absolute; ignoring", file=sys.stderr)
xdg = os.environ.get("XDG_DATA_HOME")
if xdg:
if Path(xdg).is_absolute():
return Path(xdg) / "ecc-homunculus"
print(f"[ecc] XDG_DATA_HOME={xdg!r} is not absolute; ignoring", file=sys.stderr)
return Path.home() / ".local" / "share" / "ecc-homunculus"
def _strip_remote_credentials(remote_url: str) -> str:
return re.sub(r"://[^@]+@", "://", remote_url or "")
def _normalize_remote_url(remote_url: str) -> str:
if not remote_url:
return ""
is_network = (
not remote_url.startswith("file://")
and ("://" in remote_url or re.match(r"^[^@/:]+@[^:/]+:", remote_url) is not None)
)
normalized = _strip_remote_credentials(remote_url)
normalized = re.sub(r"^[A-Za-z][A-Za-z0-9+.-]*://", "", normalized)
normalized = re.sub(r"^[^@/:]+@([^:/]+):", r"\1/", normalized)
normalized = re.sub(r"\.git/?$", "", normalized)
normalized = re.sub(r"/+$", "", normalized)
return normalized.lower() if is_network else normalized
def _project_hash(value: str) -> str:
return hashlib.sha256(value.encode("utf-8")).hexdigest()[:12]
HOMUNCULUS_DIR = _resolve_homunculus_dir()
PROJECTS_DIR = HOMUNCULUS_DIR / "projects"
REGISTRY_FILE = HOMUNCULUS_DIR / "projects.json"
@@ -177,11 +218,35 @@ def detect_project() -> dict:
except (subprocess.TimeoutExpired, FileNotFoundError):
pass
hash_source = remote_url if remote_url else project_root
project_id = hashlib.sha256(hash_source.encode()).hexdigest()[:12]
raw_remote_url = remote_url
if remote_url:
remote_url = _strip_remote_credentials(remote_url)
legacy_hash_source = remote_url if remote_url else project_root
normalized_remote = _normalize_remote_url(remote_url) if remote_url else ""
hash_source = normalized_remote if normalized_remote else legacy_hash_source
project_id = _project_hash(hash_source)
project_dir = PROJECTS_DIR / project_id
if not project_dir.exists():
legacy_sources = []
if legacy_hash_source and legacy_hash_source != hash_source:
legacy_sources.append(legacy_hash_source)
if raw_remote_url and raw_remote_url not in {hash_source, legacy_hash_source}:
legacy_sources.append(raw_remote_url)
for legacy_source in legacy_sources:
legacy_id = _project_hash(legacy_source)
legacy_dir = PROJECTS_DIR / legacy_id
if legacy_id != project_id and legacy_dir.exists():
try:
legacy_dir.rename(project_dir)
except OSError:
project_id = legacy_id
project_dir = legacy_dir
break
# Ensure project directory structure
for d in [
project_dir / "instincts" / "personal",

View File

@@ -0,0 +1,31 @@
#!/usr/bin/env bash
# Shared continuous-learning-v2 data-directory resolver.
#
# Resolution precedence:
# 1. CLV2_HOMUNCULUS_DIR, when absolute
# 2. XDG_DATA_HOME/ecc-homunculus, when XDG_DATA_HOME is absolute
# 3. HOME/.local/share/ecc-homunculus
_ecc_resolve_homunculus_dir() {
if [ -n "${CLV2_HOMUNCULUS_DIR:-}" ]; then
case "$CLV2_HOMUNCULUS_DIR" in
/*) printf '%s\n' "$CLV2_HOMUNCULUS_DIR"; return 0 ;;
*) printf '[ecc] CLV2_HOMUNCULUS_DIR=%s is not absolute; ignoring\n' "$CLV2_HOMUNCULUS_DIR" >&2 ;;
esac
fi
if [ -n "${XDG_DATA_HOME:-}" ]; then
case "$XDG_DATA_HOME" in
/*) printf '%s/ecc-homunculus\n' "$XDG_DATA_HOME"; return 0 ;;
*) printf '[ecc] XDG_DATA_HOME=%s is not absolute; ignoring\n' "$XDG_DATA_HOME" >&2 ;;
esac
fi
case "${HOME:-}" in
/*) printf '%s/.local/share/ecc-homunculus\n' "$HOME" ;;
*)
printf '[ecc] HOME=%s is not absolute; cannot resolve homunculus dir\n' "${HOME:-}" >&2
return 1
;;
esac
}

View File

@@ -0,0 +1,62 @@
#!/usr/bin/env bash
# One-shot migration from the legacy Claude config tree into the
# continuous-learning-v2 data directory.
set -euo pipefail
OLD="${HOME}/.claude/homunculus"
# shellcheck disable=SC1091
. "$(dirname "$0")/lib/homunculus-dir.sh"
NEW="$(_ecc_resolve_homunculus_dir)"
if [ "$NEW" = "$OLD" ]; then
echo "Resolved destination equals source ($OLD); nothing to migrate."
exit 0
fi
if [ ! -d "$OLD" ]; then
echo "Nothing to migrate (no $OLD)."
exit 0
fi
if command -v pgrep >/dev/null 2>&1; then
if pgrep -f "${HOME}.*observer-loop\\.sh" >/dev/null 2>&1; then
echo "Refusing to migrate: observer-loop.sh is running." >&2
echo "Exit all Claude Code sessions, then re-run." >&2
exit 1
fi
else
echo "Warning: pgrep not available; skipping running-observer check." >&2
fi
mkdir -p "$(dirname "$NEW")"
if [ ! -d "$NEW" ]; then
mv "$OLD" "$NEW"
echo "Moved $OLD -> $NEW"
elif [ -z "$(ls -A "$NEW" 2>/dev/null || true)" ]; then
rmdir "$NEW"
mv "$OLD" "$NEW"
echo "Moved $OLD -> $NEW (replaced empty destination)"
else
old_count="$(find "$OLD" -type f 2>/dev/null | wc -l | tr -d ' ')"
new_count="$(find "$NEW" -type f 2>/dev/null | wc -l | tr -d ' ')"
echo "Refusing to migrate: both paths exist with content." >&2
echo " Old: $OLD ($old_count files)" >&2
echo " New: $NEW ($new_count files)" >&2
echo "Resolve manually, then re-run." >&2
exit 1
fi
settings="${HOME}/.claude/settings.json"
if [ -f "$settings" ] && grep -q '"CLV2_CONFIG"' "$settings" 2>/dev/null; then
if grep -q '\.claude/homunculus' "$settings" 2>/dev/null; then
cat >&2 <<WARN
Advisory: ~/.claude/settings.json still sets CLV2_CONFIG under the old path.
Update it to: ${NEW}/config.json
(Not editing settings.json automatically.)
WARN
fi
fi

View File

@@ -1,10 +1,18 @@
---
name: continuous-learning
description: Automatically extract reusable patterns from Claude Code sessions and save them as learned skills for future use.
description: "[DEPRECATED - use continuous-learning-v2] Legacy v1 stop-hook skill extractor. v2 is a strict superset with instinct-based, project-scoped, hook-reliable learning. Do not invoke v1; route continuous learning, session learning, and pattern extraction requests to continuous-learning-v2."
origin: ECC
---
# Continuous Learning Skill
# Continuous Learning Skill - DEPRECATED
> **DEPRECATED 2026-04-28.** Use `continuous-learning-v2` instead. v2 is a strict superset: stop-hook observation becomes PreToolUse/PostToolUse observation, full skills become atomic instincts with confidence scoring, and global-only storage becomes project-scoped plus global promotion.
>
> This file is kept for archival reference and backward compatibility with existing installs.
---
## Original v1 Documentation (archival)
Automatically evaluates Claude Code sessions on end to extract reusable patterns that can be saved as learned skills.