fix: make detect-project.sh locale-independent and handle Windows backslash paths

Two bugs in skills/continuous-learning-v2/scripts/detect-project.sh that
silently split the same project into multiple project_id records:

1. Locale-dependent SHA-256 input (HIGH)
   The project_id hash was computed with
     printf '%s' "$hash_input" | python -c 'sys.stdin.buffer.read()'
   which ships shell-locale-encoded bytes to Python. On a system with a
   non-UTF-8 LC_ALL (e.g. ja_JP.CP932 / CP1252) the same project root
   produced a different 12-char hash than the UTF-8 locale would produce,
   so observations/instincts were silently written under a separate
   project directory. Fixed by passing the value via an env var and
   encoding as UTF-8 inside Python, making the hash locale-independent.

2. basename cannot split Windows backslash paths (MEDIUM)
   basename "C:\Users\...\ECC作成" returns the whole string on POSIX
   bash, so project_name was garbled whenever CLAUDE_PROJECT_DIR was
   passed as a native Windows path. Normalize backslashes to forward
   slashes before calling basename.

Both the primary project_id hash and the legacy-compat fallback hash
are updated to use the env-var / UTF-8 approach.

Verified: id is stable across en_US.UTF-8, ja_JP.UTF-8, ja_JP.CP932, C,
and POSIX locales; Windows-path input yields project_name=ECC作成;
ASCII-only paths regress-free.
This commit is contained in:
suusuu0927
2026-04-21 18:46:39 +09:00
parent 8bdf88e5ad
commit 797692d70f

View File

@@ -79,7 +79,11 @@ _clv2_detect_project() {
fi
# Derive project name from directory basename
project_name=$(basename "$project_root")
# Normalize Windows backslashes so basename works when CLAUDE_PROJECT_DIR
# is passed as e.g. C:\Users\...\project.
local _norm_root
_norm_root=$(printf '%s' "$project_root" | sed 's|\\|/|g')
project_name=$(basename "$_norm_root")
# Derive project ID: prefer git remote URL hash (portable across machines),
# fall back to path hash (machine-specific but still useful)
@@ -100,8 +104,15 @@ _clv2_detect_project() {
local hash_input="${remote_url:-$project_root}"
# Prefer Python for consistent SHA256 behavior across shells/platforms.
# Pass the value via env var and encode as UTF-8 inside Python so the hash
# is locale-independent (shells vary between UTF-8 / CP932 / CP1252, which
# would otherwise produce different hashes for the same non-ASCII path).
if [ -n "$_CLV2_PYTHON_CMD" ]; then
project_id=$(printf '%s' "$hash_input" | "$_CLV2_PYTHON_CMD" -c "import sys,hashlib; print(hashlib.sha256(sys.stdin.buffer.read()).hexdigest()[:12])" 2>/dev/null)
project_id=$(_CLV2_HASH_INPUT="$hash_input" "$_CLV2_PYTHON_CMD" -c '
import os, hashlib
s = os.environ["_CLV2_HASH_INPUT"]
print(hashlib.sha256(s.encode("utf-8")).hexdigest()[:12])
' 2>/dev/null)
fi
# Fallback if Python is unavailable or hash generation failed.
@@ -115,7 +126,11 @@ _clv2_detect_project() {
# check if a project dir exists under the legacy hash and reuse it
if [ "$legacy_hash_input" != "$hash_input" ] && [ -n "$_CLV2_PYTHON_CMD" ]; then
local legacy_id=""
legacy_id=$(printf '%s' "$legacy_hash_input" | "$_CLV2_PYTHON_CMD" -c "import sys,hashlib; print(hashlib.sha256(sys.stdin.buffer.read()).hexdigest()[:12])" 2>/dev/null)
legacy_id=$(_CLV2_HASH_INPUT="$legacy_hash_input" "$_CLV2_PYTHON_CMD" -c '
import os, hashlib
s = os.environ["_CLV2_HASH_INPUT"]
print(hashlib.sha256(s.encode("utf-8")).hexdigest()[:12])
' 2>/dev/null)
if [ -n "$legacy_id" ] && [ -d "${_CLV2_PROJECTS_DIR}/${legacy_id}" ] && [ ! -d "${_CLV2_PROJECTS_DIR}/${project_id}" ]; then
# Migrate legacy directory to new hash
mv "${_CLV2_PROJECTS_DIR}/${legacy_id}" "${_CLV2_PROJECTS_DIR}/${project_id}" 2>/dev/null || project_id="$legacy_id"