feat(commands): improve learn-eval with checklist-based holistic verdict

Replace the 5-dimension numeric scoring rubric with a checklist + holistic verdict system (Save / Improve then Save / Absorb into [X] / Drop). Key improvements: - Explicit pre-save checklist: grep skills/ for duplicates, check MEMORY.md, consider appending to existing skills, confirm reusability - 4-way verdict instead of binary save/don't-save: adds "Absorb into [X]" to prevent skill file proliferation, and "Improve then Save" for iterative refinement - Verdict-specific confirmation flows tailored to each outcome - Design rationale explaining why holistic judgment outperforms numeric scoring with modern frontier models
2026-05-16 13:53:04 +08:00 · 2026-03-08 19:35:28 +09:00
parent 4fa817cd7d
commit 32e11b8701
1 changed files with 51 additions and 27 deletions
--- a/commands/learn-eval.md
+++ b/commands/learn-eval.md
@@ -1,10 +1,10 @@
 ---
-description: Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project).
+description: "Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project)."
 ---

 # /learn-eval - Extract, Evaluate, then Save

-Extends `/learn` with a quality gate and save-location decision before writing any skill file.
+Extends `/learn` with a quality gate, save-location decision, and knowledge-placement awareness before writing any skill file.

 ## What to Extract

@@ -51,36 +51,60 @@ origin: auto-extracted
 [Trigger conditions]
 ```

-5. **Self-evaluate before saving** using this rubric:
+5. **Quality gate — Checklist + Holistic verdict**

-   | Dimension | 1 | 3 | 5 |
-   |-----------|---|---|---|
-   | Specificity | Abstract principles only, no code examples | Representative code example present | Rich examples covering all usage patterns |
-   | Actionability | Unclear what to do | Main steps are understandable | Immediately actionable, edge cases covered |
-   | Scope Fit | Too broad or too narrow | Mostly appropriate, some boundary ambiguity | Name, trigger, and content perfectly aligned |
-   | Non-redundancy | Nearly identical to another skill | Some overlap but unique perspective exists | Completely unique value |
-   | Coverage | Covers only a fraction of the target task | Main cases covered, common variants missing | Main cases, edge cases, and pitfalls covered |
+   #### 5a. Required checklist (verify by actually reading files)

-   - Score each dimension 1–5
-   - If any dimension scores 1–2, improve the draft and re-score until all dimensions are ≥ 3
-   - Show the user the scores table and the final draft
+   Execute **all** of the following before evaluating the draft:

-6. Ask user to confirm:
-   - Show: proposed save path + scores table + final draft
-   - Wait for explicit confirmation before writing
+   - [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
+   - [ ] Check MEMORY.md (both project and global) for overlap
+   - [ ] Consider whether appending to an existing skill would suffice
+   - [ ] Confirm this is a reusable pattern, not a one-off fix

-7. Save to the determined location
+   #### 5b. Holistic verdict

-## Output Format for Step 5 (scores table)
+   Synthesize the checklist results and draft quality, then choose **one** of the following:

-| Dimension | Score | Rationale |
-|-----------|-------|-----------|
-| Specificity | N/5 | ... |
-| Actionability | N/5 | ... |
-| Scope Fit | N/5 | ... |
-| Non-redundancy | N/5 | ... |
-| Coverage | N/5 | ... |
-| **Total** | **N/25** | |
+   | Verdict | Meaning | Next Action |
+   |---------|---------|-------------|
+   | **Save** | Unique, specific, well-scoped | Proceed to Step 6 |
+   | **Improve then Save** | Valuable but needs refinement | List improvements → revise → re-evaluate (once) |
+   | **Absorb into [X]** | Should be appended to an existing skill | Show target skill and additions → Step 6 |
+   | **Drop** | Trivial, redundant, or too abstract | Explain reasoning and stop |
+
+   **Guideline dimensions** (informing the verdict, not scored):
+
+   - **Specificity & Actionability**: Contains code examples or commands that are immediately usable
+   - **Scope Fit**: Name, trigger conditions, and content are aligned and focused on a single pattern
+   - **Uniqueness**: Provides value not covered by existing skills (informed by checklist results)
+   - **Reusability**: Realistic trigger scenarios exist in future sessions
+
+6. **Verdict-specific confirmation flow**
+
+   - **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation
+   - **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation
+   - **Drop**: Show checklist results + reasoning only (no confirmation needed)
+
+7. Save / Absorb to the determined location
+
+## Output Format for Step 5
+
+```
+### Checklist
+- [x] skills/ grep: no overlap (or: overlap found → details)
+- [x] MEMORY.md: no overlap (or: overlap found → details)
+- [x] Existing skill append: new file appropriate (or: should append to [X])
+- [x] Reusability: confirmed (or: one-off → Drop)
+
+### Verdict: Save / Improve then Save / Absorb into [X] / Drop
+
+**Rationale:** (1-2 sentences explaining the verdict)
+```
+
+## Design Rationale
+
+This version replaces the previous 5-dimension numeric scoring rubric (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage scored 1-5) with a checklist-based holistic verdict system. Modern frontier models (Opus 4.6+) have strong contextual judgment — forcing rich qualitative signals into numeric scores loses nuance and can produce misleading totals. The holistic approach lets the model weigh all factors naturally, producing more accurate save/drop decisions while the explicit checklist ensures no critical check is skipped.

 ## Notes

@@ -88,4 +112,4 @@ origin: auto-extracted
 - Don't extract one-time issues (specific API outages, etc.)
 - Focus on patterns that will save time in future sessions
 - Keep skills focused — one pattern per skill
- If Coverage score is low, add related variants before saving
+- When the verdict is Absorb, append to the existing skill rather than creating a new file