mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-03-30 13:43:26 +08:00
feat(commands): improve learn-eval with checklist-based holistic verdict
Replace the 5-dimension numeric scoring rubric with a checklist + holistic verdict system (Save / Improve then Save / Absorb into [X] / Drop). Key improvements: - Explicit pre-save checklist: grep skills/ for duplicates, check MEMORY.md, consider appending to existing skills, confirm reusability - 4-way verdict instead of binary save/don't-save: adds "Absorb into [X]" to prevent skill file proliferation, and "Improve then Save" for iterative refinement - Verdict-specific confirmation flows tailored to each outcome - Design rationale explaining why holistic judgment outperforms numeric scoring with modern frontier models
This commit is contained in:
committed by
Affaan Mustafa
parent
4fa817cd7d
commit
32e11b8701
@@ -1,10 +1,10 @@
|
||||
---
|
||||
description: Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project).
|
||||
description: "Extract reusable patterns from the session, self-evaluate quality before saving, and determine the right save location (Global vs Project)."
|
||||
---
|
||||
|
||||
# /learn-eval - Extract, Evaluate, then Save
|
||||
|
||||
Extends `/learn` with a quality gate and save-location decision before writing any skill file.
|
||||
Extends `/learn` with a quality gate, save-location decision, and knowledge-placement awareness before writing any skill file.
|
||||
|
||||
## What to Extract
|
||||
|
||||
@@ -51,36 +51,60 @@ origin: auto-extracted
|
||||
[Trigger conditions]
|
||||
```
|
||||
|
||||
5. **Self-evaluate before saving** using this rubric:
|
||||
5. **Quality gate — Checklist + Holistic verdict**
|
||||
|
||||
| Dimension | 1 | 3 | 5 |
|
||||
|-----------|---|---|---|
|
||||
| Specificity | Abstract principles only, no code examples | Representative code example present | Rich examples covering all usage patterns |
|
||||
| Actionability | Unclear what to do | Main steps are understandable | Immediately actionable, edge cases covered |
|
||||
| Scope Fit | Too broad or too narrow | Mostly appropriate, some boundary ambiguity | Name, trigger, and content perfectly aligned |
|
||||
| Non-redundancy | Nearly identical to another skill | Some overlap but unique perspective exists | Completely unique value |
|
||||
| Coverage | Covers only a fraction of the target task | Main cases covered, common variants missing | Main cases, edge cases, and pitfalls covered |
|
||||
#### 5a. Required checklist (verify by actually reading files)
|
||||
|
||||
- Score each dimension 1–5
|
||||
- If any dimension scores 1–2, improve the draft and re-score until all dimensions are ≥ 3
|
||||
- Show the user the scores table and the final draft
|
||||
Execute **all** of the following before evaluating the draft:
|
||||
|
||||
6. Ask user to confirm:
|
||||
- Show: proposed save path + scores table + final draft
|
||||
- Wait for explicit confirmation before writing
|
||||
- [ ] Grep `~/.claude/skills/` by keyword to check for content overlap
|
||||
- [ ] Check MEMORY.md (both project and global) for overlap
|
||||
- [ ] Consider whether appending to an existing skill would suffice
|
||||
- [ ] Confirm this is a reusable pattern, not a one-off fix
|
||||
|
||||
7. Save to the determined location
|
||||
#### 5b. Holistic verdict
|
||||
|
||||
## Output Format for Step 5 (scores table)
|
||||
Synthesize the checklist results and draft quality, then choose **one** of the following:
|
||||
|
||||
| Dimension | Score | Rationale |
|
||||
|-----------|-------|-----------|
|
||||
| Specificity | N/5 | ... |
|
||||
| Actionability | N/5 | ... |
|
||||
| Scope Fit | N/5 | ... |
|
||||
| Non-redundancy | N/5 | ... |
|
||||
| Coverage | N/5 | ... |
|
||||
| **Total** | **N/25** | |
|
||||
| Verdict | Meaning | Next Action |
|
||||
|---------|---------|-------------|
|
||||
| **Save** | Unique, specific, well-scoped | Proceed to Step 6 |
|
||||
| **Improve then Save** | Valuable but needs refinement | List improvements → revise → re-evaluate (once) |
|
||||
| **Absorb into [X]** | Should be appended to an existing skill | Show target skill and additions → Step 6 |
|
||||
| **Drop** | Trivial, redundant, or too abstract | Explain reasoning and stop |
|
||||
|
||||
**Guideline dimensions** (informing the verdict, not scored):
|
||||
|
||||
- **Specificity & Actionability**: Contains code examples or commands that are immediately usable
|
||||
- **Scope Fit**: Name, trigger conditions, and content are aligned and focused on a single pattern
|
||||
- **Uniqueness**: Provides value not covered by existing skills (informed by checklist results)
|
||||
- **Reusability**: Realistic trigger scenarios exist in future sessions
|
||||
|
||||
6. **Verdict-specific confirmation flow**
|
||||
|
||||
- **Save**: Present save path + checklist results + 1-line verdict rationale + full draft → save after user confirmation
|
||||
- **Absorb into [X]**: Present target path + additions (diff format) + checklist results + verdict rationale → append after user confirmation
|
||||
- **Drop**: Show checklist results + reasoning only (no confirmation needed)
|
||||
|
||||
7. Save / Absorb to the determined location
|
||||
|
||||
## Output Format for Step 5
|
||||
|
||||
```
|
||||
### Checklist
|
||||
- [x] skills/ grep: no overlap (or: overlap found → details)
|
||||
- [x] MEMORY.md: no overlap (or: overlap found → details)
|
||||
- [x] Existing skill append: new file appropriate (or: should append to [X])
|
||||
- [x] Reusability: confirmed (or: one-off → Drop)
|
||||
|
||||
### Verdict: Save / Improve then Save / Absorb into [X] / Drop
|
||||
|
||||
**Rationale:** (1-2 sentences explaining the verdict)
|
||||
```
|
||||
|
||||
## Design Rationale
|
||||
|
||||
This version replaces the previous 5-dimension numeric scoring rubric (Specificity, Actionability, Scope Fit, Non-redundancy, Coverage scored 1-5) with a checklist-based holistic verdict system. Modern frontier models (Opus 4.6+) have strong contextual judgment — forcing rich qualitative signals into numeric scores loses nuance and can produce misleading totals. The holistic approach lets the model weigh all factors naturally, producing more accurate save/drop decisions while the explicit checklist ensures no critical check is skipped.
|
||||
|
||||
## Notes
|
||||
|
||||
@@ -88,4 +112,4 @@ origin: auto-extracted
|
||||
- Don't extract one-time issues (specific API outages, etc.)
|
||||
- Focus on patterns that will save time in future sessions
|
||||
- Keep skills focused — one pattern per skill
|
||||
- If Coverage score is low, add related variants before saving
|
||||
- When the verdict is Absorb, append to the existing skill rather than creating a new file
|
||||
|
||||
Reference in New Issue
Block a user