refactor: collapse legacy command bodies into skills

This commit is contained in:
Affaan Mustafa
2026-04-01 02:25:42 -07:00
parent 6833454778
commit 975100db55
19 changed files with 630 additions and 702 deletions

View File

@@ -1,120 +1,23 @@
# Eval Command
---
description: Legacy slash-entry shim for the eval-harness skill. Prefer the skill directly.
---
Manage eval-driven development workflow.
# Eval Command (Legacy Shim)
## Usage
Use this only if you still invoke `/eval`. The maintained workflow lives in `skills/eval-harness/SKILL.md`.
`/eval [define|check|report|list] [feature-name]`
## Canonical Surface
## Define Evals
`/eval define feature-name`
Create a new eval definition:
1. Create `.claude/evals/feature-name.md` with template:
```markdown
## EVAL: feature-name
Created: $(date)
### Capability Evals
- [ ] [Description of capability 1]
- [ ] [Description of capability 2]
### Regression Evals
- [ ] [Existing behavior 1 still works]
- [ ] [Existing behavior 2 still works]
### Success Criteria
- pass@3 > 90% for capability evals
- pass^3 = 100% for regression evals
```
2. Prompt user to fill in specific criteria
## Check Evals
`/eval check feature-name`
Run evals for a feature:
1. Read eval definition from `.claude/evals/feature-name.md`
2. For each capability eval:
- Attempt to verify criterion
- Record PASS/FAIL
- Log attempt in `.claude/evals/feature-name.log`
3. For each regression eval:
- Run relevant tests
- Compare against baseline
- Record PASS/FAIL
4. Report current status:
```
EVAL CHECK: feature-name
========================
Capability: X/Y passing
Regression: X/Y passing
Status: IN PROGRESS / READY
```
## Report Evals
`/eval report feature-name`
Generate comprehensive eval report:
```
EVAL REPORT: feature-name
=========================
Generated: $(date)
CAPABILITY EVALS
----------------
[eval-1]: PASS (pass@1)
[eval-2]: PASS (pass@2) - required retry
[eval-3]: FAIL - see notes
REGRESSION EVALS
----------------
[test-1]: PASS
[test-2]: PASS
[test-3]: PASS
METRICS
-------
Capability pass@1: 67%
Capability pass@3: 100%
Regression pass^3: 100%
NOTES
-----
[Any issues, edge cases, or observations]
RECOMMENDATION
--------------
[SHIP / NEEDS WORK / BLOCKED]
```
## List Evals
`/eval list`
Show all eval definitions:
```
EVAL DEFINITIONS
================
feature-auth [3/5 passing] IN PROGRESS
feature-search [5/5 passing] READY
feature-export [0/4 passing] NOT STARTED
```
- Prefer the `eval-harness` skill directly.
- Keep this file only as a compatibility entry point.
## Arguments
$ARGUMENTS:
- `define <name>` - Create new eval definition
- `check <name>` - Run and check evals
- `report <name>` - Generate full report
- `list` - Show all evals
- `clean` - Remove old eval logs (keeps last 10 runs)
`$ARGUMENTS`
## Delegation
Apply the `eval-harness` skill.
- Support the same user intents as before: define, check, report, list, and cleanup.
- Keep evals capability-first, regression-backed, and evidence-based.
- Use the skill as the canonical evaluator instead of maintaining a separate command-specific playbook.