mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-04-12 04:33:29 +08:00
136 lines
3.9 KiB
Markdown
136 lines
3.9 KiB
Markdown
---
|
|
name: agentic-engineering
|
|
description: >
|
|
Operate as an agentic engineer using eval-first execution, decomposition,
|
|
and cost-aware model routing. Use when AI agents perform most implementation
|
|
work and humans enforce quality and risk controls.
|
|
metadata:
|
|
origin: ECC
|
|
---
|
|
|
|
# Agentic Engineering
|
|
|
|
Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.
|
|
|
|
## Operating Principles
|
|
|
|
1. Define completion criteria before execution.
|
|
2. Decompose work into agent-sized units.
|
|
3. Route model tiers by task complexity.
|
|
4. Measure with evals and regression checks.
|
|
|
|
## Eval-First Loop
|
|
|
|
1. Define capability eval and regression eval.
|
|
2. Run baseline and capture failure signatures.
|
|
3. Execute implementation.
|
|
4. Re-run evals and compare deltas.
|
|
|
|
**Example workflow:**
|
|
```
|
|
1. Write test that captures desired behavior (eval)
|
|
2. Run test → capture baseline failures
|
|
3. Implement feature
|
|
4. Re-run test → verify improvements
|
|
5. Check for regressions in other tests
|
|
```
|
|
|
|
## Task Decomposition
|
|
|
|
Apply the 15-minute unit rule:
|
|
- Each unit should be independently verifiable
|
|
- Each unit should have a single dominant risk
|
|
- Each unit should expose a clear done condition
|
|
|
|
**Good decomposition:**
|
|
```
|
|
Task: Add user authentication
|
|
├─ Unit 1: Add password hashing (15 min, security risk)
|
|
├─ Unit 2: Create login endpoint (15 min, API contract risk)
|
|
├─ Unit 3: Add session management (15 min, state risk)
|
|
└─ Unit 4: Protect routes with middleware (15 min, auth logic risk)
|
|
```
|
|
|
|
**Bad decomposition:**
|
|
```
|
|
Task: Add user authentication (2 hours, multiple risks)
|
|
```
|
|
|
|
## Model Routing
|
|
|
|
Choose model tier based on task complexity:
|
|
|
|
- **Haiku**: Classification, boilerplate transforms, narrow edits
|
|
- Example: Rename variable, add type annotation, format code
|
|
|
|
- **Sonnet**: Implementation and refactors
|
|
- Example: Implement feature, refactor module, write tests
|
|
|
|
- **Opus**: Architecture, root-cause analysis, multi-file invariants
|
|
- Example: Design system, debug complex issue, review architecture
|
|
|
|
**Cost discipline:** Escalate model tier only when lower tier fails with a clear reasoning gap.
|
|
|
|
## Session Strategy
|
|
|
|
- **Continue session** for closely-coupled units
|
|
- Example: Implementing related functions in same module
|
|
|
|
- **Start fresh session** after major phase transitions
|
|
- Example: Moving from implementation to testing
|
|
|
|
- **Compact after milestone completion**, not during active debugging
|
|
- Example: After feature complete, before starting next feature
|
|
|
|
## Review Focus for AI-Generated Code
|
|
|
|
Prioritize:
|
|
- Invariants and edge cases
|
|
- Error boundaries
|
|
- Security and auth assumptions
|
|
- Hidden coupling and rollout risk
|
|
|
|
Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.
|
|
|
|
**Review checklist:**
|
|
- [ ] Edge cases handled (null, empty, boundary values)
|
|
- [ ] Error handling comprehensive
|
|
- [ ] Security assumptions validated
|
|
- [ ] No hidden coupling between modules
|
|
- [ ] Rollout risk assessed (breaking changes, migrations)
|
|
|
|
## Cost Discipline
|
|
|
|
Track per task:
|
|
- Model tier used
|
|
- Token estimate
|
|
- Retries needed
|
|
- Wall-clock time
|
|
- Success/failure outcome
|
|
|
|
**Example tracking:**
|
|
```
|
|
Task: Implement user login
|
|
Model: Sonnet
|
|
Tokens: ~5k input, ~2k output
|
|
Retries: 1 (initial implementation had auth bug)
|
|
Time: 8 minutes
|
|
Outcome: Success
|
|
```
|
|
|
|
## When to Use This Skill
|
|
|
|
- Managing AI-driven development workflows
|
|
- Planning agent task decomposition
|
|
- Optimizing model tier selection
|
|
- Implementing eval-first development
|
|
- Reviewing AI-generated code
|
|
- Tracking development costs
|
|
|
|
## Integration with Other Skills
|
|
|
|
- **tdd-workflow**: Combine with eval-first loop for test-driven development
|
|
- **verification-loop**: Use for continuous validation during implementation
|
|
- **search-first**: Apply before implementation to find existing solutions
|
|
- **coding-standards**: Reference during code review phase
|