3.9 KiB
name, description, metadata
| name | description | metadata | ||
|---|---|---|---|---|
| agentic-engineering | Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing. Use when AI agents perform most implementation work and humans enforce quality and risk controls. |
|
Agentic Engineering
Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.
Operating Principles
- Define completion criteria before execution.
- Decompose work into agent-sized units.
- Route model tiers by task complexity.
- Measure with evals and regression checks.
Eval-First Loop
- Define capability eval and regression eval.
- Run baseline and capture failure signatures.
- Execute implementation.
- Re-run evals and compare deltas.
Example workflow:
1. Write test that captures desired behavior (eval)
2. Run test → capture baseline failures
3. Implement feature
4. Re-run test → verify improvements
5. Check for regressions in other tests
Task Decomposition
Apply the 15-minute unit rule:
- Each unit should be independently verifiable
- Each unit should have a single dominant risk
- Each unit should expose a clear done condition
Good decomposition:
Task: Add user authentication
├─ Unit 1: Add password hashing (15 min, security risk)
├─ Unit 2: Create login endpoint (15 min, API contract risk)
├─ Unit 3: Add session management (15 min, state risk)
└─ Unit 4: Protect routes with middleware (15 min, auth logic risk)
Bad decomposition:
Task: Add user authentication (2 hours, multiple risks)
Model Routing
Choose model tier based on task complexity:
-
Haiku: Classification, boilerplate transforms, narrow edits
- Example: Rename variable, add type annotation, format code
-
Sonnet: Implementation and refactors
- Example: Implement feature, refactor module, write tests
-
Opus: Architecture, root-cause analysis, multi-file invariants
- Example: Design system, debug complex issue, review architecture
Cost discipline: Escalate model tier only when lower tier fails with a clear reasoning gap.
Session Strategy
-
Continue session for closely-coupled units
- Example: Implementing related functions in same module
-
Start fresh session after major phase transitions
- Example: Moving from implementation to testing
-
Compact after milestone completion, not during active debugging
- Example: After feature complete, before starting next feature
Review Focus for AI-Generated Code
Prioritize:
- Invariants and edge cases
- Error boundaries
- Security and auth assumptions
- Hidden coupling and rollout risk
Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.
Review checklist:
- Edge cases handled (null, empty, boundary values)
- Error handling comprehensive
- Security assumptions validated
- No hidden coupling between modules
- Rollout risk assessed (breaking changes, migrations)
Cost Discipline
Track per task:
- Model tier used
- Token estimate
- Retries needed
- Wall-clock time
- Success/failure outcome
Example tracking:
Task: Implement user login
Model: Sonnet
Tokens: ~5k input, ~2k output
Retries: 1 (initial implementation had auth bug)
Time: 8 minutes
Outcome: Success
When to Use This Skill
- Managing AI-driven development workflows
- Planning agent task decomposition
- Optimizing model tier selection
- Implementing eval-first development
- Reviewing AI-generated code
- Tracking development costs
Integration with Other Skills
- tdd-workflow: Combine with eval-first loop for test-driven development
- verification-loop: Use for continuous validation during implementation
- search-first: Apply before implementation to find existing solutions
- coding-standards: Reference during code review phase