---
name: agentic-engineering
description: >
  Operate as an agentic engineer using eval-first execution, decomposition,
  and cost-aware model routing. Use when AI agents perform most implementation
  work and humans enforce quality and risk controls.
metadata:
  origin: ECC
---

# Agentic Engineering

Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.

## Operating Principles

1. Define completion criteria before execution.
2. Decompose work into agent-sized units.
3. Route model tiers by task complexity.
4. Measure with evals and regression checks.

## Eval-First Loop

1. Define capability eval and regression eval.
2. Run baseline and capture failure signatures.
3. Execute implementation.
4. Re-run evals and compare deltas.

**Example workflow:**
```
1. Write test that captures desired behavior (eval)
2. Run test → capture baseline failures
3. Implement feature
4. Re-run test → verify improvements
5. Check for regressions in other tests
```

## Task Decomposition

Apply the 15-minute unit rule:
- Each unit should be independently verifiable
- Each unit should have a single dominant risk
- Each unit should expose a clear done condition

**Good decomposition:**
```
Task: Add user authentication
├─ Unit 1: Add password hashing (15 min, security risk)
├─ Unit 2: Create login endpoint (15 min, API contract risk)
├─ Unit 3: Add session management (15 min, state risk)
└─ Unit 4: Protect routes with middleware (15 min, auth logic risk)
```

**Bad decomposition:**
```
Task: Add user authentication (2 hours, multiple risks)
```

## Model Routing

Choose model tier based on task complexity:

- **Haiku**: Classification, boilerplate transforms, narrow edits
  - Example: Rename variable, add type annotation, format code

- **Sonnet**: Implementation and refactors
  - Example: Implement feature, refactor module, write tests

- **Opus**: Architecture, root-cause analysis, multi-file invariants
  - Example: Design system, debug complex issue, review architecture

**Cost discipline:** Escalate model tier only when lower tier fails with a clear reasoning gap.

## Session Strategy

- **Continue session** for closely-coupled units
  - Example: Implementing related functions in same module

- **Start fresh session** after major phase transitions
  - Example: Moving from implementation to testing

- **Compact after milestone completion**, not during active debugging
  - Example: After feature complete, before starting next feature

## Review Focus for AI-Generated Code

Prioritize:
- Invariants and edge cases
- Error boundaries
- Security and auth assumptions
- Hidden coupling and rollout risk

Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.

**Review checklist:**
- [ ] Edge cases handled (null, empty, boundary values)
- [ ] Error handling comprehensive
- [ ] Security assumptions validated
- [ ] No hidden coupling between modules
- [ ] Rollout risk assessed (breaking changes, migrations)

## Cost Discipline

Track per task:
- Model tier used
- Token estimate
- Retries needed
- Wall-clock time
- Success/failure outcome

**Example tracking:**
```
Task: Implement user login
Model: Sonnet
Tokens: ~5k input, ~2k output
Retries: 1 (initial implementation had auth bug)
Time: 8 minutes
Outcome: Success
```

## When to Use This Skill

- Managing AI-driven development workflows
- Planning agent task decomposition
- Optimizing model tier selection
- Implementing eval-first development
- Reviewing AI-generated code
- Tracking development costs

## Integration with Other Skills

- **tdd-workflow**: Combine with eval-first loop for test-driven development
- **verification-loop**: Use for continuous validation during implementation
- **search-first**: Apply before implementation to find existing solutions
- **coding-standards**: Reference during code review phase