feat: deliver v1.8.0 harness reliability and parity updates

2026-06-14 12:11:27 +08:00 · 2026-03-04 14:48:06 -08:00
parent 32e9c293f0
commit 48b883d741
84 changed files with 2990 additions and 725 deletions
@@ -0,0 +1,63 @@
+---
+name: agentic-engineering
+description: Operate as an agentic engineer using eval-first execution, decomposition, and cost-aware model routing.
+origin: ECC
+---
+
+# Agentic Engineering
+
+Use this skill for engineering workflows where AI agents perform most implementation work and humans enforce quality and risk controls.
+
+## Operating Principles
+
+1. Define completion criteria before execution.
+2. Decompose work into agent-sized units.
+3. Route model tiers by task complexity.
+4. Measure with evals and regression checks.
+
+## Eval-First Loop
+
+1. Define capability eval and regression eval.
+2. Run baseline and capture failure signatures.
+3. Execute implementation.
+4. Re-run evals and compare deltas.
+
+## Task Decomposition
+
+Apply the 15-minute unit rule:
+- each unit should be independently verifiable
+- each unit should have a single dominant risk
+- each unit should expose a clear done condition
+
+## Model Routing
+
+- Haiku: classification, boilerplate transforms, narrow edits
+- Sonnet: implementation and refactors
+- Opus: architecture, root-cause analysis, multi-file invariants
+
+## Session Strategy
+
+- Continue session for closely-coupled units.
+- Start fresh session after major phase transitions.
+- Compact after milestone completion, not during active debugging.
+
+## Review Focus for AI-Generated Code
+
+Prioritize:
+- invariants and edge cases
+- error boundaries
+- security and auth assumptions
+- hidden coupling and rollout risk
+
+Do not waste review cycles on style-only disagreements when automated format/lint already enforce style.
+
+## Cost Discipline
+
+Track per task:
+- model
+- token estimate
+- retries
+- wall-clock time
+- success/failure
+
+Escalate model tier only when lower tier fails with a clear reasoning gap.