feat: add 6 gap-closing skills — browser QA, design system, product lens, canary watch, benchmark, safety guard

Closes competitive gaps with gstack: - browser-qa: automated visual testing via browser MCP - design-system: generate, audit, and detect AI slop in UI - product-lens: product diagnostic, founder review, feature prioritization - canary-watch: post-deploy monitoring with alert thresholds - benchmark: performance baseline and regression detection - safety-guard: prevent destructive operations in autonomous sessions
2026-05-19 23:33:07 +08:00 · 2026-03-23 04:31:10 -07:00
parent bacc585b87
commit df4f2df297
6 changed files with 485 additions and 0 deletions
--- a/skills/benchmark/SKILL.md
+++ b/skills/benchmark/SKILL.md
@@ -0,0 +1,87 @@
+# Benchmark — Performance Baseline & Regression Detection
+
+## When to Use
+
+- Before and after a PR to measure performance impact
+- Setting up performance baselines for a project
+- When users report "it feels slow"
+- Before a launch — ensure you meet performance targets
+- Comparing your stack against alternatives
+
+## How It Works
+
+### Mode 1: Page Performance
+
+Measures real browser metrics via browser MCP:
+
+```
+1. Navigate to each target URL
+2. Measure Core Web Vitals:
+   - LCP (Largest Contentful Paint) — target < 2.5s
+   - CLS (Cumulative Layout Shift) — target < 0.1
+   - INP (Interaction to Next Paint) — target < 200ms
+   - FCP (First Contentful Paint) — target < 1.8s
+   - TTFB (Time to First Byte) — target < 800ms
+3. Measure resource sizes:
+   - Total page weight (target < 1MB)
+   - JS bundle size (target < 200KB gzipped)
+   - CSS size
+   - Image weight
+   - Third-party script weight
+4. Count network requests
+5. Check for render-blocking resources
+```
+
+### Mode 2: API Performance
+
+Benchmarks API endpoints:
+
+```
+1. Hit each endpoint 100 times
+2. Measure: p50, p95, p99 latency
+3. Track: response size, status codes
+4. Test under load: 10 concurrent requests
+5. Compare against SLA targets
+```
+
+### Mode 3: Build Performance
+
+Measures development feedback loop:
+
+```
+1. Cold build time
+2. Hot reload time (HMR)
+3. Test suite duration
+4. TypeScript check time
+5. Lint time
+6. Docker build time
+```
+
+### Mode 4: Before/After Comparison
+
+Run before and after a change to measure impact:
+
+```
+/benchmark baseline    # saves current metrics
+# ... make changes ...
+/benchmark compare     # compares against baseline
+```
+
+Output:
+```
+| Metric | Before | After | Delta | Verdict |
+|--------|--------|-------|-------|---------|
+| LCP | 1.2s | 1.4s | +200ms | ⚠ WARN |
+| Bundle | 180KB | 175KB | -5KB | ✓ BETTER |
+| Build | 12s | 14s | +2s | ⚠ WARN |
+```
+
+## Output
+
+Stores baselines in `.ecc/benchmarks/` as JSON. Git-tracked so the team shares baselines.
+
+## Integration
+
+- CI: run `/benchmark compare` on every PR
+- Pair with `/canary-watch` for post-deploy monitoring
+- Pair with `/browser-qa` for full pre-ship checklist