feat: add 6 gap-closing skills — browser QA, design system, product lens, canary watch, benchmark, safety guard

Closes competitive gaps with gstack:
- browser-qa: automated visual testing via browser MCP
- design-system: generate, audit, and detect AI slop in UI
- product-lens: product diagnostic, founder review, feature prioritization
- canary-watch: post-deploy monitoring with alert thresholds
- benchmark: performance baseline and regression detection
- safety-guard: prevent destructive operations in autonomous sessions
This commit is contained in:
Affaan Mustafa
2026-03-23 04:31:10 -07:00
parent bacc585b87
commit df4f2df297
6 changed files with 485 additions and 0 deletions

87
skills/benchmark/SKILL.md Normal file
View File

@@ -0,0 +1,87 @@
# Benchmark — Performance Baseline & Regression Detection
## When to Use
- Before and after a PR to measure performance impact
- Setting up performance baselines for a project
- When users report "it feels slow"
- Before a launch — ensure you meet performance targets
- Comparing your stack against alternatives
## How It Works
### Mode 1: Page Performance
Measures real browser metrics via browser MCP:
```
1. Navigate to each target URL
2. Measure Core Web Vitals:
- LCP (Largest Contentful Paint) — target < 2.5s
- CLS (Cumulative Layout Shift) — target < 0.1
- INP (Interaction to Next Paint) — target < 200ms
- FCP (First Contentful Paint) — target < 1.8s
- TTFB (Time to First Byte) — target < 800ms
3. Measure resource sizes:
- Total page weight (target < 1MB)
- JS bundle size (target < 200KB gzipped)
- CSS size
- Image weight
- Third-party script weight
4. Count network requests
5. Check for render-blocking resources
```
### Mode 2: API Performance
Benchmarks API endpoints:
```
1. Hit each endpoint 100 times
2. Measure: p50, p95, p99 latency
3. Track: response size, status codes
4. Test under load: 10 concurrent requests
5. Compare against SLA targets
```
### Mode 3: Build Performance
Measures development feedback loop:
```
1. Cold build time
2. Hot reload time (HMR)
3. Test suite duration
4. TypeScript check time
5. Lint time
6. Docker build time
```
### Mode 4: Before/After Comparison
Run before and after a change to measure impact:
```
/benchmark baseline # saves current metrics
# ... make changes ...
/benchmark compare # compares against baseline
```
Output:
```
| Metric | Before | After | Delta | Verdict |
|--------|--------|-------|-------|---------|
| LCP | 1.2s | 1.4s | +200ms | ⚠ WARN |
| Bundle | 180KB | 175KB | -5KB | ✓ BETTER |
| Build | 12s | 14s | +2s | ⚠ WARN |
```
## Output
Stores baselines in `.ecc/benchmarks/` as JSON. Git-tracked so the team shares baselines.
## Integration
- CI: run `/benchmark compare` on every PR
- Pair with `/canary-watch` for post-deploy monitoring
- Pair with `/browser-qa` for full pre-ship checklist