feat: add 6 gap-closing skills — browser QA, design system, product lens, canary watch, benchmark, safety guard

Closes competitive gaps with gstack: - browser-qa: automated visual testing via browser MCP - design-system: generate, audit, and detect AI slop in UI - product-lens: product diagnostic, founder review, feature prioritization - canary-watch: post-deploy monitoring with alert thresholds - benchmark: performance baseline and regression detection - safety-guard: prevent destructive operations in autonomous sessions
2026-05-15 13:23:13 +08:00 · 2026-03-23 04:31:10 -07:00
parent bacc585b87
commit df4f2df297
6 changed files with 485 additions and 0 deletions
--- a/skills/benchmark/SKILL.md
+++ b/skills/benchmark/SKILL.md
@@ -0,0 +1,87 @@
+# Benchmark — Performance Baseline & Regression Detection
+
+## When to Use
+
+- Before and after a PR to measure performance impact
+- Setting up performance baselines for a project
+- When users report "it feels slow"
+- Before a launch — ensure you meet performance targets
+- Comparing your stack against alternatives
+
+## How It Works
+
+### Mode 1: Page Performance
+
+Measures real browser metrics via browser MCP:
+
+```
+1. Navigate to each target URL
+2. Measure Core Web Vitals:
+   - LCP (Largest Contentful Paint) — target < 2.5s
+   - CLS (Cumulative Layout Shift) — target < 0.1
+   - INP (Interaction to Next Paint) — target < 200ms
+   - FCP (First Contentful Paint) — target < 1.8s
+   - TTFB (Time to First Byte) — target < 800ms
+3. Measure resource sizes:
+   - Total page weight (target < 1MB)
+   - JS bundle size (target < 200KB gzipped)
+   - CSS size
+   - Image weight
+   - Third-party script weight
+4. Count network requests
+5. Check for render-blocking resources
+```
+
+### Mode 2: API Performance
+
+Benchmarks API endpoints:
+
+```
+1. Hit each endpoint 100 times
+2. Measure: p50, p95, p99 latency
+3. Track: response size, status codes
+4. Test under load: 10 concurrent requests
+5. Compare against SLA targets
+```
+
+### Mode 3: Build Performance
+
+Measures development feedback loop:
+
+```
+1. Cold build time
+2. Hot reload time (HMR)
+3. Test suite duration
+4. TypeScript check time
+5. Lint time
+6. Docker build time
+```
+
+### Mode 4: Before/After Comparison
+
+Run before and after a change to measure impact:
+
+```
+/benchmark baseline    # saves current metrics
+# ... make changes ...
+/benchmark compare     # compares against baseline
+```
+
+Output:
+```
+| Metric | Before | After | Delta | Verdict |
+|--------|--------|-------|-------|---------|
+| LCP | 1.2s | 1.4s | +200ms | ⚠ WARN |
+| Bundle | 180KB | 175KB | -5KB | ✓ BETTER |
+| Build | 12s | 14s | +2s | ⚠ WARN |
+```
+
+## Output
+
+Stores baselines in `.ecc/benchmarks/` as JSON. Git-tracked so the team shares baselines.
+
+## Integration
+
+- CI: run `/benchmark compare` on every PR
+- Pair with `/canary-watch` for post-deploy monitoring
+- Pair with `/browser-qa` for full pre-ship checklist
--- a/skills/browser-qa/SKILL.md
+++ b/skills/browser-qa/SKILL.md
@@ -0,0 +1,81 @@
+# Browser QA — Automated Visual Testing & Interaction
+
+## When to Use
+
+- After deploying a feature to staging/preview
+- When you need to verify UI behavior across pages
+- Before shipping — confirm layouts, forms, interactions actually work
+- When reviewing PRs that touch frontend code
+- Accessibility audits and responsive testing
+
+## How It Works
+
+Uses the browser automation MCP (claude-in-chrome, Playwright, or Puppeteer) to interact with live pages like a real user.
+
+### Phase 1: Smoke Test
+```
+1. Navigate to target URL
+2. Check for console errors (filter noise: analytics, third-party)
+3. Verify no 4xx/5xx in network requests
+4. Screenshot above-the-fold on desktop + mobile viewport
+5. Check Core Web Vitals: LCP < 2.5s, CLS < 0.1, INP < 200ms
+```
+
+### Phase 2: Interaction Test
+```
+1. Click every nav link — verify no dead links
+2. Submit forms with valid data — verify success state
+3. Submit forms with invalid data — verify error state
+4. Test auth flow: login → protected page → logout
+5. Test critical user journeys (checkout, onboarding, search)
+```
+
+### Phase 3: Visual Regression
+```
+1. Screenshot key pages at 3 breakpoints (375px, 768px, 1440px)
+2. Compare against baseline screenshots (if stored)
+3. Flag layout shifts > 5px, missing elements, overflow
+4. Check dark mode if applicable
+```
+
+### Phase 4: Accessibility
+```
+1. Run axe-core or equivalent on each page
+2. Flag WCAG AA violations (contrast, labels, focus order)
+3. Verify keyboard navigation works end-to-end
+4. Check screen reader landmarks
+```
+
+## Output Format
+
+```markdown
+## QA Report — [URL] — [timestamp]
+
+### Smoke Test
+- Console errors: 0 critical, 2 warnings (analytics noise)
+- Network: all 200/304, no failures
+- Core Web Vitals: LCP 1.2s ✓, CLS 0.02 ✓, INP 89ms ✓
+
+### Interactions
+- [✓] Nav links: 12/12 working
+- [✗] Contact form: missing error state for invalid email
+- [✓] Auth flow: login/logout working
+
+### Visual
+- [✗] Hero section overflows on 375px viewport
+- [✓] Dark mode: all pages consistent
+
+### Accessibility
+- 2 AA violations: missing alt text on hero image, low contrast on footer links
+
+### Verdict: SHIP WITH FIXES (2 issues, 0 blockers)
+```
+
+## Integration
+
+Works with any browser MCP:
+- `mChild__claude-in-chrome__*` tools (preferred — uses your actual Chrome)
+- Playwright via `mcp__browserbase__*`
+- Direct Puppeteer scripts
+
+Pair with `/canary-watch` for post-deploy monitoring.
--- a/skills/canary-watch/SKILL.md
+++ b/skills/canary-watch/SKILL.md
@@ -0,0 +1,93 @@
+# Canary Watch — Post-Deploy Monitoring
+
+## When to Use
+
+- After deploying to production or staging
+- After merging a risky PR
+- When you want to verify a fix actually fixed it
+- Continuous monitoring during a launch window
+- After dependency upgrades
+
+## How It Works
+
+Monitors a deployed URL for regressions. Runs in a loop until stopped or until the watch window expires.
+
+### What It Watches
+
+```
+1. HTTP Status — is the page returning 200?
+2. Console Errors — new errors that weren't there before?
+3. Network Failures — failed API calls, 5xx responses?
+4. Performance — LCP/CLS/INP regression vs baseline?
+5. Content — did key elements disappear? (h1, nav, footer, CTA)
+6. API Health — are critical endpoints responding within SLA?
+```
+
+### Watch Modes
+
+**Quick check** (default): single pass, report results
+```
+/canary-watch https://myapp.com
+```
+
+**Sustained watch**: check every N minutes for M hours
+```
+/canary-watch https://myapp.com --interval 5m --duration 2h
+```
+
+**Diff mode**: compare staging vs production
+```
+/canary-watch --compare https://staging.myapp.com https://myapp.com
+```
+
+### Alert Thresholds
+
+```yaml
+critical:  # immediate alert
+  - HTTP status != 200
+  - Console error count > 5 (new errors only)
+  - LCP > 4s
+  - API endpoint returns 5xx
+
+warning:   # flag in report
+  - LCP increased > 500ms from baseline
+  - CLS > 0.1
+  - New console warnings
+  - Response time > 2x baseline
+
+info:      # log only
+  - Minor performance variance
+  - New network requests (third-party scripts added?)
+```
+
+### Notifications
+
+When a critical threshold is crossed:
+- Desktop notification (macOS/Linux)
+- Optional: Slack/Discord webhook
+- Log to `~/.claude/canary-watch.log`
+
+## Output
+
+```markdown
+## Canary Report — myapp.com — 2026-03-23 03:15 PST
+
+### Status: HEALTHY ✓
+
+| Check | Result | Baseline | Delta |
+|-------|--------|----------|-------|
+| HTTP | 200 ✓ | 200 | — |
+| Console errors | 0 ✓ | 0 | — |
+| LCP | 1.8s ✓ | 1.6s | +200ms |
+| CLS | 0.01 ✓ | 0.01 | — |
+| API /health | 145ms ✓ | 120ms | +25ms |
+
+### No regressions detected. Deploy is clean.
+```
+
+## Integration
+
+Pair with:
+- `/browser-qa` for pre-deploy verification
+- Hooks: add as a PostToolUse hook on `git push` to auto-check after deploys
+- CI: run in GitHub Actions after deploy step
--- a/skills/design-system/SKILL.md
+++ b/skills/design-system/SKILL.md
@@ -0,0 +1,76 @@
+# Design System — Generate & Audit Visual Systems
+
+## When to Use
+
+- Starting a new project that needs a design system
+- Auditing an existing codebase for visual consistency
+- Before a redesign — understand what you have
+- When the UI looks "off" but you can't pinpoint why
+- Reviewing PRs that touch styling
+
+## How It Works
+
+### Mode 1: Generate Design System
+
+Analyzes your codebase and generates a cohesive design system:
+
+```
+1. Scan CSS/Tailwind/styled-components for existing patterns
+2. Extract: colors, typography, spacing, border-radius, shadows, breakpoints
+3. Research 3 competitor sites for inspiration (via browser MCP)
+4. Propose a design token set (JSON + CSS custom properties)
+5. Generate DESIGN.md with rationale for each decision
+6. Create an interactive HTML preview page (self-contained, no deps)
+```
+
+Output: `DESIGN.md` + `design-tokens.json` + `design-preview.html`
+
+### Mode 2: Visual Audit
+
+Scores your UI across 10 dimensions (0-10 each):
+
+```
+1. Color consistency — are you using your palette or random hex values?
+2. Typography hierarchy — clear h1 > h2 > h3 > body > caption?
+3. Spacing rhythm — consistent scale (4px/8px/16px) or arbitrary?
+4. Component consistency — do similar elements look similar?
+5. Responsive behavior — fluid or broken at breakpoints?
+6. Dark mode — complete or half-done?
+7. Animation — purposeful or gratuitous?
+8. Accessibility — contrast ratios, focus states, touch targets
+9. Information density — cluttered or clean?
+10. Polish — hover states, transitions, loading states, empty states
+```
+
+Each dimension gets a score, specific examples, and a fix with exact file:line.
+
+### Mode 3: AI Slop Detection
+
+Identifies generic AI-generated design patterns:
+
+```
+- Gratuitous gradients on everything
+- Purple-to-blue defaults
+- "Glass morphism" cards with no purpose
+- Rounded corners on things that shouldn't be rounded
+- Excessive animations on scroll
+- Generic hero with centered text over stock gradient
+- Sans-serif font stack with no personality
+```
+
+## Examples
+
+**Generate for a SaaS app:**
+```
+/design-system generate --style minimal --palette earth-tones
+```
+
+**Audit existing UI:**
+```
+/design-system audit --url http://localhost:3000 --pages / /pricing /docs
+```
+
+**Check for AI slop:**
+```
+/design-system slop-check
+```
--- a/skills/product-lens/SKILL.md
+++ b/skills/product-lens/SKILL.md
@@ -0,0 +1,79 @@
+# Product Lens — Think Before You Build
+
+## When to Use
+
+- Before starting any feature — validate the "why"
+- Weekly product review — are we building the right thing?
+- When stuck choosing between features
+- Before a launch — sanity check the user journey
+- When converting a vague idea into a spec
+
+## How It Works
+
+### Mode 1: Product Diagnostic
+
+Like YC office hours but automated. Asks the hard questions:
+
+```
+1. Who is this for? (specific person, not "developers")
+2. What's the pain? (quantify: how often, how bad, what do they do today?)
+3. Why now? (what changed that makes this possible/necessary?)
+4. What's the 10-star version? (if money/time were unlimited)
+5. What's the MVP? (smallest thing that proves the thesis)
+6. What's the anti-goal? (what are you explicitly NOT building?)
+7. How do you know it's working? (metric, not vibes)
+```
+
+Output: a `PRODUCT-BRIEF.md` with answers, risks, and a go/no-go recommendation.
+
+### Mode 2: Founder Review
+
+Reviews your current project through a founder lens:
+
+```
+1. Read README, CLAUDE.md, package.json, recent commits
+2. Infer: what is this trying to be?
+3. Score: product-market fit signals (0-10)
+   - Usage growth trajectory
+   - Retention indicators (repeat contributors, return users)
+   - Revenue signals (pricing page, billing code, Stripe integration)
+   - Competitive moat (what's hard to copy?)
+4. Identify: the one thing that would 10x this
+5. Flag: things you're building that don't matter
+```
+
+### Mode 3: User Journey Audit
+
+Maps the actual user experience:
+
+```
+1. Clone/install the product as a new user
+2. Document every friction point (confusing steps, errors, missing docs)
+3. Time each step
+4. Compare to competitor onboarding
+5. Score: time-to-value (how long until the user gets their first win?)
+6. Recommend: top 3 fixes for onboarding
+```
+
+### Mode 4: Feature Prioritization
+
+When you have 10 ideas and need to pick 2:
+
+```
+1. List all candidate features
+2. Score each on: impact (1-5) × confidence (1-5) ÷ effort (1-5)
+3. Rank by ICE score
+4. Apply constraints: runway, team size, dependencies
+5. Output: prioritized roadmap with rationale
+```
+
+## Output
+
+All modes output actionable docs, not essays. Every recommendation has a specific next step.
+
+## Integration
+
+Pair with:
+- `/browser-qa` to verify the user journey audit findings
+- `/design-system audit` for visual polish assessment
+- `/canary-watch` for post-launch monitoring
--- a/skills/safety-guard/SKILL.md
+++ b/skills/safety-guard/SKILL.md
@@ -0,0 +1,69 @@
+# Safety Guard — Prevent Destructive Operations
+
+## When to Use
+
+- When working on production systems
+- When agents are running autonomously (full-auto mode)
+- When you want to restrict edits to a specific directory
+- During sensitive operations (migrations, deploys, data changes)
+
+## How It Works
+
+Three modes of protection:
+
+### Mode 1: Careful Mode
+
+Intercepts destructive commands before execution and warns:
+
+```
+Watched patterns:
+- rm -rf (especially /, ~, or project root)
+- git push --force
+- git reset --hard
+- git checkout . (discard all changes)
+- DROP TABLE / DROP DATABASE
+- docker system prune
+- kubectl delete
+- chmod 777
+- sudo rm
+- npm publish (accidental publishes)
+- Any command with --no-verify
+```
+
+When detected: shows what the command does, asks for confirmation, suggests safer alternative.
+
+### Mode 2: Freeze Mode
+
+Locks file edits to a specific directory tree:
+
+```
+/safety-guard freeze src/components/
+```
+
+Any Write/Edit outside `src/components/` is blocked with an explanation. Useful when you want an agent to focus on one area without touching unrelated code.
+
+### Mode 3: Guard Mode (Careful + Freeze combined)
+
+Both protections active. Maximum safety for autonomous agents.
+
+```
+/safety-guard guard --dir src/api/ --allow-read-all
+```
+
+Agents can read anything but only write to `src/api/`. Destructive commands are blocked everywhere.
+
+### Unlock
+
+```
+/safety-guard off
+```
+
+## Implementation
+
+Uses PreToolUse hooks to intercept Bash, Write, Edit, and MultiEdit tool calls. Checks the command/path against the active rules before allowing execution.
+
+## Integration
+
+- Enable by default for `codex -a never` sessions
+- Pair with observability risk scoring in ECC 2.0
+- Logs all blocked actions to `~/.claude/safety-guard.log`