From df4f2df297847687bd8835c440803de36966b2c9 Mon Sep 17 00:00:00 2001 From: Affaan Mustafa Date: Mon, 23 Mar 2026 04:31:10 -0700 Subject: [PATCH] =?UTF-8?q?feat:=20add=206=20gap-closing=20skills=20?= =?UTF-8?q?=E2=80=94=20browser=20QA,=20design=20system,=20product=20lens,?= =?UTF-8?q?=20canary=20watch,=20benchmark,=20safety=20guard?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes competitive gaps with gstack: - browser-qa: automated visual testing via browser MCP - design-system: generate, audit, and detect AI slop in UI - product-lens: product diagnostic, founder review, feature prioritization - canary-watch: post-deploy monitoring with alert thresholds - benchmark: performance baseline and regression detection - safety-guard: prevent destructive operations in autonomous sessions --- skills/benchmark/SKILL.md | 87 ++++++++++++++++++++++++++++++++ skills/browser-qa/SKILL.md | 81 ++++++++++++++++++++++++++++++ skills/canary-watch/SKILL.md | 93 +++++++++++++++++++++++++++++++++++ skills/design-system/SKILL.md | 76 ++++++++++++++++++++++++++++ skills/product-lens/SKILL.md | 79 +++++++++++++++++++++++++++++ skills/safety-guard/SKILL.md | 69 ++++++++++++++++++++++++++ 6 files changed, 485 insertions(+) create mode 100644 skills/benchmark/SKILL.md create mode 100644 skills/browser-qa/SKILL.md create mode 100644 skills/canary-watch/SKILL.md create mode 100644 skills/design-system/SKILL.md create mode 100644 skills/product-lens/SKILL.md create mode 100644 skills/safety-guard/SKILL.md diff --git a/skills/benchmark/SKILL.md b/skills/benchmark/SKILL.md new file mode 100644 index 00000000..51ccce8f --- /dev/null +++ b/skills/benchmark/SKILL.md @@ -0,0 +1,87 @@ +# Benchmark — Performance Baseline & Regression Detection + +## When to Use + +- Before and after a PR to measure performance impact +- Setting up performance baselines for a project +- When users report "it feels slow" +- Before a launch — ensure you meet performance targets +- Comparing your stack against alternatives + +## How It Works + +### Mode 1: Page Performance + +Measures real browser metrics via browser MCP: + +``` +1. Navigate to each target URL +2. Measure Core Web Vitals: + - LCP (Largest Contentful Paint) — target < 2.5s + - CLS (Cumulative Layout Shift) — target < 0.1 + - INP (Interaction to Next Paint) — target < 200ms + - FCP (First Contentful Paint) — target < 1.8s + - TTFB (Time to First Byte) — target < 800ms +3. Measure resource sizes: + - Total page weight (target < 1MB) + - JS bundle size (target < 200KB gzipped) + - CSS size + - Image weight + - Third-party script weight +4. Count network requests +5. Check for render-blocking resources +``` + +### Mode 2: API Performance + +Benchmarks API endpoints: + +``` +1. Hit each endpoint 100 times +2. Measure: p50, p95, p99 latency +3. Track: response size, status codes +4. Test under load: 10 concurrent requests +5. Compare against SLA targets +``` + +### Mode 3: Build Performance + +Measures development feedback loop: + +``` +1. Cold build time +2. Hot reload time (HMR) +3. Test suite duration +4. TypeScript check time +5. Lint time +6. Docker build time +``` + +### Mode 4: Before/After Comparison + +Run before and after a change to measure impact: + +``` +/benchmark baseline # saves current metrics +# ... make changes ... +/benchmark compare # compares against baseline +``` + +Output: +``` +| Metric | Before | After | Delta | Verdict | +|--------|--------|-------|-------|---------| +| LCP | 1.2s | 1.4s | +200ms | ⚠ WARN | +| Bundle | 180KB | 175KB | -5KB | ✓ BETTER | +| Build | 12s | 14s | +2s | ⚠ WARN | +``` + +## Output + +Stores baselines in `.ecc/benchmarks/` as JSON. Git-tracked so the team shares baselines. + +## Integration + +- CI: run `/benchmark compare` on every PR +- Pair with `/canary-watch` for post-deploy monitoring +- Pair with `/browser-qa` for full pre-ship checklist diff --git a/skills/browser-qa/SKILL.md b/skills/browser-qa/SKILL.md new file mode 100644 index 00000000..d71a0068 --- /dev/null +++ b/skills/browser-qa/SKILL.md @@ -0,0 +1,81 @@ +# Browser QA — Automated Visual Testing & Interaction + +## When to Use + +- After deploying a feature to staging/preview +- When you need to verify UI behavior across pages +- Before shipping — confirm layouts, forms, interactions actually work +- When reviewing PRs that touch frontend code +- Accessibility audits and responsive testing + +## How It Works + +Uses the browser automation MCP (claude-in-chrome, Playwright, or Puppeteer) to interact with live pages like a real user. + +### Phase 1: Smoke Test +``` +1. Navigate to target URL +2. Check for console errors (filter noise: analytics, third-party) +3. Verify no 4xx/5xx in network requests +4. Screenshot above-the-fold on desktop + mobile viewport +5. Check Core Web Vitals: LCP < 2.5s, CLS < 0.1, INP < 200ms +``` + +### Phase 2: Interaction Test +``` +1. Click every nav link — verify no dead links +2. Submit forms with valid data — verify success state +3. Submit forms with invalid data — verify error state +4. Test auth flow: login → protected page → logout +5. Test critical user journeys (checkout, onboarding, search) +``` + +### Phase 3: Visual Regression +``` +1. Screenshot key pages at 3 breakpoints (375px, 768px, 1440px) +2. Compare against baseline screenshots (if stored) +3. Flag layout shifts > 5px, missing elements, overflow +4. Check dark mode if applicable +``` + +### Phase 4: Accessibility +``` +1. Run axe-core or equivalent on each page +2. Flag WCAG AA violations (contrast, labels, focus order) +3. Verify keyboard navigation works end-to-end +4. Check screen reader landmarks +``` + +## Output Format + +```markdown +## QA Report — [URL] — [timestamp] + +### Smoke Test +- Console errors: 0 critical, 2 warnings (analytics noise) +- Network: all 200/304, no failures +- Core Web Vitals: LCP 1.2s ✓, CLS 0.02 ✓, INP 89ms ✓ + +### Interactions +- [✓] Nav links: 12/12 working +- [✗] Contact form: missing error state for invalid email +- [✓] Auth flow: login/logout working + +### Visual +- [✗] Hero section overflows on 375px viewport +- [✓] Dark mode: all pages consistent + +### Accessibility +- 2 AA violations: missing alt text on hero image, low contrast on footer links + +### Verdict: SHIP WITH FIXES (2 issues, 0 blockers) +``` + +## Integration + +Works with any browser MCP: +- `mChild__claude-in-chrome__*` tools (preferred — uses your actual Chrome) +- Playwright via `mcp__browserbase__*` +- Direct Puppeteer scripts + +Pair with `/canary-watch` for post-deploy monitoring. diff --git a/skills/canary-watch/SKILL.md b/skills/canary-watch/SKILL.md new file mode 100644 index 00000000..88ff5d6a --- /dev/null +++ b/skills/canary-watch/SKILL.md @@ -0,0 +1,93 @@ +# Canary Watch — Post-Deploy Monitoring + +## When to Use + +- After deploying to production or staging +- After merging a risky PR +- When you want to verify a fix actually fixed it +- Continuous monitoring during a launch window +- After dependency upgrades + +## How It Works + +Monitors a deployed URL for regressions. Runs in a loop until stopped or until the watch window expires. + +### What It Watches + +``` +1. HTTP Status — is the page returning 200? +2. Console Errors — new errors that weren't there before? +3. Network Failures — failed API calls, 5xx responses? +4. Performance — LCP/CLS/INP regression vs baseline? +5. Content — did key elements disappear? (h1, nav, footer, CTA) +6. API Health — are critical endpoints responding within SLA? +``` + +### Watch Modes + +**Quick check** (default): single pass, report results +``` +/canary-watch https://myapp.com +``` + +**Sustained watch**: check every N minutes for M hours +``` +/canary-watch https://myapp.com --interval 5m --duration 2h +``` + +**Diff mode**: compare staging vs production +``` +/canary-watch --compare https://staging.myapp.com https://myapp.com +``` + +### Alert Thresholds + +```yaml +critical: # immediate alert + - HTTP status != 200 + - Console error count > 5 (new errors only) + - LCP > 4s + - API endpoint returns 5xx + +warning: # flag in report + - LCP increased > 500ms from baseline + - CLS > 0.1 + - New console warnings + - Response time > 2x baseline + +info: # log only + - Minor performance variance + - New network requests (third-party scripts added?) +``` + +### Notifications + +When a critical threshold is crossed: +- Desktop notification (macOS/Linux) +- Optional: Slack/Discord webhook +- Log to `~/.claude/canary-watch.log` + +## Output + +```markdown +## Canary Report — myapp.com — 2026-03-23 03:15 PST + +### Status: HEALTHY ✓ + +| Check | Result | Baseline | Delta | +|-------|--------|----------|-------| +| HTTP | 200 ✓ | 200 | — | +| Console errors | 0 ✓ | 0 | — | +| LCP | 1.8s ✓ | 1.6s | +200ms | +| CLS | 0.01 ✓ | 0.01 | — | +| API /health | 145ms ✓ | 120ms | +25ms | + +### No regressions detected. Deploy is clean. +``` + +## Integration + +Pair with: +- `/browser-qa` for pre-deploy verification +- Hooks: add as a PostToolUse hook on `git push` to auto-check after deploys +- CI: run in GitHub Actions after deploy step diff --git a/skills/design-system/SKILL.md b/skills/design-system/SKILL.md new file mode 100644 index 00000000..3bf06a41 --- /dev/null +++ b/skills/design-system/SKILL.md @@ -0,0 +1,76 @@ +# Design System — Generate & Audit Visual Systems + +## When to Use + +- Starting a new project that needs a design system +- Auditing an existing codebase for visual consistency +- Before a redesign — understand what you have +- When the UI looks "off" but you can't pinpoint why +- Reviewing PRs that touch styling + +## How It Works + +### Mode 1: Generate Design System + +Analyzes your codebase and generates a cohesive design system: + +``` +1. Scan CSS/Tailwind/styled-components for existing patterns +2. Extract: colors, typography, spacing, border-radius, shadows, breakpoints +3. Research 3 competitor sites for inspiration (via browser MCP) +4. Propose a design token set (JSON + CSS custom properties) +5. Generate DESIGN.md with rationale for each decision +6. Create an interactive HTML preview page (self-contained, no deps) +``` + +Output: `DESIGN.md` + `design-tokens.json` + `design-preview.html` + +### Mode 2: Visual Audit + +Scores your UI across 10 dimensions (0-10 each): + +``` +1. Color consistency — are you using your palette or random hex values? +2. Typography hierarchy — clear h1 > h2 > h3 > body > caption? +3. Spacing rhythm — consistent scale (4px/8px/16px) or arbitrary? +4. Component consistency — do similar elements look similar? +5. Responsive behavior — fluid or broken at breakpoints? +6. Dark mode — complete or half-done? +7. Animation — purposeful or gratuitous? +8. Accessibility — contrast ratios, focus states, touch targets +9. Information density — cluttered or clean? +10. Polish — hover states, transitions, loading states, empty states +``` + +Each dimension gets a score, specific examples, and a fix with exact file:line. + +### Mode 3: AI Slop Detection + +Identifies generic AI-generated design patterns: + +``` +- Gratuitous gradients on everything +- Purple-to-blue defaults +- "Glass morphism" cards with no purpose +- Rounded corners on things that shouldn't be rounded +- Excessive animations on scroll +- Generic hero with centered text over stock gradient +- Sans-serif font stack with no personality +``` + +## Examples + +**Generate for a SaaS app:** +``` +/design-system generate --style minimal --palette earth-tones +``` + +**Audit existing UI:** +``` +/design-system audit --url http://localhost:3000 --pages / /pricing /docs +``` + +**Check for AI slop:** +``` +/design-system slop-check +``` diff --git a/skills/product-lens/SKILL.md b/skills/product-lens/SKILL.md new file mode 100644 index 00000000..63ff7b27 --- /dev/null +++ b/skills/product-lens/SKILL.md @@ -0,0 +1,79 @@ +# Product Lens — Think Before You Build + +## When to Use + +- Before starting any feature — validate the "why" +- Weekly product review — are we building the right thing? +- When stuck choosing between features +- Before a launch — sanity check the user journey +- When converting a vague idea into a spec + +## How It Works + +### Mode 1: Product Diagnostic + +Like YC office hours but automated. Asks the hard questions: + +``` +1. Who is this for? (specific person, not "developers") +2. What's the pain? (quantify: how often, how bad, what do they do today?) +3. Why now? (what changed that makes this possible/necessary?) +4. What's the 10-star version? (if money/time were unlimited) +5. What's the MVP? (smallest thing that proves the thesis) +6. What's the anti-goal? (what are you explicitly NOT building?) +7. How do you know it's working? (metric, not vibes) +``` + +Output: a `PRODUCT-BRIEF.md` with answers, risks, and a go/no-go recommendation. + +### Mode 2: Founder Review + +Reviews your current project through a founder lens: + +``` +1. Read README, CLAUDE.md, package.json, recent commits +2. Infer: what is this trying to be? +3. Score: product-market fit signals (0-10) + - Usage growth trajectory + - Retention indicators (repeat contributors, return users) + - Revenue signals (pricing page, billing code, Stripe integration) + - Competitive moat (what's hard to copy?) +4. Identify: the one thing that would 10x this +5. Flag: things you're building that don't matter +``` + +### Mode 3: User Journey Audit + +Maps the actual user experience: + +``` +1. Clone/install the product as a new user +2. Document every friction point (confusing steps, errors, missing docs) +3. Time each step +4. Compare to competitor onboarding +5. Score: time-to-value (how long until the user gets their first win?) +6. Recommend: top 3 fixes for onboarding +``` + +### Mode 4: Feature Prioritization + +When you have 10 ideas and need to pick 2: + +``` +1. List all candidate features +2. Score each on: impact (1-5) × confidence (1-5) ÷ effort (1-5) +3. Rank by ICE score +4. Apply constraints: runway, team size, dependencies +5. Output: prioritized roadmap with rationale +``` + +## Output + +All modes output actionable docs, not essays. Every recommendation has a specific next step. + +## Integration + +Pair with: +- `/browser-qa` to verify the user journey audit findings +- `/design-system audit` for visual polish assessment +- `/canary-watch` for post-launch monitoring diff --git a/skills/safety-guard/SKILL.md b/skills/safety-guard/SKILL.md new file mode 100644 index 00000000..6b347c02 --- /dev/null +++ b/skills/safety-guard/SKILL.md @@ -0,0 +1,69 @@ +# Safety Guard — Prevent Destructive Operations + +## When to Use + +- When working on production systems +- When agents are running autonomously (full-auto mode) +- When you want to restrict edits to a specific directory +- During sensitive operations (migrations, deploys, data changes) + +## How It Works + +Three modes of protection: + +### Mode 1: Careful Mode + +Intercepts destructive commands before execution and warns: + +``` +Watched patterns: +- rm -rf (especially /, ~, or project root) +- git push --force +- git reset --hard +- git checkout . (discard all changes) +- DROP TABLE / DROP DATABASE +- docker system prune +- kubectl delete +- chmod 777 +- sudo rm +- npm publish (accidental publishes) +- Any command with --no-verify +``` + +When detected: shows what the command does, asks for confirmation, suggests safer alternative. + +### Mode 2: Freeze Mode + +Locks file edits to a specific directory tree: + +``` +/safety-guard freeze src/components/ +``` + +Any Write/Edit outside `src/components/` is blocked with an explanation. Useful when you want an agent to focus on one area without touching unrelated code. + +### Mode 3: Guard Mode (Careful + Freeze combined) + +Both protections active. Maximum safety for autonomous agents. + +``` +/safety-guard guard --dir src/api/ --allow-read-all +``` + +Agents can read anything but only write to `src/api/`. Destructive commands are blocked everywhere. + +### Unlock + +``` +/safety-guard off +``` + +## Implementation + +Uses PreToolUse hooks to intercept Bash, Write, Edit, and MultiEdit tool calls. Checks the command/path against the active rules before allowing execution. + +## Integration + +- Enable by default for `codex -a never` sessions +- Pair with observability risk scoring in ECC 2.0 +- Logs all blocked actions to `~/.claude/safety-guard.log`