feat: add ai-regression-testing skill (#433)

Patterns for catching regressions introduced by AI coding agents. Covers sandbox/production parity testing, API response shape verification, and integration with bug-check workflows. Based on real-world experience where AI (Claude Code) introduced the same bug 4 times because the same model wrote and reviewed the code — only automated tests caught it. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-15 13:23:13 +08:00 · 2026-03-17 05:35:31 +09:00
parent 113119dc6f
commit c2f2f9517c
1 changed files with 385 additions and 0 deletions
--- a/skills/ai-regression-testing/SKILL.md
+++ b/skills/ai-regression-testing/SKILL.md
@@ -0,0 +1,385 @@
+---
+name: ai-regression-testing
+description: Regression testing strategies for AI-assisted development. Sandbox-mode API testing without database dependencies, automated bug-check workflows, and patterns to catch AI blind spots where the same model writes and reviews code.
+origin: ECC
+---
+
+# AI Regression Testing
+
+Testing patterns specifically designed for AI-assisted development, where the same model writes code and reviews it — creating systematic blind spots that only automated tests can catch.
+
+## When to Activate
+
+- AI agent (Claude Code, Cursor, Codex) has modified API routes or backend logic
+- A bug was found and fixed — need to prevent re-introduction
+- Project has a sandbox/mock mode that can be leveraged for DB-free testing
+- Running `/bug-check` or similar review commands after code changes
+- Multiple code paths exist (sandbox vs production, feature flags, etc.)
+
+## The Core Problem
+
+When an AI writes code and then reviews its own work, it carries the same assumptions into both steps. This creates a predictable failure pattern:
+
+```
+AI writes fix → AI reviews fix → AI says "looks correct" → Bug still exists
+```
+
+**Real-world example** (observed in production):
+
+```
+Fix 1: Added notification_settings to API response
+  → Forgot to add it to the SELECT query
+  → AI reviewed and missed it (same blind spot)
+
+Fix 2: Added it to SELECT query
+  → TypeScript build error (column not in generated types)
+  → AI reviewed Fix 1 but didn't catch the SELECT issue
+
+Fix 3: Changed to SELECT *
+  → Fixed production path, forgot sandbox path
+  → AI reviewed and missed it AGAIN (4th occurrence)
+
+Fix 4: Test caught it instantly on first run ✅
+```
+
+The pattern: **sandbox/production path inconsistency** is the #1 AI-introduced regression.
+
+## Sandbox-Mode API Testing
+
+Most projects with AI-friendly architecture have a sandbox/mock mode. This is the key to fast, DB-free API testing.
+
+### Setup (Vitest + Next.js App Router)
+
+```typescript
+// vitest.config.ts
+import { defineConfig } from "vitest/config";
+import path from "path";
+
+export default defineConfig({
+  test: {
+    environment: "node",
+    globals: true,
+    include: ["__tests__/**/*.test.ts"],
+    setupFiles: ["__tests__/setup.ts"],
+  },
+  resolve: {
+    alias: {
+      "@": path.resolve(__dirname, "."),
+    },
+  },
+});
+```
+
+```typescript
+// __tests__/setup.ts
+// Force sandbox mode — no database needed
+process.env.SANDBOX_MODE = "true";
+process.env.NEXT_PUBLIC_SUPABASE_URL = "";
+process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";
+```
+
+### Test Helper for Next.js API Routes
+
+```typescript
+// __tests__/helpers.ts
+import { NextRequest } from "next/server";
+
+export function createTestRequest(
+  url: string,
+  options?: {
+    method?: string;
+    body?: Record<string, unknown>;
+    headers?: Record<string, string>;
+    sandboxUserId?: string;
+  },
+): NextRequest {
+  const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
+  const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
+  const reqHeaders: Record<string, string> = { ...headers };
+
+  if (sandboxUserId) {
+    reqHeaders["x-sandbox-user-id"] = sandboxUserId;
+  }
+
+  const init: { method: string; headers: Record<string, string>; body?: string } = {
+    method,
+    headers: reqHeaders,
+  };
+
+  if (body) {
+    init.body = JSON.stringify(body);
+    reqHeaders["content-type"] = "application/json";
+  }
+
+  return new NextRequest(fullUrl, init);
+}
+
+export async function parseResponse(response: Response) {
+  const json = await response.json();
+  return { status: response.status, json };
+}
+```
+
+### Writing Regression Tests
+
+The key principle: **write tests for bugs that were found, not for code that works**.
+
+```typescript
+// __tests__/api/user/profile.test.ts
+import { describe, it, expect } from "vitest";
+import { createTestRequest, parseResponse } from "../../helpers";
+import { GET, PATCH } from "@/app/api/user/profile/route";
+
+// Define the contract — what fields MUST be in the response
+const REQUIRED_FIELDS = [
+  "id",
+  "email",
+  "full_name",
+  "phone",
+  "role",
+  "created_at",
+  "avatar_url",
+  "notification_settings",  // ← Added after bug found it missing
+];
+
+describe("GET /api/user/profile", () => {
+  it("returns all required fields", async () => {
+    const req = createTestRequest("/api/user/profile");
+    const res = await GET(req);
+    const { status, json } = await parseResponse(res);
+
+    expect(status).toBe(200);
+    for (const field of REQUIRED_FIELDS) {
+      expect(json.data).toHaveProperty(field);
+    }
+  });
+
+  // Regression test — this exact bug was introduced by AI 4 times
+  it("notification_settings is not undefined (BUG-R1 regression)", async () => {
+    const req = createTestRequest("/api/user/profile");
+    const res = await GET(req);
+    const { json } = await parseResponse(res);
+
+    expect("notification_settings" in json.data).toBe(true);
+    const ns = json.data.notification_settings;
+    expect(ns === null || typeof ns === "object").toBe(true);
+  });
+});
+```
+
+### Testing Sandbox/Production Parity
+
+The most common AI regression: fixing production path but forgetting sandbox path (or vice versa).
+
+```typescript
+// Test that sandbox responses match the expected contract
+describe("GET /api/user/messages (conversation list)", () => {
+  it("includes partner_name in sandbox mode", async () => {
+    const req = createTestRequest("/api/user/messages", {
+      sandboxUserId: "user-001",
+    });
+    const res = await GET(req);
+    const { json } = await parseResponse(res);
+
+    // This caught a bug where partner_name was added
+    // to production path but not sandbox path
+    if (json.data.length > 0) {
+      for (const conv of json.data) {
+        expect("partner_name" in conv).toBe(true);
+      }
+    }
+  });
+});
+```
+
+## Integrating Tests into Bug-Check Workflow
+
+### Custom Command Definition
+
+```markdown
+<!-- .claude/commands/bug-check.md -->
+# Bug Check
+
+## Step 1: Automated Tests (mandatory, cannot skip)
+
+Run these commands FIRST before any code review:
+
+    npm run test       # Vitest test suite
+    npm run build      # TypeScript type check + build
+
+- If tests fail → report as highest priority bug
+- If build fails → report type errors as highest priority
+- Only proceed to Step 2 if both pass
+
+## Step 2: Code Review (AI review)
+
+1. Sandbox / production path consistency
+2. API response shape matches frontend expectations
+3. SELECT clause completeness
+4. Error handling with rollback
+5. Optimistic update race conditions
+
+## Step 3: For each bug fixed, propose a regression test
+```
+
+### The Workflow
+
+```
+User: "バグチェックして" (or "/bug-check")
+  │
+  ├─ Step 1: npm run test
+  │   ├─ FAIL → Bug found mechanically (no AI judgment needed)
+  │   └─ PASS → Continue
+  │
+  ├─ Step 2: npm run build
+  │   ├─ FAIL → Type error found mechanically
+  │   └─ PASS → Continue
+  │
+  ├─ Step 3: AI code review (with known blind spots in mind)
+  │   └─ Findings reported
+  │
+  └─ Step 4: For each fix, write a regression test
+      └─ Next bug-check catches if fix breaks
+```
+
+## Common AI Regression Patterns
+
+### Pattern 1: Sandbox/Production Path Mismatch
+
+**Frequency**: Most common (observed in 3 out of 4 regressions)
+
+```typescript
+// ❌ AI adds field to production path only
+if (isSandboxMode()) {
+  return { data: { id, email, name } };  // Missing new field
+}
+// Production path
+return { data: { id, email, name, notification_settings } };
+
+// ✅ Both paths must return the same shape
+if (isSandboxMode()) {
+  return { data: { id, email, name, notification_settings: null } };
+}
+return { data: { id, email, name, notification_settings } };
+```
+
+**Test to catch it**:
+
+```typescript
+it("sandbox and production return same fields", async () => {
+  // In test env, sandbox mode is forced ON
+  const res = await GET(createTestRequest("/api/user/profile"));
+  const { json } = await parseResponse(res);
+
+  for (const field of REQUIRED_FIELDS) {
+    expect(json.data).toHaveProperty(field);
+  }
+});
+```
+
+### Pattern 2: SELECT Clause Omission
+
+**Frequency**: Common with Supabase/Prisma when adding new columns
+
+```typescript
+// ❌ New column added to response but not to SELECT
+const { data } = await supabase
+  .from("users")
+  .select("id, email, name")  // notification_settings not here
+  .single();
+
+return { data: { ...data, notification_settings: data.notification_settings } };
+// → notification_settings is always undefined
+
+// ✅ Use SELECT * or explicitly include new columns
+const { data } = await supabase
+  .from("users")
+  .select("*")
+  .single();
+```
+
+### Pattern 3: Error State Leakage
+
+**Frequency**: Moderate — when adding error handling to existing components
+
+```typescript
+// ❌ Error state set but old data not cleared
+catch (err) {
+  setError("Failed to load");
+  // reservations still shows data from previous tab!
+}
+
+// ✅ Clear related state on error
+catch (err) {
+  setReservations([]);  // Clear stale data
+  setError("Failed to load");
+}
+```
+
+### Pattern 4: Optimistic Update Without Proper Rollback
+
+```typescript
+// ❌ No rollback on failure
+const handleRemove = async (id: string) => {
+  setItems(prev => prev.filter(i => i.id !== id));
+  await fetch(`/api/items/${id}`, { method: "DELETE" });
+  // If API fails, item is gone from UI but still in DB
+};
+
+// ✅ Capture previous state and rollback on failure
+const handleRemove = async (id: string) => {
+  const prevItems = [...items];
+  setItems(prev => prev.filter(i => i.id !== id));
+  try {
+    const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
+    if (!res.ok) throw new Error("API error");
+  } catch {
+    setItems(prevItems);  // Rollback
+    alert("削除に失敗しました");
+  }
+};
+```
+
+## Strategy: Test Where Bugs Were Found
+
+Don't aim for 100% coverage. Instead:
+
+```
+Bug found in /api/user/profile     → Write test for profile API
+Bug found in /api/user/messages    → Write test for messages API
+Bug found in /api/user/favorites   → Write test for favorites API
+No bug in /api/user/notifications  → Don't write test (yet)
+```
+
+**Why this works with AI development:**
+
+1. AI tends to make the **same category of mistake** repeatedly
+2. Bugs cluster in complex areas (auth, multi-path logic, state management)
+3. Once tested, that exact regression **cannot happen again**
+4. Test count grows organically with bug fixes — no wasted effort
+
+## Quick Reference
+
+| AI Regression Pattern | Test Strategy | Priority |
+|---|---|---|
+| Sandbox/production mismatch | Assert same response shape in sandbox mode | 🔴 High |
+| SELECT clause omission | Assert all required fields in response | 🔴 High |
+| Error state leakage | Assert state cleanup on error | 🟡 Medium |
+| Missing rollback | Assert state restored on API failure | 🟡 Medium |
+| Type cast masking null | Assert field is not undefined | 🟡 Medium |
+
+## DO / DON'T
+
+**DO:**
+- Write tests immediately after finding a bug (before fixing it if possible)
+- Test the API response shape, not the implementation
+- Run tests as the first step of every bug-check
+- Keep tests fast (< 1 second total with sandbox mode)
+- Name tests after the bug they prevent (e.g., "BUG-R1 regression")
+
+**DON'T:**
+- Write tests for code that has never had a bug
+- Trust AI self-review as a substitute for automated tests
+- Skip sandbox path testing because "it's just mock data"
+- Write integration tests when unit tests suffice
+- Aim for coverage percentage — aim for regression prevention