everything-claude-code/commands/gan-build.md at d0e5caebd4fb0afe867cdc2e35c27f80f6aef6b4

mirror of https://github.com/affaan-m/everything-claude-code.git synced 2026-04-01 14:43:28 +08:00

Files

haochen806 4cdfe709ab feat: add GAN-style generator-evaluator harness (#1029 )

Implements Anthropic's March 2026 harness design pattern — a multi-agent
architecture that separates generation from evaluation, creating an
adversarial feedback loop that produces production-quality applications.

Components:
- 3 agent definitions (planner, generator, evaluator)
- 1 skill with full documentation (skills/gan-style-harness/)
- 2 commands (gan-build for full apps, gan-design for frontend)
- 1 shell orchestrator (scripts/gan-harness.sh)
- Examples and configuration reference

Based on: https://www.anthropic.com/engineering/harness-design-long-running-apps

Co-authored-by: Hao Chen <haochen806@gmail.com>

2026-03-31 14:06:20 -07:00

3.1 KiB

Raw Blame History

Parse the following from $ARGUMENTS:

brief — the user's one-line description of what to build
--max-iterations N — (optional, default 15) maximum generator-evaluator cycles
--pass-threshold N — (optional, default 7.0) weighted score to pass
--skip-planner — (optional) skip planner, assume spec.md already exists
--eval-mode MODE — (optional, default "playwright") one of: playwright, screenshot, code-only

GAN-Style Harness Build

This command orchestrates a three-agent build loop inspired by Anthropic's March 2026 harness design paper.

Phase 0: Setup

Create gan-harness/ directory in project root
Create subdirectories: gan-harness/feedback/, gan-harness/screenshots/
Initialize git if not already initialized
Log start time and configuration

Phase 1: Planning (Planner Agent)

Unless --skip-planner is set:

Launch the gan-planner agent via Task tool with the user's brief
Wait for it to produce gan-harness/spec.md and gan-harness/eval-rubric.md
Display the spec summary to the user
Proceed to Phase 2

Phase 2: Generator-Evaluator Loop

iteration = 1
while iteration <= max_iterations:

    # GENERATE
    Launch gan-generator agent via Task tool:
    - Read spec.md
    - If iteration > 1: read feedback/feedback-{iteration-1}.md
    - Build/improve the application
    - Ensure dev server is running
    - Commit changes

    # Wait for generator to finish

    # EVALUATE
    Launch gan-evaluator agent via Task tool:
    - Read eval-rubric.md and spec.md
    - Test the live application (mode: playwright/screenshot/code-only)
    - Score against rubric
    - Write feedback to feedback/feedback-{iteration}.md

    # Wait for evaluator to finish

    # CHECK SCORE
    Read feedback/feedback-{iteration}.md
    Extract weighted total score

    if score >= pass_threshold:
        Log "PASSED at iteration {iteration} with score {score}"
        Break

    if iteration >= 3 and score has not improved in last 2 iterations:
        Log "PLATEAU detected — stopping early"
        Break

    iteration += 1

Phase 3: Summary

Read all feedback files
Display final scores and iteration history
Show score progression: iteration 1: 4.2 → iteration 2: 5.8 → ... → iteration N: 7.5
List any remaining issues from the final evaluation
Report total time and estimated cost

Output

## GAN Harness Build Report

**Brief:** [original prompt]
**Result:** PASS/FAIL
**Iterations:** N / max
**Final Score:** X.X / 10

### Score Progression
| Iter | Design | Originality | Craft | Functionality | Total |
|------|--------|-------------|-------|---------------|-------|
| 1 | ... | ... | ... | ... | X.X |
| 2 | ... | ... | ... | ... | X.X |
| N | ... | ... | ... | ... | X.X |

### Remaining Issues
- [Any issues from final evaluation]

### Files Created
- gan-harness/spec.md
- gan-harness/eval-rubric.md
- gan-harness/feedback/feedback-001.md through feedback-NNN.md
- gan-harness/generator-state.md
- gan-harness/build-report.md

Write the full report to gan-harness/build-report.md.

3.1 KiB Raw Blame History