From 4209421349cde0adc821c9c18ad71abb76bcf5a6 Mon Sep 17 00:00:00 2001
From: Affaan Mustafa <affaan@dcube.ai>
Date: Thu, 12 Feb 2026 15:37:48 -0800
Subject: [PATCH] docs: add token optimization guide with recommended settings
 (#175)

Adds a comprehensive Token Optimization section to the README with:
- Recommended settings (model, MAX_THINKING_TOKENS, AUTOCOMPACT_PCT)
- Daily workflow commands table (/model, /clear, /compact, /cost)
- Strategic compaction guidance (when to compact vs not)
- Context window management (MCP tool description costs)
- Agent Teams cost warning
---
 README.md | 70 +++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 63 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index 37b10bb4..062e86c7 100644
--- a/README.md
+++ b/README.md
@@ -325,6 +325,7 @@ everything-claude-code/
 |   |-- saas-nextjs-CLAUDE.md   # Real-world SaaS (Next.js + Supabase + Stripe)
 |   |-- go-microservice-CLAUDE.md # Real-world Go microservice (gRPC + PostgreSQL)
 |   |-- django-api-CLAUDE.md      # Real-world Django REST API (DRF + Celery)
+|   |-- rust-api-CLAUDE.md        # Real-world Rust API (Axum + SQLx + PostgreSQL) (NEW)
 |
 |-- mcp-configs/      # MCP server configurations
 |   |-- mcp-servers.json    # GitHub, Supabase, Vercel, Railway, etc.
@@ -883,18 +884,73 @@ These configs are battle-tested across multiple production applications.
 
 ---
 
-## ⚠️ Important Notes
+## Token Optimization
+
+Claude Code usage can be expensive if you don't manage token consumption. These settings significantly reduce costs without sacrificing quality.
+
+### Recommended Settings
+
+Add to `~/.claude/settings.json`:
+
+```json
+{
+  "model": "sonnet",
+  "env": {
+    "MAX_THINKING_TOKENS": "10000",
+    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "50"
+  }
+}
+```
+
+| Setting | Default | Recommended | Impact |
+|---------|---------|-------------|--------|
+| `model` | opus | **sonnet** | ~60% cost reduction; handles 80%+ of coding tasks |
+| `MAX_THINKING_TOKENS` | 31,999 | **10,000** | ~70% reduction in hidden thinking cost per request |
+| `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE` | 95 | **50** | Compacts earlier — better quality in long sessions |
+
+Switch to Opus only when you need deep architectural reasoning:
+```
+/model opus
+```
+
+### Daily Workflow Commands
+
+| Command | When to Use |
+|---------|-------------|
+| `/model sonnet` | Default for most tasks |
+| `/model opus` | Complex architecture, debugging, deep reasoning |
+| `/clear` | Between unrelated tasks (free, instant reset) |
+| `/compact` | At logical task breakpoints (research done, milestone complete) |
+| `/cost` | Monitor token spending during session |
+
+### Strategic Compaction
+
+The `strategic-compact` skill (included in this plugin) suggests `/compact` at logical breakpoints instead of relying on auto-compaction at 95% context. See `skills/strategic-compact/SKILL.md` for the full decision guide.
+
+**When to compact:**
+- After research/exploration, before implementation
+- After completing a milestone, before starting the next
+- After debugging, before continuing feature work
+- After a failed approach, before trying a new one
+
+**When NOT to compact:**
+- Mid-implementation (you'll lose variable names, file paths, partial state)
 
 ### Context Window Management
 
-**Critical:** Don't enable all MCPs at once. Your 200k context window can shrink to 70k with too many tools enabled.
+**Critical:** Don't enable all MCPs at once. Each MCP tool description consumes tokens from your 200k window, potentially reducing it to ~70k.
 
-Rule of thumb:
-- Have 20-30 MCPs configured
-- Keep under 10 enabled per project
-- Under 80 tools active
+- Keep under 10 MCPs enabled per project
+- Keep under 80 tools active
+- Use `disabledMcpServers` in project config to disable unused ones
 
-Use `disabledMcpServers` in project config to disable unused ones.
+### Agent Teams Cost Warning
+
+Agent Teams spawns multiple context windows. Each teammate consumes tokens independently. Only use for tasks where parallelism provides clear value (multi-module work, parallel reviews). For simple sequential tasks, subagents are more token-efficient.
+
+---
+
+## ⚠️ Important Notes
 
 ### Customization