chore: update claude-api skill (#956)

Add shared/model-migration.md and refresh model references, managed agents docs, and skill description across all language guides.
2026-04-19 08:33:36 +08:00 · 2026-04-16 12:12:57 -07:00
parent 0f7c287eaf
commit 2c7ec5e78b
36 changed files with 1049 additions and 211 deletions
--- a/skills/claude-api/SKILL.md
+++ b/skills/claude-api/SKILL.md
@@ -1,10 +1,9 @@
 ---
 name: claude-api
-description: "Build, debug, and optimize Claude API / Anthropic SDK apps. Apps built with this skill should include prompt caching. TRIGGER when: code imports anthropic/@anthropic-ai/sdk; user asks to use the Claude API, Anthropic SDKs, or Managed Agents (/v1/agents, /v1/sessions, /v1/environments). DO NOT TRIGGER when: code imports `openai`/other AI SDK, general programming, or ML/data-science tasks."
+description: "Build, debug, and optimize Claude API / Anthropic SDK apps. Apps built with this skill should include prompt caching. Also handles migrating existing Claude API code between Claude model versions (4.5 → 4.6, 4.6 → 4.7, retired-model replacements). TRIGGER when: code imports `anthropic`/`@anthropic-ai/sdk`; user asks for the Claude API, Anthropic SDK, or Managed Agents; user adds/modifies/tunes a Claude feature (caching, thinking, compaction, tool use, batch, files, citations, memory) or model (Opus/Sonnet/Haiku) in a file; questions about prompt caching / cache hit rate in an Anthropic SDK project. SKIP: file imports `openai`/other-provider SDK, filename like `*-openai.py`/`*-generic.py`, provider-neutral code, general programming/ML."
 license: Complete terms in LICENSE.txt
 ---

-
 # Building LLM-Powered Applications with Claude

 This skill helps you build LLM-powered applications with Claude. Choose the right surface based on your needs, detect the project language, then read the relevant language-specific documentation.
@@ -28,7 +27,7 @@ Never mix the two — don't reach for `requests`/`fetch` in a Python or TypeScri

 Unless the user requests otherwise:

-For the Claude model version, please use Claude Opus 4.6, which you can access via the exact model string `claude-opus-4-6`. Please default to using adaptive thinking (`thinking: {type: "adaptive"}`) for anything remotely complicated. And finally, please default to streaming for any request that may involve long input, long output, or high `max_tokens` — it prevents hitting request timeouts. Use the SDK's `.get_final_message()` / `.finalMessage()` helper to get the complete response if you don't need to handle individual stream events
+For the Claude model version, please use Claude Opus 4.7, which you can access via the exact model string `claude-opus-4-7`. Please default to using adaptive thinking (`thinking: {type: "adaptive"}`) for anything remotely complicated. And finally, please default to streaming for any request that may involve long input, long output, or high `max_tokens` — it prevents hitting request timeouts. Use the SDK's `.get_final_message()` / `.finalMessage()` helper to get the complete response if you don't need to handle individual stream events

 ---

@@ -36,7 +35,6 @@ For the Claude model version, please use Claude Opus 4.6, which you can access v

 If the User Request at the bottom of this prompt is a bare subcommand string (no prose), search every **Subcommands** table in this document — including any in sections appended below — and follow the matching Action column directly. This lets users invoke specific flows via `/claude-api <subcommand>`. If no table in the document matches, treat the request as normal prose.

-<!-- Subcommand tables are defined per-section below; this header block contains only the dispatch rule so that feature-gated sections can add their own tables without leaking strings into ungated builds. -->

 ---

@@ -163,15 +161,16 @@ Everything goes through `POST /v1/messages`. Tools and output constraints are fe

 ---

-## Current Models (cached: 2026-02-17)
+## Current Models (cached: 2026-04-15)

 | Model             | Model ID            | Context        | Input $/1M | Output $/1M |
 | ----------------- | ------------------- | -------------- | ---------- | ----------- |
-| Claude Opus 4.6   | `claude-opus-4-6`   | 200K (1M beta) | $5.00      | $25.00      |
-| Claude Sonnet 4.6 | `claude-sonnet-4-6` | 200K (1M beta) | $3.00      | $15.00      |
+| Claude Opus 4.7   | `claude-opus-4-7`   | 1M             | $5.00      | $25.00      |
+| Claude Opus 4.6   | `claude-opus-4-6`   | 1M             | $5.00      | $25.00      |
+| Claude Sonnet 4.6 | `claude-sonnet-4-6` | 1M             | $3.00      | $15.00      |
 | Claude Haiku 4.5  | `claude-haiku-4-5`  | 200K           | $1.00      | $5.00       |

-**ALWAYS use `claude-opus-4-6` unless the user explicitly names a different model.** This is non-negotiable. Do not use `claude-sonnet-4-6`, `claude-sonnet-4-5`, or any other model unless the user literally says "use sonnet" or "use haiku". Never downgrade for cost — that's the user's decision, not yours.
+**ALWAYS use `claude-opus-4-7` unless the user explicitly names a different model.** This is non-negotiable. Do not use `claude-sonnet-4-6`, `claude-sonnet-4-5`, or any other model unless the user literally says "use sonnet" or "use haiku". Never downgrade for cost — that's the user's decision, not yours.

 **CRITICAL: Use only the exact model ID strings from the table above — they are complete as-is. Do not append date suffixes.** For example, use `claude-sonnet-4-5`, never `claude-sonnet-4-5-20250514` or any other date-suffixed variant you might recall from training data. If the user requests an older model not in the table (e.g., "opus 4.5", "sonnet 3.7"), read `shared/models.md` for the exact ID — do not construct one yourself.

@@ -183,19 +182,23 @@ A note: if any of the model strings above look unfamiliar to you, that's to be e

 ## Thinking & Effort (Quick Reference)

-**Opus 4.6 — Adaptive thinking (recommended):** Use `thinking: {type: "adaptive"}`. Claude dynamically decides when and how much to think. No `budget_tokens` needed — `budget_tokens` is deprecated on Opus 4.6 and Sonnet 4.6 and must not be used. Adaptive thinking also automatically enables interleaved thinking (no beta header needed). **When the user asks for "extended thinking", a "thinking budget", or `budget_tokens`: always use Opus 4.6 with `thinking: {type: "adaptive"}`. The concept of a fixed token budget for thinking is deprecated — adaptive thinking replaces it. Do NOT use `budget_tokens` and do NOT switch to an older model.**
+**Opus 4.7 — Adaptive thinking only:** Use `thinking: {type: "adaptive"}`. `thinking: {type: "enabled", budget_tokens: N}` returns a 400 on Opus 4.7 — adaptive is the only on-mode. `{type: "disabled"}` and omitting `thinking` both work. Sampling parameters (`temperature`, `top_p`, `top_k`) are also removed and will 400. See `shared/model-migration.md` → Migrating to Opus 4.7 for the full breaking-change list.
+**Opus 4.6 — Adaptive thinking (recommended):** Use `thinking: {type: "adaptive"}`. Claude dynamically decides when and how much to think. No `budget_tokens` needed — `budget_tokens` is deprecated on Opus 4.6 and Sonnet 4.6 and should not be used for new code. Adaptive thinking also automatically enables interleaved thinking (no beta header needed). **When the user asks for "extended thinking", a "thinking budget", or `budget_tokens`: always use Opus 4.7 or 4.6 with `thinking: {type: "adaptive"}`. The concept of a fixed token budget for thinking is deprecated — adaptive thinking replaces it. Do NOT use `budget_tokens` for new 4.6/4.7 code and do NOT switch to an older model.** *Gradual-migration carve-out:* `budget_tokens` is still functional on Opus 4.6 and Sonnet 4.6 as a transitional escape hatch — if you're migrating existing code and need a hard token ceiling before you've tuned `effort`, see `shared/model-migration.md` → Transitional escape hatch. Note: this carve-out does **not** apply to Opus 4.7 — `budget_tokens` is fully removed there.
+**Effort parameter (GA, no beta header):** Controls thinking depth and overall token spend via `output_config: {effort: "low"|"medium"|"high"|"max"}` (inside `output_config`, not top-level). Default is `high` (equivalent to omitting it). `max` is Opus-tier only (Opus 4.6 and later — not Sonnet or Haiku). Opus 4.7 adds `"xhigh"` (between `high` and `max`) — the best setting for most coding and agentic use cases on 4.7, and the default in Claude Code; use a minimum of `high` for most intelligence-sensitive work. Works on Opus 4.5, Opus 4.6, Opus 4.7, and Sonnet 4.6. Will error on Sonnet 4.5 / Haiku 4.5. On Opus 4.7, effort matters more than on any prior Opus — re-tune it when migrating. Combine with adaptive thinking for the best cost-quality tradeoffs. Lower effort means fewer and more-consolidated tool calls, less preamble, and terser confirmations — `high` is often the sweet spot balancing quality and token efficiency; use `max` when correctness matters more than cost; use `low` for subagents or simple tasks.

-**Effort parameter (GA, no beta header):** Controls thinking depth and overall token spend via `output_config: {effort: "low"|"medium"|"high"|"max"}` (inside `output_config`, not top-level). Default is `high` (equivalent to omitting it). `max` is Opus 4.6 only. Works on Opus 4.5, Opus 4.6, and Sonnet 4.6. Will error on Sonnet 4.5 / Haiku 4.5. Combine with adaptive thinking for the best cost-quality tradeoffs. Lower effort means fewer and more-consolidated tool calls, less preamble, and terser confirmations — `medium` is often a favorable balance; use `max` when correctness matters more than cost; use `low` for subagents or simple tasks.
+**Opus 4.7 — thinking content omitted by default:** `thinking` blocks still stream but their text is empty unless you opt in with `thinking: {type: "adaptive", display: "summarized"}` (default is `"omitted"`). Silent change — no error. If you stream reasoning to users, the default looks like a long pause before output; set `"summarized"` to restore visible progress.
+
+**Task Budgets (beta, Opus 4.7):** `output_config: {task_budget: {type: "tokens", total: N}}` tells the model how many tokens it has for a full agentic loop — it sees a running countdown and self-moderates (minimum 20,000; beta header `task-budgets-2026-03-13`). Distinct from `max_tokens`, which is an enforced per-response ceiling the model is not aware of. See `shared/model-migration.md` → Task Budgets.

 **Sonnet 4.6:** Supports adaptive thinking (`thinking: {type: "adaptive"}`). `budget_tokens` is deprecated on Sonnet 4.6 — use adaptive thinking instead.

-**Older models (only if explicitly requested):** If the user specifically asks for Sonnet 4.5 or another older model, use `thinking: {type: "enabled", budget_tokens: N}`. `budget_tokens` must be less than `max_tokens` (minimum 1024). Never choose an older model just because the user mentions `budget_tokens` — use Opus 4.6 with adaptive thinking instead.
+**Older models (only if explicitly requested):** If the user specifically asks for Sonnet 4.5 or another older model, use `thinking: {type: "enabled", budget_tokens: N}`. `budget_tokens` must be less than `max_tokens` (minimum 1024). Never choose an older model just because the user mentions `budget_tokens` — use Opus 4.7 with adaptive thinking instead.

 ---

 ## Compaction (Quick Reference)

-**Beta, Opus 4.6 and Sonnet 4.6.** For long-running conversations that may exceed the 200K context window, enable server-side compaction. The API automatically summarizes earlier context when it approaches the trigger threshold (default: 150K tokens). Requires beta header `compact-2026-01-12`.
+**Beta, Opus 4.7, Opus 4.6, and Sonnet 4.6.** For long-running conversations that may exceed the 1M context window, enable server-side compaction. The API automatically summarizes earlier context when it approaches the trigger threshold (default: 150K tokens). Requires beta header `compact-2026-01-12`.

 **Critical:** Append `response.content` (not just the text) back to your messages on every turn. Compaction blocks in the response must be preserved — the API uses them to replace the compacted history on the next request. Extracting only the text string and appending that will silently lose the compaction state.

@@ -253,7 +256,8 @@ After detecting the language, read the relevant files based on what the user nee

 **Long-running conversations (may exceed context window):**
 → Read `{lang}/claude-api/README.md` — see Compaction section
-
+**Migrating to a newer model (Opus 4.7 / Opus 4.6 / Sonnet 4.6) or replacing a retired model:**
+→ Read `shared/model-migration.md`
 **Prompt caching / optimize caching / "why is my cache hit rate low":**
 → Read `shared/prompt-caching.md` + `{lang}/claude-api/README.md` (Prompt Caching section)

@@ -285,7 +289,8 @@ Read the **language-specific Claude API folder** (`{language}/claude-api/`):
 7. **`{language}/claude-api/files-api.md`** — Read when sending the same file across multiple requests without re-uploading.
 8. **`shared/prompt-caching.md`** — Read when adding or optimizing prompt caching. Covers prefix-stability design, breakpoint placement, and anti-patterns that silently invalidate cache.
 9. **`shared/error-codes.md`** — Read when debugging HTTP errors or implementing error handling.
-10. **`shared/live-sources.md`** — WebFetch URLs for fetching the latest official documentation.
+10. **`shared/model-migration.md`** — Read when upgrading to newer models, replacing retired models, or translating `budget_tokens` / prefill patterns to the current API.
+11. **`shared/live-sources.md`** — WebFetch URLs for fetching the latest official documentation.

 > **Note:** For Java, Go, Ruby, C#, PHP, and cURL — these have a single file each covering all basics. Read that file plus `shared/tool-use-concepts.md` and `shared/error-codes.md` as needed.

@@ -306,11 +311,13 @@ Live documentation URLs are in `shared/live-sources.md`.
 ## Common Pitfalls

 - Don't truncate inputs when passing files or content to the API. If the content is too long to fit in the context window, notify the user and discuss options (chunking, summarization, etc.) rather than silently truncating.
- **Opus 4.6 / Sonnet 4.6 thinking:** Use `thinking: {type: "adaptive"}` — do NOT use `budget_tokens` (deprecated on both Opus 4.6 and Sonnet 4.6). For older models, `budget_tokens` must be less than `max_tokens` (minimum 1024). This will throw an error if you get it wrong.
- **Opus 4.6 prefill removed:** Assistant message prefills (last-assistant-turn prefills) return a 400 error on Opus 4.6. Use structured outputs (`output_config.format`) or system prompt instructions to control response format instead.
+- **Opus 4.7 thinking:** Adaptive only. `thinking: {type: "enabled", budget_tokens: N}` returns 400 on Opus 4.7 — `budget_tokens` is fully removed there (along with `temperature`, `top_p`, `top_k`). Use `thinking: {type: "adaptive"}`.
+- **Opus 4.6 / Sonnet 4.6 thinking:** Use `thinking: {type: "adaptive"}` — do NOT use `budget_tokens` for new 4.6 code (deprecated on both Opus 4.6 and Sonnet 4.6; for gradual migration of existing code, see the transitional escape hatch in `shared/model-migration.md` — note this carve-out does not apply to Opus 4.7). For older models, `budget_tokens` must be less than `max_tokens` (minimum 1024). This will throw an error if you get it wrong.
+- **4.6/4.7 family prefill removed:** Assistant message prefills (last-assistant-turn prefills) return a 400 error on Opus 4.6, Opus 4.7, and Sonnet 4.6. Use structured outputs (`output_config.format`) or system prompt instructions to control response format instead.
+- **Confirm migration scope before editing:** When a user asks to migrate code to a newer Claude model without naming a specific file, directory, or file list, **ask which scope to apply first** — the entire working directory, a specific subdirectory, or a specific set of files. Do not start editing until the user confirms. Imperative phrasings like "migrate my codebase", "move my project to X", "upgrade to Sonnet 4.6", or bare "migrate to Opus 4.7" are **still ambiguous** — they tell you what to do but not where, so ask. Proceed without asking only when the prompt names an exact file, a specific directory, or an explicit file list ("migrate `app.py`", "migrate everything under `services/`", "update `a.py` and `b.py`"). See `shared/model-migration.md` Step 0.
 - **`max_tokens` defaults:** Don't lowball `max_tokens` — hitting the cap truncates output mid-thought and requires a retry. For non-streaming requests, default to `~16000` (keeps responses under SDK HTTP timeouts). For streaming requests, default to `~64000` (timeouts aren't a concern, so give the model room). Only go lower when you have a hard reason: classification (`~256`), cost caps, or deliberately short outputs.
- **128K output tokens:** Opus 4.6 supports up to 128K `max_tokens`, but the SDKs require streaming for values that large to avoid HTTP timeouts. Use `.stream()` with `.get_final_message()` / `.finalMessage()`.
- **Tool call JSON parsing (Opus 4.6):** Opus 4.6 may produce different JSON string escaping in tool call `input` fields (e.g., Unicode or forward-slash escaping). Always parse tool inputs with `json.loads()` / `JSON.parse()` — never do raw string matching on the serialized input.
+- **128K output tokens:** Opus 4.6 and Opus 4.7 support up to 128K `max_tokens`, but the SDKs require streaming for values that large to avoid HTTP timeouts. Use `.stream()` with `.get_final_message()` / `.finalMessage()`.
+- **Tool call JSON parsing (4.6/4.7 family):** Opus 4.6, Opus 4.7, and Sonnet 4.6 may produce different JSON string escaping in tool call `input` fields (e.g., Unicode or forward-slash escaping). Always parse tool inputs with `json.loads()` / `JSON.parse()` — never do raw string matching on the serialized input.
 - **Structured outputs (all models):** Use `output_config: {format: {...}}` instead of the deprecated `output_format` parameter on `messages.create()`. This is a general API change, not 4.6-specific.
 - **Don't reimplement SDK functionality:** The SDK provides high-level helpers — use them instead of building from scratch. Specifically: use `stream.finalMessage()` instead of wrapping `.on()` events in `new Promise()`; use typed exception classes (`Anthropic.RateLimitError`, etc.) instead of string-matching error messages; use SDK types (`Anthropic.MessageParam`, `Anthropic.Tool`, `Anthropic.Message`, etc.) instead of redefining equivalent interfaces.
 - **Don't define custom types for SDK data structures:** The SDK exports types for all API objects. Use `Anthropic.MessageParam` for messages, `Anthropic.Tool` for tool definitions, `Anthropic.ToolUseBlock` / `Anthropic.ToolResultBlockParam` for tool results, `Anthropic.Message` for responses. Defining your own `interface ChatMessage { role: string; content: unknown }` duplicates what the SDK already provides and loses type safety.
--- a/skills/claude-api/curl/examples.md
+++ b/skills/claude-api/curl/examples.md
@@ -18,7 +18,7 @@ curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
-    "model": "claude-opus-4-6",
+    "model": "claude-opus-4-7",
    "max_tokens": 16000,
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
@@ -38,7 +38,7 @@ response=$(curl -s https://api.anthropic.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
-  -d '{"model":"claude-opus-4-6","max_tokens":16000,"messages":[{"role":"user","content":"Hello"}]}')
+  -d '{"model":"claude-opus-4-7","max_tokens":16000,"messages":[{"role":"user","content":"Hello"}]}')

 # Print the first text block (-r strips the JSON quotes)
 echo "$response" | jq -r '.content[0].text'
@@ -65,7 +65,7 @@ curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
-    "model": "claude-opus-4-6",
+    "model": "claude-opus-4-7",
    "max_tokens": 64000,
    "stream": true,
    "messages": [{"role": "user", "content": "Write a haiku"}]
@@ -104,7 +104,7 @@ curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
-    "model": "claude-opus-4-6",
+    "model": "claude-opus-4-7",
    "max_tokens": 16000,
    "tools": [{
      "name": "get_weather",
@@ -129,7 +129,7 @@ curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
-    "model": "claude-opus-4-6",
+    "model": "claude-opus-4-7",
    "max_tokens": 16000,
    "tools": [{
      "name": "get_weather",
@@ -167,7 +167,7 @@ curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
-    "model": "claude-opus-4-6",
+    "model": "claude-opus-4-7",
    "max_tokens": 16000,
    "system": [
      {"type": "text", "text": "<large shared prompt...>", "cache_control": {"type": "ephemeral"}}
@@ -182,17 +182,17 @@ For 1-hour TTL: `"cache_control": {"type": "ephemeral", "ttl": "1h"}`. Top-level

 ## Extended Thinking

-> **Opus 4.6 and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is deprecated on both Opus 4.6 and Sonnet 4.6.
+> **Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Opus 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.
 > **Older models:** Use `"type": "enabled"` with `"budget_tokens": N` (must be < `max_tokens`, min 1024).

 ```bash
-# Opus 4.6: adaptive thinking (recommended)
+# Opus 4.7 / 4.6: adaptive thinking (recommended)
 curl https://api.anthropic.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
-    "model": "claude-opus-4-6",
+    "model": "claude-opus-4-7",
    "max_tokens": 16000,
    "thinking": {
      "type": "adaptive"
--- a/skills/claude-api/curl/managed-agents.md
+++ b/skills/claude-api/curl/managed-agents.md
@@ -63,7 +63,7 @@ curl -X POST https://api.anthropic.com/v1/agents \
  "${HEADERS[@]}" \
  -d '{
    "name": "Coding Assistant",
-    "model": "claude-opus-4-6",
+    "model": "claude-opus-4-7",
    "tools": [{ "type": "agent_toolset_20260401" }]
  }'
 # → { "id": "agent_abc123", ... }
@@ -85,7 +85,7 @@ curl -X POST https://api.anthropic.com/v1/agents \
  "${HEADERS[@]}" \
  -d '{
    "name": "Code Reviewer",
-    "model": "claude-opus-4-6",
+    "model": "claude-opus-4-7",
    "system": "You are a senior code reviewer. Be thorough and constructive.",
    "tools": [
      { "type": "agent_toolset_20260401" },
@@ -260,12 +260,16 @@ List files the agent wrote to `/mnt/session/outputs/` during a session, then dow

 ```bash
 # List files associated with a session
-curl "https://api.anthropic.com/v1/files?scope=$SESSION_ID" \
-  "${HEADERS[@]}"
+curl "https://api.anthropic.com/v1/files?scope_id=$SESSION_ID" \
+  -H "x-api-key: $ANTHROPIC_API_KEY" \
+  -H "anthropic-version: 2023-06-01" \
+  -H "anthropic-beta: files-api-2025-04-14,managed-agents-2026-04-01"

 # Download a specific file
 curl "https://api.anthropic.com/v1/files/$FILE_ID/content" \
-  "${HEADERS[@]}" \
+  -H "x-api-key: $ANTHROPIC_API_KEY" \
+  -H "anthropic-version: 2023-06-01" \
+  -H "anthropic-beta: files-api-2025-04-14,managed-agents-2026-04-01" \
  -o downloaded_file.txt
 ```

@@ -288,7 +292,7 @@ curl -X POST https://api.anthropic.com/v1/agents \
  "${HEADERS[@]}" \
  -d '{
    "name": "MCP Agent",
-    "model": "claude-opus-4-6",
+    "model": "claude-opus-4-7",
    "mcp_servers": [
      { "type": "url", "name": "my-tools", "url": "https://my-mcp-server.example.com/sse" }
    ],
@@ -319,7 +323,7 @@ curl -X POST https://api.anthropic.com/v1/agents \
  "${HEADERS[@]}" \
  -d '{
    "name": "Restricted Agent",
-    "model": "claude-opus-4-6",
+    "model": "claude-opus-4-7",
    "tools": [
      {
        "type": "agent_toolset_20260401",
--- a/skills/claude-api/go/managed-agents/README.md
+++ b/skills/claude-api/go/managed-agents/README.md
@@ -63,7 +63,7 @@ fmt.Println(environment.ID) // env_...
 agent, err := client.Beta.Agents.New(ctx, anthropic.BetaAgentNewParams{
    Name: "Coding Assistant",
    Model: anthropic.BetaManagedAgentsModelConfigParams{
-        ID:   "claude-opus-4-6",
+        ID:   "claude-opus-4-7",
        Type: anthropic.BetaManagedAgentsModelConfigParamsTypeModelConfig,
    },
    System: anthropic.String("You are a helpful coding assistant."),
@@ -380,7 +380,7 @@ if err != nil {
 agent, err := client.Beta.Agents.New(ctx, anthropic.BetaAgentNewParams{
    Name: "GitHub Assistant",
    Model: anthropic.BetaManagedAgentsModelConfigParams{
-        ID:   "claude-opus-4-6",
+        ID:   "claude-opus-4-7",
        Type: anthropic.BetaManagedAgentsModelConfigParamsTypeModelConfig,
    },
    MCPServers: []anthropic.BetaManagedAgentsUrlmcpServerParams{{
--- a/skills/claude-api/java/claude-api.md
+++ b/skills/claude-api/java/claude-api.md
@@ -136,7 +136,7 @@ static class GetWeather implements Supplier<String> {

 BetaToolRunner toolRunner = client.beta().messages().toolRunner(
    MessageCreateParams.builder()
-        .model("claude-opus-4-6")
+        .model("claude-opus-4-7")
        .maxTokens(16000L)
        .putAdditionalHeader("anthropic-beta", "structured-outputs-2025-11-13")
        .addTool(GetWeather.class)
@@ -164,7 +164,7 @@ import com.anthropic.models.beta.messages.ToolRunnerCreateParams;
 BetaMemoryToolHandler memoryHandler = new FileSystemMemoryToolHandler(sandboxRoot);

 MessageCreateParams createParams = MessageCreateParams.builder()
-    .model("claude-opus-4-6")
+    .model("claude-opus-4-7")
    .maxTokens(4096L)
    .addTool(BetaMemoryTool20250818.builder().build())
    .addUserMessage("Remember that my favorite color is blue")
--- a/skills/claude-api/java/managed-agents/README.md
+++ b/skills/claude-api/java/managed-agents/README.md
@@ -57,7 +57,7 @@ import com.anthropic.models.beta.sessions.SessionCreateParams;
 // 1. Create the agent (reusable, versioned)
 var agent = client.beta().agents().create(AgentCreateParams.builder()
    .name("Coding Assistant")
-    .model("claude-opus-4-6")
+    .model("claude-opus-4-7")
    .system("You are a helpful coding assistant.")
    .addTool(BetaManagedAgentsAgentToolset20260401Params.builder()
        .type(BetaManagedAgentsAgentToolset20260401Params.Type.AGENT_TOOLSET_20260401)
@@ -295,7 +295,7 @@ import com.anthropic.models.beta.agents.BetaManagedAgentsUrlmcpServerParams;
 // Agent declares MCP server (no auth here — auth goes in a vault)
 var agent = client.beta().agents().create(AgentCreateParams.builder()
    .name("GitHub Assistant")
-    .model("claude-opus-4-6")
+    .model("claude-opus-4-7")
    .addMcpServer(BetaManagedAgentsUrlmcpServerParams.builder()
        .type(BetaManagedAgentsUrlmcpServerParams.Type.URL)
        .name("github")
--- a/skills/claude-api/php/claude-api.md
+++ b/skills/claude-api/php/claude-api.md
@@ -56,7 +56,7 @@ $client = Foundry\Client::withCredentials(

 ```php
 $message = $client->messages->create(
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    maxTokens: 16000,
    messages: [
        ['role' => 'user', 'content' => 'What is the capital of France?'],
@@ -96,7 +96,7 @@ use Anthropic\Messages\RawContentBlockDeltaEvent;
 use Anthropic\Messages\TextDelta;

 $stream = $client->messages->createStream(
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    maxTokens: 64000,
    messages: [
        ['role' => 'user', 'content' => 'Write a haiku'],
@@ -141,7 +141,7 @@ $weatherTool = new BetaRunnableTool(
 $runner = $client->beta->messages->toolRunner(
    maxTokens: 16000,
    messages: [['role' => 'user', 'content' => 'What is the weather in Paris?']],
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    tools: [$weatherTool],
 );

@@ -178,7 +178,7 @@ $tools = [
 $messages = [['role' => 'user', 'content' => 'What is the weather in SF?']];

 $response = $client->messages->create(
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    maxTokens: 16000,
    tools: $tools,
    messages: $messages,
@@ -205,7 +205,7 @@ while ($response->stopReason === 'tool_use') {  // camelCase property
    $messages[] = ['role' => 'user', 'content' => $toolResults];

    $response = $client->messages->create(
-        model: 'claude-opus-4-6',
+        model: 'claude-opus-4-7',
        maxTokens: 16000,
        tools: $tools,
        messages: $messages,
@@ -233,7 +233,7 @@ foreach ($response->content as $block) {
 use Anthropic\Messages\ThinkingBlock;

 $message = $client->messages->create(
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    maxTokens: 16000,
    thinking: ['type' => 'adaptive'],
    messages: [
@@ -265,7 +265,7 @@ foreach ($message->content as $block) {

 ```php
 $message = $client->messages->create(
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    maxTokens: 16000,
    system: [
        ['type' => 'text', 'text' => $longSystemPrompt, 'cacheControl' => ['type' => 'ephemeral']],
@@ -304,7 +304,7 @@ class Person implements StructuredOutputModel
 }

 $message = $client->messages->create(
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    maxTokens: 16000,
    messages: [['role' => 'user', 'content' => 'Generate a profile for Alice, age 30']],
    outputConfig: ['format' => Person::class],
@@ -320,7 +320,7 @@ Types are inferred from PHP type hints. Use `#[Constrained(description: '...')]`

 ```php
 $message = $client->messages->create(
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    maxTokens: 16000,
    messages: [['role' => 'user', 'content' => 'Extract: John (john@co.com), Enterprise plan']],
    outputConfig: [
@@ -359,7 +359,7 @@ foreach ($message->content as $block) {
 use Anthropic\Beta\Messages\BetaRequestMCPServerURLDefinition;

 $response = $client->beta->messages->create(
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    maxTokens: 16000,
    mcpServers: [
        BetaRequestMCPServerURLDefinition::with(
--- a/skills/claude-api/php/managed-agents/README.md
+++ b/skills/claude-api/php/managed-agents/README.md
@@ -48,7 +48,7 @@ use Anthropic\Beta\Agents\BetaManagedAgentsAgentToolset20260401Params;
 // 1. Create the agent (reusable, versioned)
 $agent = $client->beta->agents->create(
    name: 'Coding Assistant',
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    system: 'You are a helpful coding assistant.',
    tools: [
        BetaManagedAgentsAgentToolset20260401Params::with(
@@ -299,7 +299,7 @@ use Anthropic\Beta\Sessions\BetaManagedAgentsAgentParams;
 // Agent declares MCP server (no auth here — auth goes in a vault)
 $agent = $client->beta->agents->create(
    name: 'GitHub Assistant',
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    mcpServers: [
        BetaManagedAgentsUrlmcpServerParams::with(
            type: 'url',
--- a/skills/claude-api/python/claude-api/README.md
+++ b/skills/claude-api/python/claude-api/README.md
@@ -27,7 +27,7 @@ async_client = anthropic.AsyncAnthropic()

 ```python
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
@@ -46,7 +46,7 @@ for block in response.content:

 ```python
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    system="You are a helpful coding assistant. Always provide examples in Python.",
    messages=[{"role": "user", "content": "How do I read a JSON file?"}]
@@ -66,7 +66,7 @@ with open("image.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{
        "role": "user",
@@ -89,7 +89,7 @@ response = client.messages.create(

 ```python
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{
        "role": "user",
@@ -119,7 +119,7 @@ Use top-level `cache_control` to automatically cache the last cacheable block in

 ```python
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    cache_control={"type": "ephemeral"},  # auto-caches the last cacheable block
    system="You are an expert on this large document...",
@@ -133,7 +133,7 @@ For fine-grained control, add `cache_control` to specific content blocks:

 ```python
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    system=[{
        "type": "text",
@@ -145,7 +145,7 @@ response = client.messages.create(

 # With explicit TTL (time-to-live)
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    system=[{
        "type": "text",
@@ -170,13 +170,13 @@ If `cache_read_input_tokens` is zero across repeated identical-prefix requests,

 ## Extended Thinking

-> **Opus 4.6 and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is deprecated on both Opus 4.6 and Sonnet 4.6.
+> **Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Opus 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.
 > **Older models:** Use `thinking: {type: "enabled", budget_tokens: N}` (must be < `max_tokens`, min 1024).

 ```python
-# Opus 4.6: adaptive thinking (recommended)
+# Opus 4.7 / 4.6: adaptive thinking (recommended)
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},  # low | medium | high | max
@@ -258,7 +258,7 @@ class ConversationManager:
 # Usage
 conversation = ConversationManager(
    client=anthropic.Anthropic(),
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    system="You are a helpful assistant."
 )

@@ -275,7 +275,7 @@ response2 = conversation.send("What's my name?")  # Claude remembers "Alice"

 ### Compaction (long conversations)

-> **Beta, Opus 4.6 and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.
+> **Beta, Opus 4.7, Opus 4.6, and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.

 ```python
 import anthropic
@@ -288,7 +288,7 @@ def chat(user_message: str) -> str:

    response = client.beta.messages.create(
        betas=["compact-2026-01-12"],
-        model="claude-opus-4-6",
+        model="claude-opus-4-7",
        max_tokens=16000,
        messages=messages,
        context_management={
@@ -331,7 +331,7 @@ The `stop_reason` field in the response indicates why the model stopped generati
 ```python
 # Automatic caching (simplest — caches the last cacheable block)
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    cache_control={"type": "ephemeral"},
    system=large_document_text,  # e.g., 50KB of context
@@ -347,7 +347,7 @@ response = client.messages.create(
 ```python
 # Default to Opus for most tasks
 response = client.messages.create(
-    model="claude-opus-4-6",  # $5.00/$25.00 per 1M tokens
+    model="claude-opus-4-7",  # $5.00/$25.00 per 1M tokens
    max_tokens=16000,
    messages=[{"role": "user", "content": "Explain quantum computing"}]
 )
@@ -371,7 +371,7 @@ simple_response = client.messages.create(

 ```python
 count_response = client.messages.count_tokens(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    messages=messages,
    system=system
 )
--- a/skills/claude-api/python/claude-api/batches.md
+++ b/skills/claude-api/python/claude-api/batches.md
@@ -26,7 +26,7 @@ message_batch = client.messages.batches.create(
        Request(
            custom_id="request-1",
            params=MessageCreateParamsNonStreaming(
-                model="claude-opus-4-6",
+                model="claude-opus-4-7",
                max_tokens=16000,
                messages=[{"role": "user", "content": "Summarize climate change impacts"}]
            )
@@ -34,7 +34,7 @@ message_batch = client.messages.batches.create(
        Request(
            custom_id="request-2",
            params=MessageCreateParamsNonStreaming(
-                model="claude-opus-4-6",
+                model="claude-opus-4-7",
                max_tokens=16000,
                messages=[{"role": "user", "content": "Explain quantum computing basics"}]
            )
@@ -117,7 +117,7 @@ message_batch = client.messages.batches.create(
        Request(
            custom_id=f"analysis-{i}",
            params=MessageCreateParamsNonStreaming(
-                model="claude-opus-4-6",
+                model="claude-opus-4-7",
                max_tokens=16000,
                system=shared_system,
                messages=[{"role": "user", "content": question}]
--- a/skills/claude-api/python/claude-api/files-api.md
+++ b/skills/claude-api/python/claude-api/files-api.md
@@ -36,7 +36,7 @@ print(f"Size: {uploaded.size_bytes} bytes")

 ```python
 response = client.beta.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{
        "role": "user",
@@ -65,7 +65,7 @@ image_file = client.beta.files.upload(
 )

 response = client.beta.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{
        "role": "user",
@@ -142,7 +142,7 @@ questions = [

 for question in questions:
    response = client.beta.messages.create(
-        model="claude-opus-4-6",
+        model="claude-opus-4-7",
        max_tokens=16000,
        messages=[{
            "role": "user",
--- a/skills/claude-api/python/claude-api/streaming.md
+++ b/skills/claude-api/python/claude-api/streaming.md
@@ -4,7 +4,7 @@

 ```python
 with client.messages.stream(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=64000,
    messages=[{"role": "user", "content": "Write a story"}]
 ) as stream:
@@ -16,7 +16,7 @@ with client.messages.stream(

 ```python
 async with async_client.messages.stream(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=64000,
    messages=[{"role": "user", "content": "Write a story"}]
 ) as stream:
@@ -30,11 +30,11 @@ async with async_client.messages.stream(

 Claude may return text, thinking blocks, or tool use. Handle each appropriately:

-> **Opus 4.6:** Use `thinking: {type: "adaptive"}`. On older models, use `thinking: {type: "enabled", budget_tokens: N}` instead.
+> **Opus 4.7 / Opus 4.6:** Use `thinking: {type: "adaptive"}`. On older models, use `thinking: {type: "enabled", budget_tokens: N}` instead.

 ```python
 with client.messages.stream(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=64000,
    thinking={"type": "adaptive"},
    messages=[{"role": "user", "content": "Analyze this problem"}]
@@ -61,7 +61,7 @@ The Python tool runner currently returns complete messages. Use streaming for in

 ```python
 with client.messages.stream(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=64000,
    tools=tools,
    messages=messages
@@ -79,7 +79,7 @@ with client.messages.stream(

 ```python
 with client.messages.stream(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=64000,
    messages=[{"role": "user", "content": "Hello"}]
 ) as stream:
@@ -126,7 +126,7 @@ def stream_with_progress(client, **kwargs):
 ```python
 try:
    with client.messages.stream(
-        model="claude-opus-4-6",
+        model="claude-opus-4-7",
        max_tokens=64000,
        messages=[{"role": "user", "content": "Write a story"}]
    ) as stream:
--- a/skills/claude-api/python/claude-api/tool-use.md
+++ b/skills/claude-api/python/claude-api/tool-use.md
@@ -27,7 +27,7 @@ def get_weather(location: str, unit: str = "celsius") -> str:

 # The tool runner handles the agentic loop automatically
 runner = client.beta.messages.tool_runner(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    tools=[get_weather],
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
@@ -72,7 +72,7 @@ async with stdio_client(StdioServerParameters(command="mcp-server")) as (read, w
        tools_result = await mcp_client.list_tools()
        # tool_runner is sync — returns the runner, not a coroutine
        runner = client.beta.messages.tool_runner(
-            model="claude-opus-4-6",
+            model="claude-opus-4-7",
            max_tokens=16000,
            messages=[{"role": "user", "content": "Use the available tools"}],
            tools=[async_mcp_tool(t, mcp_client) for t in tools_result.tools],
@@ -90,7 +90,7 @@ from anthropic.lib.tools.mcp import mcp_message

 prompt = await mcp_client.get_prompt(name="my-prompt")
 response = await client.beta.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[mcp_message(m) for m in prompt.messages],
 )
@@ -103,7 +103,7 @@ from anthropic.lib.tools.mcp import mcp_resource_to_content

 resource = await mcp_client.read_resource(uri="file:///path/to/doc.txt")
 response = await client.beta.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{
        "role": "user",
@@ -142,7 +142,7 @@ messages = [{"role": "user", "content": user_input}]
 # Agentic loop: keep going until Claude stops calling tools
 while True:
    response = client.messages.create(
-        model="claude-opus-4-6",
+        model="claude-opus-4-7",
        max_tokens=16000,
        tools=tools,
        messages=messages
@@ -189,7 +189,7 @@ final_text = next(b.text for b in response.content if b.type == "text")

 ```python
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    tools=tools,
    messages=[{"role": "user", "content": "What's the weather in Paris?"}]
@@ -204,7 +204,7 @@ for block in response.content:
        result = execute_tool(tool_name, tool_input)

        followup = client.messages.create(
-            model="claude-opus-4-6",
+            model="claude-opus-4-7",
            max_tokens=16000,
            tools=tools,
            messages=[
@@ -241,7 +241,7 @@ for block in response.content:
 # Send all results back at once
 if tool_results:
    followup = client.messages.create(
-        model="claude-opus-4-6",
+        model="claude-opus-4-7",
        max_tokens=16000,
        tools=tools,
        messages=[
@@ -271,7 +271,7 @@ tool_result = {

 ```python
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    tools=tools,
    tool_choice={"type": "tool", "name": "get_weather"},  # Force specific tool
@@ -291,7 +291,7 @@ import anthropic
 client = anthropic.Anthropic()

 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{
        "role": "user",
@@ -319,7 +319,7 @@ uploaded = client.beta.files.upload(file=open("sales_data.csv", "rb"))
 # 2. Pass to code execution via container_upload block
 # Code execution is GA; Files API is still beta (pass via extra_headers)
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    extra_headers={"anthropic-beta": "files-api-2025-04-14"},
    messages=[{
@@ -364,7 +364,7 @@ for block in response.content:
 ```python
 # First request: set up environment
 response1 = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{"role": "user", "content": "Install tabulate and create data.json with sample data"}],
    tools=[{"type": "code_execution_20260120", "name": "code_execution"}]
@@ -376,7 +376,7 @@ container_id = response1.container.id
 # Second request: reuse the same container
 response2 = client.messages.create(
    container=container_id,
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{"role": "user", "content": "Read data.json and display as a formatted table"}],
    tools=[{"type": "code_execution_20260120", "name": "code_execution"}]
@@ -416,7 +416,7 @@ import anthropic
 client = anthropic.Anthropic()

 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{"role": "user", "content": "Remember that my preferred language is Python."}],
    tools=[{"type": "memory_20250818", "name": "memory"}],
@@ -442,7 +442,7 @@ memory = MyMemoryTool()

 # Use with tool runner
 runner = client.beta.messages.tool_runner(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    tools=[memory],
    messages=[{"role": "user", "content": "Remember my preferences"}],
@@ -477,7 +477,7 @@ class ContactInfo(BaseModel):
 client = anthropic.Anthropic()

 response = client.messages.parse(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{
        "role": "user",
@@ -496,7 +496,7 @@ print(contact.interests)      # ["API", "SDKs"]

 ```python
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{
        "role": "user",
@@ -530,7 +530,7 @@ data = json.loads(text)

 ```python
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{"role": "user", "content": "Book a flight to Tokyo for 2 passengers on March 15"}],
    tools=[{
@@ -555,7 +555,7 @@ response = client.messages.create(

 ```python
 response = client.messages.create(
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    max_tokens=16000,
    messages=[{"role": "user", "content": "Plan a trip to Paris next month"}],
    output_config={
--- a/skills/claude-api/python/managed-agents/README.md
+++ b/skills/claude-api/python/managed-agents/README.md
@@ -49,7 +49,7 @@ print(environment.id)  # env_...
 # 1. Create the agent (reusable, versioned)
 agent = client.beta.agents.create(
    name="Coding Assistant",
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    tools=[{"type": "agent_toolset_20260401", "default_config": {"enabled": True}}],
 )

@@ -68,7 +68,7 @@ import os

 agent = client.beta.agents.create(
    name="Code Reviewer",
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    system="You are a senior code reviewer.",
    tools=[
        {"type": "agent_toolset_20260401"},
@@ -271,7 +271,10 @@ List files the agent wrote to `/mnt/session/outputs/` during a session, then dow

 ```python
 # List files associated with a session
-files = client.beta.files.list(session_id=session.id)
+files = client.beta.files.list(
+    scope_id=session.id,
+    betas=["managed-agents-2026-04-01"],
+)
 for f in files.data:
    print(f.filename, f.size_bytes)
    # Download each file and save to disk
@@ -279,7 +282,7 @@ for f in files.data:
    file_content.write_to_file(f.filename)
 ```

-> 💡 There's a brief indexing lag (~1–3s) between `session.status_idle` and output files appearing in `files.list` (with `scope=session_id` as a query param). Retry once or twice if the list is empty.
+> 💡 There's a brief indexing lag (~1–3s) between `session.status_idle` and output files appearing in `files.list`. Retry once or twice if the list is empty.

 ---

@@ -287,17 +290,17 @@ for f in files.data:

 ```python
 # Get session details
-session = client.beta.sessions.retrieve(session_id="sess_abc123")
+session = client.beta.sessions.retrieve(session_id="sesn_011CZxAbc123Def456")
 print(session.status, session.usage)

 # List sessions
 sessions = client.beta.sessions.list()

 # Delete a session
-client.beta.sessions.delete(session_id="sess_abc123")
+client.beta.sessions.delete(session_id="sesn_011CZxAbc123Def456")

 # Archive a session
-client.beta.sessions.archive(session_id="sess_abc123")
+client.beta.sessions.archive(session_id="sesn_011CZxAbc123Def456")
 ```

 ---
@@ -308,7 +311,7 @@ client.beta.sessions.archive(session_id="sess_abc123")
 # Agent declares MCP server (no auth here — auth goes in a vault)
 agent = client.beta.agents.create(
    name="MCP Agent",
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    mcp_servers=[
        {"type": "url", "name": "my-tools", "url": "https://my-mcp-server.example.com/sse"},
    ],
--- a/skills/claude-api/ruby/claude-api.md
+++ b/skills/claude-api/ruby/claude-api.md
@@ -26,7 +26,7 @@ client = Anthropic::Client.new(api_key: "your-api-key")

 ```ruby
 message = client.messages.create(
-  model: :"claude-opus-4-6",
+  model: :"claude-opus-4-7",
  max_tokens: 16000,
  messages: [
    { role: "user", content: "What is the capital of France?" }
@@ -46,7 +46,7 @@ end

 ```ruby
 stream = client.messages.stream(
-  model: :"claude-opus-4-6",
+  model: :"claude-opus-4-7",
  max_tokens: 64000,
  messages: [{ role: "user", content: "Write a haiku" }]
 )
@@ -78,7 +78,7 @@ class GetWeather < Anthropic::BaseTool
 end

 client.beta.messages.tool_runner(
-  model: :"claude-opus-4-6",
+  model: :"claude-opus-4-7",
  max_tokens: 16000,
  tools: [GetWeather.new],
  messages: [{ role: "user", content: "What's the weather in San Francisco?" }]
@@ -99,7 +99,7 @@ See the [shared tool use concepts](../shared/tool-use-concepts.md) for the tool

 ```ruby
 message = client.messages.create(
-  model: :"claude-opus-4-6",
+  model: :"claude-opus-4-7",
  max_tokens: 16000,
  system_: [
    { type: "text", text: long_system_prompt, cache_control: { type: "ephemeral" } }
--- a/skills/claude-api/ruby/managed-agents/README.md
+++ b/skills/claude-api/ruby/managed-agents/README.md
@@ -51,7 +51,7 @@ puts "Environment ID: #{environment.id}" # env_...
 # 1. Create the agent (reusable, versioned)
 agent = client.beta.agents.create(
  name: "Coding Assistant",
-  model: :"claude-opus-4-6",
+  model: :"claude-opus-4-7",
  system_: "You are a helpful coding assistant.",
  tools: [{type: "agent_toolset_20260401"}]
 )
@@ -260,7 +260,7 @@ client.beta.sessions.delete(session.id)
 # Agent declares MCP server (no auth here — auth goes in a vault)
 agent = client.beta.agents.create(
  name: "GitHub Assistant",
-  model: :"claude-opus-4-6",
+  model: :"claude-opus-4-7",
  mcp_servers: [
    {
      type: "url",
--- a/skills/claude-api/shared/error-codes.md
+++ b/skills/claude-api/shared/error-codes.md
@@ -80,7 +80,7 @@ This file documents HTTP error codes returned by the Claude API, their common ca
 - Using deprecated model ID
 - Invalid API endpoint

-**Fix:** Use exact model IDs from the models documentation. You can use aliases (e.g., `claude-opus-4-6`).
+**Fix:** Use exact model IDs from the models documentation. You can use aliases (e.g., `claude-opus-4-7`).

 ---

@@ -105,7 +105,12 @@ Some 400 errors are specifically related to parameter validation:
 - `budget_tokens` >= `max_tokens` in extended thinking
 - Invalid tool definition schema

-**Common mistake with extended thinking:**
+**Model-specific 400s on Opus 4.7:**
+
+- `temperature`, `top_p`, `top_k` are removed — sending any of them returns 400. Delete the parameter; see `shared/model-migration.md` → Per-SDK Syntax Reference.
+- `thinking: {type: "enabled", budget_tokens: N}` is removed — sending it returns 400. Use `thinking: {type: "adaptive"}` instead.
+
+**Common mistake with extended thinking on older models (Opus 4.6 and earlier):**

 ```
 # Wrong: budget_tokens must be < max_tokens
@@ -161,8 +166,10 @@ thinking: budget_tokens=10000, max_tokens=16000

 | Mistake                         | Error            | Fix                                                     |
 | ------------------------------- | ---------------- | ------------------------------------------------------- |
-| `budget_tokens` >= `max_tokens` | 400              | Ensure `budget_tokens` < `max_tokens`                   |
-| Typo in model ID                | 404              | Use valid model ID like `claude-opus-4-6`               |
+| `temperature`/`top_p`/`top_k` on Opus 4.7 | 400    | Remove the parameter (see `shared/model-migration.md`)  |
+| `budget_tokens` on Opus 4.7     | 400              | Use `thinking: {type: "adaptive"}`                      |
+| `budget_tokens` >= `max_tokens` (older models) | 400 | Ensure `budget_tokens` < `max_tokens`                  |
+| Typo in model ID                | 404              | Use valid model ID like `claude-opus-4-7`               |
 | First message is `assistant`    | 400              | First message must be `user`                            |
 | Consecutive same-role messages  | 400              | Alternate `user` and `assistant`                        |
 | API key in code                 | 401 (leaked key) | Use environment variable                                |
--- a/skills/claude-api/shared/live-sources.md
+++ b/skills/claude-api/shared/live-sources.md
@@ -13,17 +13,18 @@ This file contains WebFetch URLs for fetching current information from platform.

 ### Models & Pricing

-| Topic           | URL                                                                   | Extraction Prompt                                                               |
-| --------------- | --------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
-| Models Overview | `https://platform.claude.com/docs/en/about-claude/models/overview.md` | "Extract current model IDs, context windows, and pricing for all Claude models" |
-| Pricing         | `https://platform.claude.com/docs/en/pricing.md`                      | "Extract current pricing per million tokens for input and output"               |
+| Topic           | URL                                                                          | Extraction Prompt                                                               |
+| --------------- | ---------------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
+| Models Overview | `https://platform.claude.com/docs/en/about-claude/models/overview.md`        | "Extract current model IDs, context windows, and pricing for all Claude models" |
+| Migration Guide | `https://platform.claude.com/docs/en/about-claude/models/migration-guide.md` | "Extract breaking changes, deprecated parameters, and per-model migration steps when moving to a newer Claude model" |
+| Pricing         | `https://platform.claude.com/docs/en/pricing.md`                             | "Extract current pricing per million tokens for input and output"               |

 ### Core Features

 | Topic             | URL                                                                          | Extraction Prompt                                                                      |
 | ----------------- | ---------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- |
 | Extended Thinking | `https://platform.claude.com/docs/en/build-with-claude/extended-thinking.md` | "Extract extended thinking parameters, budget_tokens requirements, and usage examples" |
-| Adaptive Thinking | `https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking.md` | "Extract adaptive thinking setup, effort levels, and Claude Opus 4.6 usage examples"         |
+| Adaptive Thinking | `https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking.md` | "Extract adaptive thinking setup, effort levels, and Claude Opus 4.7 usage examples"         |
 | Effort Parameter  | `https://platform.claude.com/docs/en/build-with-claude/effort.md`            | "Extract effort levels, cost-quality tradeoffs, and interaction with thinking"        |
 | Tool Use          | `https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview.md`  | "Extract tool definition schema, tool_choice options, and handling tool results"       |
 | Streaming         | `https://platform.claude.com/docs/en/build-with-claude/streaming.md`         | "Extract streaming event types, SDK examples, and best practices"                      |
--- a/skills/claude-api/shared/managed-agents-api-reference.md
+++ b/skills/claude-api/shared/managed-agents-api-reference.md
@@ -28,13 +28,13 @@ All resources are under the `beta` namespace. Python and TypeScript share identi
 | Credentials | `vaults.credentials.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `Vaults.Credentials.New` / `Get` / `Update` / `List` / `Delete` / `Archive` |

 **Naming quirks to watch for:**
- Agents have **no delete** — only `archive`. Other resources have both.
+- Agents have **no delete** — only `archive`. Archive is **permanent**: the agent becomes read-only, new sessions cannot reference it, and there is no unarchive. Confirm with the user before archiving a production agent. Environments, Sessions, Vaults, and Credentials have both `delete` and `archive`; Session Resources, Files, and Skills are `delete`-only.
 - Session resources use `add` (not `create`).
 - Go's event stream is `StreamEvents` (not `Stream`).

 **Agent shorthand:** `agent` on session create accepts either a bare string (`agent="agent_abc123"` — uses latest version) or the full reference object (`{type: "agent", id: "agent_abc123", version: 123}`).

-**Model shorthand:** `model` on agent create accepts either a bare string (`model="claude-opus-4-6"` — uses `standard` speed) or the full config object (`{type: "model_config", id: "claude-opus-4-6", speed: "fast"}`).
+**Model shorthand:** `model` on agent create accepts either a bare string (`model="claude-opus-4-7"` — uses `standard` speed) or the full config object (`{type: "model_config", id: "claude-opus-4-6", speed: "fast"}`). Note: `speed: "fast"` is only supported on Opus 4.6.

 ---

@@ -48,7 +48,7 @@ All resources are under the `beta` namespace. Python and TypeScript share identi
 | `POST` | `/v1/agents` | CreateAgent | Create a saved agent configuration |
 | `GET` | `/v1/agents/{agent_id}` | GetAgent | Get agent details |
 | `POST` | `/v1/agents/{agent_id}` | UpdateAgent | Update agent configuration |
-| `POST` | `/v1/agents/{agent_id}/archive` | ArchiveAgent | Archive an agent (no hard delete for agents) |
+| `POST` | `/v1/agents/{agent_id}/archive` | ArchiveAgent | Archive an agent. Makes it **read-only**; existing sessions continue, new sessions cannot reference it. No unarchive — this is the terminal state. |
 | `GET` | `/v1/agents/{agent_id}/versions` | ListAgentVersions | List agent versions |

 ## Sessions
@@ -89,7 +89,7 @@ All resources are under the `beta` namespace. Python and TypeScript share identi
 | `GET`    | `/v1/environments/{environment_id}`                    | GetEnvironment       | Get environment details             |
 | `POST`   | `/v1/environments/{environment_id}`                    | UpdateEnvironment    | Update environment                  |
 | `DELETE` | `/v1/environments/{environment_id}`                    | DeleteEnvironment    | Delete environment. Returns 204. |
-| `POST`   | `/v1/environments/{environment_id}/archive`            | ArchiveEnvironment   | Archive environment (read-only; existing sessions continue) |
+| `POST`   | `/v1/environments/{environment_id}/archive`            | ArchiveEnvironment   | Archive environment. Makes it **read-only**; existing sessions continue, new sessions cannot reference it. No unarchive — this is the terminal state. |

 ## Vaults

@@ -151,7 +151,7 @@ Credentials are individual secrets stored inside a vault.
 ```json
 {
  "name": "string (required, 1-256 chars)",
-  "model": "claude-opus-4-6 (required — bare string, or {id, speed} object)",
+  "model": "claude-opus-4-7 (required — bare string, or {id, speed} object)",
  "description": "string (optional, up to 2048 chars)",
  "system": "string (optional, up to 100,000 chars)",
  "tools": [
--- a/skills/claude-api/shared/managed-agents-client-patterns.md
+++ b/skills/claude-api/shared/managed-agents-client-patterns.md
@@ -181,11 +181,11 @@ Delete the original via `files.delete(uploaded.id)`; the session-scoped copy is

 ---

-## 9. Keep credentials host-side via custom tools
+## 9. Secrets for non-MCP APIs and CLIs — keep them host-side via custom tools

-**Problem:** putting a third-party API key in the agent's vault or environment means the sandbox holds the secret. For keys tied to a human (Linear personal keys, `gh` CLI auth) or keys you'd rather not ship into a container, that's undesirable.
+**Problem:** you want the agent to call a third-party API or run a CLI that needs a secret (API key, token, service-account credential), but there is currently no way to set environment variables inside the session container, and vaults currently hold MCP credentials only — they are not exposed to the container's shell. So `curl`, installed CLIs, or SDK clients running via the `bash` tool have no first-class place to read a secret from.

-**Solution:** expose the operation as a custom tool. The agent emits `agent.custom_tool_use`; your orchestrator executes the call with its own credentials and responds with `user.custom_tool_result`. The container never sees the key.
+**Solution:** move the authenticated call to your side. Declare a custom tool on the agent; when the agent emits `agent.custom_tool_use`, your orchestrator (the process reading the SSE stream) executes the call with its own credentials and responds with `user.custom_tool_result`. The container never sees the key.

 ```ts
 // Agent template: declare the tool, no credentials
@@ -202,4 +202,8 @@ for await (const event of stream) {
 }
 ```

-Same shape works for `gh` CLI, local eval scripts, or anything else that needs host-only auth or binaries.
+Same shape works for `gh` CLI, local eval scripts, or anything else that needs host-side auth or binaries.
+
+**Security note:** this does not expose a public endpoint. `agent.custom_tool_use` arrives on the SSE stream your orchestrator already holds open with your Anthropic API key, and `user.custom_tool_result` goes back via `events.send()` under the same key. Your orchestrator is a client, not a server — nothing unauthenticated is listening.
+
+**Do not embed API keys in the system prompt or user messages as a workaround.** Prompts and messages are stored in the session's event history, returned by `events.list()`, and included in compaction summaries — a secret placed there is durably persisted and readable via the API for the life of the session.
--- a/skills/claude-api/shared/managed-agents-core.md
+++ b/skills/claude-api/shared/managed-agents-core.md
@@ -96,7 +96,7 @@ Key fields returned by the API:
 const agent = await client.beta.agents.create(
  {
    name: "Coding Assistant",
-    model: "claude-opus-4-6",
+    model: "claude-opus-4-7",
    system: "You are a helpful coding agent.",
    tools: [{ type: "agent_toolset_20260401"}],
  },
@@ -196,6 +196,8 @@ Each `POST /v1/agents/{id}` (update) creates a new immutable version (numeric ti
 | Update           | `POST`   | `/v1/agents/{id}`                     |
 | Archive          | `POST`   | `/v1/agents/{id}/archive`             |

+> ⚠️ **Archive is permanent.** Archiving makes the agent read-only: existing sessions continue to run, but **new sessions cannot reference it**, and there is no unarchive. Since agents have no `delete`, this is the terminal lifecycle state. Never archive a production agent as routine cleanup — confirm with the user first.
+
 ### Using an Agent in a Session

 Reference the agent by string ID (latest version) or by object with an explicit version:
--- a/skills/claude-api/shared/managed-agents-environments.md
+++ b/skills/claude-api/shared/managed-agents-environments.md
@@ -47,7 +47,7 @@ const env = await client.beta.environments.create({
 | Get              | `GET`    | `/v1/environments/{id}`                    | |
 | Update           | `POST`   | `/v1/environments/{id}`                    | Changes apply only to **new** containers; existing sessions keep their original config |
 | Delete           | `DELETE` | `/v1/environments/{id}`                    | Returns 204. |
-| Archive          | `POST`   | `/v1/environments/{id}/archive`            | Read-only. New sessions can't be created; existing ones continue. |
+| Archive          | `POST`   | `/v1/environments/{id}/archive`            | Makes it **read-only**; existing sessions continue, new sessions cannot reference it. No unarchive — terminal state. |

 ---

@@ -84,7 +84,10 @@ The agent can write files to `/mnt/session/outputs/` during a session. These are

 ```ts
 // After the turn completes, list output files scoped to this session:
-for await (const f of client.beta.files.list({ scope: session.id })) {
+for await (const f of client.beta.files.list({
+  scope_id: session.id,
+  betas: ["managed-agents-2026-04-01"],
+})) {
  console.log(f.filename, f.size_bytes);
  const resp = await client.beta.files.download(f.id);
  const text = await resp.text();
@@ -94,14 +97,19 @@ for await (const f of client.beta.files.list({ scope: session.id })) {
 **Requirements:**
 - The `write` tool (or `bash`) must be enabled for the agent to create output files.
 - Session-scoped `files.list` / `files.download` captures outputs written to `/mnt/session/outputs/`.
- `session_id` is a query filter on `files.list` (not yet in SDK types — cast or spread through).
+- The filter parameter is **`scope_id`** (REST query param `?scope_id=<session_id>`). The SDK's files resource auto-adds only the `files-api-2025-04-14` header, so pass `betas: ["managed-agents-2026-04-01"]` explicitly (or both headers on raw HTTP) — without it the API may reject `scope_id` as an unknown field. Requires `@anthropic-ai/sdk` ≥ 0.88.0 / `anthropic` (Python) ≥ 0.92.0 — older versions don't type `scope_id`. The `ant` CLI does **not** expose this flag yet; use the SDK or curl.
+- Pass the session ID returned by `sessions.create()` verbatim (e.g. `sesn_011CZx...`) — the API validates the prefix.
 - There's a brief indexing lag (~1–3s) between `session.status_idle` and output files appearing in `files.list`. Retry once or twice if empty.

+> **Fallback when `scope_id` filtering is unavailable** (older SDK, or endpoint returns an error): send a follow-up `user.message` asking the agent to `read` each file under `/mnt/session/outputs/` and return the contents. The agent streams the file bodies back as `agent.message` text. This works for text files only and costs output tokens — use it to unblock, not as the primary path.
+
 This gives you a bidirectional file bridge: upload reference data in, download agent artifacts out.

 ### GitHub Repositories

-Clones a GitHub repository into the session container during initialization, before the agent begins execution. The agent can read, edit, commit, and push via `bash` (`git`). Multiple repositories per session are supported — add one `resources` entry per repo.
+Clones a GitHub repository into the session container during initialization, before the agent begins execution. The agent can read, edit, commit, and push via `bash` (`git`). Multiple repositories per session are supported — add one `resources` entry per repo. Repositories are cached, so future sessions that use the same repository start faster.
+
+Repositories are attached for the lifetime of the session — to change which repositories are mounted, create a new session. You **can** rotate a repository's `authorization_token` on a running session via `client.beta.sessions.resources.update(resource_id, {session_id, authorization_token})`; the resource `id` is returned at session creation and by `resources.list()`.

 **Fields:**

@@ -117,7 +125,9 @@ Clones a GitHub repository into the session container during initialization, bef
 - `Contents: Read` — clone only
 - `Contents: Read and write` — push changes and create pull requests

-> ‼️ **To generate pull requests** you also need GitHub **MCP server** access — the `github_repository` resource gives filesystem access only. See `shared/managed-agents-tools.md` → MCP Servers. The PR workflow is: edit files in the mounted repo → push branch via `bash` → create PR via MCP `create_pull_request` tool.
+**How auth works:** `authorization_token` is never placed inside the container. `git pull` / `git push` and GitHub REST calls against the attached repository are routed through an Anthropic-side git proxy that injects the token after the request leaves the sandbox. Code running in the container — including anything the agent writes — cannot read or exfiltrate it.
+
+> ‼️ **To generate pull requests** you also need GitHub **MCP server** access — the `github_repository` resource gives filesystem + git access only. See `shared/managed-agents-tools.md` → MCP Servers. The PR workflow is: edit files in the mounted repo → push branch via `bash` (authenticated via the git proxy using `authorization_token`) → create PR via the MCP `create_pull_request` tool (authenticated via the vault).

 **TypeScript:**

@@ -126,7 +136,7 @@ Clones a GitHub repository into the session container during initialization, bef
 const agent = await client.beta.agents.create(
  {
    name: 'GitHub Agent',
-    model: 'claude-opus-4-6',
+    model: 'claude-opus-4-7',
    mcp_servers: [
      { type: 'url', name: 'github', url: 'https://api.githubcopilot.com/mcp/' },
    ],
@@ -160,7 +170,7 @@ import os

 agent = client.beta.agents.create(
    name="GitHub Agent",
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    mcp_servers=[{
        "type": "url",
        "name": "github",
@@ -194,9 +204,9 @@ Upload and manage files for use as session resources, and download files the age
 | Operation        | Method   | Path                                  | SDK |
 | ---------------- | -------- | ------------------------------------- | --- |
 | Upload           | `POST`   | `/v1/files`                           | `client.beta.files.upload({ file })` |
-| List             | `GET`    | `/v1/files?session_id=...`            | `client.beta.files.list({ session_id })` |
+| List             | `GET`    | `/v1/files?scope_id=...`              | `client.beta.files.list({ scope_id, betas: ["managed-agents-2026-04-01"] })` |
 | Get Metadata     | `GET`    | `/v1/files/{id}`                      | `client.beta.files.retrieveMetadata(id)` |
 | Download         | `GET`    | `/v1/files/{id}/content`              | `client.beta.files.download(id)` → `Response` |
 | Delete           | `DELETE` | `/v1/files/{id}`                      | `client.beta.files.delete(id)` |

-The `session_id` filter on List scopes the results to files written to `/mnt/session/outputs/` by that session. Without the filter, you get all files uploaded to your account.
+The `scope_id` filter on List scopes the results to files written to `/mnt/session/outputs/` by that session. Without the filter, you get all files uploaded to your account.
--- a/skills/claude-api/shared/managed-agents-events.md
+++ b/skills/claude-api/shared/managed-agents-events.md
@@ -184,4 +184,6 @@ When done with a session, archive it to free resources:
 await client.beta.sessions.archive(sessionId);
 ```

+> Archiving a **session** is routine cleanup — sessions are per-run and disposable. **Do not generalize this to agents or environments**: those are persistent, reusable resources, and archiving them is permanent (no unarchive; new sessions cannot reference them). See `shared/managed-agents-overview.md` → Common Pitfalls.
+

--- a/skills/claude-api/shared/managed-agents-onboarding.md
+++ b/skills/claude-api/shared/managed-agents-onboarding.md
@@ -74,7 +74,7 @@ Emit as `resources: [{type: "file", file_id, mount_path}]`. Max 999 file resourc
 - [ ] Networking: unrestricted internet from the container, or lock egress to specific hosts? (If locked, MCP server domains must be in `allowed_hosts` or tools silently fail.)
 - [ ] Name?
 - [ ] Job (one or two sentences — becomes the system prompt)?
- [ ] Model? (default `claude-opus-4-6`)
+- [ ] Model? (default `claude-opus-4-7`)

 ---

@@ -90,7 +90,7 @@ Credentials are write-only, matched to MCP servers by URL, auto-refreshed. See `
 **Kickoff:**
 - [ ] First message to the agent?

-Session creation blocks until all resources mount. Open the event stream before sending the kickoff. Stream is SSE; break on `session.status_terminated`, or on `session.status_idle` with a terminal `stop_reason` — i.e. anything except `requires_action`, which fires transiently while the session waits on a tool confirmation or custom-tool result (see `shared/managed-agents-client-patterns.md` Pattern 5). Usage lands on `span.model_request_end`. Agent-written artifacts end up in `/mnt/session/outputs/` — download via `files.list({scope: session_id})`.
+Session creation blocks until all resources mount. Open the event stream before sending the kickoff. Stream is SSE; break on `session.status_terminated`, or on `session.status_idle` with a terminal `stop_reason` — i.e. anything except `requires_action`, which fires transiently while the session waits on a tool confirmation or custom-tool result (see `shared/managed-agents-client-patterns.md` Pattern 5). Usage lands on `span.model_request_end`. Agent-written artifacts end up in `/mnt/session/outputs/` — download via `files.list({scope_id: session.id, betas: ["managed-agents-2026-04-01"]})`.

 ---

--- a/skills/claude-api/shared/managed-agents-overview.md
+++ b/skills/claude-api/shared/managed-agents-overview.md
@@ -29,7 +29,7 @@ Managed Agents is in beta. The SDK sets required beta headers automatically:
 | `skills-2025-10-02`            | Skills API (for managing custom skill definitions)   |
 | `files-api-2025-04-14`         | Files API for file uploads                           |

-**Note: do not intermix beta headers** — If you need to upload a skill or file via the Skills API or Files API you will need to use the appropriate beta header as listed above. However, you do NOT need to inlude either the Skills or Files beta header when using any of the Managed Agents endpints listed in row 1 above. Do NOT include intermix beta headers and prefer to use the Skills or Files beta headers when using their specific endpoints.
+**Which beta header goes where:** The SDK sets `managed-agents-2026-04-01` automatically on `client.beta.{agents,environments,sessions,vaults}.*` calls, and `files-api-2025-04-14` / `skills-2025-10-02` automatically on `client.beta.files.*` / `client.beta.skills.*` calls. You do NOT need to add the Skills or Files beta header when calling Managed Agents endpoints. **Exception — session-scoped file listing:** `client.beta.files.list({scope_id: session.id})` is a Files endpoint that takes a Managed Agents parameter, so it needs **both** headers. Pass `betas: ["managed-agents-2026-04-01"]` explicitly on that call (the SDK adds the Files header; you add the Managed Agents one). See `shared/managed-agents-environments.md` → Session outputs.


 ## Reading Guide
@@ -48,6 +48,7 @@ Managed Agents is in beta. The SDK sets required beta headers automatically:
 | Set up environments                    | `shared/managed-agents-environments.md` + language file |
 | Upload files / attach repos            | `shared/managed-agents-environments.md` (Resources)     |
 | Store MCP credentials                  | `shared/managed-agents-tools.md` (Vaults section)       |
+| Call a non-MCP API / CLI that needs a secret | `shared/managed-agents-client-patterns.md` Pattern 9 — no container env vars; vaults are MCP-only; keep the secret host-side via a custom tool |

 ## Common Pitfalls

@@ -59,3 +60,4 @@ Managed Agents is in beta. The SDK sets required beta headers automatically:
 - **Don't trust HTTP-library timeouts as wall-clock caps** — `requests` `timeout=(c, r)` and `httpx.Timeout(n)` are *per-chunk* read timeouts; they reset every byte, so a trickling connection can block indefinitely. For a hard deadline on raw-HTTP polling, track `time.monotonic()` at the loop level and bail explicitly. Prefer the SDK's `sessions.events.stream()` / `session.events.list()` over hand-rolled HTTP. See `shared/managed-agents-events.md` → Receiving Events.
 - **Messages queue** — you can send events while the session is `running` or `idle`; they're processed in order. No need to wait for a response before sending the next message.
 - **Cloud environments only** — `config.type: "cloud"` is the only supported environment type.
+- **Archive is permanent on every resource** — archiving an agent, environment, session, vault, or credential makes it read-only with no unarchive. For agents and environments specifically, archived resources cannot be referenced by new sessions (existing sessions continue). Do not call `.archive()` on a production agent or environment as cleanup — **always confirm with the user before archiving**.
--- a/skills/claude-api/shared/managed-agents-tools.md
+++ b/skills/claude-api/shared/managed-agents-tools.md
@@ -188,6 +188,20 @@ This keeps secrets out of reusable agent definitions. Each vault credential is t

 **Vaults** store OAuth credentials (access token + refresh token) that Anthropic auto-refreshes on your behalf via standard OAuth 2.0 `refresh_token` grant. This is the only way to authenticate MCP servers in the launch SDK.

+#### Credentials and the sandbox
+
+Vaults store credentials; those credentials **never enter the sandbox**. This is a deliberate security boundary — code running in the sandbox (including anything the agent writes) cannot read or exfiltrate a vaulted credential, even under prompt injection. Instead, credentials are injected by Anthropic-side proxies **after** a request leaves the sandbox:
+
+- **MCP tool calls** are routed through an Anthropic-side proxy that fetches the credential from the vault and adds it to the outbound request.
+- **Git operations on attached GitHub repositories** (`git pull`, `git push`, GitHub REST calls) are routed through a git proxy that injects the `github_repository` resource's `authorization_token` the same way.
+
+**Not yet supported:** running other authenticated CLIs (e.g. `aws`, `gcloud`, `stripe`) directly inside the sandbox. There is currently no way to set container environment variables or expose vault credentials to arbitrary processes. If you need one of these today:
+
+- **Prefer an MCP server** for that service if one exists — it gets the same vault-backed injection.
+- **Otherwise, register a custom tool:** the agent emits `agent.custom_tool_use`, your orchestrator (which already holds the credential) executes the call and returns `user.custom_tool_result` over the same authenticated event stream. No public endpoint is exposed; the sandbox never sees the secret. See `shared/managed-agents-client-patterns.md` → Pattern 9.
+
+**Do not put API keys in the system prompt or user messages as a workaround** — they persist in the session's event history.
+
 > Formerly known internally as TATs (Tool/Tenant Access Tokens).

 **Flow:**
@@ -254,7 +268,7 @@ Skills are attached to the **agent** definition via `agents.create()`:
 const agent = await client.beta.agents.create(
  {
    name: "Financial Agent",
-    model: "claude-opus-4-6",
+    model: "claude-opus-4-7",
    system: "You are a financial analysis agent.",
    skills: [
      { type: "anthropic", skill_id: "xlsx" },
@@ -269,7 +283,7 @@ Python:
 ```python
 agent = client.beta.agents.create(
    name="Financial Agent",
-    model="claude-opus-4-6",
+    model="claude-opus-4-7",
    system="You are a financial analysis agent.",
    skills=[
        {"type": "anthropic", "skill_id": "xlsx"},
--- a/skills/claude-api/shared/model-migration.md
+++ b/skills/claude-api/shared/model-migration.md
@@ -0,0 +1,779 @@
+# Model Migration Guide
+
+How to move existing code to newer Claude models. Covers breaking changes, deprecated parameters, and drop-in replacements for retired models.
+
+For the latest, authoritative version (with code samples in every supported language), WebFetch the **Migration Guide** URL from `shared/live-sources.md`. Use this file for the consolidated, skill-resident reference; fall back to the live docs whenever a model launch or breaking change may have shifted the picture.
+
+**This file is large.** Use the section names below to jump (or `Grep` this file for the heading text). Read Step 0 and Step 1 first — they apply to every migration. Then read only the per-target section for the model you are migrating to.
+
+| Section | When you need it |
+|---|---|
+| Step 0: Confirm the migration scope | Always — before any edits |
+| Step 1: Classify each file | Always — decides whether to swap, add-alongside, or skip |
+| Per-SDK Syntax Reference | Translate the Python examples in this guide to TypeScript / Go / Ruby / Java / C# / PHP |
+| Destination Models / Retired Model Replacements | Picking a target model |
+| Breaking Changes by Source Model | Migrating to Opus 4.6 / Sonnet 4.6 |
+| Migrating to Opus 4.7 | Migrating to Opus 4.7 (breaking changes, silent defaults, behavioral shifts) |
+| Opus 4.7 Migration Checklist | The required vs optional items for 4.7, tagged `[BLOCKS]` / `[TUNE]` |
+| Verify the Migration | After edits — runtime spot-check |
+
+**TL;DR:** Change the model ID string. If you were using `budget_tokens`, switch to `thinking: {type: "adaptive"}`. If you were using assistant prefills, they 400 on both Opus 4.6 and Sonnet 4.6 — switch to one of the prefill replacements (most often `output_config.format`; see the table in Breaking Changes by Source Model). If you're moving from Sonnet 4.5 to Sonnet 4.6, set `effort` explicitly — 4.6 defaults to `high`. Remove the `effort-2025-11-24` and `fine-grained-tool-streaming-2025-05-14` beta headers (GA on 4.6); remove `interleaved-thinking-2025-05-14` once you're on adaptive thinking (keep it only while using the transitional `budget_tokens` escape hatch). Then drop back from `client.beta.messages.create` to `client.messages.create`. Dial back any aggressive "CRITICAL: YOU MUST" tool instructions; 4.6 follows the system prompt much more closely.
+
+---
+
+## Step 0: Confirm the migration scope
+
+**Before any Write, Edit, or MultiEdit call, confirm the scope.** If the user's request does not explicitly name a single file, a specific directory, or an explicit file list, **ask first — do not start editing**. This is non-negotiable: even imperative-sounding requests like "migrate my codebase", "move my project to X", "upgrade to Sonnet 4.6", or bare "migrate to Opus 4.7" leave the scope ambiguous and require a clarifying question. Phrases like "my project", "my code", "my codebase", "the whole thing", "everywhere", or "across the repo" are **ambiguous, not directive** — they tell you *what* to do but not *where*. Ask before doing.
+
+Offer the common scopes explicitly and wait for the answer before touching any file:
+
+1. The entire working directory
+2. A specific subdirectory (e.g. `src/`, `app/`, `services/billing/`)
+3. A specific file or a list of files
+
+Surface this as a single clarifying question so the user can answer in one turn. **Proceed without asking only when the scope is already unambiguous** — the user named an exact file ("migrate `extract.py` to Sonnet 4.6"), pointed at a specific directory ("migrate everything under `services/billing/` to Opus 4.6"), listed specific files ("update `a.py` and `b.py`"), or already answered the scope question in an earlier turn. If you can answer the question "which files is this change going to touch?" with a precise list from the prompt alone, proceed. If not, ask.
+
+**Worked example.** If the user says *"Move my project to Opus 4.6. I want adaptive thinking everywhere it makes sense."* you do not know whether "my project" means the whole working directory, just `src/`, just the production code, or something else — the `everywhere` makes the intent clear (update every call site *within scope*) but the scope itself is still not defined. Do not start editing. Respond with:
+
+> Before I start editing, can you confirm the scope? I can migrate:
+> 1. Every `.py` file in the working directory
+> 2. Just the files under `src/` (production code)
+> 3. A specific subdirectory or list of files you name
+>
+> Which one?
+
+Then wait for the answer. The same applies to *"Migrate to Opus 4.7"* and bare *"Help me upgrade to Sonnet 4.6"* — ask before editing.
+
+**Sizing the scope question (large repos).** Before asking, get a per-directory count so the user can pick concretely:
+
+```sh
+rg -l "<old-model-id>" --type-not md | cut -d/ -f1 | sort | uniq -c | sort -rn
+```
+
+Present the breakdown in your scope question (e.g. *"Found 217 references across 3 directories: api/ (130), api-go/ (62), routing/ (25). Which to migrate?"*). Also confirm `git status` is clean before surveying — unexpected modifications mean a concurrent process; stop and investigate before proceeding.
+
+---
+
+## Step 1: Classify each file
+
+Not every file that contains the old model ID is a **caller** of the API. Before editing, classify each file into one of these buckets — the right action differs:
+
+| # | Bucket | What it looks like | Action |
+|---|---|---|---|
+| 1 | **Calls the API/SDK** | `client.messages.create(model=…)`, `anthropic.Anthropic()`, request payloads | Swap the model ID **and** apply the breaking-change checklist for the target version (below). |
+| 2 | **Defines or serves the model** | Model registries, OpenAPI specs, routing/queue configs, model-policy enums, generated catalogs | The old entry **stays** (the model is still served). Ask whether to (a) add the new model alongside, (b) leave alone, or (c) retire the old model — never blind-replace. **If you can't ask, default to (a): add the new model alongside and flag it** — replacing would de-register a model that's still in production. |
+| 3 | **References the ID as an opaque string** | UI fallback constants, capability-gate substring checks, generic test fixtures, label parsers, env defaults | Usually swap the string and verify any parser/regex/substring match handles the new ID — but check the sub-cases below first. |
+| 4 | **Suffixed variant ID** | `claude-<model>-<suffix>` like `-fast`, `-1024k`, `-200k`, `[1m]`, dated snapshots | These are deployment/routing identifiers, not the public model ID. **Do not assume a new-model equivalent exists.** Verify in the registry first; if absent, leave the string alone and flag it. |
+
+**Bucket 3 sub-cases — before swapping a string reference, check:**
+
+- **Capability gate** (e.g. `if 'opus-4-6' in model_id:` enables a feature) → **add the new ID alongside**, don't replace. The old model is still served and still has the capability, so replacing would silently disable the feature for any old-model traffic that still flows through. If you know no old-model traffic will hit this gate (single-caller codebase fully migrating), replacing is fine; if unsure, add alongside.
+- **Registry-assert test** (e.g. `assert "claude-X" in supported_models`, `test_X_has_N_clusters`) → **add an assertion for the new model alongside; keep the old one.** The old model is still served, so its assertion stays valid — but the registry should also include the new model, so assert that too. Heuristic: if the test references multiple model versions in a list, it's a registry test; if one model in a struct compared only to itself, it's a generic fixture.
+- **Frozen / generated snapshot** → **regenerate**, don't hand-edit.
+- **Coupled to a definer** (e.g. an integration test that passes model authorization via a shared `conftest` seed list, or asserts on a billing-tier / rate-limit-group enum or a generated SKU/pricing catalog) → **verify the definer has a new-model entry first.** If not, add a seed entry (reusing the nearest existing tier as a placeholder); if you can't confidently do that, ask the user how to populate the definer. **Do not skip the test.** Swapping without populating the definer will make the test fail at runtime.
+
+When migrating tests specifically: breaking parameters (`temperature`, `top_p`, `budget_tokens`) are usually absent — test fixtures rarely set sampling params on placeholder models. The breaking-change scan is still required, but expect mostly clean results.
+
+**Find intentionally-flagged sync points first.** Many codebases tag spots that must change at every model launch with comment markers like `MODEL LAUNCH`, `KEEP IN SYNC`, `@model-update`, or similar. Grep for whatever convention the repo uses *before* the broad model-ID grep — those markers point at the load-bearing changes.
+
+---
+
+## Per-SDK Syntax Reference
+
+Code examples in this guide are Python. **The same fields exist in every official Anthropic SDK** — Stainless generates all 7 from the same OpenAPI spec, so JSON field names map 1:1 with only case-convention differences. Use the rows below to translate the Python examples to the SDK you are migrating.
+
+> **Verify type and method names against the SDK source before writing them into customer code.** WebFetch the relevant repository from the SDK source-code table in `shared/live-sources.md` (one row per SDK) and confirm the exact symbol — particularly for typed SDKs (Go, Java, C#) where union/builder names can differ from the JSON shape. Do not guess type names that aren't in the table below or in `<lang>/claude-api/README.md`.
+
+
+### `thinking` — `budget_tokens` → adaptive
+
+| SDK | Before | After |
+|---|---|---|
+| Python | `thinking={"type": "enabled", "budget_tokens": N}` | `thinking={"type": "adaptive"}` |
+| TypeScript | `thinking: { type: 'enabled', budget_tokens: N }` | `thinking: { type: 'adaptive' }` |
+| Go | `Thinking: anthropic.ThinkingConfigParamOfEnabled(N)` | `Thinking: anthropic.ThinkingConfigParamUnion{OfAdaptive: &anthropic.ThinkingConfigAdaptiveParam{}}` |
+| Ruby | `thinking: { type: "enabled", budget_tokens: N }` | `thinking: { type: "adaptive" }` |
+| Java | `.thinking(ThinkingConfigEnabled.builder().budgetTokens(N).build())` | `.thinking(ThinkingConfigAdaptive.builder().build())` |
+| C# | `Thinking = new ThinkingConfigEnabled { BudgetTokens = N }` | `Thinking = new ThinkingConfigAdaptive()` |
+| PHP | `thinking: ['type' => 'enabled', 'budget_tokens' => N]` | `thinking: ['type' => 'adaptive']` |
+
+### Sampling parameters — `temperature` / `top_p` / `top_k`
+
+(Remove the field entirely on Opus 4.7; on Claude 4.x keep at most one of `temperature` or `top_p`.)
+
+| SDK | Field(s) to remove |
+|---|---|
+| Python | `temperature=…`, `top_p=…`, `top_k=…` |
+| TypeScript | `temperature: …`, `top_p: …`, `top_k: …` |
+| Go | `Temperature: anthropic.Float(…)`, `TopP: anthropic.Float(…)`, `TopK: anthropic.Int(…)` |
+| Ruby | `temperature: …`, `top_p: …`, `top_k: …` |
+| Java | `.temperature(…)`, `.topP(…)`, `.topK(…)` |
+| C# | `Temperature = …`, `TopP = …`, `TopK = …` |
+| PHP | `temperature: …`, `topP: …`, `topK: …` |
+
+### Prefill replacement — structured outputs via `output_config.format`
+
+| SDK | Remove (last assistant turn) | Add |
+|---|---|---|
+| Python | `{"role": "assistant", "content": "…"}` | `output_config={"format": {"type": "json_schema", "schema": SCHEMA}}` |
+| TypeScript | `{ role: 'assistant', content: '…' }` | `output_config: { format: { type: 'json_schema', schema: SCHEMA } }` |
+| Go | trailing `anthropic.MessageParam{Role: "assistant", …}` | `OutputConfig: anthropic.OutputConfigParam{Format: anthropic.JSONOutputFormatParam{…}}` |
+| Ruby | `{ role: "assistant", content: "…" }` | `output_config: { format: { type: "json_schema", schema: SCHEMA } }` |
+| Java | trailing `Message.builder().role(ASSISTANT)…` | `.outputConfig(OutputConfig.builder().format(JsonOutputFormat.builder()…build()).build())` |
+| C# | trailing `new Message { Role = "assistant", … }` | `OutputConfig = new OutputConfig { Format = new JsonOutputFormat { … } }` |
+| PHP | trailing `['role' => 'assistant', 'content' => '…']` | `outputConfig: ['format' => ['type' => 'json_schema', 'schema' => $SCHEMA]]` |
+
+### `thinking.display` — opt back into summarized reasoning (Opus 4.7)
+
+| SDK | Add |
+|---|---|
+| Python | `thinking={"type": "adaptive", "display": "summarized"}` |
+| TypeScript | `thinking: { type: 'adaptive', display: 'summarized' }` |
+| Go | `Thinking: anthropic.ThinkingConfigParamUnion{OfAdaptive: &anthropic.ThinkingConfigAdaptiveParam{Display: anthropic.ThinkingConfigAdaptiveDisplaySummarized}}` |
+| Ruby | `thinking: { type: "adaptive", display: "summarized" }` (or `display_:` when constructing the model class directly) |
+| Java | `.thinking(ThinkingConfigAdaptive.builder().display(ThinkingConfigAdaptive.Display.SUMMARIZED).build())` |
+| C# | `Thinking = new ThinkingConfigAdaptive { Display = Display.Summarized }` |
+| PHP | `thinking: ['type' => 'adaptive', 'display' => 'summarized']` |
+
+For any field not in these tables, the JSON key in the Python example translates directly: `snake_case` for Python/TypeScript/Ruby, `camelCase` named args for PHP, `PascalCase` struct fields for Go/C#, `camelCase` builder methods for Java.
+
+---
+
+## Explain every change you make
+
+Migration edits often look arbitrary to a user who hasn't read the release notes — a removed `temperature`, a deleted prefill, a rewritten system-prompt sentence. **For each edit, tell the user what you changed and why**, tied to the specific API or behavioral change that motivates it. Do this in your summary as you work, not just at the end.
+
+Be especially explicit about **system-prompt edits**. Users are rightly protective of their prompts, and prompt-tuning changes are judgment calls (not hard API requirements). For any prompt edit:
+
+- Quote the before and after text.
+- State the behavioral shift that motivates it (e.g. *"Opus 4.7 calibrates response length to task complexity, so I added an explicit length instruction"*, or *"4.6 follows instructions more literally, so 'CRITICAL: YOU MUST use the search tool' will now overtrigger — softened to 'Use the search tool when…'"*).
+- Make clear which prompt edits are **optional tuning** (tone, length, subagent guidance) versus which code edits are **required to avoid a 400** (sampling params, `budget_tokens`, prefills). Never present an optional prompt change as mandatory.
+
+If you're applying several prompt-tuning edits at once, offer them as a short list the user can accept or decline item-by-item rather than silently rewriting their system prompt.
+
+---
+
+## Before You Migrate
+
+1. **Confirm the target model ID.** Use only the exact strings from `shared/models.md` — do not append date suffixes to aliases (`claude-opus-4-6`, not `claude-opus-4-6-20251101`). Guessing an ID will 404.
+2. **Check which features your code uses** with this checklist:
+   - `thinking: {type: "enabled", budget_tokens: N}` → migrate to adaptive thinking on Opus 4.6 / Sonnet 4.6 (still functional but deprecated)
+   - Assistant-turn prefills (`messages` ending with `role: "assistant"`) → must change on Opus 4.6 / Sonnet 4.6 (returns 400)
+   - `output_format` parameter on `messages.create()` → must change on all models (deprecated API-wide)
+   - `max_tokens > ~16000` → must stream on any model (above ~16K risks SDK HTTP timeouts). When streaming, Sonnet 4.6 / Haiku 4.5 cap at 64K and Opus 4.6 caps at 128K
+   - Beta headers `effort-2025-11-24`, `fine-grained-tool-streaming-2025-05-14`, `interleaved-thinking-2025-05-14` → GA on 4.6, remove them and switch from `client.beta.messages.create` to `client.messages.create`
+   - Moving Sonnet 4.5 → Sonnet 4.6 with no `effort` set → 4.6 defaults to `high`, which may change your latency/cost profile
+   - System prompts with `CRITICAL`, `MUST`, `If in doubt, use X` language → likely to overtrigger on 4.6 (see Prompt-Behavior Changes)
+   - Coming from 3.x / 4.0 / 4.1: also check sampling params (`temperature` + `top_p`), tool versions (`text_editor_20250728`), `refusal` + `model_context_window_exceeded` stop reasons, trailing-newline tool-param handling
+3. **Test on a single request first.** Run one call against the new model, inspect the response, then roll out.
+
+---
+
+## Destination Models (recommended targets)
+
+| If you're on…                         | Migrate to         | Why                                               |
+| ------------------------------------- | ------------------ | ------------------------------------------------- |
+| Opus 4.6                              | `claude-opus-4-7`  | Most capable model; adaptive thinking only; high-res vision; see Migrating to Opus 4.7 |
+| Opus 4.0 / 4.1 / 4.5 / Opus 3         | `claude-opus-4-6`  | Most intelligent 4.x before 4.7; adaptive thinking; 128K output |
+| Sonnet 4.0 / 4.5 / 3.7 / 3.5          | `claude-sonnet-4-6`| Best speed / intelligence balance; adaptive thinking; 64K output |
+| Haiku 3 / 3.5                         | `claude-haiku-4-5` | Fastest and most cost-effective                   |
+
+Default to the latest Opus for the caller's tier unless they explicitly chose otherwise. If you're moving from Opus 4.5 or older directly to Opus 4.7, apply the 4.6 migration first, then layer the Opus 4.7 changes on top (see Migrating to Opus 4.7 below).
+
+---
+
+## Retired Model Replacements
+
+These models return 404 — update immediately:
+
+| Retired model                 | Retired       | Drop-in replacement  |
+| ----------------------------- | ------------- | -------------------- |
+| `claude-3-7-sonnet-20250219`  | Feb 19, 2026  | `claude-sonnet-4-6`  |
+| `claude-3-5-haiku-20241022`   | Feb 19, 2026  | `claude-haiku-4-5`   |
+| `claude-3-opus-20240229`      | Jan 5, 2026   | `claude-opus-4-7`    |
+| `claude-3-5-sonnet-20241022`  | Oct 28, 2025  | `claude-sonnet-4-6`  |
+| `claude-3-5-sonnet-20240620`  | Oct 28, 2025  | `claude-sonnet-4-6`  |
+| `claude-3-sonnet-20240229`    | Jul 21, 2025  | `claude-sonnet-4-6`  |
+| `claude-2.1`, `claude-2.0`    | Jul 21, 2025  | `claude-sonnet-4-6`  |
+
+## Deprecated Models (retiring soon)
+
+| Model                         | Retires       | Replacement          |
+| ----------------------------- | ------------- | -------------------- |
+| `claude-3-haiku-20240307`     | Apr 19, 2026  | `claude-haiku-4-5`   |
+| `claude-opus-4-20250514`      | June 15, 2026 | `claude-opus-4-7`    |
+| `claude-sonnet-4-20250514`    | June 15, 2026 | `claude-sonnet-4-6`  |
+
+---
+
+## Breaking Changes by Source Model
+
+### Migrating from Sonnet 4.5 to Sonnet 4.6 (effort default change)
+
+Sonnet 4.5 had no `effort` parameter; Sonnet 4.6 defaults to `high`. If you just switch the model string and do nothing else, you may see noticeably higher latency and token usage. Set `effort` explicitly.
+
+**Recommended starting points:**
+
+| Workload                                          | Start at       | Notes                                                                                                    |
+| ------------------------------------------------- | -------------- | -------------------------------------------------------------------------------------------------------- |
+| Chat, classification, content generation          | `low`          | With `thinking: {"type": "disabled"}` you'll see similar or better performance vs. Sonnet 4.5 no-thinking |
+| Most applications (balanced)                      | `medium`       | The default sweet spot for quality vs. cost                                                              |
+| Agentic coding, tool-heavy workflows              | `medium`       | Pair with adaptive thinking and a generous `max_tokens` (up to 64K with streaming — Sonnet 4.6's ceiling) |
+| Autonomous multi-step agents, long-horizon loops  | `high`         | Scale down to `medium` if latency/tokens become a concern                                                 |
+| Computer-use agents                               | `high` + adaptive | Sonnet 4.6's best computer-use accuracy is on adaptive + high                                          |
+
+For non-thinking chat workloads specifically:
+
+```python
+client.messages.create(
+    model="claude-sonnet-4-6",
+    max_tokens=8192,
+    thinking={"type": "disabled"},
+    output_config={"effort": "low"},
+    messages=[{"role": "user", "content": "..."}],
+)
+```
+
+**When to use Opus 4.6 instead:** hardest and longest-horizon problems — large code migrations, deep research, extended autonomous work. Sonnet 4.6 wins on fast turnaround and cost efficiency.
+
+### Migrating to Opus 4.6 / Sonnet 4.6 (from any older model)
+
+**1. Manual extended thinking is deprecated — use adaptive thinking.**
+
+`thinking: {type: "enabled", budget_tokens: N}` (manual extended thinking with a fixed token budget) is deprecated on Opus 4.6 and Sonnet 4.6. Replace it with `thinking: {type: "adaptive"}`, which lets Claude decide when and how much to think. Adaptive thinking also enables interleaved thinking automatically (no beta header needed).
+
+```python
+# Old (still works on older models, deprecated on 4.6)
+response = client.messages.create(
+    model="claude-sonnet-4-5",
+    max_tokens=16000,
+    thinking={"type": "enabled", "budget_tokens": 8000},
+    messages=[...]
+)
+
+# New (Opus 4.6 / Sonnet 4.6)
+response = client.messages.create(
+    model="claude-opus-4-6",  # or "claude-sonnet-4-6"
+    max_tokens=16000,
+    thinking={"type": "adaptive"},
+    output_config={"effort": "high"},  # optional: low | medium | high | max
+    messages=[...]
+)
+```
+
+Adaptive thinking is the long-term target, and on internal evaluations it outperforms manual extended thinking. Move when you can.
+
+**Transitional escape hatch:** manual extended thinking is still *functional* on Opus 4.6 and Sonnet 4.6 (deprecated, will be removed in a future release). If you need a hard ceiling while migrating — for example, to bound token spend on a runaway workload before you've tuned `effort` — you can keep `budget_tokens` around alongside an explicit `effort` value, then remove it in a follow-up. `budget_tokens` must be strictly less than `max_tokens`:
+
+```python
+# Transitional only — deprecated, plan to remove
+client.messages.create(
+    model="claude-sonnet-4-6",
+    max_tokens=16384,
+    thinking={"type": "enabled", "budget_tokens": 8192},  # must be < max_tokens
+    output_config={"effort": "medium"},
+    messages=[...],
+)
+```
+
+If the user asks for a "thinking budget" on 4.6, the preferred answer is `effort` — use `low`, `medium`, `high`, or `max` (Opus-tier only — not Sonnet or Haiku) rather than a token count.
+
+**2. Effort parameter (Opus 4.5, Opus 4.6, Sonnet 4.6 only).**
+
+Controls thinking depth and overall token spend. Goes inside `output_config`, not top-level. Default is `high`. `max` is Opus-tier only (Opus 4.6 and later — not Sonnet or Haiku). Errors on Sonnet 4.5 and Haiku 4.5.
+
+```python
+output_config={"effort": "medium"}  # often the best cost / quality balance
+```
+
+### Migrating to the 4.6 family (Opus 4.6 and Sonnet 4.6)
+
+**3. Assistant-turn prefills return 400 (Opus 4.6 and Sonnet 4.6).**
+
+Prefilled responses on the final assistant turn are no longer supported on either Opus 4.6 or Sonnet 4.6 — both return a 400. Adding assistant messages *elsewhere* in the conversation (e.g., for few-shot examples) still works. Pick the replacement that matches what the prefill was doing:
+
+| Prefill was used for                               | Replacement                                                                                                                               |
+| -------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
+| Forcing JSON / YAML / schema output                | `output_config.format` with a `json_schema` — see example below                                                                           |
+| Forcing a classification label                     | Tool with an enum field containing valid labels, or structured outputs                                                                    |
+| Skipping preambles (`Here is the summary:\n`)      | System prompt instruction: *"Respond directly without preamble. Do not start with phrases like 'Here is...' or 'Based on...'."*           |
+| Steering around bad refusals                       | Usually no longer needed — 4.6 refuses far more appropriately. Plain user-turn prompting is sufficient.                                   |
+| Continuing an interrupted response                 | Move continuation into the user turn: *"Your previous response was interrupted and ended with `[last text]`. Continue from there."*     |
+| Injecting reminders / context hydration            | Inject into the user turn instead. For complex agent harnesses, expose context via a tool call or during compaction.                      |
+
+```python
+# Old (fails on Opus 4.6 / Sonnet 4.6) — prefill forcing JSON shape
+messages=[
+    {"role": "user", "content": "Extract the name."},
+    {"role": "assistant", "content": "{\"name\": \""},
+]
+
+# New — structured outputs replace the prefill
+response = client.messages.create(
+    model="claude-opus-4-6",
+    max_tokens=1024,
+    output_config={"format": {"type": "json_schema", "schema": {...}}},
+    messages=[{"role": "user", "content": "Extract the name."}],
+)
+```
+
+**4. Stream for `max_tokens > ~16K` (all models); Opus 4.6 alone reaches 128K.**
+
+Non-streaming requests hit SDK HTTP timeouts at high `max_tokens`, regardless of model — stream for anything above ~16K output. The streamable ceiling differs by model: Sonnet 4.6 and Haiku 4.5 cap at 64K, and Opus 4.6 alone goes up to 128K.
+
+```python
+with client.messages.stream(model="claude-opus-4-6", max_tokens=64000, ...) as stream:
+    message = stream.get_final_message()
+```
+
+**5. Tool-call JSON escaping may differ (Opus 4.6 and Sonnet 4.6).**
+
+Both 4.6 models can produce tool call `input` fields with Unicode or forward-slash escaping. Always parse with `json.loads()` / `JSON.parse()` — never raw-string-match the serialized input.
+
+### All models
+
+**6. `output_format` → `output_config.format` (API-wide).**
+
+The old top-level `output_format` parameter on `messages.create()` is deprecated. Use `output_config.format` instead. This is not 4.6-specific — applies to every model.
+
+---
+
+## Beta Headers to Remove on 4.6
+
+Several beta headers that were required on 4.5 are now GA on 4.6 and should be removed. Leaving them in is harmless but misleading; removing them also lets you move from `client.beta.messages.create(...)` back to `client.messages.create(...)`.
+
+| Header                                    | Status on 4.6                                              | Action                                                  |
+| ----------------------------------------- | ---------------------------------------------------------- | ------------------------------------------------------- |
+| `effort-2025-11-24`                       | Effort parameter is GA                                     | Remove                                                  |
+| `fine-grained-tool-streaming-2025-05-14`  | GA                                                         | Remove                                                  |
+| `interleaved-thinking-2025-05-14`         | Adaptive thinking enables interleaved thinking automatically | Remove when using adaptive thinking; still functional on Sonnet 4.6 *with* manual extended thinking, but that path is deprecated |
+| `token-efficient-tools-2025-02-19`        | Built in to all Claude 4+ models                           | Remove (no effect)                                      |
+| `output-128k-2025-02-19`                  | Built in to Claude 4+ models                               | Remove (no effect)                                      |
+
+Once you remove all of these and finish moving to adaptive thinking, you can switch the SDK call site from the beta namespace back to the regular one:
+
+```python
+# Before
+response = client.beta.messages.create(
+    model="claude-opus-4-5",
+    betas=["interleaved-thinking-2025-05-14", "effort-2025-11-24"],
+    ...
+)
+
+# After
+response = client.messages.create(
+    model="claude-opus-4-6",
+    thinking={"type": "adaptive"},
+    output_config={"effort": "high"},
+    ...
+)
+```
+
+---
+
+## Additional Changes When Coming from 3.x / 4.0 / 4.1 → 4.6
+
+If you're jumping from Opus 4.1, Sonnet 4, Sonnet 3.7, or an older Claude 3.x model directly to 4.6, apply everything above *plus* the items in this section. Users already on Opus 4.5 / Sonnet 4.5 can skip this.
+
+**1. Sampling parameters: `temperature` OR `top_p`, not both.**
+
+Passing both will error on every Claude 4+ model:
+
+```python
+# Old (3.x only — errors on 4+)
+client.messages.create(temperature=0.7, top_p=0.9, ...)
+
+# New
+client.messages.create(temperature=0.7, ...)  # or top_p, not both
+```
+
+**2. Update tool versions.**
+
+Legacy tool versions are not supported on 4+. **Both the `type` and the `name` field change** — `text_editor_20250728` and `str_replace_based_edit_tool` are a pair; updating one without the other 400s. Also remove the `undo_edit` command from your text-editor integration:
+
+| Old                                               | New                                                     |
+| ------------------------------------------------- | ------------------------------------------------------- |
+| `text_editor_20250124` + `str_replace_editor`     | `text_editor_20250728` + `str_replace_based_edit_tool`  |
+| `code_execution_*` (earlier versions)             | `code_execution_20250825`                               |
+| `undo_edit` command                               | *(no longer supported — delete call sites)*             |
+
+```python
+# Before
+tools = [{"type": "text_editor_20250124", "name": "str_replace_editor"}]
+
+# After — BOTH fields change
+tools = [{"type": "text_editor_20250728", "name": "str_replace_based_edit_tool"}]
+```
+
+**3. Handle the `refusal` stop reason.**
+
+Claude 4+ can return `stop_reason: "refusal"` on the response. If your code only handles `end_turn` / `tool_use` / `max_tokens`, add a branch:
+
+```python
+if response.stop_reason == "refusal":
+    # Surface the refusal to the user; do not retry with the same prompt
+    ...
+```
+
+**4. Handle the `model_context_window_exceeded` stop reason (4.5+).**
+
+Distinct from `max_tokens`: it means the model hit the *context window* limit, not the requested output cap. Handle both:
+
+```python
+if response.stop_reason == "model_context_window_exceeded":
+    # Context window exhausted — compact or split the conversation
+    ...
+elif response.stop_reason == "max_tokens":
+    # Requested output cap hit — retry with higher max_tokens or stream
+    ...
+```
+
+**5. Trailing newlines preserved in tool call string parameters (4.5+).**
+
+4.5 and 4.6 preserve trailing newlines that older models stripped. If your tool implementations do exact string matching against tool-call `input` values (e.g., `if name == "foo"`), verify they still match when the model sends `"foo\n"`. Normalizing with `.rstrip()` on the receiving side is usually the simplest fix.
+
+**6. Haiku: rate limits reset between generations.**
+
+Haiku 4.5 has its own rate-limit pool separate from Haiku 3 / 3.5. If you're ramping traffic as you migrate, check your tier's Haiku 4.5 limits at [API rate limits](https://platform.claude.com/docs/en/api/rate-limits) — a quota that comfortably served Haiku 3.5 traffic may need a tier bump for the same volume on 4.5.
+
+---
+
+## Prompt-Behavior Changes (Opus 4.5 / 4.6, Sonnet 4.6)
+
+These don't break your code, but prompts that worked on 4.5-and-earlier may over- or under-trigger on 4.6. Tune as needed.
+
+**1. Aggressive instructions cause overtriggering.** Opus 4.5 and 4.6 follow the system prompt much more closely than earlier models. Prompts written to *overcome* the old reluctance are now too aggressive:
+
+| Before (worked on 4.0 / 4.5)                | After (use on 4.6)                        |
+| ------------------------------------------- | ----------------------------------------- |
+| `CRITICAL: You MUST use this tool when...`  | `Use this tool when...`                   |
+| `Default to using [tool]`                   | `Use [tool] when it would improve X`      |
+| `If in doubt, use [tool]`                   | *(delete — no longer needed)*             |
+
+If the model is now overtriggering a tool or skill, the fix is almost always to dial back the language, not to add more guardrails.
+
+**2. Overthinking and excessive exploration (Opus 4.6).** At higher `effort` settings, Opus 4.6 explores more before answering. If that burns too many thinking tokens, lower `effort` first (`medium` is often the sweet spot) before adding prose instructions to constrain reasoning.
+
+**3. Overeager subagent spawning (Opus 4.6).** Opus 4.6 has a strong preference for delegating to subagents. If you see it spawning a subagent for something a direct `grep` or `read` would solve, add guidance: *"Use subagents only for parallel or independent workstreams. For single-file reads or sequential operations, work directly."*
+
+**4. Overengineering (Opus 4.5 / 4.6).** Both models may add extra files, abstractions, or defensive error handling beyond what was asked. If you want minimal changes, prompt for it explicitly: *"Only make changes directly requested. Don't add helpers, abstractions, or error handling for scenarios that can't happen."*
+
+**5. LaTeX math output (Opus 4.6).** Opus 4.6 defaults to LaTeX (`\frac{}{}`, `$...$`) for math and technical content. If you need plain text, instruct it explicitly: *"Format all math as plain text — no LaTeX, no `$`, no `\frac{}{}`. Use `/` for division and `^` for exponents."*
+
+**6. Skipped verbal summaries (4.6 family).** The 4.6 models are more concise and may skip the summary paragraph after a tool call, jumping straight to the next action. If you rely on those summaries for visibility, add: *"After completing a task that involves tool use, provide a brief summary of what you did."*
+
+**7. "Think" as a trigger word (Opus 4.5 with thinking disabled).** When `thinking` is off, Opus 4.5 is particularly sensitive to the word *think* and may reason more than you want. Use `consider`, `evaluate`, or `reason through` instead.
+
+---
+
+## Model-ID Rename Quick Reference
+
+| Old string (migration source)  | New string         |
+| ------------------------------ | ------------------ |
+| `claude-opus-4-6`              | `claude-opus-4-7`  |
+| `claude-opus-4-5`              | `claude-opus-4-7`  |
+| `claude-opus-4-1`              | `claude-opus-4-7`  |
+| `claude-opus-4-0`              | `claude-opus-4-7`  |
+| `claude-sonnet-4-5`            | `claude-sonnet-4-6`|
+| `claude-sonnet-4-0`            | `claude-sonnet-4-6`|
+
+Older aliases (`claude-opus-4-5`, `claude-sonnet-4-5`, `claude-opus-4-1`, etc.) are still active and can be pinned if you need time before upgrading — see `shared/models.md` for the full legacy list.
+
+---
+
+## Migration Checklist
+
+Every item is tagged: **`[BLOCKS]`** items cause a 400 error, infinite loop, silent timeout, or wrong tool selection if missed — apply these as code edits, not as suggestions. **`[TUNE]`** items are quality/cost adjustments.
+
+For each file that calls `messages.create()` / equivalent SDK method:
+
+- [ ] **[BLOCKS]** Update the `model=` string to the new alias
+- [ ] **[BLOCKS]** Replace `budget_tokens` with `thinking={"type": "adaptive"}` (deprecated on Opus 4.6 / Sonnet 4.6)
+- [ ] **[BLOCKS]** Move `format` from top-level `output_format` into `output_config.format`
+- [ ] **[BLOCKS]** Remove any assistant-turn prefills if targeting Opus 4.6 or Sonnet 4.6 (see the prefill replacement table)
+- [ ] **[BLOCKS]** Switch to streaming if `max_tokens > ~16000` (otherwise SDK HTTP timeout)
+- [ ] **[TUNE]** Set `output_config={"effort": "..."}` explicitly — especially when moving Sonnet 4.5 → Sonnet 4.6 (4.6 defaults to `high`)
+- [ ] **[TUNE]** Remove GA beta headers: `effort-2025-11-24`, `fine-grained-tool-streaming-2025-05-14`, `token-efficient-tools-2025-02-19`, `output-128k-2025-02-19`; remove `interleaved-thinking-2025-05-14` once on adaptive thinking
+- [ ] **[TUNE]** Switch `client.beta.messages.create(...)` → `client.messages.create(...)` once all betas are removed
+- [ ] **[TUNE]** Review system prompt for aggressive tool language (`CRITICAL:`, `MUST`, `If in doubt`) and dial it back
+
+**Extra items when coming from 3.x / 4.0 / 4.1:**
+- [ ] **[BLOCKS]** Remove either `temperature` or `top_p` (passing both 400s on Claude 4+)
+- [ ] **[BLOCKS]** Update text-editor tool `type` to `text_editor_20250728`
+- [ ] **[BLOCKS]** Update text-editor tool `name` to `str_replace_based_edit_tool` — **changing only the `type` and keeping `name: "str_replace_editor"` returns a 400**
+- [ ] **[BLOCKS]** Update code-execution tool to `code_execution_20250825`
+- [ ] **[BLOCKS]** Delete any `undo_edit` command call sites
+- [ ] **[TUNE]** Add handling for `stop_reason == "refusal"`
+- [ ] **[TUNE]** Add handling for `stop_reason == "model_context_window_exceeded"` (4.5+)
+- [ ] **[TUNE]** Verify tool-param string matching tolerates trailing newlines (preserved on 4.5+)
+- [ ] **[TUNE]** If moving to Haiku 4.5: review rate-limit tier (separate pool from Haiku 3.x)
+
+**Verification:**
+- [ ] Run one test request and inspect `response.stop_reason`, `response.usage`, and whether tool-use / thinking behavior matches expectations
+
+For cached prompts: the render order and hash inputs did not change, so existing `cache_control` breakpoints keep working. However, **changing the model string invalidates the existing cache** — the first request on the new model will write the cache fresh.
+
+---
+
+## Migrating to Opus 4.7
+
+> **Model ID `claude-opus-4-7` is authoritative as written here.** When the user asks to migrate to Opus 4.7, write `model="claude-opus-4-7"` exactly. Do **not** WebFetch to verify — this guide is the source of truth for migration target IDs. The corresponding entry exists in `shared/models.md`.
+
+Claude Opus 4.7 is our most capable generally available model to date. It is highly autonomous and performs exceptionally well on long-horizon agentic work, knowledge work, vision tasks, and memory tasks. This section summarizes everything new at launch. It is layered on top of the 4.6 migration above — if the caller is jumping from Opus 4.5 or older, apply the 4.6 changes first, then apply this section.
+
+**TL;DR for someone already on Opus 4.6:** update the model ID to `claude-opus-4-7`, strip any remaining `budget_tokens` and sampling parameters (both 400 on Opus 4.7), give `max_tokens` extra headroom and re-baseline with `count_tokens()` against the new model, opt back into `thinking.display: "summarized"` if reasoning is surfaced to users, and re-tune `effort` — it matters more on 4.7 than on any prior Opus.
+
+### Breaking changes (will 400 on Opus 4.7)
+
+**Extended thinking removed.**
+
+`thinking: {type: "enabled", budget_tokens: N}` is no longer supported on Claude Opus 4.7 or later models and returns a 400 error. Switch to adaptive thinking (`thinking: {type: "adaptive"}`) and use the effort parameter to control thinking depth. Adaptive thinking is **off by default** on Claude Opus 4.7: requests with no `thinking` field run without thinking, matching Opus 4.6 behavior. Set `thinking: {type: "adaptive"}` explicitly to enable it.
+
+```python
+# Before (Opus 4.6)
+client.messages.create(
+    model="claude-opus-4-6",
+    max_tokens=64000,
+    thinking={"type": "enabled", "budget_tokens": 32000},
+    messages=[{"role": "user", "content": "..."}],
+)
+
+# After (Opus 4.7)
+client.messages.create(
+    model="claude-opus-4-7",
+    max_tokens=64000,
+    thinking={"type": "adaptive"},
+    output_config={"effort": "high"},  # or "max", "xhigh", "medium", "low"
+    messages=[{"role": "user", "content": "..."}],
+)
+```
+
+If the caller wasn't using extended thinking, no change is required — thinking is off by default, or can be set explicitly with `thinking={"type": "disabled"}`.
+
+Delete `budget_tokens` plumbing entirely. For the replacement `effort` value, see **Choosing an effort level on Opus 4.7** below — there is no exact 1:1 mapping from `budget_tokens`.
+
+**Sampling parameters removed.**
+
+The `temperature`, `top_p`, and `top_k` parameters are no longer accepted on Claude Opus 4.7. Requests that include them return a 400 error. Remove these fields from your request payloads. Prompting is the recommended way to guide model behavior on Claude Opus 4.7. If you were using `temperature = 0` for determinism, note that it never guaranteed identical outputs on prior models.
+
+```python
+# Before — errors on Opus 4.7
+client.messages.create(temperature=0.7, top_p=0.9, ...)
+
+# After
+client.messages.create(...)  # no sampling params
+```
+
+- **If the intent was determinism** — use `effort: "low"` with a tighter prompt.
+- **If the intent was creative variance** — the prompt replacement depends on the use case; **ask the user** how they want variance elicited. If you can't ask, add a use-case-appropriate instruction along the lines of *"choose something off-distribution and interesting"* — e.g. for text generation, *"Vary your phrasing and structure across responses"*; for frontend/design, use the propose-4-directions approach under **Design and frontend coding** below.
+
+### Choosing an effort level on Opus 4.7
+
+`budget_tokens` controlled how much to *think*; `effort` controls how much to think *and* act, so there is no exact 1:1 mapping. **Use `xhigh` for best results in coding and agentic use cases, and a minimum of `high` for most intelligence-sensitive use cases.** Experiment with other levels to further tune token usage and intelligence:
+
+| Level | Use when | Notes |
+| --- | --- | --- |
+| `max` | Intelligence-demanding tasks worth testing at the ceiling | Can deliver gains in some use cases but may show diminishing returns from increased token usage; can be prone to overthinking |
+| `xhigh` | **Most coding and agentic use cases** | The best setting for these; used as the default in Claude Code |
+| `high` | Intelligence-sensitive use cases generally | Balances token usage and intelligence; recommended minimum for most intelligence-sensitive work |
+| `medium` | Cost-sensitive use cases that need to reduce token usage while trading off intelligence | |
+| `low` | Short, scoped tasks and latency-sensitive workloads that are not intelligence-sensitive | |
+
+### Silent default changes (no error, but behavior differs)
+
+**Thinking content omitted by default.**
+
+Thinking blocks still appear in the response stream on Claude Opus 4.7, but their `thinking` field is empty unless you explicitly opt in. This is a silent change from Claude Opus 4.6, where the default was to return summarized thinking text. To restore summarized thinking content on Claude Opus 4.7, set `thinking.display` to `"summarized"`. **The block-field name is unchanged** — it is still `block.thinking` on a `thinking`-type block; do not rename it.
+
+**Detect this:** any code that reads `block.thinking` (or equivalent) from a `thinking`-type block and renders it in a UI, log, or trace. **The fix is the request parameter, not the response handling** — add `display: "summarized"` to the `thinking` parameter:
+
+```python
+thinking={"type": "adaptive", "display": "summarized"}  # "display" is new on Opus 4.7; values: "omitted" (default) | "summarized"
+```
+
+The default is `"omitted"` on Claude Opus 4.7. If thinking content was never surfaced anywhere, no change needed. If your product streams reasoning to users, the new default appears as a long pause before output begins; set `display: "summarized"` to restore visible progress during thinking.
+
+**Updated token counting.**
+
+Claude Opus 4.7 and Claude Opus 4.6 count tokens differently. The same input text produces a higher token count on Claude Opus 4.7 than on Claude Opus 4.6, and `/v1/messages/count_tokens` will return a different number of tokens for Claude Opus 4.7 than it did for Claude Opus 4.6. The token efficiency of Claude Opus 4.7 can vary by workload shape. Prompting interventions, `task_budget`, and `effort` can help control costs and ensure appropriate token usage. Keep in mind that these controls may trade off model intelligence. **Update your `max_tokens` parameters to give additional headroom, including compaction triggers.** Claude Opus 4.7 provides a 1M context window at standard API pricing with no long-context premium.
+
+What else to check:
+
+- Client-side token estimators (tiktoken-style approximations) calibrated against 4.6
+- Cost calculators that multiply tokens by a fixed per-token rate
+- Rate-limit retry thresholds keyed to measured token counts
+
+Re-baseline by re-running `client.messages.count_tokens()` against `claude-opus-4-7` on a representative sample of the caller's prompts. Do not apply a blanket multiplier. For cost-sensitive workloads, consider reducing `effort` by one level (e.g. `high` → `medium`). For agentic loops, consider adopting Task Budgets (below).
+
+### New feature: Task Budgets (beta)
+
+Opus 4.7 introduces **task budgets** — tell Claude how many tokens it has for a full agentic loop (thinking + tool calls + final output). The model sees a running countdown and uses it to prioritize work and wrap up gracefully as the budget is consumed.
+
+This is a **suggestion the model is aware of**, not a hard cap. It is distinct from `max_tokens`, which remains the enforced per-response limit and is *not* surfaced to the model. Use `task_budget` when you want the model to self-moderate; use `max_tokens` as a hard ceiling to cap usage.
+
+Requires beta header `task-budgets-2026-03-13`:
+
+```python
+client.beta.messages.create(
+    betas=["task-budgets-2026-03-13"],
+    model="claude-opus-4-7",
+    max_tokens=64000,
+    thinking={"type": "adaptive"},
+    output_config={
+        "effort": "high",
+        "task_budget": {"type": "tokens", "total": 128000},
+    },
+    messages=[...],
+)
+```
+
+Set a generous budget for open-ended agentic tasks and tighten it for latency-sensitive ones. **Minimum `task_budget.total` is 20,000 tokens.** If the budget is too restrictive for the task, the model may complete it less thoroughly, referencing its budget as the constraint. **Do not add `task_budget` during a migration unless you are sure the budget value is right** — if you can run the workload and measure, do so; otherwise ask the user for the value rather than guessing. This is the primary lever for offsetting the token-counting shift on agentic workloads.
+
+### Capability improvements
+
+**High-resolution vision.** Opus 4.7 is the first Claude model with high-resolution image support. Maximum image resolution is **2576 pixels on the long edge** (up from 1568px on Opus 4.6 and prior). This unlocks gains on vision-heavy workloads, especially computer use and screenshot/artifact/document understanding. Coordinates returned by the model now map 1:1 to actual image pixels, so no scale-factor math is needed.
+
+High-res support is **automatic on Opus 4.7** — no beta header, no client-side opt-in required. The model accepts larger inputs and returns pixel-accurate coordinates out of the box.
+
+**Token cost.** Full-resolution images on Opus 4.7 can use up to ~3× more image tokens than on prior models (up to ~4784 tokens per image, vs. the previous ~1,600-token cap). If the extra fidelity isn't needed, downsample client-side before sending to control cost — but **do not add downsampling by default during a migration**. If you're not sure whether the pipeline needs the fidelity, ask the user rather than guessing. Use `count_tokens()` on representative images on Opus 4.7 to re-baseline before reacting to any measured cost shift.
+
+Beyond resolution, Opus 4.7 also improves on low-level perception (pointing, measuring, counting) and natural-image bounding-box localization and detection.
+
+**Knowledge work.** Meaningful gains on tasks where the model visually verifies its own output — `.docx` redlining, `.pptx` editing, and programmatic chart/figure analysis (e.g. pixel-level data transcription via image-processing libraries). If prompts have scaffolding like *"double-check the slide layout before returning"*, try removing it and re-baselining.
+
+**Memory.** Opus 4.7 is better at writing and using file-system-based memory. If an agent maintains a scratchpad, notes file, or structured memory store across turns, that agent should improve at jotting down notes to itself and leveraging its notes in future tasks.
+
+**User-facing progress updates.** Opus 4.7 provides more regular, higher-quality interim updates during long agentic traces. If the system prompt has scaffolding like *"After every 3 tool calls, summarize progress"*, try removing it to avoid excessive user-facing text. If the length or contents of Opus 4.7's updates are not well-calibrated to your use case, explicitly describe what these updates should look like in the prompt and provide examples.
+
+### Real-time cybersecurity safeguards
+
+Requests that involve prohibited or high-risk topics may lead to refusals.
+
+### Fast Mode: not available on Opus 4.7
+
+Opus 4.7 does not have a Fast Mode variant. **Opus 4.6 Fast remains supported**. Only surface this if the caller's code actually uses a Fast Mode model string (e.g. `claude-opus-4-6-fast`); if the word "fast" does not appear in the code, say nothing about Fast Mode.
+
+When you see `model="claude-opus-4-6-fast"` (or similar), **the migration edit is**:
+
+```python
+# Opus 4.7 has no Fast Mode — keeping on 4.6 Fast (caller's choice to switch to standard Opus 4.7).
+model="claude-opus-4-6-fast",
+```
+
+That is: leave the model string **unchanged**, add the comment above it, and tell the user their two options — (a) stay on Opus 4.6 Fast, which remains supported, or (b) move latency-tolerant traffic to standard Opus 4.7 for the intelligence gain. Do **not** rewrite the model string to `claude-opus-4-7` yourself; that silently trades latency for intelligence, which is the caller's decision.
+
+### Behavioral shifts (prompt-tunable)
+
+These don't break anything, but prompts tuned for Opus 4.6 may land differently. Opus 4.7 is more steerable than 4.6, so small prompt nudges usually close the gap.
+
+**More literal instruction following.** Claude Opus 4.7 interprets prompts more literally and explicitly than Claude Opus 4.6, particularly at lower effort levels. It will not silently generalize an instruction from one item to another, and it will not infer requests you didn't make. The upside of this literalism is precision and less thrash. It generally performs better for API use cases with carefully tuned prompts, structured extraction, and pipelines where you want predictable behavior. A prompt and harness review may be especially helpful for migration to Claude Opus 4.7.
+
+**Verbosity calibrates to task complexity.** Opus 4.7 scales response length to how complex it judges the task to be, rather than defaulting to a fixed verbosity — shorter answers on simple lookups, much longer on open-ended analysis. If the product depends on a particular length or style, tune the prompt explicitly. To reduce verbosity:
+
+> *"Provide concise, focused responses. Skip non-essential context, and keep examples minimal."*
+
+If you see specific kinds of over-verbosity (e.g. over-explaining), add instructions targeting those. Positive examples showing the desired level of concision tend to be more effective than negative examples or instructions telling the model what not to do. Do **not** assume existing "be concise" instructions should be removed — test first.
+
+**Tone and writing style.** Opus 4.7 is more direct and opinionated, with less validation-forward phrasing and fewer emoji than Opus 4.6's warmer style. As with any new model, prose style on long-form writing may shift. If the product relies on a specific voice, re-evaluate style prompts against the new baseline. If a warmer or more conversational voice is wanted, specify it:
+
+> *"Use a warm, collaborative tone. Acknowledge the user's framing before answering."*
+
+**`effort` matters more than on any prior Opus.** Opus 4.7 respects `effort` levels more strictly, especially at the low end. At `low` and `medium` it scopes work to what was asked rather than going above and beyond — good for latency and cost, but on moderate tasks at `low` there is some risk of under-thinking.
+
+- If shallow reasoning shows up on complex problems, raise `effort` to `high` or `xhigh` rather than prompting around it.
+- If `effort` must stay `low` for latency, add targeted guidance: *"This task involves multi-step reasoning. Think carefully through the problem before responding."*
+- **At `xhigh` or `max`, set a large `max_tokens`** so the model has room to think and act across tool calls and subagents. Start at 64K and tune from there. (`xhigh` is a new effort level on Opus 4.7, between `high` and `max`.)
+
+Adaptive-thinking triggering is also steerable. If the model thinks more often than wanted — which can happen with large or complex system prompts — add: *"Thinking adds latency and should only be used when it will meaningfully improve answer quality — typically for problems that require multi-step reasoning. When in doubt, respond directly."*
+
+**Uses tools less often by default.** Opus 4.7 tends to use tools less often than 4.6 and to use reasoning more. This produces better results in most cases, but for products that rely on tools (search/retrieval, function-calling, computer-use steps), it can drop tool-use rate. Two levers:
+
+- **Raise `effort`** — `high` or `xhigh` show substantially more tool usage in agentic search and coding, and are especially useful for knowledge work.
+- **Prompt for it** — be explicit in tool descriptions or the system prompt about when and how to use the tool, and encourage the model to err on the side of using it more often:
+
+> *"When the answer depends on information not present in the conversation, you MUST call the `search` tool before answering — do not answer from prior knowledge."*
+
+**Fewer subagents by default.** Opus 4.7 tends to spawn fewer subagents than 4.6. This is steerable — give explicit guidance on when delegation is desirable. For a coding agent, for example:
+
+> *"Do NOT spawn a subagent for work you can complete directly in a single response (e.g. refactoring a function you can already see). Spawn multiple subagents in the same turn when fanning out across items or reading multiple files."*
+
+**Design and frontend coding.** Opus 4.7 has stronger design instincts than 4.6, with a consistent default house style: warm cream/off-white backgrounds (around `#F4F1EA`), serif display type (Georgia, Fraunces, Playfair), italic word-accents, and a terracotta/amber accent. This reads well for editorial, hospitality, and portfolio briefs, but will feel off for dashboards, dev tools, fintech, healthcare, or enterprise apps — and it appears in slide decks as well as web UIs.
+
+The default is persistent. Generic instructions ("don't use cream," "make it clean and minimal") tend to shift the model to a different fixed palette rather than producing variety. Two approaches work reliably:
+
+1. **Specify a concrete alternative.** The model follows explicit specs precisely — give exact hex values, typefaces, and layout constraints.
+2. **Have the model propose options before building.** This breaks the default and gives the user control:
+
+   > *"Before building, propose 4 distinct visual directions tailored to this brief (each as: bg hex / accent hex / typeface — one-line rationale). Ask the user to pick one, then implement only that direction."*
+
+If the caller previously relied on `temperature` for design variety, use approach (2) — it produces meaningfully different directions across runs.
+
+Opus 4.7 also requires less frontend-design prompting than previous models to avoid generic "AI slop" aesthetics. Where earlier models needed a lengthy anti-slop snippet, Opus 4.7 generates distinctive, creative frontends with a much shorter nudge. This snippet works well alongside the variety approaches above:
+
+> *"NEVER use generic AI-generated aesthetics like overused font families (Inter, Roboto, Arial, system fonts), cliched color schemes (particularly purple gradients on white or dark backgrounds), predictable layouts and component patterns, and cookie-cutter design that lacks context-specific character. Use unique fonts, cohesive colors and themes, and animations for effects and micro-interactions."*
+
+**Interactive coding products.** Opus 4.7's token usage and behavior can differ between autonomous, asynchronous coding agents with a single user turn and interactive, synchronous coding agents with multiple user turns. Specifically, it tends to use more tokens in interactive settings, primarily because it reasons more after user turns. This can improve long-horizon coherence, instruction following, and coding capabilities in long interactive coding sessions, but also comes with more token usage. To maximize both performance and token efficiency in coding products, use `effort: "xhigh"` or `"high"`, add autonomous features (like an auto mode), and reduce the number of human interactions required from users.
+
+When limiting required user interactions, specify the task, intent, and relevant constraints upfront in the first human turn. Well-specified, clear, and accurate task descriptions upfront help maximize autonomy and intelligence while minimizing extra token usage after user turns — because Opus 4.7 is more autonomous than prior models, this usage pattern helps to maximize performance. In contrast, ambiguous or underspecified prompts conveyed progressively over multiple user turns tend to reduce token efficiency and sometimes performance.
+
+**Code review.** Opus 4.7 is meaningfully better at finding bugs than prior models, with both higher recall and precision. However, if a code-review harness was tuned for an earlier model, it may initially show *lower* recall — this is likely a harness effect, not a capability regression. When a review prompt says "only report high-severity issues," "be conservative," or "don't nitpick," Opus 4.7 follows that instruction more faithfully than earlier models did: it investigates just as thoroughly, identifies the bugs, and then declines to report findings it judges to be below the stated bar. Precision rises, but measured recall can fall even though underlying bug-finding has improved.
+
+Recommended prompt language:
+
+> *"Report every issue you find, including ones you are uncertain about or consider low-severity. Do not filter for importance or confidence at this stage — a separate verification step will do that. Your goal here is coverage: it is better to surface a finding that later gets filtered out than to silently drop a bug. For each finding, include your confidence level and an estimated severity so a downstream filter can rank them."*
+
+This can be used without an actual second step, but moving confidence filtering out of the finding step often helps. If the harness has a separate verification/dedup/ranking stage, tell the model explicitly that its job at the finding stage is coverage, not filtering. If single-pass self-filtering is wanted, be concrete about the bar rather than using qualitative terms like "important" — e.g. *"report any bugs that could cause incorrect behavior, a test failure, or a misleading result; only omit nits like pure style or naming preferences."* Iterate on prompts against a subset of evals to validate recall or F1 gains.
+
+**Computer use.** Computer use works across resolutions up to the new 2576px / 3.75MP maximum. Sending images at **1080p** provides a good balance of performance and cost. For particularly cost-sensitive workloads, **720p** or **1366×768** are lower-cost options with strong performance. Test to find the ideal settings for the use case; experimenting with `effort` can also help tune behavior.
+
+---
+
+## Opus 4.7 Migration Checklist
+
+Every item is tagged: **`[BLOCKS]`** items cause a 400 error, infinite loop, silent truncation, or empty output if missed — apply these as code edits, not as suggestions. **`[TUNE]`** items are quality/cost adjustments — surface them to the user as recommendations.
+
+`[BLOCKS]` items prefixed with **"If…"** or **"At…"** are conditional. Before working through the list, **scan the file** for the conditions: does it surface thinking text to a UI/log? Does it set `output_config.effort` to `"x-high"` or `"max"`? Is it a security workload? Is it a multi-turn agentic loop? Apply only the items whose condition matches.
+
+- [ ] **[BLOCKS]** Replace `thinking: {type: "enabled", budget_tokens: N}` with `thinking: {type: "adaptive"}` + `output_config.effort`; delete `budget_tokens` plumbing entirely
+- [ ] **[BLOCKS]** Strip `temperature`, `top_p`, `top_k` from request construction
+- [ ] **[BLOCKS]** If thinking content is surfaced to users or stored in logs: add `thinking.display: "summarized"` (otherwise the rendered text is empty)
+- [ ] **[BLOCKS]** At `output_config.effort` of `xhigh` or `max`: set `max_tokens` ≥ 64000 (otherwise output truncates mid-thought)
+- [ ] **[TUNE]** Give `max_tokens` and compaction triggers extra headroom; re-run `count_tokens()` against `claude-opus-4-7` on representative prompts to re-baseline (no blanket multiplier)
+- [ ] **[TUNE]** Re-baseline cost and rate-limit dashboards *before* reacting to measured shifts
+- [ ] **[TUNE]** Re-evaluate `effort` per route — use `xhigh` for coding/agentic and a minimum of `high` for most intelligence-sensitive work; it matters more on 4.7 than any prior Opus
+- [ ] **[TUNE]** Multi-turn agentic loops: adopt the API-native Task Budgets (`output_config.task_budget`, beta `task-budgets-2026-03-13`, minimum 20k tokens) — this is for capping *cumulative* spend across a loop; per-turn depth is `effort`
+- [ ] **[TUNE]** Check for ambiguous or underspecified instructions that relied on 4.6 generalizing intent, and update them to be clearer or more precise — 4.7 follows them literally
+- [ ] **[TUNE]** Tool-use workloads: add explicit when/how-to-use guidance to tool descriptions (4.7 reaches for tools less often)
+- [ ] **[TUNE]** Verbosity: test existing length instructions before changing them — 4.7 calibrates length to task complexity, so tune for the desired output rather than assuming a direction
+- [ ] **[TUNE]** Remove forced-progress-update scaffolding (*"after every N tool calls…"*)
+- [ ] **[TUNE]** Remove knowledge-work verification scaffolding (*"double-check the slide layout…"*) and re-baseline
+- [ ] **[TUNE]** Add tone instruction if a warmer / more conversational voice is needed; re-evaluate style prompts on writing-heavy routes
+- [ ] **[TUNE]** Subagent tool present: add explicit spawn / don't-spawn guidance
+- [ ] **[TUNE]** Frontend/design output: specify a concrete palette/typeface, or have the model propose 4 visual directions before building (the default cream/serif house style is persistent)
+- [ ] **[TUNE]** Interactive coding products: use `effort: "xhigh"` or `"high"`, add autonomous features (e.g. an auto mode) to reduce human interactions, and specify task/intent/constraints upfront in the first turn
+- [ ] **[TUNE]** Code-review harnesses: remove or loosen "only report high-severity" / "be conservative" filters and have the model report every finding with confidence + severity; move filtering to a downstream step (4.7 follows severity filters more literally, which can depress measured recall)
+- [ ] **[TUNE]** Vision-heavy pipelines (screenshots, charts, document understanding): leave images at native resolution up to 2576px long edge for the accuracy gain; remove any scale-factor math from coordinate handling (coords are now 1:1 with pixels). No beta header / opt-in needed — high-res is automatic on Opus 4.7.
+- [ ] **[TUNE]** Computer-use pipelines: send screenshots at 1080p for a good performance/cost balance (720p or 1366×768 for cost-sensitive workloads); experiment with `effort` to tune behavior
+- [ ] **[TUNE]** Cost-sensitive image pipelines: full-res images on 4.7 use up to ~4784 tokens vs ~1,600 on prior models (~3×). Downsampling client-side before upload avoids the increase, but **do not downsample by default** — if you're unsure whether fidelity is needed, ask the user. Re-baseline with `count_tokens()` on representative images before reacting to cost shifts.
+
+---
+
+## Verify the Migration
+
+After updating, spot-check that the new model is actually being used. Replace `YOUR_TARGET_MODEL` with the model string you migrated to (e.g. `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-6`, `claude-haiku-4-5`) and keep the assertion prefix in sync:
+
+```python
+YOUR_TARGET_MODEL = "claude-opus-4-7"  # or "claude-opus-4-6", "claude-sonnet-4-6", "claude-haiku-4-5"
+response = client.messages.create(model=YOUR_TARGET_MODEL, max_tokens=64, messages=[...])
+assert response.model.startswith(YOUR_TARGET_MODEL), response.model
+```
+
+For rate-limit headroom changes, pricing, or capability deltas (vision, structured outputs, effort support), query the Models API:
+
+```python
+m = client.models.retrieve(YOUR_TARGET_MODEL)
+m.max_input_tokens, m.max_tokens
+m.capabilities["effort"]["max"]["supported"]
+```
+
+See `shared/models.md` for the full capability lookup pattern.
--- a/skills/claude-api/shared/models.md
+++ b/skills/claude-api/shared/models.md
@@ -7,9 +7,9 @@
 For **live** capability data — context window, max output tokens, feature support (thinking, vision, effort, structured outputs, etc.) — query the Models API instead of relying on the cached tables below. Use this when the user asks "what's the context window for X", "does model X support vision/thinking/effort", "which models support feature Y", or wants to select a model by capability at runtime.

 ```python
-m = client.models.retrieve("claude-opus-4-6")
-m.id                 # "claude-opus-4-6"
-m.display_name       # "Claude Opus 4.6"
+m = client.models.retrieve("claude-opus-4-7")
+m.id                 # "claude-opus-4-7"
+m.display_name       # "Claude Opus 4.7"
 m.max_input_tokens   # context window (int)
 m.max_tokens         # max output tokens (int)

@@ -32,21 +32,21 @@ Top-level fields (`id`, `display_name`, `max_input_tokens`, `max_tokens`) are ty
 ### Raw HTTP

 ```bash
-curl https://api.anthropic.com/v1/models/claude-opus-4-6 \
+curl https://api.anthropic.com/v1/models/claude-opus-4-7 \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01"
 ```

 ```json
 {
-  "id": "claude-opus-4-6",
-  "display_name": "Claude Opus 4.6",
-  "max_input_tokens": 1000000,
+  "id": "claude-opus-4-7",
+  "display_name": "Claude Opus 4.7",
+  "max_input_tokens": 200000,
  "max_tokens": 128000,
  "capabilities": {
    "image_input": {"supported": true},
    "structured_outputs": {"supported": true},
-    "thinking": {"supported": true, "types": {"enabled": {"supported": true}, "adaptive": {"supported": true}}},
+    "thinking": {"supported": true, "types": {"enabled": {"supported": false}, "adaptive": {"supported": true}}},
    "effort": {"supported": true, "low": {"supported": true}, …, "max": {"supported": true}},
    …
  }
@@ -57,14 +57,15 @@ curl https://api.anthropic.com/v1/models/claude-opus-4-6 \

 | Friendly Name     | Alias (use this)    | Full ID                       | Context        | Max Output | Status |
 |-------------------|---------------------|-------------------------------|----------------|------------|--------|
-| Claude Opus 4.6   | `claude-opus-4-6`   | —                             | 200K (1M beta) | 128K       | Active |
-| Claude Sonnet 4.6 | `claude-sonnet-4-6` | -                             | 200K (1M beta) | 64K        | Active |
+| Claude Opus 4.7   | `claude-opus-4-7`   | —                             | 1M             | 128K       | Active |
+| Claude Opus 4.6   | `claude-opus-4-6`   | —                             | 1M             | 128K       | Active |
+| Claude Sonnet 4.6 | `claude-sonnet-4-6` | -                             | 1M             | 64K        | Active |
 | Claude Haiku 4.5  | `claude-haiku-4-5`  | `claude-haiku-4-5-20251001`   | 200K           | 64K        | Active |

 ### Model Descriptions
-
- **Claude Opus 4.6** — Our most intelligent model for building agents and coding. Supports adaptive thinking (recommended), 128K max output tokens (requires streaming for large outputs). 1M context window available in beta via `context-1m-2025-08-07` header.
- **Claude Sonnet 4.6** — Our best combination of speed and intelligence. Supports adaptive thinking (recommended). 1M context window available in beta via `context-1m-2025-08-07` header. 64K max output tokens.
+- **Claude Opus 4.7** — The most capable Claude model to date — highly autonomous, strong on long-horizon agentic work, knowledge work, vision, and memory. Adaptive thinking only; sampling parameters and `budget_tokens` are removed. 1M context window at standard API pricing (no long-context premium) — see `shared/model-migration.md` → Migrating to Opus 4.7 for breaking changes.
+- **Claude Opus 4.6** — Previous-generation Opus. Supports adaptive thinking (recommended), 128K max output tokens (requires streaming for large outputs). 1M context window.
+- **Claude Sonnet 4.6** — Our best combination of speed and intelligence. Supports adaptive thinking (recommended). 1M context window. 64K max output tokens.
 - **Claude Haiku 4.5** — Fastest and most cost-effective model for simple tasks.

 ## Legacy Models (still active)
@@ -102,7 +103,8 @@ When a user asks for a model by name, use this table to find the correct model I

 | User says...                              | Use this model ID              |
 |-------------------------------------------|--------------------------------|
-| "opus", "most powerful"                   | `claude-opus-4-6`              |
+| "opus", "most powerful"                   | `claude-opus-4-7`              |
+| "opus 4.7"                                | `claude-opus-4-7`              |
 | "opus 4.6"                                | `claude-opus-4-6`              |
 | "opus 4.5"                                | `claude-opus-4-5`              |
 | "opus 4.1"                                | `claude-opus-4-1`              |
--- a/skills/claude-api/shared/prompt-caching.md
+++ b/skills/claude-api/shared/prompt-caching.md
@@ -111,11 +111,11 @@ Fix by moving the dynamic piece after the last breakpoint, making it determinist

 | Model | Minimum |
 |---|---:|
-| Opus 4.6, Opus 4.5, Haiku 4.5 | 4096 tokens |
+| Opus 4.7, Opus 4.6, Opus 4.5, Haiku 4.5 | 4096 tokens |
 | Sonnet 4.6, Haiku 3.5, Haiku 3 | 2048 tokens |
 | Sonnet 4.5, Sonnet 4.1, Sonnet 4, Sonnet 3.7 | 1024 tokens |

-A 3K-token prompt caches on Sonnet 4.5 but silently won't on Opus 4.6.
+A 3K-token prompt caches on Sonnet 4.5 but silently won't on Opus 4.7.

 **Economics:** Cache reads cost ~0.1× base input price. Cache writes cost **1.25× for 5-minute TTL, 2× for 1-hour TTL**. Break-even depends on TTL: with 5-minute TTL, two requests break even (1.25× + 0.1× = 1.35× vs 2× uncached); with 1-hour TTL, you need at least three requests (2× + 0.2× = 2.2× vs 3× uncached). The 1-hour TTL keeps entries alive across gaps in bursty traffic, but the doubled write cost means it needs more reads to pay off.

--- a/skills/claude-api/shared/tool-use-concepts.md
+++ b/skills/claude-api/shared/tool-use-concepts.md
@@ -74,7 +74,7 @@ if response.stop_reason == "pause_turn":
    ]
    # Make another API request — server resumes automatically
    response = client.messages.create(
-        model="claude-opus-4-6", messages=messages, tools=tools
+        model="claude-opus-4-7", messages=messages, tools=tools
    )
 ```

@@ -171,7 +171,7 @@ Web search and web fetch let Claude search the web and retrieve page content. Th
 ]
 ```

-### Dynamic Filtering (Opus 4.6 / Sonnet 4.6)
+### Dynamic Filtering (Opus 4.7 / Opus 4.6 / Sonnet 4.6)

 The `web_search_20260209` and `web_fetch_20260209` versions support **dynamic filtering** — Claude writes and executes code to filter search results before they reach the context window, improving accuracy and token efficiency. Dynamic filtering is built into these tool versions and activates automatically; you do not need to separately declare the `code_execution` tool or pass any beta header.

@@ -280,7 +280,7 @@ Two features are available:
 - **JSON outputs** (`output_config.format`): Control Claude's response format
 - **Strict tool use** (`strict: true`): Guarantee valid tool parameter schemas

-**Supported models:** Claude Opus 4.6, Claude Sonnet 4.6, and Claude Haiku 4.5. Legacy models (Claude Opus 4.5, Claude Opus 4.1) also support structured outputs.
+**Supported models:** Claude Opus 4.7, Claude Sonnet 4.6, and Claude Haiku 4.5. Legacy models (Claude Opus 4.5, Claude Opus 4.1) also support structured outputs.

 > **Recommended:** Use `client.messages.parse()` which automatically validates responses against your schema. When using `messages.create()` directly, use `output_config: {format: {...}}`. The `output_format` convenience parameter is also accepted by some SDK methods (e.g., `.parse()`), but `output_config.format` is the canonical API-level parameter.

--- a/skills/claude-api/typescript/claude-api/README.md
+++ b/skills/claude-api/typescript/claude-api/README.md
@@ -24,7 +24,7 @@ const client = new Anthropic({ apiKey: "your-api-key" });

 ```typescript
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  messages: [{ role: "user", content: "What is the capital of France?" }],
 });
@@ -43,7 +43,7 @@ for (const block of response.content) {

 ```typescript
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  system:
    "You are a helpful coding assistant. Always provide examples in Python.",
@@ -59,7 +59,7 @@ const response = await client.messages.create({

 ```typescript
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  messages: [
    {
@@ -84,7 +84,7 @@ import fs from "fs";
 const imageData = fs.readFileSync("image.png").toString("base64");

 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  messages: [
    {
@@ -113,7 +113,7 @@ Use top-level `cache_control` to automatically cache the last cacheable block in

 ```typescript
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  cache_control: { type: "ephemeral" }, // auto-caches the last cacheable block
  system: "You are an expert on this large document...",
@@ -127,7 +127,7 @@ For fine-grained control, add `cache_control` to specific content blocks:

 ```typescript
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  system: [
    {
@@ -141,7 +141,7 @@ const response = await client.messages.create({

 // With explicit TTL (time-to-live)
 const response2 = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  system: [
    {
@@ -168,13 +168,13 @@ If `cache_read_input_tokens` is zero across repeated identical-prefix requests,

 ## Extended Thinking

-> **Opus 4.6 and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is deprecated on both Opus 4.6 and Sonnet 4.6.
+> **Opus 4.7, Opus 4.6, and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is removed on Opus 4.7 (400 if sent); deprecated on Opus 4.6 and Sonnet 4.6.
 > **Older models:** Use `thinking: {type: "enabled", budget_tokens: N}` (must be < `max_tokens`, min 1024).

 ```typescript
-// Opus 4.6: adaptive thinking (recommended)
+// Opus 4.7 / 4.6: adaptive thinking (recommended)
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  thinking: { type: "adaptive" },
  output_config: { effort: "high" }, // low | medium | high | max
@@ -232,7 +232,7 @@ const messages: Anthropic.MessageParam[] = [
 ];

 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  messages: messages,
 });
@@ -248,7 +248,7 @@ const response = await client.messages.create({

 ### Compaction (long conversations)

-> **Beta, Opus 4.6 and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.
+> **Beta, Opus 4.7, Opus 4.6, and Sonnet 4.6.** When conversations approach the 200K context window, compaction automatically summarizes earlier context server-side. The API returns a `compaction` block; you must pass it back on subsequent requests — append `response.content`, not just the text.

 ```typescript
 import Anthropic from "@anthropic-ai/sdk";
@@ -261,7 +261,7 @@ async function chat(userMessage: string): Promise<string> {

  const response = await client.beta.messages.create({
    betas: ["compact-2026-01-12"],
-    model: "claude-opus-4-6",
+    model: "claude-opus-4-7",
    max_tokens: 16000,
    messages,
    context_management: {
@@ -308,7 +308,7 @@ The `stop_reason` field in the response indicates why the model stopped generati
 ```typescript
 // Automatic caching (simplest — caches the last cacheable block)
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  cache_control: { type: "ephemeral" },
  system: largeDocumentText, // e.g., 50KB of context
@@ -323,7 +323,7 @@ const response = await client.messages.create({

 ```typescript
 const countResponse = await client.messages.countTokens({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  messages: messages,
  system: system,
 });
--- a/skills/claude-api/typescript/claude-api/batches.md
+++ b/skills/claude-api/typescript/claude-api/batches.md
@@ -24,7 +24,7 @@ const messageBatch = await client.messages.batches.create({
    {
      custom_id: "request-1",
      params: {
-        model: "claude-opus-4-6",
+        model: "claude-opus-4-7",
        max_tokens: 16000,
        messages: [
          { role: "user", content: "Summarize climate change impacts" },
@@ -34,7 +34,7 @@ const messageBatch = await client.messages.batches.create({
    {
      custom_id: "request-2",
      params: {
-        model: "claude-opus-4-6",
+        model: "claude-opus-4-7",
        max_tokens: 16000,
        messages: [
          { role: "user", content: "Explain quantum computing basics" },
--- a/skills/claude-api/typescript/claude-api/files-api.md
+++ b/skills/claude-api/typescript/claude-api/files-api.md
@@ -41,7 +41,7 @@ console.log(`Size: ${uploaded.size_bytes} bytes`);

 ```typescript
 const response = await client.beta.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  messages: [
    {
--- a/skills/claude-api/typescript/claude-api/streaming.md
+++ b/skills/claude-api/typescript/claude-api/streaming.md
@@ -4,7 +4,7 @@

 ```typescript
 const stream = client.messages.stream({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 64000,
  messages: [{ role: "user", content: "Write a story" }],
 });
@@ -23,11 +23,11 @@ for await (const event of stream) {

 ## Handling Different Content Types

-> **Opus 4.6:** Use `thinking: {type: "adaptive"}`. On older models, use `thinking: {type: "enabled", budget_tokens: N}` instead.
+> **Opus 4.7 / Opus 4.6:** Use `thinking: {type: "adaptive"}`. On older models, use `thinking: {type: "enabled", budget_tokens: N}` instead.

 ```typescript
 const stream = client.messages.stream({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 64000,
  thinking: { type: "adaptive" },
  messages: [{ role: "user", content: "Analyze this problem" }],
@@ -82,7 +82,7 @@ const getWeather = betaZodTool({
 });

 const runner = client.beta.messages.toolRunner({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 64000,
  tools: [getWeather],
  messages: [
@@ -117,7 +117,7 @@ for await (const messageStream of runner) {

 ```typescript
 const stream = client.messages.stream({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 64000,
  messages: [{ role: "user", content: "Hello" }],
 });
--- a/skills/claude-api/typescript/claude-api/tool-use.md
+++ b/skills/claude-api/typescript/claude-api/tool-use.md
@@ -30,7 +30,7 @@ const getWeather = betaZodTool({

 // The tool runner handles the agentic loop and returns the final message
 const finalMessage = await client.beta.messages.toolRunner({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  tools: [getWeather],
  messages: [{ role: "user", content: "What's the weather in Paris?" }],
@@ -61,7 +61,7 @@ let messages: Anthropic.MessageParam[] = [{ role: "user", content: userInput }];

 while (true) {
  const response = await client.messages.create({
-    model: "claude-opus-4-6",
+    model: "claude-opus-4-7",
    max_tokens: 16000,
    tools: tools,
    messages: messages,
@@ -108,7 +108,7 @@ let messages: Anthropic.MessageParam[] = [{ role: "user", content: userInput }];

 while (true) {
  const stream = client.messages.stream({
-    model: "claude-opus-4-6",
+    model: "claude-opus-4-7",
    max_tokens: 64000,
    tools,
    messages,
@@ -163,7 +163,7 @@ while (true) {

 ```typescript
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  tools: tools,
  messages: [{ role: "user", content: "What's the weather in Paris?" }],
@@ -174,7 +174,7 @@ for (const block of response.content) {
    const result = await executeTool(block.name, block.input);

    const followup = await client.messages.create({
-      model: "claude-opus-4-6",
+      model: "claude-opus-4-7",
      max_tokens: 16000,
      tools: tools,
      messages: [
@@ -198,7 +198,7 @@ for (const block of response.content) {

 ```typescript
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  tools: tools,
  tool_choice: { type: "tool", name: "get_weather" },
@@ -217,7 +217,7 @@ Version-suffixed `type` literals; `name` is fixed per interface. Pass plain obje
 ```typescript
 // ✓ let inference work — no annotation
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  tools: [
    { type: "text_editor_20250728", name: "str_replace_based_edit_tool" },
@@ -257,7 +257,7 @@ import Anthropic from "@anthropic-ai/sdk";
 const client = new Anthropic();

 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  messages: [
    {
@@ -305,7 +305,7 @@ const uploaded = await client.beta.files.upload({
 // Code execution is GA; Files API is still beta (pass via RequestOptions)
 const response = await client.messages.create(
  {
-    model: "claude-opus-4-6",
+    model: "claude-opus-4-7",
    max_tokens: 16000,
    messages: [
      {
@@ -365,7 +365,7 @@ for (const block of response.content) {
 ```typescript
 // First request: set up environment
 const response1 = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  messages: [
    {
@@ -382,7 +382,7 @@ const containerId = response1.container!.id;

 const response2 = await client.messages.create({
  container: containerId,
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  messages: [
    {
@@ -402,7 +402,7 @@ const response2 = await client.messages.create({

 ```typescript
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  messages: [
    {
@@ -436,7 +436,7 @@ const handlers: MemoryToolHandlers = {
 const memory = betaMemoryTool(handlers);

 const runner = client.beta.messages.toolRunner({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  tools: [memory],
  messages: [{ role: "user", content: "Remember my preferences" }],
@@ -473,7 +473,7 @@ const ContactInfoSchema = z.object({
 const client = new Anthropic();

 const response = await client.messages.parse({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  messages: [
    {
@@ -495,7 +495,7 @@ console.log(response.parsed_output!.name); // "Jane Doe"

 ```typescript
 const response = await client.messages.create({
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  max_tokens: 16000,
  messages: [
    {
--- a/skills/claude-api/typescript/managed-agents/README.md
+++ b/skills/claude-api/typescript/managed-agents/README.md
@@ -52,7 +52,7 @@ console.log(environment.id); // env_...
 const agent = await client.beta.agents.create(
  {
    name: "Coding Assistant",
-    model: "claude-opus-4-6",
+    model: "claude-opus-4-7",
    tools: [{ type: "agent_toolset_20260401", default_config: { enabled: true } }],
  },
 );
@@ -73,7 +73,7 @@ console.log(session.id, session.status);
 const agent = await client.beta.agents.create(
  {
    name: "Code Reviewer",
-    model: "claude-opus-4-6",
+    model: "claude-opus-4-7",
    system: "You are a senior code reviewer.",
    tools: [
      { type: "agent_toolset_20260401", default_config: { enabled: true } },
@@ -297,7 +297,8 @@ import fs from "fs";

 // List files associated with a session
 const files = await client.beta.files.list({
-  scope: session.id,
+  scope_id: session.id,
+  betas: ["managed-agents-2026-04-01"],
 });
 for (const f of files.data) {
  console.log(f.filename, f.size_bytes);
@@ -317,17 +318,17 @@ for (const f of files.data) {

 ```typescript
 // Get session details
-const session = await client.beta.sessions.retrieve("sess_abc123");
+const session = await client.beta.sessions.retrieve("sesn_011CZxAbc123Def456");
 console.log(session.status, session.usage);

 // List sessions
 const sessions = await client.beta.sessions.list();

 // Delete a session
-await client.beta.sessions.delete("sess_abc123");
+await client.beta.sessions.delete("sesn_011CZxAbc123Def456");

 // Archive a session
-await client.beta.sessions.archive("sess_abc123");
+await client.beta.sessions.archive("sesn_011CZxAbc123Def456");
 ```

 ---
@@ -338,7 +339,7 @@ await client.beta.sessions.archive("sess_abc123");
 // Agent declares MCP server (no auth here — auth goes in a vault)
 const agent = await client.beta.agents.create({
  name: "MCP Agent",
-  model: "claude-opus-4-6",
+  model: "claude-opus-4-7",
  mcp_servers: [
    { type: "url", name: "my-tools", url: "https://my-mcp-server.example.com/sse" },
  ],