chore: update claude-api skill [auto-sync] (#730)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This commit is contained in:
cc-skill-sync[bot]
2026-03-25 11:10:46 -04:00
committed by GitHub
parent 887114fd09
commit 98669c11ca
13 changed files with 390 additions and 14 deletions

View File

@@ -61,7 +61,7 @@ Before reading code examples, determine which language the user is working in:
| Ruby | Yes (beta) | No | `BaseTool` + `tool_runner` in beta | | Ruby | Yes (beta) | No | `BaseTool` + `tool_runner` in beta |
| cURL | N/A | N/A | Raw HTTP, no SDK features | | cURL | N/A | N/A | Raw HTTP, no SDK features |
| C# | No | No | Official SDK | | C# | No | No | Official SDK |
| PHP | No | No | Official SDK | | PHP | Yes (beta) | No | `BetaRunnableTool` + `toolRunner()` |
--- ---
@@ -170,6 +170,18 @@ See `{lang}/claude-api/README.md` (Compaction section) for code examples. Full d
--- ---
## Prompt Caching (Quick Reference)
**Prefix match.** Any byte change anywhere in the prefix invalidates everything after it. Render order is `tools``system``messages`. Keep stable content first (frozen system prompt, deterministic tool list), put volatile content (timestamps, per-request IDs, varying questions) after the last `cache_control` breakpoint.
**Top-level auto-caching** (`cache_control: {type: "ephemeral"}` on `messages.create()`) is the simplest option when you don't need fine-grained placement. Max 4 breakpoints per request. Minimum cacheable prefix is ~1024 tokens — shorter prefixes silently won't cache.
**Verify with `usage.cache_read_input_tokens`** — if it's zero across repeated requests, a silent invalidator is at work (`datetime.now()` in system prompt, unsorted JSON, varying tool set).
For placement patterns, architectural guidance, and the silent-invalidator audit checklist: read `shared/prompt-caching.md`. Language-specific syntax: `{lang}/claude-api/README.md` (Prompt Caching section).
---
## Reading Guide ## Reading Guide
After detecting the language, read the relevant files based on what the user needs: After detecting the language, read the relevant files based on what the user needs:
@@ -185,6 +197,9 @@ After detecting the language, read the relevant files based on what the user nee
**Long-running conversations (may exceed context window):** **Long-running conversations (may exceed context window):**
→ Read `{lang}/claude-api/README.md` — see Compaction section → Read `{lang}/claude-api/README.md` — see Compaction section
**Prompt caching / optimize caching / "why is my cache hit rate low":**
→ Read `shared/prompt-caching.md` + `{lang}/claude-api/README.md` (Prompt Caching section)
**Function calling / tool use / agents:** **Function calling / tool use / agents:**
→ Read `{lang}/claude-api/README.md` + `shared/tool-use-concepts.md` + `{lang}/claude-api/tool-use.md` → Read `{lang}/claude-api/README.md` + `shared/tool-use-concepts.md` + `{lang}/claude-api/tool-use.md`
@@ -207,8 +222,9 @@ Read the **language-specific Claude API folder** (`{language}/claude-api/`):
4. **`{language}/claude-api/streaming.md`** — Read when building chat UIs or interfaces that display responses incrementally. 4. **`{language}/claude-api/streaming.md`** — Read when building chat UIs or interfaces that display responses incrementally.
5. **`{language}/claude-api/batches.md`** — Read when processing many requests offline (not latency-sensitive). Runs asynchronously at 50% cost. 5. **`{language}/claude-api/batches.md`** — Read when processing many requests offline (not latency-sensitive). Runs asynchronously at 50% cost.
6. **`{language}/claude-api/files-api.md`** — Read when sending the same file across multiple requests without re-uploading. 6. **`{language}/claude-api/files-api.md`** — Read when sending the same file across multiple requests without re-uploading.
7. **`shared/error-codes.md`** — Read when debugging HTTP errors or implementing error handling. 7. **`shared/prompt-caching.md`** — Read when adding or optimizing prompt caching. Covers prefix-stability design, breakpoint placement, and anti-patterns that silently invalidate cache.
8. **`shared/live-sources.md`** — WebFetch URLs for fetching the latest official documentation. 8. **`shared/error-codes.md`** — Read when debugging HTTP errors or implementing error handling.
9. **`shared/live-sources.md`** — WebFetch URLs for fetching the latest official documentation.
> **Note:** For Java, Go, Ruby, C#, PHP, and cURL — these have a single file each covering all basics. Read that file plus `shared/tool-use-concepts.md` and `shared/error-codes.md` as needed. > **Note:** For Java, Go, Ruby, C#, PHP, and cURL — these have a single file each covering all basics. Read that file plus `shared/tool-use-concepts.md` and `shared/error-codes.md` as needed.

View File

@@ -215,7 +215,7 @@ List<MessageParam> followUpMessages =
## Context Editing / Compaction (Beta) ## Context Editing / Compaction (Beta)
**Beta-namespace prefix is inconsistent** (source-verified against `src/Anthropic/Models/Beta/Messages/*.cs` @ 12.8.0). No prefix: `MessageCreateParams`, `MessageCountTokensParams`, `Role`. **Everything else has the `Beta` prefix**: `BetaMessageParam`, `BetaMessage`, `BetaContentBlock`, `BetaToolUseBlock`, all block param types. The unprefixed `Role` WILL collide with `Anthropic.Models.Messages.Role` if you import both namespaces (CS0104). Safest: import only Beta; if mixing, alias the beta `Role`: **Beta-namespace prefix is inconsistent** (source-verified against `src/Anthropic/Models/Beta/Messages/*.cs` @ 12.9.0). No prefix: `MessageCreateParams`, `MessageCountTokensParams`, `Role`. **Everything else has the `Beta` prefix**: `BetaMessageParam`, `BetaMessage`, `BetaContentBlock`, `BetaToolUseBlock`, all block param types. The unprefixed `Role` WILL collide with `Anthropic.Models.Messages.Role` if you import both namespaces (CS0104). Safest: import only Beta; if mixing, alias the beta `Role`:
```csharp ```csharp
using Anthropic.Models.Beta.Messages; using Anthropic.Models.Beta.Messages;
@@ -299,7 +299,7 @@ Values: `Effort.Low`, `Effort.Medium`, `Effort.High`, `Effort.Max`. Combine with
## Prompt Caching ## Prompt Caching
`System` takes `MessageCreateParamsSystem?` — a union of `string` or `List<TextBlockParam>`. There is no `SystemTextBlockParam`; use plain `TextBlockParam`. The implicit conversion needs the concrete `List<TextBlockParam>` type (array literals won't convert). `System` takes `MessageCreateParamsSystem?` — a union of `string` or `List<TextBlockParam>`. There is no `SystemTextBlockParam`; use plain `TextBlockParam`. The implicit conversion needs the concrete `List<TextBlockParam>` type (array literals won't convert). For placement patterns and the silent-invalidator audit checklist, see `shared/prompt-caching.md`.
```csharp ```csharp
System = new List<TextBlockParam> { System = new List<TextBlockParam> {
@@ -312,6 +312,8 @@ System = new List<TextBlockParam> {
Optional `Ttl` on `CacheControlEphemeral`: `new() { Ttl = Ttl.Ttl1h }` or `Ttl.Ttl5m`. `CacheControl` also exists on `Tool.CacheControl` and top-level `MessageCreateParams.CacheControl`. Optional `Ttl` on `CacheControlEphemeral`: `new() { Ttl = Ttl.Ttl1h }` or `Ttl.Ttl5m`. `CacheControl` also exists on `Tool.CacheControl` and top-level `MessageCreateParams.CacheControl`.
Verify hits via `response.Usage.CacheCreationInputTokens` / `response.Usage.CacheReadInputTokens`.
--- ---
## Token Counting ## Token Counting

View File

@@ -157,6 +157,29 @@ curl https://api.anthropic.com/v1/messages \
--- ---
## Prompt Caching
Put `cache_control` on the last block of the stable prefix. See `shared/prompt-caching.md` for placement patterns and the silent-invalidator audit checklist.
```bash
curl https://api.anthropic.com/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-opus-4-6",
"max_tokens": 16000,
"system": [
{"type": "text", "text": "<large shared prompt...>", "cache_control": {"type": "ephemeral"}}
],
"messages": [{"role": "user", "content": "Summarize the key points"}]
}'
```
For 1-hour TTL: `"cache_control": {"type": "ephemeral", "ttl": "1h"}`. Top-level `"cache_control"` on the request body auto-places on the last cacheable block. Verify hits via the response `usage.cache_creation_input_tokens` / `usage.cache_read_input_tokens` fields.
---
## Extended Thinking ## Extended Thinking
> **Opus 4.6 and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is deprecated on both Opus 4.6 and Sonnet 4.6. > **Opus 4.6 and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is deprecated on both Opus 4.6 and Sonnet 4.6.

View File

@@ -315,6 +315,23 @@ To disable: `anthropic.ThinkingConfigParamUnion{OfDisabled: &anthropic.ThinkingC
--- ---
## Prompt Caching
`System` is `[]TextBlockParam`; set `CacheControl` on the last block to cache tools + system together. For placement patterns and the silent-invalidator audit checklist, see `shared/prompt-caching.md`.
```go
System: []anthropic.TextBlockParam{{
Text: longSystemPrompt,
CacheControl: anthropic.NewCacheControlEphemeralParam(), // default 5m TTL
}},
```
For 1-hour TTL: `anthropic.CacheControlEphemeralParam{TTL: anthropic.CacheControlEphemeralTTLTTL1h}`. There's also a top-level `CacheControl` on `MessageNewParams` that auto-places on the last cacheable block.
Verify hits via `resp.Usage.CacheCreationInputTokens` / `resp.Usage.CacheReadInputTokens`.
---
## Server-Side Tools ## Server-Side Tools
Version-suffixed struct names with `Param` suffix. `Name`/`Type` are `constant.*` types — zero value marshals correctly, so `{}` works. Wrap in `ToolUnionParam` with the matching `Of*` field. Version-suffixed struct names with `Param` suffix. `Name`/`Type` are `constant.*` types — zero value marshals correctly, so `{}` works. Wrap in `ToolUnionParam` with the matching `Of*` field.

View File

@@ -10,14 +10,14 @@ Maven:
<dependency> <dependency>
<groupId>com.anthropic</groupId> <groupId>com.anthropic</groupId>
<artifactId>anthropic-java</artifactId> <artifactId>anthropic-java</artifactId>
<version>2.16.1</version> <version>2.17.0</version>
</dependency> </dependency>
``` ```
Gradle: Gradle:
```groovy ```groovy
implementation("com.anthropic:anthropic-java:2.16.1") implementation("com.anthropic:anthropic-java:2.17.0")
``` ```
## Client Initialization ## Client Initialization
@@ -254,7 +254,7 @@ Combine with `Thinking = ThinkingConfigAdaptive` for cost-quality control.
## Prompt Caching ## Prompt Caching
System message as a list of `TextBlockParam` with `CacheControlEphemeral`. Use `.systemOfTextBlockParams(...)` — the plain `.system(String)` overload can't carry cache control. System message as a list of `TextBlockParam` with `CacheControlEphemeral`. Use `.systemOfTextBlockParams(...)` — the plain `.system(String)` overload can't carry cache control. For placement patterns and the silent-invalidator audit checklist, see `shared/prompt-caching.md`.
```java ```java
import com.anthropic.models.messages.TextBlockParam; import com.anthropic.models.messages.TextBlockParam;
@@ -271,6 +271,8 @@ import com.anthropic.models.messages.CacheControlEphemeral;
There's also a top-level `.cacheControl(CacheControlEphemeral)` on `MessageCreateParams.Builder` and on `Tool.builder()`. There's also a top-level `.cacheControl(CacheControlEphemeral)` on `MessageCreateParams.Builder` and on `Tool.builder()`.
Verify hits via `response.usage().cacheCreationInputTokens()` / `response.usage().cacheReadInputTokens()`.
--- ---
## Token Counting ## Token Counting

View File

@@ -1,6 +1,6 @@
# Claude API — PHP # Claude API — PHP
> **Note:** The PHP SDK is the official Anthropic SDK for PHP. Tool runner and Agent SDK are not available. Bedrock, Vertex AI, and Foundry clients are supported. > **Note:** The PHP SDK is the official Anthropic SDK for PHP. A beta tool runner is available via `$client->beta->messages->toolRunner()`. Structured output helpers are supported via `StructuredOutputModel` classes. Agent SDK is not available. Bedrock, Vertex AI, and Foundry clients are supported.
## Installation ## Installation
@@ -89,7 +89,7 @@ foreach ($message->content as $block) {
## Streaming ## Streaming
> **Requires SDK v0.5.0+.** v0.4.0 and earlier used a single `$params` array; calling with named parameters throws `Unknown named parameter $model`. Upgrade: `composer require "anthropic-ai/sdk:^0.6"` > **Requires SDK v0.5.0+.** v0.4.0 and earlier used a single `$params` array; calling with named parameters throws `Unknown named parameter $model`. Upgrade: `composer require "anthropic-ai/sdk:^0.7"`
```php ```php
use Anthropic\Messages\RawContentBlockDeltaEvent; use Anthropic\Messages\RawContentBlockDeltaEvent;
@@ -112,7 +112,49 @@ foreach ($stream as $event) {
--- ---
## Tool Use (Manual Loop) ## Tool Use
### Tool Runner (Beta)
**Beta:** The PHP SDK provides a tool runner via `$client->beta->messages->toolRunner()`. Define tools with `BetaRunnableTool` — a definition array plus a `run` closure:
```php
use Anthropic\Lib\Tools\BetaRunnableTool;
$weatherTool = new BetaRunnableTool(
definition: [
'name' => 'get_weather',
'description' => 'Get the current weather for a location.',
'input_schema' => [
'type' => 'object',
'properties' => [
'location' => ['type' => 'string', 'description' => 'City and state'],
],
'required' => ['location'],
],
],
run: function (array $input): string {
return "The weather in {$input['location']} is sunny and 72°F.";
},
);
$runner = $client->beta->messages->toolRunner(
maxTokens: 16000,
messages: [['role' => 'user', 'content' => 'What is the weather in Paris?']],
model: 'claude-opus-4-6',
tools: [$weatherTool],
);
foreach ($runner as $message) {
foreach ($message->content as $block) {
if ($block->type === 'text') {
echo $block->text;
}
}
}
```
### Manual Loop
Tools are passed as arrays. **The SDK uses camelCase keys** (`inputSchema`, `toolUseID`, `stopReason`) and auto-maps to the API's snake_case on the wire — since v0.5.0. See [shared tool use concepts](../shared/tool-use-concepts.md) for the loop pattern. Tools are passed as arrays. **The SDK uses camelCase keys** (`inputSchema`, `toolUseID`, `stopReason`) and auto-maps to the API's snake_case on the wire — since v0.5.0. See [shared tool use concepts](../shared/tool-use-concepts.md) for the loop pattern.
@@ -217,6 +259,98 @@ foreach ($message->content as $block) {
--- ---
## Prompt Caching
`system:` takes an array of text blocks; set `cacheControl` on the last block. Array-shape syntax (camelCase keys) is idiomatic. For placement patterns and the silent-invalidator audit checklist, see `shared/prompt-caching.md`.
```php
$message = $client->messages->create(
model: 'claude-opus-4-6',
maxTokens: 16000,
system: [
['type' => 'text', 'text' => $longSystemPrompt, 'cacheControl' => ['type' => 'ephemeral']],
],
messages: [['role' => 'user', 'content' => 'Summarize the key points']],
);
```
For 1-hour TTL: `'cacheControl' => ['type' => 'ephemeral', 'ttl' => '1h']`. There's also a top-level `cacheControl:` on `messages->create(...)` that auto-places on the last cacheable block.
Verify hits via `$message->usage->cacheCreationInputTokens` / `$message->usage->cacheReadInputTokens`.
---
## Structured Outputs
### Using StructuredOutputModel (Recommended)
Define a PHP class implementing `StructuredOutputModel` and pass it as `outputConfig`:
```php
use Anthropic\Lib\Contracts\StructuredOutputModel;
use Anthropic\Lib\Concerns\StructuredOutputModelTrait;
use Anthropic\Lib\Attributes\Constrained;
class Person implements StructuredOutputModel
{
use StructuredOutputModelTrait;
#[Constrained(description: 'Full name')]
public string $name;
public int $age;
public ?string $email = null; // nullable = optional field
}
$message = $client->messages->create(
model: 'claude-opus-4-6',
maxTokens: 16000,
messages: [['role' => 'user', 'content' => 'Generate a profile for Alice, age 30']],
outputConfig: ['format' => Person::class],
);
$person = $message->parsedOutput(); // Person instance
echo $person->name;
```
Types are inferred from PHP type hints. Use `#[Constrained(description: '...')]` to add descriptions. Nullable properties (`?string`) become optional fields.
### Raw Schema
```php
$message = $client->messages->create(
model: 'claude-opus-4-6',
maxTokens: 16000,
messages: [['role' => 'user', 'content' => 'Extract: John (john@co.com), Enterprise plan']],
outputConfig: [
'format' => [
'type' => 'json_schema',
'schema' => [
'type' => 'object',
'properties' => [
'name' => ['type' => 'string'],
'email' => ['type' => 'string'],
'plan' => ['type' => 'string'],
],
'required' => ['name', 'email', 'plan'],
'additionalProperties' => false,
],
],
],
);
// First text block contains valid JSON
foreach ($message->content as $block) {
if ($block->type === 'text') {
$data = json_decode($block->text, true);
break;
}
}
```
---
## Beta Features & Server-Side Tools ## Beta Features & Server-Side Tools
**`betas:` is NOT a param on `$client->messages->create()`** — it only exists on the beta namespace. Use it for features that need an explicit opt-in header: **`betas:` is NOT a param on `$client->messages->create()`** — it only exists on the beta namespace. Use it for features that need an explicit opt-in header:

View File

@@ -215,6 +215,16 @@ async for message in query(
session_id = message.data.get("session_id") # Capture for resuming later session_id = message.data.get("session_id") # Capture for resuming later
``` ```
`AssistantMessage` includes per-turn `usage` data (a dict matching the Anthropic API usage shape) for tracking costs:
```python
from claude_agent_sdk import query, ClaudeAgentOptions, AssistantMessage
async for message in query(prompt="...", options=ClaudeAgentOptions()):
if isinstance(message, AssistantMessage) and message.usage:
print(f"Input: {message.usage['input_tokens']}, Output: {message.usage['output_tokens']}")
```
Typed task message subclasses are available for better type safety when handling subagent task events: Typed task message subclasses are available for better type safety when handling subagent task events:
- `TaskStartedMessage` — emitted when a subagent task is registered - `TaskStartedMessage` — emitted when a subagent task is registered
- `TaskProgressMessage` — real-time progress updates with cumulative usage metrics - `TaskProgressMessage` — real-time progress updates with cumulative usage metrics

View File

@@ -111,7 +111,7 @@ response = client.messages.create(
## Prompt Caching ## Prompt Caching
Cache large context to reduce costs (up to 90% savings). Cache large context to reduce costs (up to 90% savings). **Caching is a prefix match** — any byte change anywhere in the prefix invalidates everything after it. For placement patterns, architectural guidance (frozen system prompt, deterministic tool order, where to put volatile content), and the silent-invalidator audit checklist, read `shared/prompt-caching.md`.
### Automatic Caching (Recommended) ### Automatic Caching (Recommended)
@@ -156,6 +156,16 @@ response = client.messages.create(
) )
``` ```
### Verifying Cache Hits
```python
print(response.usage.cache_creation_input_tokens) # tokens written to cache (~1.25x cost)
print(response.usage.cache_read_input_tokens) # tokens served from cache (~0.1x cost)
print(response.usage.input_tokens) # uncached tokens (full cost)
```
If `cache_read_input_tokens` is zero across repeated identical-prefix requests, a silent invalidator is at work — `datetime.now()` or a UUID in the system prompt, unsorted `json.dumps()`, or a varying tool set. See `shared/prompt-caching.md` for the full audit table.
--- ---
## Extended Thinking ## Extended Thinking

View File

@@ -90,3 +90,24 @@ end
### Manual Loop ### Manual Loop
See the [shared tool use concepts](../shared/tool-use-concepts.md) for the tool definition format and agentic loop pattern. See the [shared tool use concepts](../shared/tool-use-concepts.md) for the tool definition format and agentic loop pattern.
---
## Prompt Caching
`system_:` (trailing underscore — avoids shadowing `Kernel#system`) takes an array of text blocks; set `cache_control` on the last block. Plain hashes work via the `OrHash` type alias. For placement patterns and the silent-invalidator audit checklist, see `shared/prompt-caching.md`.
```ruby
message = client.messages.create(
model: :"claude-opus-4-6",
max_tokens: 16000,
system_: [
{ type: "text", text: long_system_prompt, cache_control: { type: "ephemeral" } }
],
messages: [{ role: "user", content: "Summarize the key points" }]
)
```
For 1-hour TTL: `cache_control: { type: "ephemeral", ttl: "1h" }`. There's also a top-level `cache_control:` on `messages.create` that auto-places on the last cacheable block.
Verify hits via `message.usage.cache_creation_input_tokens` / `message.usage.cache_read_input_tokens`.

View File

@@ -0,0 +1,128 @@
# Prompt Caching — Design & Optimization
This file covers how to design prompt-building code for effective caching. For language-specific syntax, see the `## Prompt Caching` section in each language's README or single-file doc.
## The one invariant everything follows from
**Prompt caching is a prefix match. Any change anywhere in the prefix invalidates everything after it.**
The cache key is derived from the exact bytes of the rendered prompt up to each `cache_control` breakpoint. A single byte difference at position N — a timestamp, a reordered JSON key, a different tool in the list — invalidates the cache for all breakpoints at positions ≥ N.
Render order is: `tools``system``messages`. A breakpoint on the last system block caches both tools and system together.
Design the prompt-building path around this constraint. Get the ordering right and most caching works for free. Get it wrong and no amount of `cache_control` markers will help.
---
## Workflow for optimizing existing code
When asked to add or optimize caching:
1. **Trace the prompt assembly path.** Find where `system`, `tools`, and `messages` are constructed. Identify every input that flows into them.
2. **Classify each input by stability:**
- Never changes → belongs early in the prompt, before any breakpoint
- Changes per-session → belongs after the global prefix, cache per-session
- Changes per-turn → belongs at the end, after the last breakpoint
- Changes per-request (timestamps, UUIDs, random IDs) → **eliminate or move to the very end**
3. **Check rendered order matches stability order.** Stable content must physically precede volatile content. If a timestamp is interpolated into the system prompt header, everything after it is uncacheable regardless of markers.
4. **Place breakpoints at stability boundaries.** See placement patterns below.
5. **Audit for silent invalidators.** See anti-patterns table.
---
## Placement patterns
### Large system prompt shared across many requests
Put a breakpoint on the last system text block. If there are tools, they render before system — the marker on the last system block caches tools + system together.
```json
"system": [
{"type": "text", "text": "<large shared prompt>", "cache_control": {"type": "ephemeral"}}
]
```
### Multi-turn conversations
Put a breakpoint on the last content block of the most-recently-appended turn. Each subsequent request reuses the entire prior conversation prefix. Earlier breakpoints remain valid read points, so hits accrue incrementally as the conversation grows.
```json
// Last content block of the last user turn
messages[-1].content[-1].cache_control = {"type": "ephemeral"}
```
### Shared prefix, varying suffix
Many requests share a large fixed preamble (few-shot examples, retrieved docs, instructions) but differ in the final question. Put the breakpoint at the end of the **shared** portion, not at the end of the whole prompt — otherwise every request writes a distinct cache entry and nothing is ever read.
```json
"messages": [{"role": "user", "content": [
{"type": "text", "text": "<shared context>", "cache_control": {"type": "ephemeral"}},
{"type": "text", "text": "<varying question>"} // no marker — differs every time
]}]
```
### Prompts that change from the beginning every time
Don't cache. If the first 1K tokens differ per request, there is no reusable prefix. Adding `cache_control` only pays the cache-write premium with zero reads. Leave it off.
---
## Architectural guidance
These are the decisions that matter more than marker placement. Fix these first.
**Keep the system prompt frozen.** Don't interpolate "current date: X", "mode: Y", "user name: Z" into the system prompt — those sit at the front of the prefix and invalidate everything downstream. Inject dynamic context as a user or assistant message later in `messages`. A message at turn 5 invalidates nothing before turn 5.
**Don't change tools or model mid-conversation.** Tools render at position 0; adding, removing, or reordering a tool invalidates the entire cache. Same for switching models (caches are model-scoped). If you need "modes", don't swap the tool set — give Claude a tool that records the mode transition, or pass the mode as message content. Serialize tools deterministically (sort by name).
**Fork operations must reuse the parent's exact prefix.** Side computations (summarization, compaction, sub-agents) often spin up a separate API call. If the fork rebuilds `system` / `tools` / `model` with any difference, it misses the parent's cache entirely. Copy the parent's `system`, `tools`, and `model` verbatim, then append fork-specific content at the end.
---
## Silent invalidators
When reviewing code, grep for these inside anything that feeds the prompt prefix:
| Pattern | Why it breaks caching |
|---|---|
| `datetime.now()` / `Date.now()` / `time.time()` in system prompt | Prefix changes every request |
| `uuid4()` / `crypto.randomUUID()` / request IDs early in content | Same — every request is unique |
| `json.dumps(d)` without `sort_keys=True` / iterating a `set` | Non-deterministic serialization → prefix bytes differ |
| f-string interpolating session/user ID into system prompt | Per-user prefix; no cross-user sharing |
| Conditional system sections (`if flag: system += ...`) | Every flag combination is a distinct prefix |
| `tools=build_tools(user)` where set varies per user | Tools render at position 0; nothing caches across users |
Fix by moving the dynamic piece after the last breakpoint, making it deterministic, or deleting it if it's not load-bearing.
---
## API reference
```json
"cache_control": {"type": "ephemeral"} // 5-minute TTL (default)
"cache_control": {"type": "ephemeral", "ttl": "1h"} // 1-hour TTL
```
- Max **4** `cache_control` breakpoints per request.
- Goes on any content block: system text blocks, tool definitions, message content blocks (`text`, `image`, `tool_use`, `tool_result`, `document`).
- Top-level `cache_control` on `messages.create()` auto-places on the last cacheable block — simplest option when you don't need fine-grained placement.
- Minimum cacheable prefix is model-dependent (typically 10242048 tokens). Shorter prefixes silently won't cache even with a marker.
**Economics:** Cache writes cost ~1.25× base input price; reads cost ~0.1×. A prefix must be used in at least two requests within TTL to break even (one writes the cache, subsequent ones read it). For bursty traffic, the 1-hour TTL keeps entries alive across gaps.
---
## Verifying cache hits
The response `usage` object reports cache activity:
| Field | Meaning |
|---|---|
| `cache_creation_input_tokens` | Tokens written to cache this request (you paid the ~1.25× write premium) |
| `cache_read_input_tokens` | Tokens served from cache this request (you paid ~0.1×) |
| `input_tokens` | Tokens processed at full price (not cached) |
If `cache_read_input_tokens` is zero across repeated requests with identical prefixes, a silent invalidator is at work — diff the rendered prompt bytes between two requests to find it.
Language-specific access: `response.usage.cache_read_input_tokens` (Python/TS/Ruby), `$message->usage->cacheReadInputTokens` (PHP), `resp.Usage.CacheReadInputTokens` (Go/C#), `.usage().cacheReadInputTokens()` (Java).

View File

@@ -6,7 +6,7 @@ This file covers the conceptual foundations of tool use with the Claude API. For
### Tool Definition Structure ### Tool Definition Structure
> **Note:** When using the Tool Runner (beta), tool schemas are generated automatically from your function signatures (Python), Zod schemas (TypeScript), annotated classes (Java), `jsonschema` struct tags (Go), or `BaseTool` subclasses (Ruby). The raw JSON schema format below is for the manual approach or SDKs without tool runner support. > **Note:** When using the Tool Runner (beta), tool schemas are generated automatically from your function signatures (Python), Zod schemas (TypeScript), annotated classes (Java), `jsonschema` struct tags (Go), or `BaseTool` subclasses (Ruby). The raw JSON schema format below is for the manual approach — including PHP's `BetaRunnableTool`, which wraps a run closure around a hand-written schema — or SDKs without tool runner support.
Each tool requires a name, description, and JSON Schema for its inputs: Each tool requires a name, description, and JSON Schema for its inputs:
@@ -59,7 +59,7 @@ Any `tool_choice` value can also include `"disable_parallel_tool_use": true` to
### Tool Runner vs Manual Loop ### Tool Runner vs Manual Loop
**Tool Runner (Recommended):** The SDK's tool runner handles the agentic loop automatically — it calls the API, detects tool use requests, executes your tool functions, feeds results back to Claude, and repeats until Claude stops calling tools. Available in Python, TypeScript, Java, Go, and Ruby SDKs (beta). The Python SDK also provides MCP conversion helpers (`anthropic.lib.tools.mcp`) to convert MCP tools, prompts, and resources for use with the tool runner — see `python/claude-api/tool-use.md` for details. **Tool Runner (Recommended):** The SDK's tool runner handles the agentic loop automatically — it calls the API, detects tool use requests, executes your tool functions, feeds results back to Claude, and repeats until Claude stops calling tools. Available in Python, TypeScript, Java, Go, Ruby, and PHP SDKs (beta). The Python SDK also provides MCP conversion helpers (`anthropic.lib.tools.mcp`) to convert MCP tools, prompts, and resources for use with the tool runner — see `python/claude-api/tool-use.md` for details.
**Manual Agentic Loop:** Use when you need fine-grained control over the loop (e.g., custom logging, conditional tool execution, human-in-the-loop approval). Loop until `stop_reason == "end_turn"`, always append the full `response.content` to preserve tool_use blocks, and ensure each `tool_result` includes the matching `tool_use_id`. **Manual Agentic Loop:** Use when you need fine-grained control over the loop (e.g., custom logging, conditional tool execution, human-in-the-loop approval). Loop until `stop_reason == "end_turn"`, always append the full `response.content` to preserve tool_use blocks, and ensure each `tool_result` includes the matching `tool_use_id`.

View File

@@ -187,6 +187,7 @@ for await (const message of query({
description: "Expert code reviewer for quality and security reviews.", description: "Expert code reviewer for quality and security reviews.",
prompt: "Analyze code quality and suggest improvements.", prompt: "Analyze code quality and suggest improvements.",
tools: ["Read", "Glob", "Grep"], tools: ["Read", "Glob", "Grep"],
// Optional: skills, mcpServers for subagent customization
}, },
}, },
}, },

View File

@@ -105,6 +105,8 @@ const response = await client.messages.create({
## Prompt Caching ## Prompt Caching
**Caching is a prefix match** — any byte change anywhere in the prefix invalidates everything after it. For placement patterns, architectural guidance (frozen system prompt, deterministic tool order, where to put volatile content), and the silent-invalidator audit checklist, read `shared/prompt-caching.md`.
### Automatic Caching (Recommended) ### Automatic Caching (Recommended)
Use top-level `cache_control` to automatically cache the last cacheable block in the request: Use top-level `cache_control` to automatically cache the last cacheable block in the request:
@@ -152,6 +154,16 @@ const response2 = await client.messages.create({
}); });
``` ```
### Verifying Cache Hits
```typescript
console.log(response.usage.cache_creation_input_tokens); // tokens written to cache (~1.25x cost)
console.log(response.usage.cache_read_input_tokens); // tokens served from cache (~0.1x cost)
console.log(response.usage.input_tokens); // uncached tokens (full cost)
```
If `cache_read_input_tokens` is zero across repeated identical-prefix requests, a silent invalidator is at work — `Date.now()` or a UUID in the system prompt, non-deterministic key ordering, or a varying tool set. See `shared/prompt-caching.md` for the full audit table.
--- ---
## Extended Thinking ## Extended Thinking