mirror of
https://github.com/anthropics/skills.git
synced 2026-03-30 13:13:29 +08:00
chore: update claude-api skill [auto-sync]
This commit is contained in:
@@ -61,7 +61,7 @@ Before reading code examples, determine which language the user is working in:
|
||||
| Ruby | Yes (beta) | No | `BaseTool` + `tool_runner` in beta |
|
||||
| cURL | N/A | N/A | Raw HTTP, no SDK features |
|
||||
| C# | No | No | Official SDK |
|
||||
| PHP | No | No | Official SDK |
|
||||
| PHP | Yes (beta) | No | `BetaRunnableTool` + `toolRunner()` |
|
||||
|
||||
---
|
||||
|
||||
@@ -170,6 +170,18 @@ See `{lang}/claude-api/README.md` (Compaction section) for code examples. Full d
|
||||
|
||||
---
|
||||
|
||||
## Prompt Caching (Quick Reference)
|
||||
|
||||
**Prefix match.** Any byte change anywhere in the prefix invalidates everything after it. Render order is `tools` → `system` → `messages`. Keep stable content first (frozen system prompt, deterministic tool list), put volatile content (timestamps, per-request IDs, varying questions) after the last `cache_control` breakpoint.
|
||||
|
||||
**Top-level auto-caching** (`cache_control: {type: "ephemeral"}` on `messages.create()`) is the simplest option when you don't need fine-grained placement. Max 4 breakpoints per request. Minimum cacheable prefix is ~1024 tokens — shorter prefixes silently won't cache.
|
||||
|
||||
**Verify with `usage.cache_read_input_tokens`** — if it's zero across repeated requests, a silent invalidator is at work (`datetime.now()` in system prompt, unsorted JSON, varying tool set).
|
||||
|
||||
For placement patterns, architectural guidance, and the silent-invalidator audit checklist: read `shared/prompt-caching.md`. Language-specific syntax: `{lang}/claude-api/README.md` (Prompt Caching section).
|
||||
|
||||
---
|
||||
|
||||
## Reading Guide
|
||||
|
||||
After detecting the language, read the relevant files based on what the user needs:
|
||||
@@ -185,6 +197,9 @@ After detecting the language, read the relevant files based on what the user nee
|
||||
**Long-running conversations (may exceed context window):**
|
||||
→ Read `{lang}/claude-api/README.md` — see Compaction section
|
||||
|
||||
**Prompt caching / optimize caching / "why is my cache hit rate low":**
|
||||
→ Read `shared/prompt-caching.md` + `{lang}/claude-api/README.md` (Prompt Caching section)
|
||||
|
||||
**Function calling / tool use / agents:**
|
||||
→ Read `{lang}/claude-api/README.md` + `shared/tool-use-concepts.md` + `{lang}/claude-api/tool-use.md`
|
||||
|
||||
@@ -207,8 +222,9 @@ Read the **language-specific Claude API folder** (`{language}/claude-api/`):
|
||||
4. **`{language}/claude-api/streaming.md`** — Read when building chat UIs or interfaces that display responses incrementally.
|
||||
5. **`{language}/claude-api/batches.md`** — Read when processing many requests offline (not latency-sensitive). Runs asynchronously at 50% cost.
|
||||
6. **`{language}/claude-api/files-api.md`** — Read when sending the same file across multiple requests without re-uploading.
|
||||
7. **`shared/error-codes.md`** — Read when debugging HTTP errors or implementing error handling.
|
||||
8. **`shared/live-sources.md`** — WebFetch URLs for fetching the latest official documentation.
|
||||
7. **`shared/prompt-caching.md`** — Read when adding or optimizing prompt caching. Covers prefix-stability design, breakpoint placement, and anti-patterns that silently invalidate cache.
|
||||
8. **`shared/error-codes.md`** — Read when debugging HTTP errors or implementing error handling.
|
||||
9. **`shared/live-sources.md`** — WebFetch URLs for fetching the latest official documentation.
|
||||
|
||||
> **Note:** For Java, Go, Ruby, C#, PHP, and cURL — these have a single file each covering all basics. Read that file plus `shared/tool-use-concepts.md` and `shared/error-codes.md` as needed.
|
||||
|
||||
|
||||
@@ -215,7 +215,7 @@ List<MessageParam> followUpMessages =
|
||||
|
||||
## Context Editing / Compaction (Beta)
|
||||
|
||||
**Beta-namespace prefix is inconsistent** (source-verified against `src/Anthropic/Models/Beta/Messages/*.cs` @ 12.8.0). No prefix: `MessageCreateParams`, `MessageCountTokensParams`, `Role`. **Everything else has the `Beta` prefix**: `BetaMessageParam`, `BetaMessage`, `BetaContentBlock`, `BetaToolUseBlock`, all block param types. The unprefixed `Role` WILL collide with `Anthropic.Models.Messages.Role` if you import both namespaces (CS0104). Safest: import only Beta; if mixing, alias the beta `Role`:
|
||||
**Beta-namespace prefix is inconsistent** (source-verified against `src/Anthropic/Models/Beta/Messages/*.cs` @ 12.9.0). No prefix: `MessageCreateParams`, `MessageCountTokensParams`, `Role`. **Everything else has the `Beta` prefix**: `BetaMessageParam`, `BetaMessage`, `BetaContentBlock`, `BetaToolUseBlock`, all block param types. The unprefixed `Role` WILL collide with `Anthropic.Models.Messages.Role` if you import both namespaces (CS0104). Safest: import only Beta; if mixing, alias the beta `Role`:
|
||||
|
||||
```csharp
|
||||
using Anthropic.Models.Beta.Messages;
|
||||
@@ -299,7 +299,7 @@ Values: `Effort.Low`, `Effort.Medium`, `Effort.High`, `Effort.Max`. Combine with
|
||||
|
||||
## Prompt Caching
|
||||
|
||||
`System` takes `MessageCreateParamsSystem?` — a union of `string` or `List<TextBlockParam>`. There is no `SystemTextBlockParam`; use plain `TextBlockParam`. The implicit conversion needs the concrete `List<TextBlockParam>` type (array literals won't convert).
|
||||
`System` takes `MessageCreateParamsSystem?` — a union of `string` or `List<TextBlockParam>`. There is no `SystemTextBlockParam`; use plain `TextBlockParam`. The implicit conversion needs the concrete `List<TextBlockParam>` type (array literals won't convert). For placement patterns and the silent-invalidator audit checklist, see `shared/prompt-caching.md`.
|
||||
|
||||
```csharp
|
||||
System = new List<TextBlockParam> {
|
||||
@@ -312,6 +312,8 @@ System = new List<TextBlockParam> {
|
||||
|
||||
Optional `Ttl` on `CacheControlEphemeral`: `new() { Ttl = Ttl.Ttl1h }` or `Ttl.Ttl5m`. `CacheControl` also exists on `Tool.CacheControl` and top-level `MessageCreateParams.CacheControl`.
|
||||
|
||||
Verify hits via `response.Usage.CacheCreationInputTokens` / `response.Usage.CacheReadInputTokens`.
|
||||
|
||||
---
|
||||
|
||||
## Token Counting
|
||||
|
||||
@@ -157,6 +157,29 @@ curl https://api.anthropic.com/v1/messages \
|
||||
|
||||
---
|
||||
|
||||
## Prompt Caching
|
||||
|
||||
Put `cache_control` on the last block of the stable prefix. See `shared/prompt-caching.md` for placement patterns and the silent-invalidator audit checklist.
|
||||
|
||||
```bash
|
||||
curl https://api.anthropic.com/v1/messages \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "x-api-key: $ANTHROPIC_API_KEY" \
|
||||
-H "anthropic-version: 2023-06-01" \
|
||||
-d '{
|
||||
"model": "claude-opus-4-6",
|
||||
"max_tokens": 16000,
|
||||
"system": [
|
||||
{"type": "text", "text": "<large shared prompt...>", "cache_control": {"type": "ephemeral"}}
|
||||
],
|
||||
"messages": [{"role": "user", "content": "Summarize the key points"}]
|
||||
}'
|
||||
```
|
||||
|
||||
For 1-hour TTL: `"cache_control": {"type": "ephemeral", "ttl": "1h"}`. Top-level `"cache_control"` on the request body auto-places on the last cacheable block. Verify hits via the response `usage.cache_creation_input_tokens` / `usage.cache_read_input_tokens` fields.
|
||||
|
||||
---
|
||||
|
||||
## Extended Thinking
|
||||
|
||||
> **Opus 4.6 and Sonnet 4.6:** Use adaptive thinking. `budget_tokens` is deprecated on both Opus 4.6 and Sonnet 4.6.
|
||||
|
||||
@@ -315,6 +315,23 @@ To disable: `anthropic.ThinkingConfigParamUnion{OfDisabled: &anthropic.ThinkingC
|
||||
|
||||
---
|
||||
|
||||
## Prompt Caching
|
||||
|
||||
`System` is `[]TextBlockParam`; set `CacheControl` on the last block to cache tools + system together. For placement patterns and the silent-invalidator audit checklist, see `shared/prompt-caching.md`.
|
||||
|
||||
```go
|
||||
System: []anthropic.TextBlockParam{{
|
||||
Text: longSystemPrompt,
|
||||
CacheControl: anthropic.NewCacheControlEphemeralParam(), // default 5m TTL
|
||||
}},
|
||||
```
|
||||
|
||||
For 1-hour TTL: `anthropic.CacheControlEphemeralParam{TTL: anthropic.CacheControlEphemeralTTLTTL1h}`. There's also a top-level `CacheControl` on `MessageNewParams` that auto-places on the last cacheable block.
|
||||
|
||||
Verify hits via `resp.Usage.CacheCreationInputTokens` / `resp.Usage.CacheReadInputTokens`.
|
||||
|
||||
---
|
||||
|
||||
## Server-Side Tools
|
||||
|
||||
Version-suffixed struct names with `Param` suffix. `Name`/`Type` are `constant.*` types — zero value marshals correctly, so `{}` works. Wrap in `ToolUnionParam` with the matching `Of*` field.
|
||||
|
||||
@@ -10,14 +10,14 @@ Maven:
|
||||
<dependency>
|
||||
<groupId>com.anthropic</groupId>
|
||||
<artifactId>anthropic-java</artifactId>
|
||||
<version>2.16.1</version>
|
||||
<version>2.17.0</version>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
Gradle:
|
||||
|
||||
```groovy
|
||||
implementation("com.anthropic:anthropic-java:2.16.1")
|
||||
implementation("com.anthropic:anthropic-java:2.17.0")
|
||||
```
|
||||
|
||||
## Client Initialization
|
||||
@@ -254,7 +254,7 @@ Combine with `Thinking = ThinkingConfigAdaptive` for cost-quality control.
|
||||
|
||||
## Prompt Caching
|
||||
|
||||
System message as a list of `TextBlockParam` with `CacheControlEphemeral`. Use `.systemOfTextBlockParams(...)` — the plain `.system(String)` overload can't carry cache control.
|
||||
System message as a list of `TextBlockParam` with `CacheControlEphemeral`. Use `.systemOfTextBlockParams(...)` — the plain `.system(String)` overload can't carry cache control. For placement patterns and the silent-invalidator audit checklist, see `shared/prompt-caching.md`.
|
||||
|
||||
```java
|
||||
import com.anthropic.models.messages.TextBlockParam;
|
||||
@@ -271,6 +271,8 @@ import com.anthropic.models.messages.CacheControlEphemeral;
|
||||
|
||||
There's also a top-level `.cacheControl(CacheControlEphemeral)` on `MessageCreateParams.Builder` and on `Tool.builder()`.
|
||||
|
||||
Verify hits via `response.usage().cacheCreationInputTokens()` / `response.usage().cacheReadInputTokens()`.
|
||||
|
||||
---
|
||||
|
||||
## Token Counting
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Claude API — PHP
|
||||
|
||||
> **Note:** The PHP SDK is the official Anthropic SDK for PHP. Tool runner and Agent SDK are not available. Bedrock, Vertex AI, and Foundry clients are supported.
|
||||
> **Note:** The PHP SDK is the official Anthropic SDK for PHP. A beta tool runner is available via `$client->beta->messages->toolRunner()`. Structured output helpers are supported via `StructuredOutputModel` classes. Agent SDK is not available. Bedrock, Vertex AI, and Foundry clients are supported.
|
||||
|
||||
## Installation
|
||||
|
||||
@@ -89,7 +89,7 @@ foreach ($message->content as $block) {
|
||||
|
||||
## Streaming
|
||||
|
||||
> **Requires SDK v0.5.0+.** v0.4.0 and earlier used a single `$params` array; calling with named parameters throws `Unknown named parameter $model`. Upgrade: `composer require "anthropic-ai/sdk:^0.6"`
|
||||
> **Requires SDK v0.5.0+.** v0.4.0 and earlier used a single `$params` array; calling with named parameters throws `Unknown named parameter $model`. Upgrade: `composer require "anthropic-ai/sdk:^0.7"`
|
||||
|
||||
```php
|
||||
use Anthropic\Messages\RawContentBlockDeltaEvent;
|
||||
@@ -112,7 +112,49 @@ foreach ($stream as $event) {
|
||||
|
||||
---
|
||||
|
||||
## Tool Use (Manual Loop)
|
||||
## Tool Use
|
||||
|
||||
### Tool Runner (Beta)
|
||||
|
||||
**Beta:** The PHP SDK provides a tool runner via `$client->beta->messages->toolRunner()`. Define tools with `BetaRunnableTool` — a definition array plus a `run` closure:
|
||||
|
||||
```php
|
||||
use Anthropic\Lib\Tools\BetaRunnableTool;
|
||||
|
||||
$weatherTool = new BetaRunnableTool(
|
||||
definition: [
|
||||
'name' => 'get_weather',
|
||||
'description' => 'Get the current weather for a location.',
|
||||
'input_schema' => [
|
||||
'type' => 'object',
|
||||
'properties' => [
|
||||
'location' => ['type' => 'string', 'description' => 'City and state'],
|
||||
],
|
||||
'required' => ['location'],
|
||||
],
|
||||
],
|
||||
run: function (array $input): string {
|
||||
return "The weather in {$input['location']} is sunny and 72°F.";
|
||||
},
|
||||
);
|
||||
|
||||
$runner = $client->beta->messages->toolRunner(
|
||||
maxTokens: 16000,
|
||||
messages: [['role' => 'user', 'content' => 'What is the weather in Paris?']],
|
||||
model: 'claude-opus-4-6',
|
||||
tools: [$weatherTool],
|
||||
);
|
||||
|
||||
foreach ($runner as $message) {
|
||||
foreach ($message->content as $block) {
|
||||
if ($block->type === 'text') {
|
||||
echo $block->text;
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Manual Loop
|
||||
|
||||
Tools are passed as arrays. **The SDK uses camelCase keys** (`inputSchema`, `toolUseID`, `stopReason`) and auto-maps to the API's snake_case on the wire — since v0.5.0. See [shared tool use concepts](../shared/tool-use-concepts.md) for the loop pattern.
|
||||
|
||||
@@ -217,6 +259,98 @@ foreach ($message->content as $block) {
|
||||
|
||||
---
|
||||
|
||||
## Prompt Caching
|
||||
|
||||
`system:` takes an array of text blocks; set `cacheControl` on the last block. Array-shape syntax (camelCase keys) is idiomatic. For placement patterns and the silent-invalidator audit checklist, see `shared/prompt-caching.md`.
|
||||
|
||||
```php
|
||||
$message = $client->messages->create(
|
||||
model: 'claude-opus-4-6',
|
||||
maxTokens: 16000,
|
||||
system: [
|
||||
['type' => 'text', 'text' => $longSystemPrompt, 'cacheControl' => ['type' => 'ephemeral']],
|
||||
],
|
||||
messages: [['role' => 'user', 'content' => 'Summarize the key points']],
|
||||
);
|
||||
```
|
||||
|
||||
For 1-hour TTL: `'cacheControl' => ['type' => 'ephemeral', 'ttl' => '1h']`. There's also a top-level `cacheControl:` on `messages->create(...)` that auto-places on the last cacheable block.
|
||||
|
||||
Verify hits via `$message->usage->cacheCreationInputTokens` / `$message->usage->cacheReadInputTokens`.
|
||||
|
||||
---
|
||||
|
||||
## Structured Outputs
|
||||
|
||||
### Using StructuredOutputModel (Recommended)
|
||||
|
||||
Define a PHP class implementing `StructuredOutputModel` and pass it as `outputConfig`:
|
||||
|
||||
```php
|
||||
use Anthropic\Lib\Contracts\StructuredOutputModel;
|
||||
use Anthropic\Lib\Concerns\StructuredOutputModelTrait;
|
||||
use Anthropic\Lib\Attributes\Constrained;
|
||||
|
||||
class Person implements StructuredOutputModel
|
||||
{
|
||||
use StructuredOutputModelTrait;
|
||||
|
||||
#[Constrained(description: 'Full name')]
|
||||
public string $name;
|
||||
|
||||
public int $age;
|
||||
|
||||
public ?string $email = null; // nullable = optional field
|
||||
}
|
||||
|
||||
$message = $client->messages->create(
|
||||
model: 'claude-opus-4-6',
|
||||
maxTokens: 16000,
|
||||
messages: [['role' => 'user', 'content' => 'Generate a profile for Alice, age 30']],
|
||||
outputConfig: ['format' => Person::class],
|
||||
);
|
||||
|
||||
$person = $message->parsedOutput(); // Person instance
|
||||
echo $person->name;
|
||||
```
|
||||
|
||||
Types are inferred from PHP type hints. Use `#[Constrained(description: '...')]` to add descriptions. Nullable properties (`?string`) become optional fields.
|
||||
|
||||
### Raw Schema
|
||||
|
||||
```php
|
||||
$message = $client->messages->create(
|
||||
model: 'claude-opus-4-6',
|
||||
maxTokens: 16000,
|
||||
messages: [['role' => 'user', 'content' => 'Extract: John (john@co.com), Enterprise plan']],
|
||||
outputConfig: [
|
||||
'format' => [
|
||||
'type' => 'json_schema',
|
||||
'schema' => [
|
||||
'type' => 'object',
|
||||
'properties' => [
|
||||
'name' => ['type' => 'string'],
|
||||
'email' => ['type' => 'string'],
|
||||
'plan' => ['type' => 'string'],
|
||||
],
|
||||
'required' => ['name', 'email', 'plan'],
|
||||
'additionalProperties' => false,
|
||||
],
|
||||
],
|
||||
],
|
||||
);
|
||||
|
||||
// First text block contains valid JSON
|
||||
foreach ($message->content as $block) {
|
||||
if ($block->type === 'text') {
|
||||
$data = json_decode($block->text, true);
|
||||
break;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Beta Features & Server-Side Tools
|
||||
|
||||
**`betas:` is NOT a param on `$client->messages->create()`** — it only exists on the beta namespace. Use it for features that need an explicit opt-in header:
|
||||
|
||||
@@ -215,6 +215,16 @@ async for message in query(
|
||||
session_id = message.data.get("session_id") # Capture for resuming later
|
||||
```
|
||||
|
||||
`AssistantMessage` includes per-turn `usage` data (a dict matching the Anthropic API usage shape) for tracking costs:
|
||||
|
||||
```python
|
||||
from claude_agent_sdk import query, ClaudeAgentOptions, AssistantMessage
|
||||
|
||||
async for message in query(prompt="...", options=ClaudeAgentOptions()):
|
||||
if isinstance(message, AssistantMessage) and message.usage:
|
||||
print(f"Input: {message.usage['input_tokens']}, Output: {message.usage['output_tokens']}")
|
||||
```
|
||||
|
||||
Typed task message subclasses are available for better type safety when handling subagent task events:
|
||||
- `TaskStartedMessage` — emitted when a subagent task is registered
|
||||
- `TaskProgressMessage` — real-time progress updates with cumulative usage metrics
|
||||
|
||||
@@ -111,7 +111,7 @@ response = client.messages.create(
|
||||
|
||||
## Prompt Caching
|
||||
|
||||
Cache large context to reduce costs (up to 90% savings).
|
||||
Cache large context to reduce costs (up to 90% savings). **Caching is a prefix match** — any byte change anywhere in the prefix invalidates everything after it. For placement patterns, architectural guidance (frozen system prompt, deterministic tool order, where to put volatile content), and the silent-invalidator audit checklist, read `shared/prompt-caching.md`.
|
||||
|
||||
### Automatic Caching (Recommended)
|
||||
|
||||
@@ -156,6 +156,16 @@ response = client.messages.create(
|
||||
)
|
||||
```
|
||||
|
||||
### Verifying Cache Hits
|
||||
|
||||
```python
|
||||
print(response.usage.cache_creation_input_tokens) # tokens written to cache (~1.25x cost)
|
||||
print(response.usage.cache_read_input_tokens) # tokens served from cache (~0.1x cost)
|
||||
print(response.usage.input_tokens) # uncached tokens (full cost)
|
||||
```
|
||||
|
||||
If `cache_read_input_tokens` is zero across repeated identical-prefix requests, a silent invalidator is at work — `datetime.now()` or a UUID in the system prompt, unsorted `json.dumps()`, or a varying tool set. See `shared/prompt-caching.md` for the full audit table.
|
||||
|
||||
---
|
||||
|
||||
## Extended Thinking
|
||||
|
||||
@@ -90,3 +90,24 @@ end
|
||||
### Manual Loop
|
||||
|
||||
See the [shared tool use concepts](../shared/tool-use-concepts.md) for the tool definition format and agentic loop pattern.
|
||||
|
||||
---
|
||||
|
||||
## Prompt Caching
|
||||
|
||||
`system_:` (trailing underscore — avoids shadowing `Kernel#system`) takes an array of text blocks; set `cache_control` on the last block. Plain hashes work via the `OrHash` type alias. For placement patterns and the silent-invalidator audit checklist, see `shared/prompt-caching.md`.
|
||||
|
||||
```ruby
|
||||
message = client.messages.create(
|
||||
model: :"claude-opus-4-6",
|
||||
max_tokens: 16000,
|
||||
system_: [
|
||||
{ type: "text", text: long_system_prompt, cache_control: { type: "ephemeral" } }
|
||||
],
|
||||
messages: [{ role: "user", content: "Summarize the key points" }]
|
||||
)
|
||||
```
|
||||
|
||||
For 1-hour TTL: `cache_control: { type: "ephemeral", ttl: "1h" }`. There's also a top-level `cache_control:` on `messages.create` that auto-places on the last cacheable block.
|
||||
|
||||
Verify hits via `message.usage.cache_creation_input_tokens` / `message.usage.cache_read_input_tokens`.
|
||||
|
||||
128
skills/claude-api/shared/prompt-caching.md
Normal file
128
skills/claude-api/shared/prompt-caching.md
Normal file
@@ -0,0 +1,128 @@
|
||||
# Prompt Caching — Design & Optimization
|
||||
|
||||
This file covers how to design prompt-building code for effective caching. For language-specific syntax, see the `## Prompt Caching` section in each language's README or single-file doc.
|
||||
|
||||
## The one invariant everything follows from
|
||||
|
||||
**Prompt caching is a prefix match. Any change anywhere in the prefix invalidates everything after it.**
|
||||
|
||||
The cache key is derived from the exact bytes of the rendered prompt up to each `cache_control` breakpoint. A single byte difference at position N — a timestamp, a reordered JSON key, a different tool in the list — invalidates the cache for all breakpoints at positions ≥ N.
|
||||
|
||||
Render order is: `tools` → `system` → `messages`. A breakpoint on the last system block caches both tools and system together.
|
||||
|
||||
Design the prompt-building path around this constraint. Get the ordering right and most caching works for free. Get it wrong and no amount of `cache_control` markers will help.
|
||||
|
||||
---
|
||||
|
||||
## Workflow for optimizing existing code
|
||||
|
||||
When asked to add or optimize caching:
|
||||
|
||||
1. **Trace the prompt assembly path.** Find where `system`, `tools`, and `messages` are constructed. Identify every input that flows into them.
|
||||
2. **Classify each input by stability:**
|
||||
- Never changes → belongs early in the prompt, before any breakpoint
|
||||
- Changes per-session → belongs after the global prefix, cache per-session
|
||||
- Changes per-turn → belongs at the end, after the last breakpoint
|
||||
- Changes per-request (timestamps, UUIDs, random IDs) → **eliminate or move to the very end**
|
||||
3. **Check rendered order matches stability order.** Stable content must physically precede volatile content. If a timestamp is interpolated into the system prompt header, everything after it is uncacheable regardless of markers.
|
||||
4. **Place breakpoints at stability boundaries.** See placement patterns below.
|
||||
5. **Audit for silent invalidators.** See anti-patterns table.
|
||||
|
||||
---
|
||||
|
||||
## Placement patterns
|
||||
|
||||
### Large system prompt shared across many requests
|
||||
|
||||
Put a breakpoint on the last system text block. If there are tools, they render before system — the marker on the last system block caches tools + system together.
|
||||
|
||||
```json
|
||||
"system": [
|
||||
{"type": "text", "text": "<large shared prompt>", "cache_control": {"type": "ephemeral"}}
|
||||
]
|
||||
```
|
||||
|
||||
### Multi-turn conversations
|
||||
|
||||
Put a breakpoint on the last content block of the most-recently-appended turn. Each subsequent request reuses the entire prior conversation prefix. Earlier breakpoints remain valid read points, so hits accrue incrementally as the conversation grows.
|
||||
|
||||
```json
|
||||
// Last content block of the last user turn
|
||||
messages[-1].content[-1].cache_control = {"type": "ephemeral"}
|
||||
```
|
||||
|
||||
### Shared prefix, varying suffix
|
||||
|
||||
Many requests share a large fixed preamble (few-shot examples, retrieved docs, instructions) but differ in the final question. Put the breakpoint at the end of the **shared** portion, not at the end of the whole prompt — otherwise every request writes a distinct cache entry and nothing is ever read.
|
||||
|
||||
```json
|
||||
"messages": [{"role": "user", "content": [
|
||||
{"type": "text", "text": "<shared context>", "cache_control": {"type": "ephemeral"}},
|
||||
{"type": "text", "text": "<varying question>"} // no marker — differs every time
|
||||
]}]
|
||||
```
|
||||
|
||||
### Prompts that change from the beginning every time
|
||||
|
||||
Don't cache. If the first 1K tokens differ per request, there is no reusable prefix. Adding `cache_control` only pays the cache-write premium with zero reads. Leave it off.
|
||||
|
||||
---
|
||||
|
||||
## Architectural guidance
|
||||
|
||||
These are the decisions that matter more than marker placement. Fix these first.
|
||||
|
||||
**Keep the system prompt frozen.** Don't interpolate "current date: X", "mode: Y", "user name: Z" into the system prompt — those sit at the front of the prefix and invalidate everything downstream. Inject dynamic context as a user or assistant message later in `messages`. A message at turn 5 invalidates nothing before turn 5.
|
||||
|
||||
**Don't change tools or model mid-conversation.** Tools render at position 0; adding, removing, or reordering a tool invalidates the entire cache. Same for switching models (caches are model-scoped). If you need "modes", don't swap the tool set — give Claude a tool that records the mode transition, or pass the mode as message content. Serialize tools deterministically (sort by name).
|
||||
|
||||
**Fork operations must reuse the parent's exact prefix.** Side computations (summarization, compaction, sub-agents) often spin up a separate API call. If the fork rebuilds `system` / `tools` / `model` with any difference, it misses the parent's cache entirely. Copy the parent's `system`, `tools`, and `model` verbatim, then append fork-specific content at the end.
|
||||
|
||||
---
|
||||
|
||||
## Silent invalidators
|
||||
|
||||
When reviewing code, grep for these inside anything that feeds the prompt prefix:
|
||||
|
||||
| Pattern | Why it breaks caching |
|
||||
|---|---|
|
||||
| `datetime.now()` / `Date.now()` / `time.time()` in system prompt | Prefix changes every request |
|
||||
| `uuid4()` / `crypto.randomUUID()` / request IDs early in content | Same — every request is unique |
|
||||
| `json.dumps(d)` without `sort_keys=True` / iterating a `set` | Non-deterministic serialization → prefix bytes differ |
|
||||
| f-string interpolating session/user ID into system prompt | Per-user prefix; no cross-user sharing |
|
||||
| Conditional system sections (`if flag: system += ...`) | Every flag combination is a distinct prefix |
|
||||
| `tools=build_tools(user)` where set varies per user | Tools render at position 0; nothing caches across users |
|
||||
|
||||
Fix by moving the dynamic piece after the last breakpoint, making it deterministic, or deleting it if it's not load-bearing.
|
||||
|
||||
---
|
||||
|
||||
## API reference
|
||||
|
||||
```json
|
||||
"cache_control": {"type": "ephemeral"} // 5-minute TTL (default)
|
||||
"cache_control": {"type": "ephemeral", "ttl": "1h"} // 1-hour TTL
|
||||
```
|
||||
|
||||
- Max **4** `cache_control` breakpoints per request.
|
||||
- Goes on any content block: system text blocks, tool definitions, message content blocks (`text`, `image`, `tool_use`, `tool_result`, `document`).
|
||||
- Top-level `cache_control` on `messages.create()` auto-places on the last cacheable block — simplest option when you don't need fine-grained placement.
|
||||
- Minimum cacheable prefix is model-dependent (typically 1024–2048 tokens). Shorter prefixes silently won't cache even with a marker.
|
||||
|
||||
**Economics:** Cache writes cost ~1.25× base input price; reads cost ~0.1×. A prefix must be used in at least two requests within TTL to break even (one writes the cache, subsequent ones read it). For bursty traffic, the 1-hour TTL keeps entries alive across gaps.
|
||||
|
||||
---
|
||||
|
||||
## Verifying cache hits
|
||||
|
||||
The response `usage` object reports cache activity:
|
||||
|
||||
| Field | Meaning |
|
||||
|---|---|
|
||||
| `cache_creation_input_tokens` | Tokens written to cache this request (you paid the ~1.25× write premium) |
|
||||
| `cache_read_input_tokens` | Tokens served from cache this request (you paid ~0.1×) |
|
||||
| `input_tokens` | Tokens processed at full price (not cached) |
|
||||
|
||||
If `cache_read_input_tokens` is zero across repeated requests with identical prefixes, a silent invalidator is at work — diff the rendered prompt bytes between two requests to find it.
|
||||
|
||||
Language-specific access: `response.usage.cache_read_input_tokens` (Python/TS/Ruby), `$message->usage->cacheReadInputTokens` (PHP), `resp.Usage.CacheReadInputTokens` (Go/C#), `.usage().cacheReadInputTokens()` (Java).
|
||||
@@ -6,7 +6,7 @@ This file covers the conceptual foundations of tool use with the Claude API. For
|
||||
|
||||
### Tool Definition Structure
|
||||
|
||||
> **Note:** When using the Tool Runner (beta), tool schemas are generated automatically from your function signatures (Python), Zod schemas (TypeScript), annotated classes (Java), `jsonschema` struct tags (Go), or `BaseTool` subclasses (Ruby). The raw JSON schema format below is for the manual approach or SDKs without tool runner support.
|
||||
> **Note:** When using the Tool Runner (beta), tool schemas are generated automatically from your function signatures (Python), Zod schemas (TypeScript), annotated classes (Java), `jsonschema` struct tags (Go), or `BaseTool` subclasses (Ruby). The raw JSON schema format below is for the manual approach — including PHP's `BetaRunnableTool`, which wraps a run closure around a hand-written schema — or SDKs without tool runner support.
|
||||
|
||||
Each tool requires a name, description, and JSON Schema for its inputs:
|
||||
|
||||
@@ -59,7 +59,7 @@ Any `tool_choice` value can also include `"disable_parallel_tool_use": true` to
|
||||
|
||||
### Tool Runner vs Manual Loop
|
||||
|
||||
**Tool Runner (Recommended):** The SDK's tool runner handles the agentic loop automatically — it calls the API, detects tool use requests, executes your tool functions, feeds results back to Claude, and repeats until Claude stops calling tools. Available in Python, TypeScript, Java, Go, and Ruby SDKs (beta). The Python SDK also provides MCP conversion helpers (`anthropic.lib.tools.mcp`) to convert MCP tools, prompts, and resources for use with the tool runner — see `python/claude-api/tool-use.md` for details.
|
||||
**Tool Runner (Recommended):** The SDK's tool runner handles the agentic loop automatically — it calls the API, detects tool use requests, executes your tool functions, feeds results back to Claude, and repeats until Claude stops calling tools. Available in Python, TypeScript, Java, Go, Ruby, and PHP SDKs (beta). The Python SDK also provides MCP conversion helpers (`anthropic.lib.tools.mcp`) to convert MCP tools, prompts, and resources for use with the tool runner — see `python/claude-api/tool-use.md` for details.
|
||||
|
||||
**Manual Agentic Loop:** Use when you need fine-grained control over the loop (e.g., custom logging, conditional tool execution, human-in-the-loop approval). Loop until `stop_reason == "end_turn"`, always append the full `response.content` to preserve tool_use blocks, and ensure each `tool_result` includes the matching `tool_use_id`.
|
||||
|
||||
|
||||
@@ -187,6 +187,7 @@ for await (const message of query({
|
||||
description: "Expert code reviewer for quality and security reviews.",
|
||||
prompt: "Analyze code quality and suggest improvements.",
|
||||
tools: ["Read", "Glob", "Grep"],
|
||||
// Optional: skills, mcpServers for subagent customization
|
||||
},
|
||||
},
|
||||
},
|
||||
|
||||
@@ -105,6 +105,8 @@ const response = await client.messages.create({
|
||||
|
||||
## Prompt Caching
|
||||
|
||||
**Caching is a prefix match** — any byte change anywhere in the prefix invalidates everything after it. For placement patterns, architectural guidance (frozen system prompt, deterministic tool order, where to put volatile content), and the silent-invalidator audit checklist, read `shared/prompt-caching.md`.
|
||||
|
||||
### Automatic Caching (Recommended)
|
||||
|
||||
Use top-level `cache_control` to automatically cache the last cacheable block in the request:
|
||||
@@ -152,6 +154,16 @@ const response2 = await client.messages.create({
|
||||
});
|
||||
```
|
||||
|
||||
### Verifying Cache Hits
|
||||
|
||||
```typescript
|
||||
console.log(response.usage.cache_creation_input_tokens); // tokens written to cache (~1.25x cost)
|
||||
console.log(response.usage.cache_read_input_tokens); // tokens served from cache (~0.1x cost)
|
||||
console.log(response.usage.input_tokens); // uncached tokens (full cost)
|
||||
```
|
||||
|
||||
If `cache_read_input_tokens` is zero across repeated identical-prefix requests, a silent invalidator is at work — `Date.now()` or a UUID in the system prompt, non-deterministic key ordering, or a varying tool set. See `shared/prompt-caching.md` for the full audit table.
|
||||
|
||||
---
|
||||
|
||||
## Extended Thinking
|
||||
|
||||
Reference in New Issue
Block a user