From d211d437443a7b2496a3dad9575e7dddd724c585 Mon Sep 17 00:00:00 2001 From: Lance Martin <122662504+rlancemartin@users.noreply.github.com> Date: Wed, 6 May 2026 09:05:49 -0700 Subject: [PATCH] Add Managed Agents outcomes, multiagent, and webhooks to claude-api skill (#1096) --- skills/claude-api/SKILL.md | 2 +- skills/claude-api/shared/live-sources.md | 1 + .../shared/managed-agents-api-reference.md | 41 ++++++- .../claude-api/shared/managed-agents-core.md | 6 +- .../shared/managed-agents-events.md | 12 +- .../shared/managed-agents-multiagent.md | 99 ++++++++++++++++ .../shared/managed-agents-onboarding.md | 2 +- .../shared/managed-agents-outcomes.md | 106 +++++++++++++++++ .../shared/managed-agents-overview.md | 5 +- .../claude-api/shared/managed-agents-tools.md | 2 +- .../shared/managed-agents-webhooks.md | 110 ++++++++++++++++++ 11 files changed, 374 insertions(+), 12 deletions(-) create mode 100644 skills/claude-api/shared/managed-agents-multiagent.md create mode 100644 skills/claude-api/shared/managed-agents-outcomes.md create mode 100644 skills/claude-api/shared/managed-agents-webhooks.md diff --git a/skills/claude-api/SKILL.md b/skills/claude-api/SKILL.md index ebff1c796..9412f082c 100644 --- a/skills/claude-api/SKILL.md +++ b/skills/claude-api/SKILL.md @@ -234,7 +234,7 @@ For placement patterns, architectural guidance, and the silent-invalidator audit |---|---| | `managed-agents-onboard` | Walk the user through setting up a Managed Agent from scratch. **Read `shared/managed-agents-onboarding.md` immediately** and follow its interview script: mental model → know-or-explore branch → template config → session setup → emit code. Do not summarize — run the interview. | -**Reading guide:** Start with `shared/managed-agents-overview.md`, then the topical `shared/managed-agents-*.md` files (core, environments, tools, events, memory, client-patterns, onboarding, api-reference). For Python, TypeScript, Go, Ruby, PHP, and Java, read `{lang}/managed-agents/README.md` for code examples. For cURL, read `curl/managed-agents.md`. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI is one convenient way to create agents and environments from version-controlled YAML (URL in `shared/live-sources.md`). If a binding you need isn't shown in the language README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# does not currently have Managed Agents support; use raw HTTP from `curl/managed-agents.md` as a reference. +**Reading guide:** Start with `shared/managed-agents-overview.md`, then the topical `shared/managed-agents-*.md` files (core, environments, tools, events, outcomes, multiagent, webhooks, memory, client-patterns, onboarding, api-reference). For Python, TypeScript, Go, Ruby, PHP, and Java, read `{lang}/managed-agents/README.md` for code examples. For cURL, read `curl/managed-agents.md`. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI is one convenient way to create agents and environments from version-controlled YAML (URL in `shared/live-sources.md`). If a binding you need isn't shown in the language README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# does not currently have Managed Agents support; use raw HTTP from `curl/managed-agents.md` as a reference. **When the user wants to set up a Managed Agent from scratch** (e.g. "how do I get started", "walk me through creating one", "set up a new agent"): read `shared/managed-agents-onboarding.md` and run its interview — same flow as the `managed-agents-onboard` subcommand. diff --git a/skills/claude-api/shared/live-sources.md b/skills/claude-api/shared/live-sources.md index 53a8bbec2..d2f835519 100644 --- a/skills/claude-api/shared/live-sources.md +++ b/skills/claude-api/shared/live-sources.md @@ -88,6 +88,7 @@ Use these when a managed-agents binding, behavior, or wire-level detail isn't co | Permission Policies | `https://platform.claude.com/docs/en/managed-agents/permission-policies.md` | "Extract permission policy types (allow/deny/confirm) and per-tool config" | | Multi-Agent | `https://platform.claude.com/docs/en/managed-agents/multi-agent.md` | "Extract multi-agent composition patterns, sub-agent invocation, and result handoff" | | Observability | `https://platform.claude.com/docs/en/managed-agents/observability.md` | "Extract logging, tracing, and usage telemetry exposed by managed agents" | +| Webhooks | `https://platform.claude.com/docs/en/managed-agents/webhooks.md` | "Extract webhook endpoint registration, HMAC signature verification, supported event types, and delivery semantics" | | GitHub | `https://platform.claude.com/docs/en/managed-agents/github.md` | "Extract github_repository resource shape, multi-repo mounting, and token rotation" | | MCP Connector | `https://platform.claude.com/docs/en/managed-agents/mcp-connector.md` | "Extract MCP server declaration on agents and vault-based credential injection at session" | | Vaults | `https://platform.claude.com/docs/en/managed-agents/vaults.md` | "Extract vault create, credential add/rotate, OAuth refresh shape, and archive" | diff --git a/skills/claude-api/shared/managed-agents-api-reference.md b/skills/claude-api/shared/managed-agents-api-reference.md index 8e7b3a03b..16b1c5b8d 100644 --- a/skills/claude-api/shared/managed-agents-api-reference.md +++ b/skills/claude-api/shared/managed-agents-api-reference.md @@ -23,15 +23,16 @@ All resources are under the `beta` namespace. Python and TypeScript share identi | Environments | `environments.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `Environments.New` / `Get` / `Update` / `List` / `Delete` / `Archive` | | Sessions | `sessions.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `Sessions.New` / `Get` / `Update` / `List` / `Delete` / `Archive` | | Session Events | `sessions.events.list` / `send` / `stream` | `Sessions.Events.List` / `Send` / `StreamEvents` | +| Session Threads | `sessions.threads.list` / `retrieve` / `archive`; `sessions.threads.events.list` / `stream` | `Sessions.Threads.List` / `Get` / `Archive`; `Sessions.Threads.Events.List` / `StreamEvents` | | Session Resources | `sessions.resources.add` / `retrieve` / `update` / `list` / `delete` | `Sessions.Resources.Add` / `Get` / `Update` / `List` / `Delete` | | Vaults | `vaults.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `Vaults.New` / `Get` / `Update` / `List` / `Delete` / `Archive` | -| Credentials | `vaults.credentials.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `Vaults.Credentials.New` / `Get` / `Update` / `List` / `Delete` / `Archive` | +| Credentials | `vaults.credentials.create` / `retrieve` / `update` / `list` / `delete` / `archive` / `mcp_oauth_validate` | `Vaults.Credentials.New` / `Get` / `Update` / `List` / `Delete` / `Archive` / `McpOauthValidate` | | Memory Stores | `memory_stores.create` / `retrieve` / `update` / `list` / `delete` / `archive` | `MemoryStores.New` / `Get` / `Update` / `List` / `Delete` / `Archive` | | Memories | `memory_stores.memories.create` / `retrieve` / `update` / `list` / `delete` | `MemoryStores.Memories.New` / `Get` / `Update` / `List` / `Delete` | | Memory Versions | `memory_stores.memory_versions.list` / `retrieve` / `redact` | `MemoryStores.MemoryVersions.List` / `Get` / `Redact` | **Naming quirks to watch for:** -- Agents have **no delete** — only `archive`. Archive is **permanent**: the agent becomes read-only, new sessions cannot reference it, and there is no unarchive. Confirm with the user before archiving a production agent. Environments, Sessions, Vaults, Credentials, and Memory Stores have both `delete` and `archive`; Session Resources, Files, Skills, and Memories are `delete`-only; Memory Versions have neither — only `redact`. +- Agents and Session Threads have **no delete** — only `archive`. Archive is **permanent**: the agent becomes read-only, new sessions cannot reference it, and there is no unarchive. Confirm with the user before archiving a production agent. Environments, Sessions, Vaults, Credentials, and Memory Stores have both `delete` and `archive`; Session Resources, Files, Skills, and Memories are `delete`-only; Memory Versions have neither — only `redact`. - Session resources use `add` (not `create`). - Go's event stream is `StreamEvents` (not `Stream`). @@ -73,6 +74,18 @@ All resources are under the `beta` namespace. Python and TypeScript share identi | `POST` | `/v1/sessions/{session_id}/events` | SendEvents | Send events (user message, tool result) | | `GET` | `/v1/sessions/{session_id}/events/stream` | StreamEvents | Stream events via SSE | +## Session Threads + +Per-subagent event streams in multiagent sessions. See `shared/managed-agents-multiagent.md`. + +| Method | Path | Operation | Description | +| -------- | ------------------------------------------------ | ---------------- | ---------------------------------------- | +| `GET` | `/v1/sessions/{session_id}/threads` | ListThreads | List threads (paginated) | +| `GET` | `/v1/sessions/{session_id}/threads/{thread_id}` | GetThread | Retrieve one thread (carries `agent` snapshot, `status`, `parent_thread_id`, `stats`, `usage`) | +| `POST` | `/v1/sessions/{session_id}/threads/{thread_id}/archive` | ArchiveThread | Archive a thread | +| `GET` | `/v1/sessions/{session_id}/threads/{thread_id}/events` | ListThreadEvents | List past events for one thread (paginated) | +| `GET` | `/v1/sessions/{session_id}/threads/{thread_id}/stream` | StreamThreadEvents | Stream one thread via SSE (SDK: `threads.events.stream`) | + ## Session Resources | Method | Path | Operation | Description | @@ -119,6 +132,7 @@ Credentials are individual secrets stored inside a vault. | `POST` | `/v1/vaults/{vault_id}/credentials/{credential_id}` | UpdateCredential | Update credential | | `DELETE` | `/v1/vaults/{vault_id}/credentials/{credential_id}` | DeleteCredential | Delete credential | | `POST` | `/v1/vaults/{vault_id}/credentials/{credential_id}/archive` | ArchiveCredential | Archive credential | +| `POST` | `/v1/vaults/{vault_id}/credentials/{credential_id}/mcp_oauth_validate` | McpOauthValidate | Validate an MCP OAuth credential | ## Memory Stores @@ -206,13 +220,21 @@ Immutable per-mutation snapshots (`memver_...`) — the audit and rollback surfa "url": "https://api.githubcopilot.com/mcp/" } ], + "multiagent": { + "type": "coordinator", + "agents": [ + "agent_abc123", + { "type": "agent", "id": "agent_def456", "version": 4 }, + { "type": "self" } + ] + }, "metadata": { "key": "value (max 16 pairs, keys ≤64 chars, values ≤512 chars)" } } ``` -> Limits: `tools` max 50, `skills` max 64, `mcp_servers` max 20 (unique names). +> Limits: `tools` max 128, `skills` max 20, `mcp_servers` max 20 (unique names). `multiagent.agents` 1–20 entries (string ID | `{type:"agent",id,version?}` | `{type:"self"}`) — see `shared/managed-agents-multiagent.md`. ### CreateSession Request Body @@ -276,6 +298,19 @@ Immutable per-mutation snapshots (`memver_...`) — the audit and rollback surfa } ``` +### Define Outcome Event + +```json +{ + "type": "user.define_outcome", + "description": "Build a DCF model for Costco in .xlsx", + "rubric": { "type": "file", "file_id": "file_01..." }, + "max_iterations": 5 +} +``` + +> `rubric` is required: `{type: "text", content}` or `{type: "file", file_id}`. `max_iterations` default 3, max 20. Echoed back with `outcome_id` + `processed_at`. See `shared/managed-agents-outcomes.md`. + ### Tool Result Event ```json diff --git a/skills/claude-api/shared/managed-agents-core.md b/skills/claude-api/shared/managed-agents-core.md index ef45ab8f3..f5e0127e1 100644 --- a/skills/claude-api/shared/managed-agents-core.md +++ b/skills/claude-api/shared/managed-agents-core.md @@ -132,8 +132,9 @@ const session = await client.beta.sessions.create( | `system` | string | No | System prompt — defines the agent's behavior (up to 100K chars) | | `tools` | array | No | Encompasses three kinds: (1) pre-built Claude Agent tools (`agent_toolset_20260401`), (2) MCP tools (`mcp_toolset`), and (3) custom client-side tools. Max 128. | | `mcp_servers` | array | No | MCP server connections — standardized third-party capabilities (e.g. GitHub, Asana). Max 20, unique names. See `shared/managed-agents-tools.md` → MCP Servers. | -| `skills` | array | No | Customized "best-practices" context with progressive disclosure. Max 64. See `shared/managed-agents-tools.md` → Skills. | +| `skills` | array | No | Customized "best-practices" context with progressive disclosure. Max 20. See `shared/managed-agents-tools.md` → Skills. | | `description` | string | No | Description of the agent (up to 2048 chars) | +| `multiagent` | object | No | `{type: "coordinator", agents: [...]}` — roster this agent may delegate to. See `shared/managed-agents-multiagent.md`. | | `metadata` | object | No | Arbitrary key-value pairs (max 16, keys ≤64 chars, values ≤512 chars) | --- @@ -153,8 +154,9 @@ The API is **flat** — `model`, `system`, `tools` etc. are top-level fields, no | `system` | string | No | System prompt | | `tools` | array | No | Agent toolset / MCP toolset / custom tools | | `mcp_servers` | array | No | MCP server connections | -| `skills` | array | No | Skill references (max 64) | +| `skills` | array | No | Skill references (max 20) | | `description` | string | No | Description of the agent | +| `multiagent` | object | No | Coordinator roster — see `shared/managed-agents-multiagent.md` | | `metadata` | object | No | Arbitrary key-value pairs | ### Lifecycle: create once, run many, update in place diff --git a/skills/claude-api/shared/managed-agents-events.md b/skills/claude-api/shared/managed-agents-events.md index 4ee3084d4..28e3fbcb1 100644 --- a/skills/claude-api/shared/managed-agents-events.md +++ b/skills/claude-api/shared/managed-agents-events.md @@ -12,13 +12,15 @@ Send events to a session via `POST /v1/sessions/{id}/events`. | `user.interrupt` | Interrupt the agent while it's running | | `user.tool_confirmation` | Approve/deny a tool call (when `always_ask` policy) | | `user.custom_tool_result` | Provide result for a custom tool call | +| `user.define_outcome` | Start a rubric-graded iterate loop — see `shared/managed-agents-outcomes.md` | ### Receiving Events -Two methods: +Three methods: 1. **Streaming (SSE)**: `GET /v1/sessions/{id}/events/stream` — real-time Server-Sent Events. **Long-lived** — the server sends periodic heartbeats to keep the connection alive. 2. **Polling**: `GET /v1/sessions/{id}/events` — paginated event list (query params: `limit` default 1000, `page`). **Returns immediately** — this is a plain paginated GET, not a long-poll. +3. **Webhooks**: Anthropic POSTs session state transitions to your HTTPS endpoint — thin payloads (IDs only), HMAC-signed, Console-registered. See `shared/managed-agents-webhooks.md`. All received events carry `id`, `type`, and `processed_at` (ISO 8601; `null` if not yet processed by the agent). @@ -47,8 +49,12 @@ Event types use dot notation, grouped by namespace: | `session.error` | Error occurred during processing | | `span.model_request_start` | Model inference started | | `span.model_request_end` | Model inference completed | +| `span.outcome_evaluation_start` / `_ongoing` / `_end` | Grader progress for outcome-oriented sessions — see `shared/managed-agents-outcomes.md` | +| `session.thread_created` | Subagent thread spawned (multiagent) — see `shared/managed-agents-multiagent.md` | +| `session.thread_status_running` / `_idle` / `_rescheduled` / `_terminated` | Subagent thread status transitions (multiagent). `_idle` carries `stop_reason`. | +| `agent.thread_message_sent` / `_received` | Cross-thread message, carries `to_session_thread_id` / `from_session_thread_id` (multiagent) | -The stream also echoes back user-sent events (`user.message`, `user.interrupt`, `user.tool_confirmation`, `user.custom_tool_result`). +The stream also echoes back user-sent events (`user.message`, `user.interrupt`, `user.tool_confirmation`, `user.custom_tool_result`, `user.define_outcome`). --- @@ -125,7 +131,7 @@ await client.beta.sessions.events.send(sessionId, { }); ``` -The agent stops mid-task. It does not see the interrupt as a message — it just halts. Send a follow-up `user` event to explain what to do instead. +The agent stops mid-task. It does not see the interrupt as a message — it just halts. Send a follow-up `user` event to explain what to do instead. If an outcome is active, the interrupt also marks `span.outcome_evaluation_end.result: "interrupted"` (see `shared/managed-agents-outcomes.md`). > **Note**: Interrupt events may have empty IDs in the current implementation. When troubleshooting, use the `processed_at` timestamp along with surrounding event IDs. diff --git a/skills/claude-api/shared/managed-agents-multiagent.md b/skills/claude-api/shared/managed-agents-multiagent.md new file mode 100644 index 000000000..1d5872c9b --- /dev/null +++ b/skills/claude-api/shared/managed-agents-multiagent.md @@ -0,0 +1,99 @@ +# Managed Agents — Multiagent Sessions + +A coordinator agent can delegate to other agents within one session. All agents **share the container and filesystem**; each runs in its own **thread** — a context-isolated event stream with its own conversation history, model, system prompt, tools, MCP servers, and skills (from that agent's own config). Threads are persistent: the coordinator can send a follow-up to a subagent it called earlier and that subagent retains its prior turns. + +The SDK sets the `managed-agents-2026-04-01` beta header automatically on all `client.beta.{agents,sessions}.*` calls; no additional header is required for multiagent. + +--- + +## Declare the roster on the coordinator + +`multiagent` is a **top-level field** on `agents.create()` / `agents.update()` — **not** a `tools[]` entry. `agents` lists 1–20 roster entries. Nothing changes on `sessions.create()` — the roster is resolved from the coordinator's config. + +```python +orchestrator = client.beta.agents.create( + name="Engineering Lead", + model="{{OPUS_ID}}", + system="You coordinate engineering work. Delegate code review to the reviewer and test writing to the test agent.", + tools=[{"type": "agent_toolset_20260401"}], + multiagent={ + "type": "coordinator", + "agents": [ + reviewer.id, # bare string — latest version + {"type": "agent", "id": test_writer.id, "version": 4}, # pinned version + {"type": "self"}, # the coordinator itself + ], + }, +) + +session = client.beta.sessions.create(agent=orchestrator.id, environment_id=env.id) +``` + +| Roster entry | Shape | Notes | +|---|---|---| +| String shorthand | `"agent_abc123"` | References the latest version of a stored agent. | +| Agent reference | `{type: "agent", id, version?}` | Omit `version` to pin the latest at coordinator save time. | +| Self | `{type: "self"}` | The coordinator can spawn copies of itself. | + +Up to **20 unique agents** in the roster; the coordinator may spawn **multiple copies** of each. **One level of delegation only** — depth > 1 is ignored. + +--- + +## Threads + +The session-level event stream is the **primary thread** — it shows the coordinator's trace plus a condensed view of subagent activity (thread status transitions and cross-thread messages, not every subagent tool call). Drill into a specific subagent via the per-thread endpoints: + +| Operation | HTTP | SDK (`client.beta.sessions.threads.*`) | +|---|---|---| +| List threads | `GET /v1/sessions/{sid}/threads` | `.list(session_id)` | +| Retrieve one | `GET /v1/sessions/{sid}/threads/{tid}` | `.retrieve(thread_id, session_id=...)` | +| Archive | `POST /v1/sessions/{sid}/threads/{tid}/archive` | `.archive(thread_id, session_id=...)` | +| List thread events | `GET /v1/sessions/{sid}/threads/{tid}/events` | `.events.list(thread_id, session_id=...)` | +| Stream thread events | `GET /v1/sessions/{sid}/threads/{tid}/stream` | `.events.stream(thread_id, session_id=...)` | + +Each `SessionThread` carries `id`, `status` (`running` | `idle` | `rescheduling` | `terminated`), `agent` (a resolved snapshot of the agent config — `id`, `name`, `model`, `system`, `tools`, `skills`, `mcp_servers`, `version`), `parent_thread_id` (null for the primary thread, which is included in the list), `archived_at`, and optional `stats`/`usage`. **Session status aggregates thread statuses** — if any thread is `running`, `session.status` is `running`. Max **25 concurrent threads**. When draining a per-thread stream, break on `session.thread_status_idle` (and check its `stop_reason` as you would for the session-level idle). + +--- + +## Multiagent events (on the session stream) + +| Event | Payload highlights | Meaning | +|---|---|---| +| `session.thread_created` | `session_thread_id`, `agent_name` | A new thread was created. | +| `session.thread_status_running` | `session_thread_id`, `agent_name` | Thread started activity. | +| `session.thread_status_idle` | `session_thread_id`, `agent_name`, **`stop_reason`** | Thread is awaiting input. Inspect `stop_reason` (same shape as `session.status_idle.stop_reason`). | +| `session.thread_status_rescheduled` | `session_thread_id`, `agent_name` | Thread is rescheduling after a retryable error. | +| `session.thread_status_terminated` | `session_thread_id`, `agent_name` | Thread was archived or hit a terminal error. | +| `agent.thread_message_sent` | `to_session_thread_id`, `to_agent_name`, `content` | Coordinator sent a follow-up to another thread. | +| `agent.thread_message_received` | `from_session_thread_id`, `from_agent_name`, `content` | An agent delivered its result to the coordinator. | + +--- + +## Tool permissions and custom tools from subagent threads + +When a subagent needs your client (an `always_ask` confirmation, or a custom tool result), the request is **cross-posted to the primary thread** with `session_thread_id` identifying the originating thread — so you only need to watch the session stream. Reply with `user.tool_confirmation` (carrying `tool_use_id`) or `user.custom_tool_result` (carrying `custom_tool_use_id`), and **echo the `session_thread_id` from the originating event** (the SDK param type and docstring expect it). The server also routes by the tool-use ID, so the echo is belt-and-suspenders rather than load-bearing — but include it. + +```python +for event_id in stop.event_ids: + pending = events_by_id[event_id] + confirmation = { + "type": "user.tool_confirmation", + "tool_use_id": event_id, + "result": "allow", + } + if pending.session_thread_id is not None: + confirmation["session_thread_id"] = pending.session_thread_id + client.beta.sessions.events.send(session.id, events=[confirmation]) +``` + +The same pattern applies to `user.custom_tool_result`. + +--- + +## Pitfalls + +- **Don't put the roster on `sessions.create()` or in `tools[]`.** `multiagent` is a top-level agent field; update the coordinator, then start a session that references it. +- **Don't assume shared context.** Threads share the filesystem but not conversation history or tools. If the coordinator needs a subagent to act on something, it must say so in the delegated message (or write it to disk). +- **Depth > 1 is ignored.** A subagent's own `multiagent` roster (if any) doesn't cascade — only the session's coordinator delegates. + +For per-language bindings beyond Python, WebFetch `https://platform.claude.com/docs/en/managed-agents/multi-agent.md` (see `shared/live-sources.md`). diff --git a/skills/claude-api/shared/managed-agents-onboarding.md b/skills/claude-api/shared/managed-agents-onboarding.md index 912a8cec6..e6bc3416d 100644 --- a/skills/claude-api/shared/managed-agents-onboarding.md +++ b/skills/claude-api/shared/managed-agents-onboarding.md @@ -51,7 +51,7 @@ Three rounds. Batch the questions in each round; don't ask them one at a time. **Round B — Skills, files, and repos.** What the agent has on hand when it starts. -*Skills* — two types; both work the same way — Claude auto-uses them when relevant. Max 64 per agent. +*Skills* — two types; both work the same way — Claude auto-uses them when relevant. Max 20 per agent. - [ ] **Pre-built Agent Skills**: `xlsx`, `docx`, `pptx`, `pdf`. Reference by name. - [ ] **Custom Skills**: skills uploaded to the user's org via the Skills API. Reference by `skill_id` + optional `version`. If the skill doesn't exist yet, walk the user through `POST /v1/skills` + `POST /v1/skills/{id}/versions` (beta header `skills-2025-10-02`). Full detail: `shared/managed-agents-tools.md` → Skills + Skills API. diff --git a/skills/claude-api/shared/managed-agents-outcomes.md b/skills/claude-api/shared/managed-agents-outcomes.md new file mode 100644 index 000000000..aee3f4e3f --- /dev/null +++ b/skills/claude-api/shared/managed-agents-outcomes.md @@ -0,0 +1,106 @@ +# Managed Agents — Outcomes + +An **outcome** elevates a session from *conversation* to *work*: you state what "done" looks like, and the harness runs an iterate → grade → revise loop until the artifact meets the rubric, hits `max_iterations`, or is interrupted. A separate **grader** (independent context window) scores each iteration against your rubric and feeds per-criterion gaps back to the agent. + +The SDK sets the `managed-agents-2026-04-01` beta header automatically on all `client.beta.sessions.*` calls; no additional header is required for outcomes. + +--- + +## The `user.define_outcome` event + +Outcomes are not a field on `sessions.create()`. You create a normal session, then send a `user.define_outcome` event. The agent starts working on receipt — **do not also send a `user.message`** to kick it off. + +```python +session = client.beta.sessions.create( + agent=AGENT_ID, + environment_id=ENVIRONMENT_ID, + title="Financial analysis on Costco", +) + +client.beta.sessions.events.send( + session_id=session.id, + events=[ + { + "type": "user.define_outcome", + "description": "Build a DCF model for Costco in .xlsx", + "rubric": {"type": "text", "content": RUBRIC_MD}, + # or: "rubric": {"type": "file", "file_id": rubric.id} + "max_iterations": 5, # optional; default 3, max 20 + } + ], +) +``` + +| Field | Type | Notes | +|---|---|---| +| `type` | `"user.define_outcome"` | | +| `description` | string | The task. This is what the agent works toward — no separate `user.message` needed. | +| `rubric` | `{type: "text", content}` \| `{type: "file", file_id}` | **Required.** Markdown with explicit, independently gradeable criteria. Upload once via `client.beta.files.upload(...)` (beta `files-api-2025-04-14`) to reuse across sessions. | +| `max_iterations` | int | Optional. Default **3**, max **20**. | + +The event is echoed back on the stream with a server-assigned `outcome_id` and `processed_at`. + +> **Writing rubrics.** Use explicit, gradeable criteria ("CSV has a numeric `price` column"), not vibes ("data looks good") — the grader scores each criterion independently, so vague criteria produce noisy loops. If you don't have a rubric, have Claude analyze a known-good artifact and turn that analysis into one. + +--- + +## Outcome-specific events + +These appear on the standard event stream (`sessions.events.stream` / `.list`) alongside the usual `agent.*` / `session.*` events. + +| Event | Payload highlights | Meaning | +|---|---|---| +| `span.outcome_evaluation_start` | `outcome_id`, `iteration` (0-indexed) | Grader began scoring iteration *N*. | +| `span.outcome_evaluation_ongoing` | `outcome_id` | Heartbeat while the grader runs. Grader reasoning is opaque — you see *that* it's working, not *what* it's thinking. | +| `span.outcome_evaluation_end` | `outcome_evaluation_start_id`, `outcome_id`, `iteration`, `result`, `explanation`, `usage` | Grader finished one iteration. `result` drives what happens next (table below). | + +### `span.outcome_evaluation_end.result` + +| `result` | Next | +|---|---| +| `satisfied` | Session → `idle`. Terminal for this outcome. | +| `needs_revision` | Agent starts another iteration. | +| `max_iterations_reached` | No further grader cycles. Agent may run one final revision, then session → `idle`. | +| `failed` | Session → `idle`. Rubric fundamentally doesn't match the task (e.g. description and rubric contradict). | +| `interrupted` | Only emitted if `_start` had already fired before a `user.interrupt` arrived. | + +```json +{ + "type": "span.outcome_evaluation_end", + "id": "sevt_01jkl...", + "outcome_evaluation_start_id": "sevt_01def...", + "outcome_id": "outc_01a...", + "result": "satisfied", + "explanation": "All 12 criteria met: revenue projections use 5 years of historical data, ...", + "iteration": 0, + "usage": { "input_tokens": 2400, "output_tokens": 350, "cache_creation_input_tokens": 0, "cache_read_input_tokens": 1800 }, + "processed_at": "2026-03-25T14:03:00Z" +} +``` + +--- + +## Checking status & retrieving deliverables + +**Status** — either watch the stream for `span.outcome_evaluation_end`, or poll the session and read `outcome_evaluations`: + +```python +session = client.beta.sessions.retrieve(session.id) +for ev in session.outcome_evaluations: + print(f"{ev.outcome_id}: {ev.result}") # outc_01a...: satisfied +``` + +**Deliverables** — the agent writes to `/mnt/session/outputs/`. Once idle, fetch via the Files API with `scope_id=session.id`. This is the same session-outputs mechanism documented in `shared/managed-agents-environments.md` → Session outputs (including the dual-beta-header requirement on `files.list`). + +--- + +## Interaction rules & pitfalls + +- **One outcome at a time.** Chain by sending the next `user.define_outcome` only after the previous one's terminal `span.outcome_evaluation_end` (`satisfied` / `max_iterations_reached` / `failed` / `interrupted`). The session retains history across chained outcomes. +- **Steering is allowed but optional.** You *may* send `user.message` events mid-outcome to nudge direction, but the agent already knows to keep working until terminal — don't send "keep going" prompts. +- **`user.interrupt` pauses the current outcome** — it marks `result: "interrupted"` and leaves the session `idle`, ready for a new outcome or conversational turn. +- **After terminal, the session is reusable** — continue conversationally or define a new outcome. +- **Outcome ≠ session-create field.** Don't put `outcome`, `rubric`, or `description` on `sessions.create()` — outcomes are always sent as a `user.define_outcome` event. +- **Idle-break gate is unchanged.** In your drain loop, keep using `event.type === 'session.status_idle' && event.stop_reason?.type !== 'requires_action'` — do **not** gate on `span.outcome_evaluation_end` alone (on `needs_revision` the session keeps running). See `shared/managed-agents-client-patterns.md` Pattern 5. + +For the raw HTTP shapes and per-language SDK bindings beyond Python, WebFetch `https://platform.claude.com/docs/en/managed-agents/define-outcomes.md` (see `shared/live-sources.md`). diff --git a/skills/claude-api/shared/managed-agents-overview.md b/skills/claude-api/shared/managed-agents-overview.md index 2c55d2f15..689f510df 100644 --- a/skills/claude-api/shared/managed-agents-overview.md +++ b/skills/claude-api/shared/managed-agents-overview.md @@ -25,7 +25,7 @@ Managed Agents is in beta. The SDK sets required beta headers automatically: | Beta Header | What it enables | | ------------------------------ | ---------------------------------------------------- | -| `managed-agents-2026-04-01` | Agents, Environments, Sessions, Events, Session Resources, Vaults, Credentials, Memory Stores | +| `managed-agents-2026-04-01` | Agents, Environments, Sessions, Events, Session Resources, Session Threads, Outcomes, Multiagent, Vaults, Credentials, Memory Stores | | `skills-2025-10-02` | Skills API (for managing custom skill definitions) | | `files-api-2025-04-14` | Files API for file uploads | @@ -45,6 +45,9 @@ Managed Agents is in beta. The SDK sets required beta headers automatically: | Configure tools and permissions | `shared/managed-agents-tools.md` | | Set up MCP servers | `shared/managed-agents-tools.md` (MCP Servers section) | | Stream events / handle tool_use | `shared/managed-agents-events.md` + language file | +| Get notified of session state changes via webhook (no polling) | `shared/managed-agents-webhooks.md` — Console-registered endpoint, HMAC verify, thin payload + fetch | +| Define an outcome / rubric-graded iterate loop | `shared/managed-agents-outcomes.md` — `user.define_outcome` event, grader, `span.outcome_evaluation_*` events | +| Coordinate multiple agents / subagents / threads | `shared/managed-agents-multiagent.md` — `multiagent: {type: "coordinator", agents: [...]}` on the agent, session threads, cross-posted tool confirmations | | Set up environments | `shared/managed-agents-environments.md` + language file | | Upload files / attach repos | `shared/managed-agents-environments.md` (Resources) | | Give agents persistent memory across sessions | `shared/managed-agents-memory.md` — memory stores, `memory_store` session resource, preconditions, versions/redact | diff --git a/skills/claude-api/shared/managed-agents-tools.md b/skills/claude-api/shared/managed-agents-tools.md index de1dabb73..3a7247f6a 100644 --- a/skills/claude-api/shared/managed-agents-tools.md +++ b/skills/claude-api/shared/managed-agents-tools.md @@ -258,7 +258,7 @@ Two types — both work the same way; the agent automatically uses them when rel | **Pre-built Anthropic skills** | Common document tasks (PowerPoint, Excel, Word, PDF). Reference by name (e.g. `xlsx`). | | **Custom skills** | Skills you've created in your organization via the Skills API. Reference by `skill_id` + optional `version`. | -**Max 64 skills per agent.** Agent creation uses `managed-agents-2026-04-01`; the separate Skills API (for managing custom skill definitions) uses `skills-2025-10-02`. +**Max 20 skills per agent.** Agent creation uses `managed-agents-2026-04-01`; the separate Skills API (for managing custom skill definitions) uses `skills-2025-10-02`. ### Enabling skills on a session diff --git a/skills/claude-api/shared/managed-agents-webhooks.md b/skills/claude-api/shared/managed-agents-webhooks.md new file mode 100644 index 000000000..4d2e5e15b --- /dev/null +++ b/skills/claude-api/shared/managed-agents-webhooks.md @@ -0,0 +1,110 @@ +# Managed Agents — Webhooks + +Anthropic can POST to your HTTPS endpoint when a Managed Agents resource changes state — an alternative to holding an SSE stream or polling. Payloads are **thin** (event type + resource IDs only); on receipt, fetch the resource for current state. Every delivery is HMAC-signed. + +> **Direction matters.** This page covers *Anthropic → you* notifications about session/vault state. It does **not** cover *third-party → you* webhooks that *trigger* a session (e.g. a GitHub push handler that calls `sessions.create()`) — that's ordinary application code on your side with no Anthropic-specific wire format. + +--- + +## Register an endpoint (Console only) + +Console → **Manage → Webhooks**. There is no programmatic endpoint-management API yet. Secret rotation is supported from the same page. + +| Field | Constraint | +|---|---| +| URL | HTTPS on port 443, publicly resolvable hostname | +| Event types | Subscribe per `data.type` — you only receive subscribed types (plus test events) | +| Signing secret | `whsec_`-prefixed, 32 bytes, **shown once at creation** — store it | + +--- + +## Verify the signature + +Every delivery is HMAC-signed. **Use the SDK's `client.beta.webhooks.unwrap()`** — it verifies the signature, rejects payloads more than ~5 minutes old, and returns the parsed event. It reads the `whsec_` secret from `ANTHROPIC_WEBHOOK_SIGNING_KEY`. + +```python +import anthropic +from flask import Flask, request + +client = anthropic.Anthropic() # reads ANTHROPIC_WEBHOOK_SIGNING_KEY from env +app = Flask(__name__) + + +@app.route("/webhook", methods=["POST"]) +def webhook(): + try: + event = client.beta.webhooks.unwrap( + request.get_data(as_text=True), + headers=dict(request.headers), + ) + except Exception: + return "invalid signature", 400 + + if event.id in seen_event_ids: # dedupe retries — id is per-event, not per-delivery + return "", 204 + seen_event_ids.add(event.id) + + match event.data.type: + case "session.status_idled": + session = client.beta.sessions.retrieve(event.data.id) + notify_user(session) + case "vault_credential.refresh_failed": + alert_oncall(event.data.id) + + return "", 204 +``` + +Pass the **raw request body** to `unwrap()` — frameworks that re-serialize JSON (Express `.json()`, Flask `.get_json()`) change the bytes and break the MAC. For other languages, look up the `beta.webhooks.unwrap` binding in the SDK repo (`shared/live-sources.md`); don't hand-roll verification. + +--- + +## Payload envelope + +```json +{ + "type": "event", + "id": "event_01ABC...", + "created_at": "2026-03-18T14:05:22Z", + "data": { + "type": "session.status_idled", + "id": "session_01XYZ...", + "organization_id": "8a3d2f1e-...", + "workspace_id": "c7b0e4d9-..." + } +} +``` + +Switch on `data.type`, fetch the resource by `data.id`, return any **2xx** to acknowledge. `created_at` is when the *state transition* happened, not when the webhook fired. + +--- + +## Supported `data.type` values + +| `data.type` | Fires when | +|---|---| +| `session.status_scheduled` | Session created and ready to accept events | +| `session.status_run_started` | Agent execution kicked off (every transition to `running`) | +| `session.status_idled` | Agent awaiting input (tool approval, custom tool result, or next message) | +| `session.status_terminated` | Session hit a terminal error | +| `session.thread_created` | Multiagent: coordinator opened a new subagent thread | +| `session.thread_idled` | Multiagent: a subagent thread is waiting for input | +| `session.outcome_evaluation_ended` | Outcome grader finished one iteration | +| `vault.archived` | Vault was archived | +| `vault.created` | Vault was created | +| `vault.deleted` | Vault was deleted | +| `vault_credential.archived` | Vault credential was archived | +| `vault_credential.created` | Vault credential was created | +| `vault_credential.deleted` | Vault credential was deleted | +| `vault_credential.refresh_failed` | MCP OAuth vault credential failed to refresh | + +> These are **webhook** `data.type` values — a separate namespace from SSE event types (`session.status_idle`, `span.outcome_evaluation_end`, etc. in `shared/managed-agents-events.md`). Don't reuse SSE constants in webhook handlers. + +--- + +## Delivery behavior & pitfalls + +- **No ordering guarantee.** `session.status_idled` may arrive before `session.outcome_evaluation_ended` even if the evaluation finished first. Sort by envelope `created_at` if order matters. +- **Retries carry the same `event.id`.** At least one retry on non-2xx. Dedupe on `event.id`. +- **3xx is failure.** Redirects are not followed — update the URL in Console if your endpoint moves. +- **Auto-disable** after ~20 consecutive failed deliveries, or immediately if the hostname resolves to a private IP or returns a redirect. Re-enable manually in Console. +- **Thin payload is intentional.** Don't expect `stop_reason`, `outcome_evaluations`, credential secrets, etc. on the webhook body — fetch the resource.