diff --git a/docs/observability.md b/docs/observability.md new file mode 100644 index 0000000000..9d03fe26fc --- /dev/null +++ b/docs/observability.md @@ -0,0 +1,134 @@ +# Observability + +MCP applications often need traces, metrics, or structured logs around tool, +resource, and prompt activity. Transport middleware is useful for HTTP-level +events, but MCP primitive activity is best observed at the MCP request layer +where the protocol method and request parameters are still visible. + +## Where to Instrument + +Use the narrowest layer that has the data you need: + +| Layer | Use for | Notes | +| --- | --- | --- | +| ASGI middleware | HTTP status codes, headers, auth, reverse-proxy behavior | This sees transport requests, not every MCP primitive operation. Streamable HTTP and SSE can multiplex multiple MCP messages through a long-lived transport. | +| `Server.middleware` | Server-side MCP requests and notifications | Wraps `initialize`, unknown methods, validation failures, and registered handlers. This is the usual place for server spans around `tools/call`, `resources/read`, and `prompts/get`. | +| Client wrapper code | Client-side outgoing MCP requests | Wrap calls such as `client.call_tool()`, `client.read_resource()`, or `client.get_prompt()` when you want the caller-side span or metric. | +| Handler code | Domain-specific work inside a tool, resource, or prompt | Use this for application details such as database queries, external API calls, cache hits, or business identifiers. | + +## Server-Side Middleware + +`Server.middleware` runs around every inbound MCP request before params are +validated and before the registered handler is invoked. A middleware can record +duration, success or failure, the protocol method, and a safe target name. + +```python title="server_observability.py" +import time +from collections.abc import Mapping +from typing import Any + +from mcp.server import Server, ServerRequestContext +from mcp.server.context import CallNext, HandlerResult + + +async def observe_mcp_request( + ctx: ServerRequestContext[Any, Any], + method: str, + params: Mapping[str, Any] | None, + call_next: CallNext, +) -> HandlerResult: + started = time.perf_counter() + target = params.get("name") if isinstance(params, Mapping) else None + + try: + result = await call_next() + except Exception: + duration_ms = (time.perf_counter() - started) * 1000 + print( + "mcp.request failed", + { + "method": method, + "target": target, + "request_id": ctx.request_id, + "duration_ms": round(duration_ms, 2), + }, + ) + raise + + duration_ms = (time.perf_counter() - started) * 1000 + print( + "mcp.request completed", + { + "method": method, + "target": target, + "request_id": ctx.request_id, + "duration_ms": round(duration_ms, 2), + }, + ) + return result + + +server = Server("observed-server", on_call_tool=...) +server.middleware.append(observe_mcp_request) +``` + +For OpenTelemetry, the same pattern can create a span around `await call_next()` +instead of printing. Keep exported attributes small and safe: method name, +request id, status, duration, and the prompt/resource/tool name are usually +enough. Avoid recording tool arguments, resource contents, prompt text, tokens, +or authentication data unless your application has explicitly classified them +as safe to export. + +## Primitive Span Shape + +A practical span and metric shape is: + +| MCP method | Suggested span name | Useful attributes | +| --- | --- | --- | +| `tools/call` | `MCP tools/call ` | `mcp.method.name`, `mcp.tool.name`, `jsonrpc.request.id`, status | +| `resources/read` | `MCP resources/read ` | `mcp.method.name`, a low-cardinality resource identifier, `jsonrpc.request.id`, status | +| `prompts/get` | `MCP prompts/get ` | `mcp.method.name`, `mcp.prompt.name`, `jsonrpc.request.id`, status | +| `*/list` | `MCP ` | `mcp.method.name`, result count when safe | + +Prefer low-cardinality attributes. For example, use a resource scheme or +template name instead of the full resource URI if the URI may contain document +ids, user ids, or file paths. + +## Request Tracing vs Primitive Tracing + +Request-level tracing answers "which MCP message was handled?" Primitive-level +tracing answers "which tool, resource, or prompt did the application execute?" +Most production systems need both: + +1. A request span around the MCP method, created in middleware. +2. Optional child spans inside handlers for application work such as model + calls, database queries, network calls, or filesystem operations. + +Do not rely only on HTTP middleware for primitive tracing. With streamable HTTP +or SSE, HTTP request boundaries do not always line up with MCP method +boundaries, and headers may only be present on the transport request rather +than each MCP message. + +## Client-Side Calls + +Client applications can use the same naming scheme around outgoing SDK calls: + +```python title="client_observability.py" +import time + + +async def observed_call_tool(client, name: str, arguments: dict): + started = time.perf_counter() + try: + return await client.call_tool(name, arguments) + finally: + duration_ms = (time.perf_counter() - started) * 1000 + print( + "mcp.client.call_tool", + {"tool": name, "duration_ms": round(duration_ms, 2)}, + ) +``` + +If you propagate trace context between client and server, put it in the MCP +request metadata rather than assuming transport headers will be available for +each logical request. diff --git a/mkdocs.yml b/mkdocs.yml index cb89faf0f0..9c6ca69c5c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -18,6 +18,7 @@ nav: - Concepts: concepts.md - Low-Level Server: low-level-server.md - Authorization: authorization.md + - Observability: observability.md - Testing: testing.md - API Reference: api/