Structured Error Responses for MCP Tools

The Structured Error Responses for MCP Tools topic sits inside CCA-F Domain 2 (Tool Design and MCP Integration, 18% weight) as task statement 2.2. It is one of the highest-leverage pages in the entire Claude Certified Architect Foundations exam because every scenario cluster — Customer Support Resolution, Multi-Agent Research, Code Generation, CI/CD, Structured Data Extraction, Developer Productivity — contains at least one tool that can fail, and the exam repeatedly punishes candidates who model those failures as flat text strings instead of structured envelopes. Structured Error Responses for MCP Tools is the contract that lets an agent loop recover intelligently rather than retry blindly, give up silently, or apologize to the end user without taking any corrective action.

This study note walks the full surface of Structured Error Responses for MCP Tools that a CCA-F candidate must internalize: the isError flag that tells Claude a tool call did not succeed, the errorCategory taxonomy that classifies why it did not succeed, the isRetryable boolean that informs the recovery decision, and the human-readable content body that gives the agent something to reason about. The note also draws the bright line between an access failure (a tool that could not answer the question) and a valid empty result (a tool that answered correctly and the answer was "nothing matched") — a distinction that collapses customer-support agents when it is ignored. A final Practice Anchors section ties the vocabulary to the Multi-Agent Research scenario, where structured error context has to propagate cleanly from subagent up to coordinator.

Why Structured Error Responses Matter — Enabling Intelligent Agent Recovery vs Silent Failure

A Claude agent that calls an MCP tool has three possible observations: the tool succeeded and returned useful content, the tool succeeded and returned a valid empty result, or the tool failed. The job of Structured Error Responses for MCP Tools is to make the third case legible to the agent so that the next iteration of the loop can take a sensible action.

Generic error handling fails for two reasons. First, a uniform string like "Operation failed" gives Claude no signal to differentiate recovery paths. Second, without an isError boolean the model may treat a plain error string as valid data and continue as if everything succeeded, which is how production agents end up telling users "your refund has been processed" when in fact the refund tool raised a permission exception.

Structured error responses fix both failures by giving Claude three orthogonal axes: did it fail (isError), what kind of failure (errorCategory), and should you retry (isRetryable). Every CCA-F scenario-based question about Structured Error Responses for MCP Tools is, underneath the surface, asking you to pick the combination of those three fields that maximizes the coordinator agent's ability to recover without human intervention.

The exam is not testing whether you know that errors exist. It is testing whether you can design a tool response envelope that lets Claude decide between retry, abort, re-ask the user, escalate to a human, or pick a different tool. If your envelope is a plain string, none of those decisions are possible and the agent will either loop forever or surrender silently. Source ↗

The MCP Tool Result Envelope — content, isError, errorCategory, isRetryable

Every MCP tool invocation returns a tool result envelope that Claude consumes on the next turn. In the Messages API this envelope is a tool_result content block attached to a user-role message; in Claude Code it is surfaced as the observation that feeds back into the agentic loop. Four fields are load-bearing for Structured Error Responses for MCP Tools:

content — the human-readable payload. For success cases this is the data the tool produced. For error cases it is a description of what went wrong.
isError — a boolean. true means the tool call did not succeed; false or absent means success.
errorCategory — a classification string (TRANSIENT, VALIDATION, PERMISSION, BUSINESS, NOT_FOUND, UNKNOWN). Used only when isError is true.
isRetryable — a boolean. true means an immediate retry has a reasonable chance of succeeding; false means retrying without intervention will produce the same failure.

isError is the boolean flag on an MCP tool result envelope that tells the calling agent whether the tool execution succeeded. When isError: true, Claude treats the content payload as an error description and will not treat its contents as data. When isError is absent or false, Claude treats content as successful tool output. The field is the primary signal that drives the entire recovery decision tree; without it, Claude cannot distinguish a legitimate empty result from a silent failure. Source ↗

The content body is not optional even when isError: true. Claude needs text to reason about. A tool that fails without returning any description of the failure forces the agent to guess, which is how exam distractor answers end up looking like "the agent apologizes and ends the conversation without taking action."

The isError Field — Boolean Signal That Tool Execution Did Not Succeed

isError is the single most important bit in the envelope. It is the membrane between "this is data" and "this is a failure description." A well-designed MCP tool sets isError: true whenever the invocation could not produce the semantic answer the caller requested, and sets it to false (or omits it) when the tool succeeded even if the answer is empty.

The distinction between isError: true and a merely unsuccessful outcome is subtle and frequently tested. If a customer-support agent asks a lookup_order tool for order #12345 and the order does not exist, the right response depends on the contract:

If the tool's contract is "return the order or signal that nothing matched," the correct envelope is isError: false with content: "No order found with ID 12345." That is a valid empty result.
If the tool's contract is "return the order," and the lookup cannot complete because the database is unreachable, the correct envelope is isError: true with errorCategory: "TRANSIENT" and isRetryable: true. That is an access failure.

The two look similar from a naive HTTP perspective — both might be a 404 or a 5xx — but they have opposite implications for the agent's next move. Miss the distinction and the agent will retry queries that have already been correctly answered (wasting tokens and latency) or accept silent database failures as "no data" (producing false negatives in the conversation).

Rule of thumb: ask "did the tool do its job?" If the answer is yes (even if the result set is empty), set isError: false. If the answer is no (the tool could not fulfil its contract), set isError: true and populate errorCategory and isRetryable so the agent can decide what to do next. Source ↗

errorCategory Values — TRANSIENT, VALIDATION, PERMISSION, BUSINESS, NOT_FOUND, UNKNOWN

errorCategory is the taxonomy that tells Claude why a call failed. CCA-F questions on Structured Error Responses for MCP Tools lean heavily on your ability to pick the right category for a given scenario.

TRANSIENT

Temporary conditions that may clear on retry: network timeouts, upstream rate limits, service unavailability, transient database contention. Pair with isRetryable: true (usually) and expect the agent to apply backoff.

VALIDATION

The caller supplied input that violated the tool's schema or domain constraints: malformed date, invalid currency code, a field that exceeds a max length, a required parameter missing semantic plausibility. Pair with isRetryable: false and give the agent enough information to correct the call (naming the offending parameter and why it is invalid). Retrying the same input will fail identically.

PERMISSION

The caller is authenticated but not authorized to perform the operation: an agent asked to read a record tied to a different customer, a tool call blocked by policy, an API key that lacks the required scope. Pair with isRetryable: false. The agent should either escalate to a human, switch to a tool with broader scope, or explain the limitation to the user.

BUSINESS

A domain-level rule rejects the operation even though the input is technically valid and the caller is authorized: a refund request that violates the refund-window policy, a transfer that would breach a daily limit, a cancellation that is no longer possible because fulfillment has started. Pair with isRetryable: false and a human-readable policy description. This is the category most often mislabeled as VALIDATION; the difference matters because the agent's recovery path is fundamentally different — a business error requires explaining policy, not correcting input.

NOT_FOUND

A specific resource was requested and does not exist. Many tool designs prefer to surface NOT_FOUND as an ordinary empty result (isError: false), but when the caller needs a definitive signal — for example, a write operation that requires a prior record — the explicit NOT_FOUND error is clearer than a silent empty success. Pair with isRetryable: false unless the missing resource could plausibly appear on a retry.

UNKNOWN

A reserved bucket for failures the tool cannot classify. Treat as transient-plus-operator-attention: set isRetryable: false by default and include whatever diagnostic detail exists in content. Prefer precise categories whenever the tool can distinguish them; overuse of UNKNOWN is a smell.

errorCategory is the classification field on a structured MCP error response that tells the agent why a call failed. The canonical categories are TRANSIENT (temporary, retry likely helps), VALIDATION (invalid input, must fix before retry), PERMISSION (not authorized, retry will not help), BUSINESS (policy violation, cannot be fixed by retrying or changing input), NOT_FOUND (resource missing), and UNKNOWN (unclassified). Each category implies a different recovery strategy, so conflating them — especially BUSINESS with VALIDATION — produces broken agent loops that either retry forever or apologize without action. Source ↗

The isRetryable Field — Informing the Agent Whether Immediate Retry Is Worthwhile

isRetryable is the second-order signal that collapses the policy decision "should the agent try again?" into a single boolean. The field is not a description of retry logic (backoff, jitter, max attempts); it is a statement about whether the underlying failure is the kind that could succeed on an identical re-invocation.

When `isRetryable: true` is Correct

Network timeouts, connection resets, DNS failures.
HTTP 429 (rate limit) — the rate-limit window will roll.
HTTP 503 (service unavailable) — the upstream may recover.
Transient database deadlocks.

When `isRetryable: false` is Correct

Any VALIDATION error — the input will be identical and identically rejected.
Any PERMISSION error — the caller's scope has not changed.
Any BUSINESS error — the policy engine will produce the same verdict.
Most NOT_FOUND errors (unless the resource is expected to materialize).

Why `isRetryable` Is Separate From `errorCategory`

There is no deterministic mapping from category to retryability. A TRANSIENT failure that has already exhausted an upstream retry budget should be surfaced with isRetryable: false so the agent does not keep pounding. A NOT_FOUND error for a record that is being asynchronously created might legitimately be retryable. Keep the two fields independent and set them both explicitly.

isRetryable is the boolean field on a structured MCP error response that tells the agent whether an identical retry has a reasonable probability of succeeding. true signals that the failure is environmental (timeout, rate limit, transient unavailability) and backoff-plus-retry is a valid recovery path. false signals that the failure is deterministic (invalid input, missing permission, policy violation) and retrying without changing the input or the caller's authorization will fail identically. The field is orthogonal to errorCategory; a single category can produce either value depending on context. Source ↗

Error Message Content — Human-Readable Description in content Field

The content body of a structured error envelope is the text Claude actually reads. It should give the agent enough detail to:

Explain the failure to the end user in non-technical language.
Decide whether a different tool, parameter adjustment, or escalation is warranted.
Produce a trace that a human reviewer can follow during post-hoc debugging.

A good error content string names the operation that failed, identifies the offending input where applicable, states whether the condition is transient, and — for business rules — summarizes the policy in customer-friendly language. A bad error content string is "Error." or a raw Python stack trace.

For BUSINESS errors in customer-facing agents, the text should explicitly be written for the end user. "Refunds are only available within 30 days of purchase; this order was placed 47 days ago" is infinitely more useful than "policy_engine_violation: rule_id=REFUND_WINDOW_EXCEEDED."

Two-audience principle: the content field has to satisfy the agent (so it can route the response) and the end user (because the agent will often quote the text directly). Write it in plain language with enough specificity that quoting it verbatim produces a helpful reply. Source ↗

Recovery Strategy Matrix — isError + isRetryable + errorCategory → Agent Decision

The full decision surface is a three-field cube, but in practice only a handful of combinations appear in exam scenarios. Memorize the mapping.

isError	errorCategory	isRetryable	Canonical Agent Response
false	(n/a)	(n/a)	Proceed with the returned `content` as data (including valid empty results).
true	TRANSIENT	true	Apply exponential backoff and retry; cap at the loop's retry budget.
true	TRANSIENT	false	Retry budget exhausted — surface to human or fall back to alternative tool.
true	VALIDATION	false	Correct the offending parameter using hints in `content` and retry with the corrected input.
true	PERMISSION	false	Escalate to a tool with broader scope or ask the user for authorization.
true	BUSINESS	false	Explain the policy to the user in the language from `content`; do not retry.
true	NOT_FOUND	false	Re-ask the user for the correct identifier or treat as empty if contract allows.
true	UNKNOWN	false	Log, surface to operator, provide best-effort explanation to user.

Every row in this matrix is a plausible CCA-F distractor; the exam's job is to see whether you can pick the correct combination given a scenario's cues about timeouts, user input, or policy rules.

Transient Error Handling — Exponential Backoff Triggered by isRetryable: true

Transient failures are the canonical retry case. When the agent sees isError: true, errorCategory: "TRANSIENT", isRetryable: true, the correct loop behavior is:

Increment a retry counter for this tool call.
Back off for an interval drawn from an exponential schedule with jitter (for example, base × 2^attempt × random(0.5, 1.5)).
Re-invoke the tool with the identical input.
If the retry budget is exhausted, surface the original error or switch to a fallback.

Retry policy lives in the agent loop, not in the tool. The tool's only job is to label the failure so the agent can make the decision. A tool that silently retries internally hides information from the agent and breaks the observability of the loop.

The CCA-F exam cares about where retry logic lives. Tools return structured error envelopes; agents apply retry policy. A tool that loops internally on transient failures looks identical to a successful call from the agent's perspective and prevents the agent from budgeting tokens, latency, or user-visible progress messages. Source ↗

Validation Errors — Returning VALIDATION With Parameter Correction Hints

Validation errors happen when the caller violates the tool's schema or domain constraints. Good validation errors do two things:

Name the offending parameter (for example, "end_date must be on or after start_date; received end_date=2026-01-15, start_date=2026-02-01").
State the constraint explicitly so the agent can generate a conforming retry.

Do not just echo back "Bad input." The agent has a finite number of loop iterations to converge on a valid call, and every iteration without actionable guidance is wasted.

A validation error is always isRetryable: false at the tool level because an identical retry will fail identically. The agent is expected to change the input before retrying, which is a different operation — not a retry of the same call.

Permission Errors — PERMISSION_DENIED Signals Escalation or Alternative Path

Permission errors mean the caller is known but not authorized. Common shapes:

A customer-service agent tries to access a record belonging to a different customer.
A developer-productivity agent tries to write to a repository the user lacks commit rights to.
A multi-agent research subagent tries to call an API outside its tool allowlist.

The structured response should identify the missing permission in content so the coordinator agent can choose a remediation: re-ask the user to authenticate, escalate to a human, or switch to a different tool with appropriate scope. The isRetryable: false flag is critical — a retry with the same credentials will fail identically and waste loop iterations.

CCA-F distractors often present "retry with exponential backoff" as the correct response to a permission error. It is never correct. Permission errors do not heal with time. Set isRetryable: false and give the agent a branching signal (different tool, human escalation, user re-authentication) rather than a timer. Source ↗

Business Errors — retriable: false With Customer-Friendly Explanations for Policy Violations

Business errors are where most production agents break. A refund tool that returns "error: REFUND_WINDOW_EXCEEDED" with no further context leaves the agent with two bad options: retry (wrong — the policy will reject it identically) or apologize generically (wrong — the user deserves a specific explanation).

The correct structured response is:

{
  "isError": true,
  "errorCategory": "BUSINESS",
  "isRetryable": false,
  "content": "Refunds are only available within 30 days of purchase. This order (#12345) was placed 47 days ago and is outside the refund window. If you believe this is an exception, please escalate to a supervisor."
}

The agent can quote the content string directly to the customer, and it will read as a courteous, specific explanation. The isRetryable: false flag prevents the agent from wasting iterations on a retry that cannot succeed. The errorCategory: "BUSINESS" lets the coordinator route the situation to an escalation queue rather than a technical-debugging queue.

This is the single highest-value pattern on the CCA-F exam because it appears explicitly in the Customer Support Resolution scenario cluster and implicitly in every other scenario that touches policy. The pass reports from the community cite this pattern by name: "use structured error responses with errorCategory/isRetryable and customer-friendly descriptions for business rule violations."

Access Failures vs Valid Empty Results — The Single Most Tested Distinction

A failed tool call and a successful tool call that returned nothing look similar on the wire but require opposite agent behaviors. The Structured Error Responses for MCP Tools envelope is what draws the line.

Access Failure

The tool could not complete its job: the database was unreachable, the upstream API timed out, authentication failed, the schema rejected the input. The agent needs to make a retry / abort / escalate decision.

Envelope shape:

{
  "isError": true,
  "errorCategory": "TRANSIENT",
  "isRetryable": true,
  "content": "Unable to reach order database (timeout after 5s)."
}

Valid Empty Result

The tool completed its job and the answer is legitimately "nothing." The customer has no open orders. The search returned zero matches. There are no outstanding tickets on this account.

Envelope shape:

{
  "isError": false,
  "content": "No orders found for customer #8421."
}

The two shapes have opposite implications. The first says "I could not answer the question, do something about it." The second says "I answered the question, the answer is empty, proceed with that knowledge."

An access failure occurs when a tool cannot fulfil its contract and therefore cannot report a semantic answer (timeout, permission denied, database unreachable, malformed input). It is represented as isError: true plus an errorCategory and isRetryable signal so the agent can decide whether to retry, reroute, or escalate. A valid empty result occurs when a tool did fulfil its contract and the answer happens to be empty (no matching records, zero-length list, absent optional value). It is represented as isError: false with descriptive content. Conflating the two is one of the highest-frequency architecture mistakes on the CCA-F exam because it breaks agent recovery — either the agent retries queries that were already correctly answered, or it accepts silent infrastructure failures as legitimate "no data" responses and gives the user a wrong answer with confidence. Source ↗

Wrapping External API Errors — Translating HTTP Status Codes to MCP Error Fields

Most MCP tools are thin wrappers around third-party APIs. The tool's job is to translate the upstream error surface into the structured envelope Claude understands. A rough mapping from HTTP semantics to MCP fields:

Upstream signal	isError	errorCategory	isRetryable
200 with data	false	(n/a)	(n/a)
200 with empty list	false	(n/a)	(n/a)
400 (bad request)	true	VALIDATION	false
401 (unauthenticated)	true	PERMISSION	false
403 (forbidden)	true	PERMISSION	false
404 (not found)	true	NOT_FOUND	false (usually)
409 (conflict, policy)	true	BUSINESS	false
422 (unprocessable)	true	VALIDATION	false
429 (rate limited)	true	TRANSIENT	true
500 (server error)	true	TRANSIENT	true (bounded)
502/503/504 (gateway)	true	TRANSIENT	true

The mapping is a guideline, not a rule. A tool author who understands the upstream semantics should be willing to re-classify: a 409 that represents a business-rule conflict belongs in BUSINESS; a 409 that represents a version-mismatch that a retry-with-fresh-read could solve belongs in TRANSIENT.

Plain-Language Explanation: Structured Error Responses for MCP Tools

Abstract taxonomies stick when you anchor them to familiar systems. Three analogies cover the full sweep of Structured Error Responses for MCP Tools.

Analogy 1: The Hospital Triage Desk

Walk into a hospital emergency room. The triage nurse does not just say "something is wrong." She writes down three pieces of information on a card: whether there is a problem at all (isError), what kind of problem it is (errorCategory — cardiac, orthopedic, neurological, administrative), and whether the patient can come back later or needs immediate treatment (isRetryable). The doctor upstream reads the card and picks a specific course of action: admit, treat now, schedule for tomorrow, refer to another specialist.

If the triage nurse wrote only "patient has a problem," the doctor would have to re-diagnose every patient from scratch. That is what generic error strings do to Claude agents. Structured Error Responses for MCP Tools are the triage card that lets the agent act with the right urgency and direction.

The analogy maps cleanly:

isError = "is there actually a problem?"
errorCategory = "what kind of problem?" (TRANSIENT = minor and likely to clear, VALIDATION = patient gave wrong info, PERMISSION = not insured for this treatment, BUSINESS = not covered by policy, NOT_FOUND = no record exists, UNKNOWN = needs further workup).
isRetryable = "can the patient come back later with the same info and get seen?"
content = the human-readable notes that travel with the card so the next clinician understands context.

Analogy 2: The Postal System

A package handed to a courier has four possible fates. It arrives (isError: false, and the contents are the delivered data). It is undeliverable because the street was blocked by a storm (TRANSIENT, isRetryable: true — try again tomorrow). It is undeliverable because the address is malformed (VALIDATION, isRetryable: false — the sender must fix the address). It is undeliverable because the recipient refuses delivery (PERMISSION, isRetryable: false). It is undeliverable because customs has forbidden the item (BUSINESS, isRetryable: false — explain the policy to the sender).

The fifth case is subtle and vital: the courier successfully delivers, but the recipient has nothing to accept — the delivery slot is empty, the signature was waived, no one was home in a drop-OK arrangement. That is a valid empty result (isError: false, content: "delivered, no signature required") and it is not a failure at all. Treating "no signature" as a delivery failure would cause the courier to retry a delivery that already completed — which is exactly how agents break when they conflate access failures with valid empty results.

Analogy 3: The Kitchen Ticket Rail

A busy restaurant kitchen runs on tickets. Each ticket is either satisfied (isError: false) or returned to the expediter with a reason. The reason matters: ingredient out of stock is TRANSIENT (the chef will get more tomorrow, try again later), wrong order item (VALIDATION — the server wrote chicken when the guest said fish, fix the ticket), guest refused after ordering (PERMISSION/BUSINESS depending on the house rules — no retry possible), dish on the ticket is no longer on the menu (NOT_FOUND).

The expediter — the coordinator — uses the category on the returned ticket to decide whether to push it back to the line, walk it out to the guest with an apology, or route it to the manager. A ticket that just says "problem" forces the expediter to investigate every return personally, which destroys throughput. Structured error responses are the kitchen convention that keeps the line moving.

Which Analogy to Use

Questions about routing decisions (which category triggers which agent branch) → triage desk.
Questions about retryability and the access-failure-vs-empty-result distinction → postal system.
Questions about multi-agent propagation where a subagent reports back to a coordinator → kitchen ticket rail.

Common Exam Traps — Generic Errors, Retryable Confusion, Empty Result Conflation

The CCA-F exam repeatedly exploits a small set of trap patterns around Structured Error Responses for MCP Tools. Recognize them and you will pick off three to five questions that trip most candidates.

Trap 1: Uniform "Operation Failed" Text

Distractor answers propose returning a single error string like "Operation failed" or "An error occurred." These answers look safe and conservative. They are wrong because they strip the three signals (isError, errorCategory, isRetryable) the agent needs to recover. Correct answers always include structure.

Trap 2: `isRetryable: false` Means Terminate the Loop

Candidates read isRetryable: false and conclude the entire agent loop must abort. That is not what the field means. isRetryable: false means this specific call with this specific input will not succeed on a naive retry. The agent can still change the input (for VALIDATION), escalate (for PERMISSION), explain the policy to the user (for BUSINESS), or switch tools. The loop continues; the retry is what is prevented.

Trap 3: VALIDATION vs BUSINESS Conflated

A refund that is denied because the window expired is tempting to label as VALIDATION because the request is "bad." It is not. The input is syntactically valid and authorized; the domain policy rejects it. That is BUSINESS. The recovery path for BUSINESS is "explain the policy." The recovery path for VALIDATION is "correct the input and retry." Conflate them and the agent either loops forever trying to correct a valid input or apologizes without quoting the policy.

Trap 4: Treating a Valid Empty Result as an Error

A search that returns zero matches is a successful call. isError: false, content: "No matches found." CCA-F scenarios sometimes present this as isError: true with errorCategory: "NOT_FOUND" in a distractor. The test is whether you know the tool answered the question. When the contract is "return zero-or-more matches," zero is a legitimate answer.

Trap 5: Tool-Side Retry Hiding Failures from the Agent

Distractors propose "the tool retries internally up to 5 times before returning an error." This hides the retry budget from the agent loop and breaks observability. Retry logic belongs in the agent, informed by isRetryable from the tool. A tool that retries internally should do so only as an optimization under the hood of a single logical call, never as a substitute for a structured failure response.

The most common CCA-F mistake on task 2.2 is returning a uniform "Operation failed" string for every failure. Even if the tool cannot classify the failure precisely, the envelope should include isError: true, an errorCategory (UNKNOWN if nothing else fits), an isRetryable boolean, and a content body with enough detail to explain the situation to the end user. A scenario that offers "wrap everything in a generic error string" as an option is always wrong; pick the option that preserves the three structured fields. Source ↗

Practice Anchors — Multi-Agent Research Scenario: Structured Error Context for Coordinator Recovery

The Multi-Agent Research scenario is the richest testing ground for Structured Error Responses for MCP Tools because errors have to propagate cleanly from a subagent up through a coordinator and potentially into a user-facing summary.

The Canonical Setup

A coordinator agent fans out research tasks to three subagents. Each subagent has its own MCP tool set: web_search, fetch_url, extract_citations, summarize_source. Subagent A's fetch_url call hits a paywall and returns isError: true, errorCategory: "PERMISSION", isRetryable: false, content: "Source returned 403 — paywall requires institutional access." Subagent B's web_search hits a 429 and returns isError: true, errorCategory: "TRANSIENT", isRetryable: true, content: "Search rate limit exceeded, retry after 60s." Subagent C's extract_citations returns isError: false, content: "[]" — a valid empty result because the source had no formal citations.

What the Coordinator Must Do

For Subagent A: skip the paywalled source, note the limitation in the final report, do not retry.
For Subagent B: apply backoff and retry the search; if the retry budget is exhausted, note the partial coverage.
For Subagent C: accept the empty citation list as legitimate and move on. Do not re-ask the subagent. Do not treat the empty list as a failure.

Every CCA-F scenario question tied to Structured Error Responses for MCP Tools is ultimately asking whether the candidate can keep those three reactions straight when the envelopes differ by only one or two fields.

Practice Question Template A — Category Selection

A customer-support agent calls a process_refund tool. The refund is for an order placed 45 days ago; company policy limits refunds to 30 days. Which error envelope is correct?

(A) isError: true, errorCategory: VALIDATION, isRetryable: true
(B) isError: true, errorCategory: BUSINESS, isRetryable: false, content: "Refunds are only available within 30 days of purchase."
(C) isError: true, errorCategory: UNKNOWN, isRetryable: false
(D) isError: false, content: "Refund not applicable."

Correct answer: (B). The input is valid and authorized; the domain policy rejects it. BUSINESS with isRetryable: false and a customer-friendly description is the right shape.

Practice Question Template B — Access Failure vs Empty Result

A research subagent calls list_open_tickets for customer #4421. The customer is known and authorized, and the customer has no open tickets. Which envelope is correct?

(A) isError: true, errorCategory: NOT_FOUND, isRetryable: false
(B) isError: true, errorCategory: TRANSIENT, isRetryable: true
(C) isError: false, content: "No open tickets found for customer #4421."
(D) isError: true, errorCategory: UNKNOWN, isRetryable: false

Correct answer: (C). The tool fulfilled its contract; the answer is legitimately empty. This is a valid empty result, not an access failure.

Practice Question Template C — Coordinator Propagation

Subagent A returns isError: true, errorCategory: TRANSIENT, isRetryable: true after the first attempt at fetch_url. The coordinator has a retry budget of 3 per subagent call. What should the coordinator do?

(A) Abort the entire research workflow.
(B) Re-invoke fetch_url with backoff, up to the remaining retry budget, before surfacing the failure.
(C) Retry immediately without backoff.
(D) Drop the subagent and skip the source silently.

Correct answer: (B). TRANSIENT + isRetryable: true invites a bounded retry with backoff; the retry budget lives in the coordinator, not the tool.

Structured Error Responses for MCP Tools Frequently Asked Questions (FAQ)

What is the difference between `isError` and `isRetryable` in an MCP tool response?

isError is a boolean that tells Claude whether the tool call succeeded at all. When isError: true, the content body is treated as an error description rather than data. isRetryable is a separate boolean that tells Claude whether an identical retry of the failed call has a reasonable chance of succeeding. The two are independent: isError: true must be accompanied by an explicit isRetryable value because category alone does not determine retryability (a TRANSIENT failure may be non-retryable if the budget is exhausted; a NOT_FOUND may be retryable if the resource is being asynchronously created). For CCA-F, treat them as orthogonal fields that together with errorCategory form the three-axis decision space the agent uses to pick a recovery path.

How do I decide between `errorCategory: VALIDATION` and `errorCategory: BUSINESS`?

Ask whether the input was syntactically valid and the caller was authorized. If the answer is "no, the input violated the schema or domain constraints" (wrong type, missing required field, out-of-range value), the category is VALIDATION and the agent should correct the input before retrying. If the input was valid and the caller was authorized but a domain policy rejected the operation (refund outside the window, transfer above a daily limit, cancellation after fulfillment), the category is BUSINESS and the agent should explain the policy to the user rather than attempt a retry. The distinction matters because the recovery paths differ: VALIDATION triggers input correction; BUSINESS triggers a policy explanation. Conflating them produces agents that either loop forever trying to fix a valid request or apologize generically without quoting the actual rule.

Should a tool retry internally before returning an error?

No — retry policy belongs in the agent loop, not inside the tool. A tool that retries internally hides the retry budget from Claude and breaks loop observability: the agent cannot budget tokens, latency, or user-visible progress messages, and cannot switch to a fallback when the overall time budget is tight. The correct pattern is to surface the failure via a structured envelope with the appropriate isRetryable signal and let the coordinator decide whether to apply backoff and retry. Internal retries may be acceptable as a performance optimization within a single logical call (for example, retrying a single underlying HTTP request once inside a larger semantic operation) but never as a substitute for returning a structured failure response when the overall operation cannot succeed.

How do I tell the difference between an access failure and a valid empty result?

Ask whether the tool fulfilled its contract. If the contract was "return matching records or signal that none matched" and the result is "no matches," the tool succeeded and the correct envelope is isError: false with descriptive content (a valid empty result). If the contract was "return the record" and the tool could not complete because of a timeout, permission issue, or upstream outage, the tool failed and the correct envelope is isError: true with an errorCategory and isRetryable signal (an access failure). The two shapes look similar on the wire but imply opposite agent behaviors: empty results are consumed as data and the loop proceeds; access failures trigger the recovery decision tree. Conflating them is one of the most frequently tested architecture mistakes on the CCA-F exam.

What should go in the `content` field when `isError: true`?

Human-readable text that satisfies two audiences: the agent (so it can route the response and choose a recovery action) and the end user (because the agent will often quote the text directly in customer-facing replies). For TRANSIENT errors, name the underlying condition and any known retry hint. For VALIDATION, name the offending parameter and the constraint it violated. For PERMISSION, state what the caller is not authorized to do. For BUSINESS, summarize the policy in plain language with enough specificity that quoting it produces a courteous reply. Avoid raw stack traces, internal rule IDs without explanation, and generic phrases like "Operation failed" — all three strip signal and force the agent to either guess or apologize generically.

How does structured error propagation work in multi-agent systems?

In a coordinator-subagent architecture, each subagent's MCP tool calls return structured error envelopes to the subagent itself. The subagent is responsible for its own first-line recovery — applying backoff on TRANSIENT retries, correcting input on VALIDATION, and so on. Errors that the subagent cannot resolve propagate upward as part of the subagent's own result to the coordinator, ideally preserving the errorCategory and isRetryable signals so the coordinator can make an informed decision (route to a different subagent, surface to the user, abort the workflow). The Multi-Agent Research scenario explicitly tests this propagation: a subagent that hits a paywall must surface PERMISSION clearly enough that the coordinator can skip the source and document the gap in the final report, rather than crashing or silently omitting.

Does `isRetryable: false` mean the agent should terminate the loop?

No. isRetryable: false means this specific call with this specific input and this specific caller cannot succeed on a naive retry. It does not mean the agent has run out of options. For a VALIDATION error, the agent should correct the input and retry (a different call). For a PERMISSION error, the agent should escalate, re-authenticate, or switch to a tool with broader scope. For a BUSINESS error, the agent should explain the policy to the user and perhaps offer an alternative. The loop continues in all three cases; what is prevented is a blind retry of the identical failed invocation. CCA-F distractors often suggest "abort the workflow" as the response to isRetryable: false — that answer is almost always wrong.

Why Structured Error Responses Matter — Enabling Intelligent Agent Recovery vs Silent Failure

The MCP Tool Result Envelope — content, isError, errorCategory, isRetryable

The isError Field — Boolean Signal That Tool Execution Did Not Succeed

errorCategory Values — TRANSIENT, VALIDATION, PERMISSION, BUSINESS, NOT_FOUND, UNKNOWN

TRANSIENT

VALIDATION

PERMISSION

BUSINESS

NOT_FOUND

UNKNOWN

The isRetryable Field — Informing the Agent Whether Immediate Retry Is Worthwhile

When isRetryable: true is Correct

When isRetryable: false is Correct

Why isRetryable Is Separate From errorCategory

Error Message Content — Human-Readable Description in content Field

Recovery Strategy Matrix — isError + isRetryable + errorCategory → Agent Decision

Transient Error Handling — Exponential Backoff Triggered by isRetryable: true

Validation Errors — Returning VALIDATION With Parameter Correction Hints

Permission Errors — PERMISSION_DENIED Signals Escalation or Alternative Path

Business Errors — retriable: false With Customer-Friendly Explanations for Policy Violations

Access Failures vs Valid Empty Results — The Single Most Tested Distinction

Access Failure

Valid Empty Result

Wrapping External API Errors — Translating HTTP Status Codes to MCP Error Fields

Plain-Language Explanation: Structured Error Responses for MCP Tools

Analogy 1: The Hospital Triage Desk

Analogy 2: The Postal System

Analogy 3: The Kitchen Ticket Rail

Which Analogy to Use

Common Exam Traps — Generic Errors, Retryable Confusion, Empty Result Conflation

Trap 1: Uniform "Operation Failed" Text

Trap 2: isRetryable: false Means Terminate the Loop

Trap 3: VALIDATION vs BUSINESS Conflated

Trap 4: Treating a Valid Empty Result as an Error

Trap 5: Tool-Side Retry Hiding Failures from the Agent

Practice Anchors — Multi-Agent Research Scenario: Structured Error Context for Coordinator Recovery

The Canonical Setup

What the Coordinator Must Do

Practice Question Template A — Category Selection

Practice Question Template B — Access Failure vs Empty Result

Practice Question Template C — Coordinator Propagation

Structured Error Responses for MCP Tools Frequently Asked Questions (FAQ)

What is the difference between isError and isRetryable in an MCP tool response?

How do I decide between errorCategory: VALIDATION and errorCategory: BUSINESS?

Should a tool retry internally before returning an error?

How do I tell the difference between an access failure and a valid empty result?

What should go in the content field when isError: true?

How does structured error propagation work in multi-agent systems?

Does isRetryable: false mean the agent should terminate the loop?

Further Reading

Official sources

When `isRetryable: true` is Correct

When `isRetryable: false` is Correct

Why `isRetryable` Is Separate From `errorCategory`

Trap 2: `isRetryable: false` Means Terminate the Loop

What is the difference between `isError` and `isRetryable` in an MCP tool response?

How do I decide between `errorCategory: VALIDATION` and `errorCategory: BUSINESS`?

What should go in the `content` field when `isError: true`?

Does `isRetryable: false` mean the agent should terminate the loop?