Agent DailyAgent Daily
tutorialintermediate

Classifier fallback and billing for Claude Fable 5 Jun 2026 • Responses Safeguards Billing Detect safety classifier blocks on Fable 5 and fall back to Opus 4.8 with server-side or SDK-based client-side fallback, including streaming behavior and the new billing changes.

cookbook
View original on cookbook

Claude Fable 5 includes safety classifiers that block requests in cybersecurity, biology, and reasoning extraction areas. Anthropic provides server-side and client-side fallback mechanisms to automatically retry blocked requests with Claude Opus 4.8, along with new billing changes that credit fallback token costs. This guide covers classifier block detection, fallback implementation strategies, streaming behavior, and billing adjustments for API customers.

Key Points

  • Fable 5 safety classifiers block three categories: offensive cybersecurity (exploits, malware), biology/life sciences (lab methods), and reasoning extraction attempts
  • Detect classifier blocks via stop_reason: 'refusal' with stop_details.category indicating 'cyber', 'bio', or 'reasoning_extraction'
  • Use server-side fallback (recommended) by passing fallbacks parameter with Opus 4.8 and server-side-fallback-2026-06-01 beta header for automatic retry
  • Server-side fallback is currently available on Claude API and Claude Platform on AWS; sticky-served turns route directly to fallback without additional fallback blocks
  • Detect fallback execution via fallback content blocks in response and usage.iterations tracking per-model usage across conversation turns
  • Billing changes provide fallback_credit_token for client-side implementations to bill fallback requests as cache reads instead of full token costs
  • When fallback model is unavailable (rate-limited/overloaded), API returns refusal with stop_details.recommended_model for manual retry
  • Client-side fallback requires SDK helpers and manual implementation of fallback logic when not using server-side feature
  • Classifiers are deliberately conservative and tuned for robustness, causing some benign technical work to trigger blocks; false-positive rates will improve post-launch
  • Branch fallback logic on stop_reason, not content or stop_details; treat null stop_details as generic refusal

Found this useful? Add it to a playbook for a step-by-step implementation guide.

Workflow Diagram

Start Process
Step A
Step B
Step C
Complete
Quality

Concepts

Artifacts (6)

fallback_unavailable_responsejsonconfig
{
  "stop_reason": "refusal",
  "stop_details": {
    "type": "refusal",
    "category": "cyber",
    "recommended_model": "claude-opus-4-8"
  }
}
server_side_fallback_curlbashcommand
curl https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: server-side-fallback-2026-06-01" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-fable-5",
    "max_tokens": 1024,
    "fallbacks": [
      {
        "model": "claude-opus-4-8"
      }
    ],
    "messages": [
      {
        "role": "user",
        "content": "Hello, world"
      }
    ]
  }'
fable5_fallback_setuppythonscript
import os
from dotenv import load_dotenv

load_dotenv()

PRIMARY_MODEL = "claude-fable-5"
FALLBACK_MODEL = "claude-opus-4-8"
SERVER_SIDE_FALLBACK_BETA = "server-side-fallback-2026-06-01"
FALLBACK_CREDIT_BETA = "fallback-credit-2026-06-01"

# Anthropic() reads ANTHROPIC_API_KEY from the environment. Add it to a .env
# file (loaded above) or export it in your shell before running the live examples.
if not os.environ.get("ANTHROPIC_API_KEY"):
    print("ANTHROPIC_API_KEY is not set - add it to .env or export it.")
classifier_block_responsejsonconfig
{
  "stop_reason": "refusal",
  "stop_details": {
    "type": "refusal",
    "category": "cyber",
    "explanation": "This request triggered restrictions on violative cyber content and was blocked under Anthropic's Usage Policy..."
  },
  "content": []
}
fallback_detection_helperspythonscript
def fallback_hops(response):
    """(from_model, to_model) for each hop that ran and blocked this turn."""
    hops = []
    for b in response.content:
        if getattr(b, "type", None) == "fallback":
            d = b.model_dump() if hasattr(b, "model_dump") else dict(b)
            hops.append((d["from"]["model"], d["to"]["model"]))
    return hops

def served_by_fallback(response):
    """True whenever a fallback model served the response, INCLUDING a sticky-served turn.
    usage.iterations is the best way to check whether a turn was served by a fallback model."""
    iters = getattr(response.usage, "iterations", None) or []
    return any(
        (i.get("type") if isinstance(i, dict) else getattr(i, "type", None)) == "fallback_message"
        for i in iters
    )
server_side_fallback_implementationpythonscript
from anthropic import Anthropic

client = Anthropic()

PRIMARY_MODEL = "claude-fable-5"
FALLBACK_MODEL = "claude-opus-4-8"
SERVER_SIDE_FALLBACK_BETA = "server-side-fallback-2026-06-01"

def chat_turn(messages, max_tokens=1024):
    """One API call; the server handles the fallback."""
    return client.beta.messages.create(
        model=PRIMARY_MODEL,
        max_tokens=max_tokens,
        messages=messages,
        betas=[SERVER_SIDE_FALLBACK_BETA],
        fallbacks=[{"model": FALLBACK_MODEL}],
    )

# Example usage
response = chat_turn([{"role": "user", "content": "Hello, world"}])
hops = fallback_hops(response)
for from_model, to_model in hops:
    print(f"[{from_model} blocked — continued on {to_model}]")
if not hops and served_by_fallback(response):
    print(f"[sticky: served directly by {response.model}]")
Classifier fallback and billing for Claude Fable 5 Jun 2026 • Responses Safeguards Billing Detect safety classifier blocks on Fable 5 and fall back to Opus 4.8 with server-side or SDK-based client-side fallback, including streaming behavior and the new billing changes. | Agent Daily