tutorialintermediate

Automatic context compaction Manage context limits in long-running agentic workflows by automatically compressing conversation history.

March 8, 2026cookbook

This cookbook demonstrates automatic context compaction for managing token limits in long-running agentic workflows. It shows how the Claude Agent Python SDK can automatically compress conversation history when token usage exceeds a threshold, enabling tasks to continue beyond the 200k token context limit. The example uses a customer service agent processing support tickets, where each ticket requires multiple tool calls that accumulate in conversation history. By implementing context compaction with the compaction_control parameter, agents can maintain focus and efficiency across many iterations without manual context management.

Key Points

•Context compaction automatically monitors token usage per turn and injects a summary prompt when thresholds are exceeded
•Long-running agentic workflows with tool use accumulate conversation history linearly—processing 20-30 tickets can quickly consume the entire context window
•The compaction_control parameter clears conversation history and resumes with only a compressed summary, allowing tasks to continue beyond typical limits
•For Claude Opus 4.6, use server-side compaction which handles context management automatically without SDK configuration
•SDK-based compaction is useful for older models or when using a cheaper model for summarization tasks
•Customer service workflows benefit significantly from compaction—each ticket requires 7+ tool calls (fetch, classify, search, prioritize, route, draft, complete)
•Summaries are wrapped in <summary></summary> tags to guide the model, though these tags aren't parsed by the system
•The beta_tool decorator makes functions accessible to Claude agents by extracting function arguments and docstrings as tool metadata
•Without compaction, by ticket #5 the context includes complete details from all previous tickets, causing exponential token growth
•Effective context engineering prevents performance degradation and 'context rot' in iterative agent workflows

Found this useful? Add it to a playbook for a step-by-step implementation guide.

Workflow Diagram

Start Process

Step A

Step B

Step C

Complete

Quality★★★★★

Concepts

Context Management Memory Systems Agent Teams Tool Use Coding Workflows

Artifacts (4)

customer_service_agent_setuppythonscript

import anthropic
from anthropic import beta_tool
from utils.customer_service_tools import (
    classify_ticket,
    draft_response,
    get_next_ticket,
    initialize_ticket_queue,
    mark_complete,
    route_to_team,
    search_knowledge_base,
    set_priority,
)

client = anthropic.Anthropic()
tools = [
    get_next_ticket,
    classify_ticket,
    search_knowledge_base,
    set_priority,
    route_to_team,
    draft_response,
    mark_complete,
]

environment_setuppythonconfig

from dotenv import load_dotenv
import os

load_dotenv()
MODEL = "claude-sonnet-4-6"
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")

ticket_processing_workflowworkflow

For EACH ticket, complete ALL these steps:
1. **Fetch ticket**: Call get_next_ticket() to retrieve the next unprocessed ticket
2. **Classify**: Call classify_ticket() to categorize the issue (billing/technical/account/product/shipping)
3. **Research**: Call search_knowledge_base() to find relevant information for this ticket type
4. **Prioritize**: Call set_priority() to assign priority (low/medium/high/urgent) based on severity
5. **Route**: Call route_to_team() to assign to the appropriate team
6. **Draft**: Call draft_response() to create a helpful customer response using KB information
7. **Complete**: Call mark_complete() to finalize this ticket
8. **Continue**: Immediately fetch the next ticket and repeat

IMPORTANT RULES:
- Process tickets ONE AT A TIME in sequence
- Complete ALL 7 steps for each ticket before moving to the next
- Keep fetching and processing tickets until queue is empty

baseline_execution_codepythonscript

from anthropic.types.beta import BetaMessageParam

num_tickets = 5
initialize_ticket_queue(num_tickets)

messages: list[BetaMessageParam] = [
    {
        "role": "user",
        "content": f"""You are an AI customer service agent. Your task is to process support tickets from a queue.
        For EACH ticket, you must complete ALL these steps:
        1. Fetch ticket: Call get_next_ticket()
        2. Classify: Call classify_ticket()
        3. Research: Call search_knowledge_base()
        4. Prioritize: Call set_priority()
        5. Route: Call route_to_team()
        6. Draft: Call draft_response()
        7. Complete: Call mark_complete()
        8. Continue: Immediately fetch the next ticket and repeat
        There are {num_tickets} tickets total - process all of them.
        Begin by fetching the first ticket.""",
    }
]

total_input = 0
total_output = 0
turn_count = 0

runner = client.beta.messages.tool_runner(
    model=MODEL,
    max_tokens=4096,
    tools=tools,
    messages=messages,
)

for message in runner:
    messages_list = list(runner._params["messages"])
    turn_count += 1
    total_input += message.usage.input_tokens
    total_output += message.usage.output_tokens
    print(
        f"Turn {turn_count:2d}: Input={message.usage.input_tokens:7,} tokens | "
        f"Output={message.usage.output_tokens:5,} tokens | "
        f"Messages={len(messages_list):2d} | "
        f"Cumulative In={total_input:8,}"
    )