tutorialintermediate
Automatic context compaction Manage context limits in long-running agentic workflows by automatically compressing conversation history.
cookbook
View original on cookbookThis cookbook demonstrates automatic context compaction for managing token limits in long-running agentic workflows. It shows how the Claude Agent Python SDK can automatically compress conversation history when token usage exceeds a threshold, enabling tasks to continue beyond the 200k token context limit. The example uses a customer service agent processing support tickets, where each ticket requires multiple tool calls that accumulate in conversation history. By implementing context compaction with the compaction_control parameter, agents can maintain focus and efficiency across many iterations without manual context management.
Key Points
- •Context compaction automatically monitors token usage per turn and injects a summary prompt when thresholds are exceeded
- •Long-running agentic workflows with tool use accumulate conversation history linearly—processing 20-30 tickets can quickly consume the entire context window
- •The compaction_control parameter clears conversation history and resumes with only a compressed summary, allowing tasks to continue beyond typical limits
- •For Claude Opus 4.6, use server-side compaction which handles context management automatically without SDK configuration
- •SDK-based compaction is useful for older models or when using a cheaper model for summarization tasks
- •Customer service workflows benefit significantly from compaction—each ticket requires 7+ tool calls (fetch, classify, search, prioritize, route, draft, complete)
- •Summaries are wrapped in <summary></summary> tags to guide the model, though these tags aren't parsed by the system
- •The beta_tool decorator makes functions accessible to Claude agents by extracting function arguments and docstrings as tool metadata
- •Without compaction, by ticket #5 the context includes complete details from all previous tickets, causing exponential token growth
- •Effective context engineering prevents performance degradation and 'context rot' in iterative agent workflows
Found this useful? Add it to a playbook for a step-by-step implementation guide.
Workflow Diagram
Start Process
Step A
Step B
Step C
Complete
Concepts
Artifacts (4)
customer_service_agent_setuppythonscript
import anthropic
from anthropic import beta_tool
from utils.customer_service_tools import (
classify_ticket,
draft_response,
get_next_ticket,
initialize_ticket_queue,
mark_complete,
route_to_team,
search_knowledge_base,
set_priority,
)
client = anthropic.Anthropic()
tools = [
get_next_ticket,
classify_ticket,
search_knowledge_base,
set_priority,
route_to_team,
draft_response,
mark_complete,
]environment_setuppythonconfig
from dotenv import load_dotenv
import os
load_dotenv()
MODEL = "claude-sonnet-4-6"
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")ticket_processing_workflowworkflow
For EACH ticket, complete ALL these steps:
1. **Fetch ticket**: Call get_next_ticket() to retrieve the next unprocessed ticket
2. **Classify**: Call classify_ticket() to categorize the issue (billing/technical/account/product/shipping)
3. **Research**: Call search_knowledge_base() to find relevant information for this ticket type
4. **Prioritize**: Call set_priority() to assign priority (low/medium/high/urgent) based on severity
5. **Route**: Call route_to_team() to assign to the appropriate team
6. **Draft**: Call draft_response() to create a helpful customer response using KB information
7. **Complete**: Call mark_complete() to finalize this ticket
8. **Continue**: Immediately fetch the next ticket and repeat
IMPORTANT RULES:
- Process tickets ONE AT A TIME in sequence
- Complete ALL 7 steps for each ticket before moving to the next
- Keep fetching and processing tickets until queue is emptybaseline_execution_codepythonscript
from anthropic.types.beta import BetaMessageParam
num_tickets = 5
initialize_ticket_queue(num_tickets)
messages: list[BetaMessageParam] = [
{
"role": "user",
"content": f"""You are an AI customer service agent. Your task is to process support tickets from a queue.
For EACH ticket, you must complete ALL these steps:
1. Fetch ticket: Call get_next_ticket()
2. Classify: Call classify_ticket()
3. Research: Call search_knowledge_base()
4. Prioritize: Call set_priority()
5. Route: Call route_to_team()
6. Draft: Call draft_response()
7. Complete: Call mark_complete()
8. Continue: Immediately fetch the next ticket and repeat
There are {num_tickets} tickets total - process all of them.
Begin by fetching the first ticket.""",
}
]
total_input = 0
total_output = 0
turn_count = 0
runner = client.beta.messages.tool_runner(
model=MODEL,
max_tokens=4096,
tools=tools,
messages=messages,
)
for message in runner:
messages_list = list(runner._params["messages"])
turn_count += 1
total_input += message.usage.input_tokens
total_output += message.usage.output_tokens
print(
f"Turn {turn_count:2d}: Input={message.usage.input_tokens:7,} tokens | "
f"Output={message.usage.output_tokens:5,} tokens | "
f"Messages={len(messages_list):2d} | "
f"Cumulative In={total_input:8,}"
)