Agent DailyAgent Daily
tutorialintermediate

Automatic context compaction Manage context limits in long-running agentic workflows by automatically compressing conversation history.

cookbook
View original on cookbook

This cookbook demonstrates automatic context compaction for managing token limits in long-running agentic workflows. It shows how the Claude Agent Python SDK can automatically compress conversation history when token usage exceeds a threshold, enabling tasks to continue beyond the 200k token context limit. The example uses a customer service agent processing support tickets, where each ticket requires multiple tool calls that accumulate in conversation history. By implementing context compaction with the compaction_control parameter, agents can maintain focus and efficiency across many iterations without manual context management.

Key Points

  • Context compaction automatically monitors token usage per turn and injects a summary prompt when thresholds are exceeded
  • Long-running agentic workflows with tool use accumulate conversation history linearly—processing 20-30 tickets can quickly consume the entire context window
  • The compaction_control parameter clears conversation history and resumes with only a compressed summary, allowing tasks to continue beyond typical limits
  • For Claude Opus 4.6, use server-side compaction which handles context management automatically without SDK configuration
  • SDK-based compaction is useful for older models or when using a cheaper model for summarization tasks
  • Customer service workflows benefit significantly from compaction—each ticket requires 7+ tool calls (fetch, classify, search, prioritize, route, draft, complete)
  • Summaries are wrapped in <summary></summary> tags to guide the model, though these tags aren't parsed by the system
  • The beta_tool decorator makes functions accessible to Claude agents by extracting function arguments and docstrings as tool metadata
  • Without compaction, by ticket #5 the context includes complete details from all previous tickets, causing exponential token growth
  • Effective context engineering prevents performance degradation and 'context rot' in iterative agent workflows

Found this useful? Add it to a playbook for a step-by-step implementation guide.

Workflow Diagram

Start Process
Step A
Step B
Step C
Complete
Quality

Concepts

Artifacts (4)

customer_service_agent_setuppythonscript
import anthropic
from anthropic import beta_tool
from utils.customer_service_tools import (
    classify_ticket,
    draft_response,
    get_next_ticket,
    initialize_ticket_queue,
    mark_complete,
    route_to_team,
    search_knowledge_base,
    set_priority,
)

client = anthropic.Anthropic()
tools = [
    get_next_ticket,
    classify_ticket,
    search_knowledge_base,
    set_priority,
    route_to_team,
    draft_response,
    mark_complete,
]
environment_setuppythonconfig
from dotenv import load_dotenv
import os

load_dotenv()
MODEL = "claude-sonnet-4-6"
ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY")
ticket_processing_workflowworkflow
For EACH ticket, complete ALL these steps:
1. **Fetch ticket**: Call get_next_ticket() to retrieve the next unprocessed ticket
2. **Classify**: Call classify_ticket() to categorize the issue (billing/technical/account/product/shipping)
3. **Research**: Call search_knowledge_base() to find relevant information for this ticket type
4. **Prioritize**: Call set_priority() to assign priority (low/medium/high/urgent) based on severity
5. **Route**: Call route_to_team() to assign to the appropriate team
6. **Draft**: Call draft_response() to create a helpful customer response using KB information
7. **Complete**: Call mark_complete() to finalize this ticket
8. **Continue**: Immediately fetch the next ticket and repeat

IMPORTANT RULES:
- Process tickets ONE AT A TIME in sequence
- Complete ALL 7 steps for each ticket before moving to the next
- Keep fetching and processing tickets until queue is empty
baseline_execution_codepythonscript
from anthropic.types.beta import BetaMessageParam

num_tickets = 5
initialize_ticket_queue(num_tickets)

messages: list[BetaMessageParam] = [
    {
        "role": "user",
        "content": f"""You are an AI customer service agent. Your task is to process support tickets from a queue.
        For EACH ticket, you must complete ALL these steps:
        1. Fetch ticket: Call get_next_ticket()
        2. Classify: Call classify_ticket()
        3. Research: Call search_knowledge_base()
        4. Prioritize: Call set_priority()
        5. Route: Call route_to_team()
        6. Draft: Call draft_response()
        7. Complete: Call mark_complete()
        8. Continue: Immediately fetch the next ticket and repeat
        There are {num_tickets} tickets total - process all of them.
        Begin by fetching the first ticket.""",
    }
]

total_input = 0
total_output = 0
turn_count = 0

runner = client.beta.messages.tool_runner(
    model=MODEL,
    max_tokens=4096,
    tools=tools,
    messages=messages,
)

for message in runner:
    messages_list = list(runner._params["messages"])
    turn_count += 1
    total_input += message.usage.input_tokens
    total_output += message.usage.output_tokens
    print(
        f"Turn {turn_count:2d}: Input={message.usage.input_tokens:7,} tokens | "
        f"Output={message.usage.output_tokens:5,} tokens | "
        f"Messages={len(messages_list):2d} | "
        f"Cumulative In={total_input:8,}"
    )