articleintermediate
Context engineering: memory, compaction, and tool clearing Mar 2026 • Tools Agent Patterns Compare context engineering strategies for long-running agents and learn when each applies, what it costs, and how they compose.
cookbook
View original on cookbookThis cookbook teaches context engineering strategies for long-running AI agents, focusing on three key techniques: compaction (summarizing context), tool-result clearing (removing re-fetchable tool outputs), and memory (persistent external storage). The guide addresses context rot—the degradation of model performance as context windows grow—and provides practical implementations using Claude's API. Through a research agent example, it demonstrates how to combine these strategies to manage token growth, maintain conversation continuity, and persist knowledge across sessions.
Key Points
- •Context rot occurs as token count increases, reducing model's ability to recall information accurately before hitting hard token limits
- •Compaction distills context into high-fidelity summaries, allowing agents to continue with minimal performance degradation on long conversations
- •Tool-result clearing removes old, re-fetchable tool outputs (file reads, API responses) while maintaining call records to reduce context bloat
- •Memory implements structured note-taking via persistent external storage, enabling agents to track progress across tasks and sessions without keeping everything in active context
- •All three strategies have first-party API support: server-side compaction, context editing (tool-result clearing), and the memory tool
- •Map workload characteristics to the right primitive: use clearing for large re-fetchable results, compaction for long conversations, memory for cross-session persistence
- •Claude Code uses multiple strategies in production: compaction for conversations and dual memory systems for cross-session persistence
- •Context is a finite resource with diminishing marginal returns; the goal is finding the smallest set of high-signal tokens maximizing desired outcomes
- •Subagents isolate work in separate contexts and programmatic tool calling keeps large results out of the window entirely as complementary strategies
- •Test clearing configs and compaction prompts against your workload's actual tool-use patterns to diagnose which context problem needs solving
Found this useful? Add it to a playbook for a step-by-step implementation guide.
Workflow Diagram
Start Process
Step A
Step B
Step C
Complete
Concepts
Artifacts (4)
Environment Setupconfig
ANTHROPIC_API_KEY=your-key-herePython Dependencies Installationbashcommand
pip install anthropic python-dotenv matplotlibContext Engineering Setup Scriptpythonscript
import json
import os
import tempfile
from collections import namedtuple
from pathlib import Path
import anthropic
import matplotlib.pyplot as plt
from dotenv import load_dotenv
load_dotenv()
if not os.environ.get("ANTHROPIC_API_KEY"):
raise ValueError("ANTHROPIC_API_KEY not set. Add it to a .env file or export it.")
CORPUS_PATH = Path("research_corpus.py")
assert CORPUS_PATH.exists(), (
f"research_corpus.py not found in {Path.cwd()}. It should be alongside this notebook."
)
client = anthropic.Anthropic()
MODEL = "claude-sonnet-4-6"
print(f"anthropic SDK {anthropic.__version__}, model {MODEL}")
# Token counting utility
_token_cache: dict[str, int] = {}
def count_tokens(text: str) -> int:
if text not in _token_cache:
_token_cache[text] = client.messages.count_tokens(
model=MODEL,
messages=[{"role": "user", "content": text}]
).input_tokens
return _token_cache[text]Corpus Analysis Scriptpythonscript
from research_corpus import COMPACTION_PROBES, CORPUS
print(f"CORPUS is a dict of {len(CORPUS)} synthetic documents held in Python memory.")
print("When the agent calls read_file, the content is served from this dict and")
print("lands directly in the agent's context window — no disk I/O involved.\n")
_total_tokens = 0
for path, content in CORPUS.items():
n_tok = count_tokens(content)
_total_tokens += n_tok
display_name = path.removeprefix("/research/")
print(f"{display_name:<26} ~ {n_tok:>6,} tokens")
print(f"\nTotal corpus: ~ {_total_tokens:,} tokens")
assert _total_tokens > 250_000, (
f"Corpus is only {_total_tokens:,} tokens; expected >250K. "
"Restart the kernel and re-run, or verify research_corpus.py is current."
)