Agent DailyAgent Daily
tutorialintermediate

Knowledge graph construction with Claude Mar 2026 • RAG & Retrieval Tools Build knowledge graphs from unstructured text using Claude for entity extraction, relation mining, deduplication, and multi-hop graph querying.

cookbook
View original on cookbook

This guide teaches how to build knowledge graphs from unstructured text using Claude's structured outputs for entity extraction, relation mining, and entity resolution. Rather than training traditional NER and relation classifiers, Claude handles each stage via prompts, enabling multi-hop graph reasoning without a database. The approach uses Haiku for high-volume extraction and Sonnet for entity resolution, with techniques transferable to production databases like Neo4j or PostgreSQL.

Key Points

  • Use Claude's structured outputs (Pydantic models) to extract typed entities and subject-predicate-object triples without training data
  • Define entity types (PERSON, ORGANIZATION, LOCATION, EVENT, ARTIFACT) and relation predicates as schema constraints for reliable extraction
  • Implement Claude-driven entity resolution to collapse surface-form variants (e.g., 'NASA' vs 'National Aeronautics and Space Administration') into canonical nodes
  • Use Haiku for cost-effective, high-volume extraction work; use Sonnet for nuanced entity resolution and disambiguation across documents
  • Serialize subgraphs back to Claude for multi-hop reasoning queries that span multiple documents (e.g., 'who works with people who worked on project X')
  • Measure extraction quality with precision/recall metrics against a gold standard to reason about cost/quality tradeoffs between models
  • Build in-memory graphs with NetworkX; techniques transfer directly to Neo4j, Neptune, or Postgres adjacency tables for production scaling
  • Include one-sentence descriptions for each extracted entity to disambiguate entities with similar names during resolution
  • Keep predicates as short verb phrases ('commanded', 'launched from', 'part of') for clarity and consistency
  • Extract only central entities from documents; skip incidental mentions to reduce noise and improve graph quality

Found this useful? Add it to a playbook for a step-by-step implementation guide.

Workflow Diagram

Start Process
Step A
Step B
Step C
Complete
Quality

Concepts

Artifacts (4)

entity_extraction_schemapythonconfig
from pydantic import BaseModel
from typing import Literal

EntityType = Literal["PERSON", "ORGANIZATION", "LOCATION", "EVENT", "ARTIFACT"]

class Entity(BaseModel):
    name: str
    type: EntityType
    description: str

class Relation(BaseModel):
    source: str
    predicate: str
    target: str

class ExtractedGraph(BaseModel):
    entities: list[Entity]
    relations: list[Relation]
entity_extraction_prompttemplate
Extract a knowledge graph from the document below.

<document>
{text}
</document>

Guidelines:
- Extract only entities that are central to what this document is about — skip incidental mentions.
- For each entity, write a one-sentence description grounded in this document. These descriptions are used later to disambiguate entities with similar names.
- Predicates should be short verb phrases ("commanded", "launched from", "part of").
- Every relation must connect two entities you extracted.
knowledge_graph_setuppythonscript
import json
from collections import defaultdict
from pathlib import Path
from typing import Literal
from urllib.parse import quote

import anthropic
import matplotlib.pyplot as plt
import networkx as nx
import requests
from dotenv import load_dotenv
from pydantic import BaseModel

load_dotenv()

client = anthropic.Anthropic()
EXTRACTION_MODEL = "claude-haiku-4-5"
SYNTHESIS_MODEL = "claude-sonnet-4-6"

ARTICLE_TITLES = [
    "Apollo program",
    "Apollo 11",
    "Neil Armstrong",
    "Saturn V",
    "Buzz Aldrin",
    "Kennedy Space Center",
]

WIKI_API = "https://en.wikipedia.org/api/rest_v1/page/summary/"
HEADERS = {
    "User-Agent": "claude-cookbooks/1.0 (https://github.com/anthropics/claude-cookbooks)"
}

def fetch_summary(title: str) -> str:
    slug = quote(title.replace(" ", "_"), safe="")
    r = requests.get(WIKI_API + slug, headers=HEADERS, timeout=10)
    r.raise_for_status()
    return r.json()["extract"]

documents = []
for i, title in enumerate(ARTICLE_TITLES):
    try:
        documents.append({
            "id": i,
            "title": title,
            "text": fetch_summary(title)
        })
    except requests.RequestException as e:
        print(f"Skipping {title}: {e}")

if not documents:
    raise RuntimeError("No documents loaded — check network and Wikipedia API availability")

print(f"Loaded {len(documents)} documents")
extraction_functionpythonscript
def extract(text: str) -> ExtractedGraph:
    response = client.messages.parse(
        model=EXTRACTION_MODEL,
        max_tokens=2048,
        messages=[{
            "role": "user",
            "content": EXTRACTION_PROMPT.format(text=text)
        }],
        output_format=ExtractedGraph,
    )
    return response.parsed_output

raw_entities = []
raw_relations = []

for doc in documents:
    try:
        result = extract(doc["text"])
    except anthropic.APIError as e:
        print(f"Skipping {doc['title']}: {e}")
        continue
    
    for ent in result.entities:
        raw_entities.append({
            **ent.model_dump(),
            "source_doc": doc["title"]
        })
    
    for rel in result.relations:
        raw_relations.append({
            **rel.model_dump(),
            "source_doc": doc["title"]
        })
    
    print(f"{doc['title']:<25} {len(result.entities):>3} entities {len(result.relations):>3} relations")

print(f"\nTotal: {len(raw_entities)} raw entities, {len(raw_relations)} raw relations")