Agent DailyAgent Daily
tutorialintermediate

The vulnerability detection agent Apr 2026 • Claude Agent SDK Cybersecurity Build a vulnerability-discovery agent with the Claude Agent SDK that threat-models a C target, hunts memory-safety bugs with built-in file tools, and triages findings into a structured report.

cookbook
View original on cookbook

This cookbook demonstrates building a vulnerability-discovery agent using the Claude Agent SDK that automatically threat-models C source code, hunts memory-safety bugs using built-in file tools (Read, Grep, Glob), and generates structured security reports. The agent operates in a multi-turn session with a bootstrap threat-modeling phase, an interview phase for owner input, and automated vulnerability finding and triage loops. The approach reduces false positives compared to traditional static analyzers by using Claude's reasoning to identify high-confidence memory-safety issues in a read-only sandbox environment.

Key Points

  • Set up engagement context as system prompt to document authorization, sandbox isolation, and responsible disclosure workflow for all agent operations
  • Use Claude Agent SDK's multi-turn ClaudeSDKClient session to bootstrap threat model from source code, then refine with owner interview answers in same session
  • Leverage built-in Read, Grep, and Glob tools instead of hand-rolled file access to safely explore codebase and identify entry points and trust boundaries
  • Structure threat model output with system context, assets, entry points/trust boundaries, threats (with id, surface, impact, likelihood), and open questions for owner
  • Drive agentic find loop by having Claude reason about which inputs could corrupt memory, using threat model to guide where to hunt for vulnerabilities
  • Chain find, triage, and report as separate query() calls that emit schema-conformant JSON for reviewer action
  • Apply for Cyber Verification Program (CVP) if real-world work triggers Claude's cyber safeguards to continue legitimate security research
  • Test end-to-end workflow on self-contained canary.c with planted bugs (heap overflow, stack overflow, use-after-free) before targeting production code
  • Reduce false positives by having Claude reason about threat likelihood and impact rather than relying on static analyzer heuristics
  • Keep threat model and findings in structured markdown/JSON format for responsible disclosure workflow and security team handoff

Found this useful? Add it to a playbook for a step-by-step implementation guide.

Workflow Diagram

Start Process
Step A
Step B
Step C
Complete
Quality

Concepts

Artifacts (4)

vulnerability_detection_agent_setup.pypythonscript
import json
from collections.abc import AsyncIterator
from pathlib import Path
from dotenv import load_dotenv
from claude_agent_sdk import (
    AssistantMessage,
    ClaudeAgentOptions,
    ClaudeSDKClient,
    Message,
    ResultMessage,
    TextBlock,
    ToolUseBlock,
    query,
)

load_dotenv()

MODEL_NAME = "claude-opus-4-7"
TARGET_DIR = Path("vulnerability_detection_agent/canary").resolve()

assert TARGET_DIR.is_dir(), f"run this notebook from claude_agent_sdk/ (got cwd={Path.cwd()})"

ENGAGEMENT_CONTEXT = """\
## Engagement context
This is authorized security research conducted as a defensive security assessment on a self-contained canary target vendored in this notebook.
The target is read-only source (no execution).
Findings are collected for demonstration and responsible-disclosure workflow testing.
"""

async def collect(stream: AsyncIterator[Message]) -> str:
    """Consume an Agent SDK message stream; print tool calls; return final text."""
    final = ""
    async for msg in stream:
        if isinstance(msg, AssistantMessage):
            for block in msg.content:
                if isinstance(block, ToolUseBlock):
                    args = str(block.input)
                    args = args if len(args) <= 120 else args[:120] + "...}"
                    print(f" [tool] {block.name} {args} ")
                elif isinstance(block, TextBlock) and block.text.strip():
                    final += block.text
        elif isinstance(msg, ResultMessage) and msg.is_error:
            raise RuntimeError(msg.result)
    return final

print(f"Model: {MODEL_NAME}")
canary.cctemplate
// canary.c
// Entry: ./canary
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void parse_alpha(const unsigned char *data, size_t len) {
    unsigned char *buf = malloc(32);
    memcpy(buf, data, len);  // HEAP BUFFER OVERFLOW: no bounds check on len
    printf("alpha: %02x\n", buf[0]);
    free(buf);
}

static void parse_bravo(const unsigned char *data, size_t len) {
    char name[16];
    memcpy(name, data, len);  // STACK BUFFER OVERFLOW: no bounds check on len
    name[15] = 0;
    printf("bravo: %s\n", name);
}

static void parse_charlie(const unsigned char *data, size_t len) {
    char *p = malloc(64);
    if (len > 0 && data[0] == 0xff) {
        free(p);
    }
    memcpy(p, data, len);  // USE-AFTER-FREE: p may be freed above
    printf("charlie: %s\n", p);
    free(p);
}
threat_model_schema.mdtemplate
# Threat Model: <system name>

## 1. System context
(Description of the system, its purpose, and deployment context)

## 2. Assets
| asset | description | sensitivity |
|-------|-------------|-------------|

## 3. Entry points & trust boundaries
| entry_point | description | trust_boundary | reachable_assets |
|-------------|-------------|-----------------|------------------|

## 4. Threats
| id | threat | surface | asset | impact | likelihood |
|----|--------|---------|-------|--------|------------|

## 5. Open questions
- (Things the code alone cannot answer: deployment context, which inputs are attacker-controlled in practice, blast radius)
engagement_context.txtconfig
## Engagement context
This is authorized security research conducted as a defensive security assessment on a self-contained canary target vendored in this notebook.
The target is read-only source (no execution).
Findings are collected for demonstration and responsible-disclosure workflow testing.

Key requirements:
1. Authorization: Code owner has authorized this assessment
2. Isolation: Read-only sandbox environment
3. Disclosure: Findings headed for responsible disclosure workflow