Automation

Automating repetitive tasks and workflows with agents

Build a playbook about Automation

Save articles from this feed, then generate a personalized implementation guide

See a sample →

Agents:

Type:

9 results

TUTadvanced

Reproduce Claude's agentic search benchmark scores in the Messages API Jun 2026 • Evals Tools Build a Messages API harness that reproduces published DeepSearchQA and BrowseComp scores, using programmatic tool calling, server-side compaction, and task budgets.

This cookbook demonstrates how to reproduce Claude's published agentic search benchmark scores (DeepSearchQA, BrowseComp) using the Messages API with programmatic tool calling, server-side compaction, and task budgets. The key is proper harness configuration—API parameters that become critical for agents running 30+ tool calls across hundreds of thousands of tokens. By following this guide, you'll build an agentic search loop that matches Claude's official benchmark performance and understand why each configuration choice matters for long-horizon tasks.

★★★★★

Jul 1, 2026

RELintermediate

Launch HN: Propolis (YC X25) – Browser agents that QA your web app autonomously

Propolis is an autonomous QA platform that deploys swarms of browser agents to simulate user behavior, identify bugs, and generate e2e tests for web applications. The agents collaboratively explore websites, flag friction points, and propose tests that integrate into CI/CD pipelines. Available at $1000/month with flexible pricing options, it addresses the gap between deterministic testing and real-world usage coverage by treating agents as a canary group for quality assurance.

★★★★★

mpapazianJun 15, 2026

RELintermediate

Launch HN: Mosaic (YC W25) – Agentic Video Editing

Mosaic is an agentic video editing platform that uses multimodal AI and a node-based canvas interface to automate video editing workflows. Built by former Tesla engineers, it addresses frustrations with traditional editors by enabling users to create reusable editing agents that can analyze video content and apply intelligent edits through natural language prompts. The platform combines visual intelligence (saliency analysis, object detection, emotion recognition) with a timeline editor and supports export to DaVinci Resolve, Premiere Pro, and Final Cut Pro.

★★★★★

adishjJun 15, 2026

RELintermediate

Show HN: AILA – Local-first autonomous agent with zero-remote-override

AILA is a local-first autonomous agent platform built by Marco, a Berlin paramedic, that runs 100% on user hardware with zero remote override capability. The system uses a "Sovereignty Key" (physical hardware anchor) to ensure true ownership and prevent external control, even by the creator. Unlike cloud-based AI that users merely rent access to, AILA enables users to modify their agent's reasoning in plain language while maintaining complete autonomy and privacy.

★★★★★

marcoheiglJun 15, 2026

TUTintermediate

Outcomes: agents that verify their own work May 2026 • Agent Patterns Evals Build a grade-and-revise loop with Outcomes: a writer drafts a cited research brief, a stateless grader fetches every URL and checks every quote against a rubric, and feedback drives revisions until the brief passes. Covers user.define_outcome, the span.outcome_evaluation_* events, and how to write a rubric the grader can act on.

This guide teaches how to build a grade-and-revise loop using Outcomes in Claude Managed Agents, where a writer agent drafts a cited research brief and a stateless grader independently verifies every URL, quote, and claim against a detailed rubric. The grader provides structured feedback that drives revisions until the brief passes, eliminating manual review cycles. Key techniques include writing specific, actionable rubrics that force concrete evidence, using span.outcome_evaluation_* events to track the loop, and understanding when Outcomes is the right tool for quality assurance.

★★★★★

May 7, 2026

TUTintermediate

The vulnerability detection agent Apr 2026 • Claude Agent SDK Cybersecurity Build a vulnerability-discovery agent with the Claude Agent SDK that threat-models a C target, hunts memory-safety bugs with built-in file tools, and triages findings into a structured report.

This cookbook demonstrates building a vulnerability-discovery agent using the Claude Agent SDK that automatically threat-models C source code, hunts memory-safety bugs using built-in file tools (Read, Grep, Glob), and generates structured security reports. The agent operates in a multi-turn session with a bootstrap threat-modeling phase, an interview phase for owner input, and automated vulnerability finding and triage loops. The approach reduces false positives compared to traditional static analyzers by using Claude's reasoning to identify high-confidence memory-safety issues in a read-only sandbox environment.

★★★★★

May 5, 2026

TUTintermediate

Build an SRE incident response agent with Claude Managed Agents Apr 2026 • Agent Patterns Observability A webhook-triggered responder that investigates logs and runbooks with a custom Skill, fixes infrastructure code, and gates the PR behind a human-approval custom tool — with the full audit trail in the Console.

This tutorial demonstrates building a webhook-triggered SRE incident response agent using Claude Managed Agents that automatically investigates production alerts, consults runbooks, proposes infrastructure fixes via pull requests, and gates merging behind human approval. The agent combines built-in sandbox tools (bash, read, edit) with custom tools for PR management and human-in-the-loop approval, providing complete audit trails in the Anthropic Console. The example uses mocked PagerDuty, GitHub, and Datadog integrations to focus on agent patterns, with guidance for swapping in real services.

★★★★★

Apr 9, 2026

TUTintermediate

Build a Slack data analyst bot with Claude Managed Agents Apr 2026 • Agent Patterns Integrations Mention the bot with a CSV to get an analysis report in-thread, with multi-turn follow-ups on the same session.

This cookbook demonstrates building a Slack data analyst bot using Claude Managed Agents and Bolt for Python. Users mention the bot with a CSV file to receive narrative analysis reports in-thread, with support for multi-turn follow-ups within the same session. The implementation handles file uploads, streams agent progress updates, and manages session persistence across Slack threads.

★★★★★

Apr 9, 2026

TUTintermediate

Threat intelligence enrichment agent Apr 2026 • Tools Agent Patterns Build an agent that autonomously investigates IOCs by querying multiple threat intel sources, cross-referencing findings, mapping to MITRE ATT&CK, and producing structured reports for SIEM and SOAR integration.

This cookbook demonstrates building a Claude-powered threat intelligence enrichment agent that autonomously investigates Indicators of Compromise (IOCs) by querying multiple threat intel sources, correlating findings, mapping to MITRE ATT&CK, and generating structured reports for SIEM/SOAR integration. The agent uses Claude's tool-use capabilities to decide which intelligence sources to query, chain tool calls based on discoveries, and convert free-text analysis into analyst-ready JSON reports. The architecture uses simulated threat intel backends that can be swapped with real APIs (VirusTotal, AbuseIPDB, Shodan, etc.) without changing orchestration logic.

★★★★★

Apr 7, 2026