Agentic Engineering Patterns: How to Actually Work with AI Coding Agents

The Question

If 84% of developers now use AI coding assistants and 41% of all code is AI-generated, why do only 33% trust the accuracy of what these tools produce? The answer lies in a fundamental mismatch: we're treating autonomous agents like fancy autocomplete when they require an entirely different engineering discipline.

Simon Willison's Agentic Engineering Patterns provide that discipline—a practical framework for turning AI coding agents from productivity traps into force multipliers.

Simple Explanation

Think of AI coding agents like Claude Code and OpenAI Codex as extremely talented but unreliable interns. They can write code, run tests, debug issues, and iterate independently—but they hallucinate, forget context, and sometimes confidently produce wrong solutions.

The old workflow—write code, test, debug—assumes you are the one writing. But when an agent writes code, your job shifts from implementation to verification. As Willison notes, "writing code is now cheap"—judgment, verification, and architectural stewardship have become the scarce skills.

Agentic engineering patterns are structured habits that make this new workflow reliable: hoard proven solutions, enforce test-driven loops, build understanding systematically, and pay down the cognitive debt of opaque AI-generated code.

How It Actually Works

The Architecture Under the Hood

Understanding why patterns matter requires understanding what coding agents actually are. OpenAI's Codex, for instance, uses a model + harness + surfaces architecture:

The Model (GPT-5.2): A statistical engine that predicts the next token based on patterns learned from vast public code repositories. It doesn't "reason"—it pattern-matches.
The Harness: The orchestration layer that manages execution loops, tool use, and failure recovery. It runs tests, reads errors, and feeds them back to the model for iteration.
Surfaces: The interfaces—CLI, IDE extensions, web apps—where you interact with the agent.

The critical insight: non-determinism is built into the model. Ask the same question twice, you might get different code. The harness provides structure, but without explicit patterns, you're gambling on outputs.

The Four Core Patterns

1. Hoard Things You Know How to Do

Willison's "hoarding" pattern flips the traditional knowledge management problem. Instead of documenting for humans, you're building a library of proven solutions that agents can recombine.

Practically: maintain a blog, GitHub repos, or markdown files with working code examples. When you need an agent to solve a similar problem, point it to your hoard. The agent pulls proven patterns rather than inventing (and potentially breaking) new ones.

2. First Run the Tests (Red/Green TDD)

Test-driven development becomes essential with agents. The pattern is simple: before asking an agent to write code, have it write a failing test first. Then ask it to make the test pass.

This creates a verifiable contract. If the test passes, the code works. If it fails, the agent iterates. You're not reviewing every line—you're trusting the test harness to validate.

3. Linear Walkthroughs

When inheriting AI-generated code or onboarding to an unfamiliar codebase, ask the agent for a structured walkthrough. Instead of jumping between files, request a linear explanation: start here, then here, then here.

This builds mental models systematically rather than overwhelming you with disconnected code fragments.

4. Interactive Explanations

Cognitive debt accumulates when you ship code you don't understand. Interactive explanations are your repayment plan: ask the agent to build a demo, create a visualization, or write documentation that forces you to engage with how the code actually works.

Real-World Example

Consider a team building a new API endpoint. Without patterns, the workflow looks like this:

Developer prompts: "Add a user registration endpoint"
Agent generates 200 lines of code
Developer stares at it, unsure if it's correct
Developer manually tests, finds bugs, prompts again
Cycle repeats, trust erodes

With agentic engineering patterns:

Developer points agent to hoard: "Use the authentication pattern from auth-examples.md"
Developer asks agent to write failing tests first: "Write tests for registration validation"
Agent writes tests, they fail (red)
Developer asks agent to make tests pass (green)
Agent implements, tests pass
Developer requests linear walkthrough of the flow
Developer asks agent to build interactive demo for documentation

The difference: verification is built into the process, not bolted on afterward. The test harness becomes the verification layer, and the developer's role shifts from code reviewer to pattern orchestrator.

Why It Matters

The statistics reveal a painful truth: adoption has outpaced trust. According to the 2025 Stack Overflow Developer Survey, 84% of developers use AI tools (up from 76% in 2024), with 51% using them daily. Yet trust has plummeted to 33%, down from 43% in 2024.

Qodo's 2025 AI Code Quality Report found that 76% of developers fall into a "red zone" of high hallucinations and low shipping confidence. Only 3.8% experience both low hallucinations and high confidence in shipping AI-generated code.

The productivity gains are real—developers report saving 3.6-8 hours weekly, and GitHub Copilot boasts 20 million users with 90% Fortune 100 adoption. But without structured patterns, those gains evaporate into verification overhead.

Agentic engineering patterns address this directly: they transform AI from a black box you hope is correct into a reliable collaborator you can verify. The paradigm shift isn't about writing more code faster—it's about writing verifiable code systematically.

The Question

Simple Explanation

How It Actually Works

The Architecture Under the Hood

The Four Core Patterns

1. Hoard Things You Know How to Do

2. First Run the Tests (Red/Green TDD)

3. Linear Walkthroughs

4. Interactive Explanations

Real-World Example

Why It Matters

Further Reading

Related Articles

Ring 0 or Riot: How Kernel-Level Anti-Cheat Became Gaming's Most Controversial Technology

Computing on Encrypted Data: How Intel's Heracles Chip Makes the Impossible Practical

Die Zukunft der künstlichen Intelligenz