Agentic Engineering Philosophy

Agentic engineering is a useful tool, but its efficacy depends heavily on the problem domain, codebase complexity and size, ease-of-validation, and how long-lived the code is expected to be. The most effective ways to use these tools are going to vary substantially across engineering teams, and I think it’s easy to use these tools in ways that make code and systems less reliable, despite their potential to do the opposite.

In general, I think both people and machines need the same best practices: tests, more tests, focused PRs, a clear architecture, good documentation, an observable system with a clear logging taxonomy, and components that can be understood in isolation. Quality engineering practices create long-term speed.

Effective Agentic Engineering

There’s not going to be a one-size-fits-all approach to agentic engineering that’s going to work for all problems and all codebases. Depending on the domain, having an LLM write code may be a terrible idea, but it might still be helpful to have one review code that you’ve written.

For problems where agentic AI is a good fit, I think the basics are pretty easy to get right:

Start everything with /plan mode
Ensure you have clear validation criteria for every task
Tests are now essential.
“Use red/green TDD” is a cheat
Instruct your LLMs to include a // DECISION: ${why} comment to document the times when it’s making a call about which direction to take to make code review easier.
Review conversational context to figure out what instructions the agent is missing and where it’s wasting its time. Fix those things. Most AI-generated PRs should include scaffolding instruction improvements or added scripts/capabilities so that similar future tasks will be easier.
Beware of using LLMs to generate LLM scaffolding. It’s an easy way to generate reams of misleading slop.
The last step in every task (after tests, lint, and type-checks) should be a /review-fresh command that spins up a new agent without any conversational context to do a rigorous code review.
For anything with any complexity, always use the best model you can. Any money you save from using a cheaper model will be dwarfed by the costs of debugging.
Review code locally. Look at the full file for context, not just the diff in isolation.
To control costs, make sure engineers are able to see their current session’s costs, focus on tooling that limits context bloat, and teach good prompting practices.
Sandbox your LLMs

I think Simon Willison’s Agentic Engineering Patterns is far and away the best read about how to use coding agents effectively.

AI use on this site

I used AI heavily to create the HTML and CSS for this site. I think it is a useful fit for this sort of work because it is low-stakes and easy-to-validate, and I’m not trying to demonstrate any aptitude for design.

All of my writing is my own. I like em dashes, en dashes, parenthetical expressions, and I will occasionally use sentence fragments or contrasting phrases for effect. I never use AI for writing. I don’t even use it when writing pieces like this css proves me human that intentionally ape LLM-generated writing despite many HN comments that seem to assume it impossible for a person to vary their own writing style.