Agentic Engineering Philosophy
Agentic engineering is a useful tool, but its efficacy depends heavily on the problem domain, codebase complexity and size, ease-of-validation, and how long-lived the code is expected to be. The most effective ways to use these tools are going to vary substantially across engineering teams, and I think it’s easy to use these tools in ways that make code and systems less reliable, despite their potential to do the opposite.
In general, I think both people and machines need the same best practices: tests, more tests, focused PRs, a clear architecture, good documentation, an observable system with a clear logging taxonomy, and components that can be understood in isolation. Quality engineering practices create long-term speed.
Effective Agentic Engineering
There’s not going to be a one-size-fits-all approach to agentic engineering that’s going to work for all problems and all codebases. Depending on the domain, having an LLM write code may be a terrible idea, but it might still be helpful to have one review code that you’ve written.
For problems where agentic AI is a good fit, I think the basics are pretty easy to get right:
- Start everything with
/planmode - Ensure you have clear validation criteria for every task
- Tests are now essential.
- “Use red/green TDD” is a cheat
- Instruct your LLMs to include a
// DECISION: ${why}comment to document the times when it’s making a call about which direction to take to make code review easier. - Review conversational context to figure out what instructions the agent is missing and where it’s wasting its time. Fix those things. Most AI-generated PRs should include scaffolding instruction improvements or added scripts/capabilities so that similar future tasks will be easier.
- Beware of using LLMs to generate LLM scaffolding. It’s an easy way to generate reams of misleading slop.
- The last step in every task (after tests, lint, and type-checks) should be a
/review-freshcommand that spins up a new agent without any conversational context to do a rigorous code review. - For anything with any complexity, always use the best model you can. Any money you save from using a cheaper model will be dwarfed by the costs of debugging.
- Review code locally. Look at the full file for context, not just the diff in isolation.
- To control costs, make sure engineers are able to see their current session’s costs, focus on tooling that limits context bloat, and teach good prompting practices.
- Sandbox your LLMs
I think Simon Willison’s Agentic Engineering Patterns is far and away the best read about how to use coding agents effectively.
AI use on this site
I used AI heavily to create the HTML and CSS for this site. I think it is a useful fit for this sort of work because it is low-stakes and easy-to-validate, and I’m not trying to demonstrate any aptitude for design.
All of my writing is my own. I like em dashes, en dashes, parenthetical expressions, and I will occasionally use sentence fragments or contrasting phrases for effect. I never use AI for writing. I don’t even use it when writing pieces like this css proves me human that intentionally ape LLM-generated writing despite many HN comments that seem to assume it impossible for a person to vary their own writing style.