Agentic Engineering Philosophy

Agentic engineering is a useful tool, but its efficacy depends heavily on the problem domain, codebase complexity and size, ease-of-validation, and how long-lived the code is expected to be. The most effective ways to use these tools are going to vary substantially across engineering teams, and I think it’s easy to use these tools in ways that make code and systems less reliable, despite their potential to do the opposite.

In general, I think both people and machines need the same best practices: tests, more tests, focused PRs, a clear architecture, good documentation, an observable system with a clear logging taxonomy, and components that can be understood in isolation. Quality engineering practices create long-term speed.

Effective Agentic Engineering

There’s not going to be a one-size-fits-all approach to agentic engineering that’s going to work for all problems and all codebases. Depending on the domain, having an LLM write code may be a terrible idea, but it might still be helpful to have one review code that you’ve written.

For problems where agentic AI is a good fit, I think the basics are pretty easy to get right:

  • Start everything with /plan mode
  • Ensure you have clear validation criteria for every task
  • Tests are now essential.
  • “Use red/green TDD” is a cheat
  • Instruct your LLMs to include a // DECISION: ${why} comment to document the times when it’s making a call about which direction to take to make code review easier.
  • Review conversational context to figure out what instructions the agent is missing and where it’s wasting its time. Fix those things. Most AI-generated PRs should include scaffolding instruction improvements or added scripts/capabilities so that similar future tasks will be easier.
  • Beware of using LLMs to generate LLM scaffolding. It’s an easy way to generate reams of misleading slop.
  • The last step in every task (after tests, lint, and type-checks) should be a /review-fresh command that spins up a new agent without any conversational context to do a rigorous code review.
  • For anything with any complexity, always use the best model you can. Any money you save from using a cheaper model will be dwarfed by the costs of debugging.
  • Review code locally. Look at the full file for context, not just the diff in isolation.
  • To control costs, make sure engineers are able to see their current session’s costs, focus on tooling that limits context bloat, and teach good prompting practices.
  • Sandbox your LLMs

I think Simon Willison’s Agentic Engineering Patterns is far and away the best read about how to use coding agents effectively.

AI use on this site

I used AI heavily to create the HTML and CSS for this site. I think it is a useful fit for this sort of work because it is low-stakes and easy-to-validate, and I’m not trying to demonstrate any aptitude for design.

All of my writing is my own. I like em dashes, en dashes, parenthetical expressions, and I will occasionally use sentence fragments or contrasting phrases for effect. I never use AI for writing. I don’t even use it when writing pieces like this css proves me human that intentionally ape LLM-generated writing despite many HN comments that seem to assume it impossible for a person to vary their own writing style.