How I Build
My tools and how I use them.
This is a guide to my tools and processes for building software. I'm writing this both to share what I've found with other humans, and as guidance for LLMs that interact with me & share my tools.
I've found these tools helpful building "pre-LLM" software in small and large startups, and for LLM-enabled applications like I'm working on now. My tool choices are largely informed by how I structure software. I don't spend much time justifying my tools here. That's both for brevity, and because I think tools either resonate with a person or they don't . It's also not exhaustive—these are what I keep in the tool chest, not the attic.1
My Laptop
My primary development environment. Working roughly from the bottom up:
-
I use a MacBook Air 15". I used to always go with a beefed up MacBook Pro, but switched and haven't missed it. If I do anything compute intensive, I usually just use another computer.
-
I use Karabiner-Elements to map
caps_lock -> left-ctrl
,left-ctrl -> esc
, andright-cmd + hjkl -> arrow keys
. This makesctrl+{x}
more ergonomic,esc
more accessible, and gives me an alternative to the arrow keys. The first two can be done with system preferences alone. -
I use AeroSpace as a window manager. I have it set to send apps to named windows, like "messages" and "feeds" to isolate distractions & make them easier to find. It takes some getting used to, and I've had some issues with it but I haven't switched. I used divvy before, and it was ok as well.
-
I use Drafts to save notes, and as a general scratchpad. The quick capture is great, and I have it globally bound to
ctrl+d
. I have it on my phone as well, and use it as a simple workout tracker. It's got a lot more functionality that I barely scratch the surface of. -
I use OrbStack for Docker containers and local long-running services. I've sung it's praises before.
-
I use Wezterm as my terminal. I like it because I can write custom functions in Lua to e.g. run a command when I press a key or set up a specific layout per project. I used kitty before, which is also great but didn't have as much flexibility.
-
I use Ice to manage my menu bar. I used Bartender until they changed control without notifying users.
-
I use CleanShot for screenshots and screen recording. It's simple and easy to use.
-
I use Alfred as a more powerful Spotlight replacement and launcher.
My Servers
By this, I mean computers that I own or rent to run services on.
-
I use Tailscale for a private network between my devices and services. It simplifies connecting to locally running services, and exit nodes are handy when traveling. I run a tailscale router in docker on other subnets that I want to connect to privately.
-
I use fly.io for hosting internal services and prototypes. It's simple to spin up an app that sleeps when not in use, and has private networking for running internal services. It has somewhat frequent small scale service interruptions, but I don't have to spend time dealing with IAM roles or enabling GCP APIs.
-
I use AWS when I can't use fly.
-
I use GCP when I can't use AWS.
-
I run PicoShare to share links to files, like product demo videos that I email to larger groups.2
-
I run ArchiveBox by the talented Nick Sweeting to save webpages and prevent link rot.
-
I use Cloudflare for DNS and SSL.
-
I handle most provisioning across providers with Terraform Cloud. This includes VPS, DNS, buckets, managed services, and anything else I configure +/- once.
Development Environment
- I use direnv to configure my env per project.
- I use a consistent repo structure whenever I can. I have a
mkrepo
alias configured to create it from a copier template. It includes:/bin
to store binaries. I add it to PATH viadirenv
./etc
for configuration. I store non-sensitive config inshared.env
, test specific config intest.env
, and sensitive config insecret.env
.secret.env
is gitignored.shared.env
andsecret.env
are sourced viadirenv
./apps
for code if it's a monorepo, which I prefer./var
for runtime data. It's gitignored, and can be cleaned to reset my dev environment.
- I use just as a task runner. It's similar to
Make
, with some improvements (e.g. no.PHONY
). - I use tabs wherever possible because I'm a craftsman and have an opinion about such matters.
- I use EditorConfig to codify my spacing opinions.
- I use VS Code as my IDE.
- I switch AI assistants often in search of nirvana.
Command Line Tools
Mostly installed through Homebrew or uv.
-
I use
zsh
as my shell with oh my zsh. I switched frombash
so long ago that I don't remember why. I triedfish
but didn't see a reason to pay the switching cost. -
I use chezmoi to manage my dotfiles. I don't love the workflow I use with it, especially once I started using templates. It's a bit manual to keep updated, but I like having my config in git. I use
brew bundle dump --global --force
to include my brew installs, which also picks up my VSCode extensions. -
I use llm3 to prompt via the command line and pipe to and from my other tools.4 I use OpenRouter to use different models depending on the task.5
- I use gitui for git. It's simple and fast with good keyboard shortcuts.
- I use bat as a replacement for
cat
. It gives me syntax color and paging. - I use ripgrep for searching in files.
- I use fd for finding files.
- I use fzf for finding things (like git branches).
- I use atuin for command history. Not sure it's a big improvement over
fzf
directly onzsh
history, though. - I use lsd for listing directories.
- I use doggo as a better
dig
. - I use gdu for disk usage.
- I use httpie for HTTP requests.
- I use iproute2mac for better networking commands, like
ip address show
andip route get 8.8.8.8
. - I use jq for JSON, paired with llm-jq for generating filters.
- I use zoxide for jumping to directories.
- I use oxipng for optimizing PNGs.
Backend Services
I like "boring" and reliable here. These are the most commonly used.
- I use Postgres for persistence unless there's a compelling reason not to.
- I've used
redis
for simple pub/sub, key/value, and caching. I started trying out valkey after the license change. - I use nginx for serving static sites.
- I use RabbitMQ if I need more complicated queueing.
- I use Elasticsearch if I must.
Backend (Python)
I mostly write backends in Python, unless there's a good reason not to.
-
I use uv for virtual environments, dependency management, tool management, building packages, and running Python scripts. It's a step function improvement over what we had before.
-
I use ruff for linting and formatting. Like everything from Astral, it's so fast that I introduce errors to verify it's working.
-
I use Pylance for type checking in
strict
mode. Microsoft restricts it from being used outside of VS Code, and it's a big reason I still use VS Code. Astral is working on a type checker, and with their track record, I'm very optimistic about it.6
-
I use Pydantic and attrs for defining types. I use
pydantic
for serializing and parsing across boundaries (like the API), andattrs
for internal types. Having multiple models feels pedantic at times, but it's largely born of my philosophy around types. I useddataclass
before, butattrs
is effectively a superset. -
I use Pydantic Settings to strongly type environment variables.
-
I use fastapi as a web framework. I generate an OpenAPI spec from it that feeds the frontend.
-
I use prisma as an ORM. Unusual choice for Python, but I like the DSL and the types that the unofficial Python port generates are better than the other ORMs I've tried.
-
I use PydanticAI to call AI models. It's simple, intuitive, and let me rip out many lines of boilerplate around tool calling. I'd highly suggest starting with this library instead of a more expansive "framework". I intially assumed that interacting with LLMs was complicated, and would benefit from a framework the same way HTTP services do. I now think of it as more of a complex interaction, where the framework is adding a layer of indirection and distracting you from your usual tools. PydanticAI is simple enough to read and even modify—I use a custom fork where I've hacked in support for passing images.
-
I use stamina to retry things that fail intermittently.7 It's tenacity with the right defaults set.
-
I use my fork of result when I want to bubble up nested error states.
-
I use httpx as an async HTTP client.
-
I use gcloud-aio for better typing when using GCP.
-
I use fern to generate Python SDKs for APIs that don't have an SDK or have a clunky / untyped one. I usually track down an OpenAPI spec and generate an SDK locally.
-
I use structlog as a logging library. Nicely formatted, stuctured, and eliminates the need for string interpolation in log messages.
-
I use Logfire for observability. It's a cleaner and less expensive Datadog from the strong Pydantic team.8
-
I use Sentry for error reporting.
-
I use VCR.py to record and replay HTTP interactions—mainly for testing with pytest-recording but also for evaluating tool calling models.
-
I use inline-snapshot to automate assertions in tests. It works really well in conjuction with VCR.py when testing LLM output, like I added here.
Frontend
Mainly web apps, mobile apps, and static landing pages.
- I use Typescript and enjoy it.
- I use React and have never felt a need to switch.
- I use Next.js, but I don't enjoy it. It's largely convention based, which isn't my style. I work around it by using
output: export
and hosting the static files withnginx
if I can. - I build REST-ish APIs to feed the frontend. I've used GraphQL before, but it feels more like a tradeoff than a strict improvement.
- I use React Query for fetching, mutating, and caching. Some of the defaults are surprising.
- I use Hey API to generate types and React Query config from the backend OpenAPI specs. It's been in flux a lot since it forked from openapi-typescript-codegen, but it's simple and well-typed.
- I use Expo for building React Native apps. For someone with very little native mobile experience, it's been great.9
- I use Refactoring UI as a resource for designing UIs. I was much worse at worse at laying out interfaces before reading it.10
How I Structure Software
These are my opinions and beliefs on effectively structuring software. They're born of my experience and style of building. I believe they are "right" in a sense, but not the only right way. They are mainly in the context of building B2B/B2C application code, and not e.g. firmware for pacemakers.
Types
-
Type checking is a great superpower of building software. A good type system allows us to translate our understanding of a domain into a set of precise rules, and the type checker ensures our that understanding remains consistent as we evolve the system. It's a powerful tool to ensure that we are in control of the complexity in our projects. I use strong typing wherever possible, and highly favor libraries that do as well.
-
It's ok to have duplicative-looking types for different layers of a project—e.g. ORM, API schema, and business logic. Types need to be easy to change and reason about so we can adapt them to our evolving requirements. Types near the edge (like API schemas and database tables) are inherently less flexible. It's often better to explicitly map types between these layers rather than create a "franken-type" and allow the least flexible layer to calcify the rest.11
- Pattern matching is my favorite control flow tool. It's versatile because the concept maps well to human decision making, where we categorize something and then have rules based on that category. It's powerful because we can use a type system to ensure we're exhaustively handling every category at every decision point, especially as we add new categories. I use union types, discriminated unions, and
match
statements liberally.12
Exceptions
- Current generation LLMs have a bad habit of catching exceptions, logging them, and returning a default value or object. That usually compounds the issue. Good software handles the exceptions it anticipates, and loudly reports the ones it doesn't. A
500 Internal Server Error
response is loud and clear. A200 OK
with an empty list is just confusing. Exceptions we anticipate can be incorporated in the type system with a result type13, allowing callers to opt in to more specific error handling (e.g. a more descriptive error message).
Configuration
-
I don't like "convention over configuration" and generally avoid libraries/platforms that use a lot of "magic" like metaprogramming. The magic is a liability when learning or coming back to a technology, and the pain of explicitly writing configuration is going to 0 with LLMs. Good libraries expose their capabilities in their function signatures and don't require regularly referencing a book of incantations.
-
I like understanding how the tools I use work. I almost never use a starter template or automated install script. If it's complicated enough to warrant automating configuration, it's worth taking a few minutes to understand what tradeoffs it's making.
Organization
- I'm not picky about code organization below the module level. Files and folders are skeumorphic and feel like an antiquated way to organize code, anyway.
Comments
- Current generation LLMs are way too comment happy. Comments should be reserved for decisions that are not intuitively obvious from the code. These include decisions to omit more robust handling (i.e.
TODO
s), and any algorithmic choices that would make a good leetcode question.
Testing
- Tests should be added for non-trivial algorithms that are easy to test in isolation, for functionality that is likely to regress, and for functionality that has regressed. Historically, automated testing and other quality assurance methods have had significant tradeoffs and a mixed return on investment. I think we'll see a lot of improvements to the effectiveness of QA as LLMs get more involved.
Footnotes
-
This guide mostly applies to B2B/B2C "web"-based applications—my tools for e.g. PKI and announcing BGP are kept in the attic. ↩
-
PicoShare's creator Michael Lynch has a very honest and transparent blog about building products / businesses that I enjoy. ↩
-
llm
's creator Simon Willison has a high-signal/high-volume blog that I read every day. It's a great resource, particularly for working with LLMs. His style of working in the open is inspiring, and sparks a lot of ideas for me. Dan Corin is another great example of building in the open like this. ↩ -
There are a bunch of great plugins for
llm
, and you can write your own. I use a wrapper I wrote to generate terminal commands and my sister's commitm to generate commit messages. It pairs well with files-to-prompt for including file context. ↩ -
OpenRouter's got a stacked team. I'm very bullish on them & happy to call them friends. ↩
-
Obviously, I'm bullish on Astral. Charlie Marsh is building a team of wizards, including sharkdp and BurntSushi, who authored tools on this list. ↩
-
Hynek Schlawack created
attrs
,stamina
, andstructlog
on this list. I resonate well with his approach to structuring software. His blog and the documentation on his projects are excellent resources, especially on Python and Docker. ↩ -
Pydantic is another great team pushing the Python ecosystem forward. Samuel Colvin and the team have great technical taste that shows across their projects. ↩
-
The This Week in React newsletter is a good source for keeping up to date with React Native news. ↩
-
I think UIs are going to change a lot as AI improves. Geoffrey Litt does insightful work on the topic of "malleable software" that I suggest reading. ↩
-
I see a lot of natural hesitation to do this because of a misapplication of the DRY Principle. There's a difference between being explicit and being duplicative. The cost of being explicit is going to 0 with LLMs, and we can harness the benefits for free. ↩
-
In Python, pattern matching is newer and a bit unintuitive. Raymond Hettinger has a talk explaining the quirks and some complicated workarounds. I use it more simply for matching types. I always check for exhaustiveness, and exclude a default case (or use
assert_never
) so the type checker tells me when I am missing a case. ↩ -
I use my fork of result for this in Python, along with stamina to retry things that fail intermittently. I prefer decorators over a bunch of
if (err)
style logic. ↩