
Random unorganized thoughts on agent maxxing and psychosis since I crossed the rubicon a while ago
When you first realize what the agents are capable of you get obsessed, because you have basically godlike powers, and you spend a bit of time in psychosis. I think most of us have gone through that and have kind of acclimated
These days I am building in a very disciplined way now, on a single application with complicated but narrow features, and I still use 8-10 agents most of the day, each running a task that takes many hours including validation and verification and testing and all that
Generally if you have 2-3 agents each building a feature in your app, you are gonna get a lot of slop and redundancy— so you also need to be generating reports of duplicate types, consolidation and componentization, documentation, internationalization, testing, strict lint and type check and also just going through and removing all the slop and and larp
The kind of app you’re building and the approach matter a lot but people tend to be either generative, adding features— or discriminatory, finding and fixing things. LLM psychosis is what happens when you believe the outputs without verifying them, or rather you generate without discriminating
I rely heavily on test driven development and specifically end to end testing with no mocks or code that isn’t run in my live stack. I can assume the UX will be horrible and I’ll need to hand fix a lot of the frontend, but if it’s able to get through the actions at all and I can see the result then it means I have something to work with, the APIs and all that are all kind of worked out and I can start sculpting it
I like monorepos and separate, small, testable packages. If I can prove each package is individually self consistent and well tested, I can rely on it to build more complex things on top of. Packages and plugins are great for complexity management on big projects, and a lot of my agent janitor work is building reports on refactoring needs and approaches, how to properly isolate dependencies and enforce good hierarchy to avoid circular dependencies.
I can also parallelize if I’m working on separate packages or parts that dont conflict. LLMs do better with smaller code, smaller files, concise packages with clear instructions. They are magical on small bits of code. And awful at large constructions. So a big focus is packaging, re-use, clean abstraction
The agents cannot handle unit tests yet— they hallucinate massive test harnesses with their own bugs that make code search worse. Everything has to be real and scripted, and even the has to be occasionally run headfully to verify they aren’t lying.
My job after making the initial idea is purely QA. I find a bug, write it down, dispatch to an agent, move on until my context is full of bugs, then go back through and check if they were fixed. Sometimes I end up back down to just 1 agent and I’m focused on the code to get more insight.
Probably a good pattern would be no more than 3 agents after 10pm
e2e tests are very very slow and grindy, but always high value, especially in things like realtime games, so you will get bored and start doing other things
80% of what my agents are doing is report generation, implementation planning and slop removal. Most of my prompting after the planning and execution is guidance, continuation or just “okay now finish the rest” because they love to break things up into phases.
I have some fancy prompts and bindings but this is just time saving for the boilerplate crap I always say and it’s necessary
The key is discipline. Work on one project at a time, spend a lot of time thinking about the functionality and describing the user journeys, make sure you have end to end visual tests of everything a user will use and do
I have seen a lot of true LLM psychosis and this happens most when people venture out of a domain they understand. The LLM sounds smarter than it is. It is a junior trying to sound smart. This is especially true in math, physics, bio, etc. all the frontier models will claim they have proven things when they haven’t. You can keep grinding it to the truth, but probably you started with your own ideas about math or the universe and they will throw the system horribly off course in trying to please you.
As a rule, if you don’t understand the meaning of the words or code, if you can’t go in and read it and say “oh god this is all wrong”, you will guaranteeably 100% of the time slip into psychosis, even if you are smart and an expert in other thing.
If you ask your agent to plan a feature, you have to read the fucking plan. Ask it to ask you questions about anything it’s not certain of, and to identify any risks. If you don’t understand something, have it explain. The plans proposed are always 20% wrong
We are not all the same. Before LLMs I was seriously coding at least 8 hours a day 5 days a week for more than a decade, but often more than that. Some people will go nuts and make total bullshit. I build pretty complicated systems and things that do actually work. That is because even without LLMs I could do that, that was my job before.
So if you can’t plan and manage a project and architecture for 10 engineers then no, you can’t do that for a bunch of agents either
We get a lot of reward out of creation, and this can be very addicting. I am pretty addicted to building with AI myself and trying to work my way out of it. If our goal is to build a product, we have to find a way to get out of playing Claude Codex or whatever and use the things we are making. The quality of your product is directly proportional to how much you yourself use it. Most of your time should not be spent using agents at all, but using your own product, and only going back to the agents when something isn’t working well for you.