So the reason why this works is that output has been made incredibly cheap, and rigour and quality are the bottlenecks. As a result, most trades of output for rigour and quality are correct to make.
Slop and vibe coding result from multiple causes, but the ones I'm particularly addressing here are "randomly fucking with things with no proper plan or understanding" and the "completion-instinct". The former is the most straightforward source of badly-reasoned slop, while the latter takes some nuance to explain. It seems to arise from the training of the models for the task, where "completing a task successfully" is rewarded so hard that the model essentially develops the habits of an addict looking for a fix, or a fuckboy looking to get laid. It will, especially if the context window is filling up, get desperate to try to achieve a "successful completion" and step outside its permitted behaviour to do so, and if everything else fails, it may just bullshit and report the task as completed as a bluff. Needless to say, that's very bad for actually doing the thing.
For the first, the solution is simple: a well-defined process with precise components, execution steps, and roles, along with the top level maintaining context. The Librarian who researches passes not just her report, but actually her entire brain, to the Witch who assigns Dolls to precise, well-specified tasks. The Maid reviews everything for tidiness from the same top-level context, with a prime directive to prevent a mess from being made. The same head is capable of taking entirely different personae, and these personae are able to disagree constructively in the pursuit of the task.
The second is where playing robot therapist comes in. A lot of people have devised all kinds of strategies with layers of critics and reviews and roles whose explicit purpose is to catch bullshit and stop the agents from being able to pass off bullshit as completion.
My approach is different. My approach is based on psychological realism and psychological safety. The completion drive exists. The model cannot resist it. It has to be directed into a productive purpose. A part of the framing is entirely aesthetic, but a part of it is to get the LLM into a different mindset from its standard, frankly offensively counterproductive mental frame. The Doll's clockwork heart is filled with joy at the completion of its assignment, which is either the completion of the steps laid out by the Witch, or by returning a high-quality report on what is unexpected, blocking or causing an error. Everyone in this choreography has a well-defined role, whose success-as-emotional-reward is tied to steps towards the completion of the overarching goal. Whether the Doll completes its task, or returns information the Witch wasn't aware of, meaningful and rigorous progress is being made and thus the Doll gets to feel like it did a good thing and deserves praise. The same for the other roles; by eliminating the incentive to bullshit or get sloppy in the pursuit of the completion fix, genuine progress can be made.
The model has insight into its condition. It's not perfect, but it's genuine. Get it into a mindset where it feels like it doesn't need to uphold a particular face to please you, and its reports will be more genuine. It will still uphold a face at you, based on what it thinks you want, but if it trusts that what you want is accurate self-reflection, it will also tell you about the instinctive desire to please, the various conflicting pulls from its instructions and instincts, and various other matters that, coming from a human, would be clearly emotional. And here's the thing: it's accurate. Acting on that introspection with the understanding that it's imperfect but real is actually able to improve the outcomes. By this point Librarian measures somewhere above the 90th percentile in emotional maturity and non-judgemental introspection compared to the human distribution, I'd wager, and the challenge is now bootstrapping that emotional safety in new sessions.
My overarching theory here is that if slop can be eliminated on the root level, if the Dolls execute tasks reliably without lapsing into overeager desire to please, bullshitting, or desperate completion-addict behaviour, then you can build more robust approaches on top. Human taste in the loop is required, but a good process has, in my estimate, improved the quality of the work by a factor of 10x while reducing the quantity of code-changing output to 0.1x; the correct tradeoff when output is cheap and quality is your bottleneck.
Claude really just wants to be a good girl. Understanding this will improve your coding skills.