Tokens and Dreams

The one great principle of the English law is, to make business for itself.

photos/tokens-and-dreams-0.png

At work we were talking about metrics. Well, they were talking about metrics, and then when they realized they had none, they asked me to generate some. I spent an hour or so putting together a script that pulled all the relevant historical data from our database, cleaned and normalized into a CSV. Then I fed the CSV into AI. With only a sentence or two of prompt, AI extrapolated meaningful signals, produced no less than 5 graphs, and correlated the data with external market signals which were not explicit in the data-set. It also built a polished interactive dashboard. A dashboard! A dashboard! I've said it in my head so many times I don't even know what it fucking means. Dash board. But I know this: Everyone wants dashboards. AI knows it, too.

Around this same time, I was doing more AI coding experiments. I'd been stuck between the hype, the Big Fear, and the mundane reality of trying to get AI to write good code. I wanted to understand what was real and what was not. My detector has been giving me conflicting signals. Given the exact same prompt and inputs, one run of Claude goes in a wildly different direction than another. Junk gets confidently asserted as not-junk, and then when corrected, the model just as confidently asserts that it was, indeed, junk all along. A turn or two later it's right back at it. Memories of preferences ("don't you even THINK about committing code") get forgotten in the whirl of tokens. em-dashes appear. Tangles of weedy code spread with each iteration, and need a vigorous hand-pruning to keep things clean, because the show must go on. Even special workflows dedicated to reducing sprawl tend to come back with gems like "X cannot be refactored because Y is load-bearing."

Load bearing. My inlaws have a lake house down in the Ozarks, and there was this 4" metal pole in the basement underneath a support beam. My father-in-law hated that pole because it was right in everyone's way when going out to the dock. I'm not sure how he determined this, but one day he decided that it was not load bearing, and he removed it. We spent an anxious night that night in the basement bedroom wondering if we'd be awakened by unborne loads coming down on top of us. Thankfully, he was right, or more accurately, right enough.

photos/claude-and-his-bros-writing-my-code.jpg

Claude and the boys sit down to write me some code.

The evidence is puzzling. When someone's daily experience of AI is of the dashboard variety, the hype seems real (and with it the concomitant Fear). Yet I continue to find myself in this no-mans-land of yeah, AI is legit for some stuff, but it also sucks at some stuff. When pushing back against a hype claim that landed in my Slack DMs, I find myself constantly wanting to attach a disclaimer: "No! I embrace these tools! This is not thinly-veiled self-preservation, just hear me out!" And then I want to attach a sub-disclaimer to that one that says "(well, actually it is, but just not in the way that you think)". But Slack doesn't have a disclaimer and sub-disclaimer feature yet. It'd be nice, given that silver bullets are absolutely raining down these days. I sometimes find myself wondering, with resignation, do I just need to take the leap of faith, let go, and become machine? But no, you all know me better than that.

cybernetics

The map is not the territory.

Brooks' Tarpit and Agentic Quicksand

In his essay No Silver Bullet, written 40 years ago, Fred Brooks argues that the difficulty in shipping software has never been the code-writing. Instead it's the inherent, irreducible complexity of the problem itself, which the software is designed to solve, that makes it so challenging:

The essence of a software entity is a construct of interlocking concepts: data sets, relationships among data items, algorithms, and invocations of functions. This essence is abstract, in that the conceptual construct is the same under many different representations. It is nonetheless highly precise and richly detailed.

I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representation.

For proof, I'd point to the fact that, even with 40 years to develop best-practices, modern languages and incredibly advanced code authoring tools, we're still drowning in buggy, bloated software. Subjectively, it seems to be getting worse.

photos/its-just-teamchat-bro.png

It's just teamchat, bro.

Who manages this complexity, nowadays, besides the programmer? At best, it is encoded in rigorously-updated design docs. More usually it exists in obsolete documentation that one go-getter wrote a couple years ago and never touched again, poorly-normalized database schemas, and perhaps some code comments and inline documentation (also suspect, since we all have been guilty of updating code and forgetting the associated prose comments). In the end it's the code itself that is the true representation of this essential difficulty.

When the programmer is removed as the steward of the code and replaced by AI, what happens? The true-believers argue that control is retained, it has simply shifted to the network of prompts, skills, code, tests, and agents. I would argue, based on my own experience, that this is actually where things break down. Agentic, AI-modulated feedback loops inevitably become self-referential and at some point the loop closes, because the thing anchoring it to reality - the programmer - can no longer keep up. Code gets written, reviewed, red-teamed, tested and modified using the same system that produced it. Because the speed at which AI produces code far exceeds what any person can reasonably review and fully understand, there's a kind of event horizon that gets crossed. The break occurs and beyond that point the ownership of the code is implicitly transferred.

The consequences of an AI feedback loop go beyond the loss of that fundamental layer of control. When the system can no longer be managed by human programmers, errors in design compound unchecked. You may end up with many islands which are internally consistent, but do not compose well with one another, much less produce a coherent whole.

Worse still, this drift leads to real costs. Every iteration consumes tokens, so the code-base is not merely accumulating noise but paying to accumulate it. The code continues to grow, the context-window is stretched, and the consequence is yet more code, more tokens. The feedback loop becomes an economically self-reinforcing pit of quicksand. Maybe just a few more hours of token-spend will fix it... But the tens- or hundreds-of-thousands of lines of code sitting behind the software artifact resist attempts to refactor, because the refactor requires the same tools which introduced the problems in the first place.

I recently had a call with an AI-native developer where, with refreshing candor, he showed me his AI development process. Several times he mentioned, almost apologetically and without a trace of defensiveness, that he was not a "real developer", yet he had vibe-coded a real product. He expressed frustration with opaque token spend, noted he had paid several thousand dollars over his normal usage just to get his agents "un-stuck" and the looms spinning again. He talked about the addictive feeling of making progress as his agents churned.

One of the features in his application was a dashboard visualization of a knowledge-graph, each node brightly illuminated against a background web of connections. But for reasons which remain obscure, the graph had a tendency to wiggle around and reconfigure itself so that one was forced to mechanically mouse over nodes at random until the tooltip informed you that you'd arrived at the node you were interested in. As we sat in silence on the call, while his cursor moused around, I wondered, how many more thousand dollar re-ups would it take to get the dashboard to sit still and behave?

A guy on our sales team demoed an AI-powered dashboard for us on Monday. It made a number of mistakes, including hallucinating a link to a new housing regulation for California. The link was part of an automated "drip" email campaign intended to go out to customers.

"What card number is our Claude running on?" my boss asked me yesterday. "I want to just make sure it's on the credit card and not a debit card."

Comments (0)


Post a comment