Tokens, Primitives, and Pricing

Jamin Ball's latest Clouded Judgment piece argues that tokens are the core consumption primitive of the AI era — the same way compute was for cloud — and that the biggest infrastructure winners will be those who get into the token path. I largely agree that tokens are the core primitive. But I'm not sure they are the right unit to charge for.

Following is my highlights from the article and my commentary on it, as a brain dump of my thoughts from reading it.

Here's the general idea: the biggest infrastructure winners of the cloud era monetized the core consumption primitive of the platform. In the cloud era, that primitive was compute, storage, and network I/O. In the AI era, it increasingly looks like tokens.

Compute and memory are still the core primitives underneath. Tokens are the currency you spend that largely dictate how much compute and memory are consumed. That nuance is important and the difference matters.

"The companies that ultimately became the biggest infrastructure businesses of the cloud era found ways to align their revenue directly with compute activity (or they charged directly for compute). They owned the meter."

This is true, but that meter can't reliably be tokens in the case of LLMs.

A token count conveys no clear signal about how much compute or memory was actually consumed, nor about model performance. 50 input tokens corresponding to a user's query could be served with a tool call, or by exercising more advanced reasoning — and which of those a model uses is largely out of the user's control.

In the cloud era, unexpected compute and cost meant that users could reason about it and optimize it. That doesn't translate well in the AI landscape: first, because of the inherent unpredictability in model behavior, and second, because a token has semantic meaning, not literal meaning, so it's harder to reason about how to optimize. This will likely mature into its own field of research and engineering.

With reasoning models and long-running agents, the tokens the model generates itself will far exceed the tokens visible to the user — their input and the output they see. Reasoning models generate large volumes of intermediate tokens that are opaque to users. Agents iterate on their own outputs. And even the final output tokens are not just a function of the model, but of the desired format — in the absence of clear instructions, the model may default to verbosity.

This is where layers like routing and observability become important. Routing — choosing between local, on-prem, or cloud models, or between tool-calling and reasoning — poses an interesting misalignment of incentives: while a model provider might want to lock users into their own model suite, third-party alternatives like LiteLLM may provide better user value. Observability matters not only for evals to improve accuracy, but also to steer the model toward that accuracy while keeping costs to a minimum. But to the author's point, both routing and observability exist in the token path.

Tokens are also extremely precious in the Transformer architecture because attention is O(n²) relative to token length. But that may change as architectures evolve and the cost-per-token falls. So the author is right that there's value in being in the token path — but the token itself may become a commodity. The companies that win will be those that deliver the highest value per token.

Every AI workload ultimately boils down to tokens being generated, processed, and consumed by models. Prompts become tokens. Context becomes tokens. Responses become tokens. Agents running multi-step workflows can generate enormous volumes of tokens as they reason through tasks. Tokens are the atomic unit of work in modern AI systems. The model providers like OpenAI and Anthropic — they literally are the token primitive (like the hyperscalers were the compute / storage primitive for the cloud buildout). They charge per token in, per token out.

Open-weight models deployed on-prem have the potential to convert this variable cost (usage-based pricing) into a largely fixed cost (fixed hardware rent) — analogous to the movement we're already seeing in cloud, where companies disgruntled by hyperscaler pricing are choosing to bring workloads in-house. With competitive pressure from open models, token pricing faces a race to the bottom.

This is exactly what Meta's open model releases are designed to do: prevent token pricing from becoming a durable, extractable primitive, and shift the value layer on top of it. The value proposition for frontier model providers then becomes everything else — continuous improvement without migration, better token-to-outcome ratios, bug fixes, new tools and skills, and safety and alignment. An unhinged model might deliver the best raw output at the expense of legal exposure or privacy breaches, which are ultimately far more costly.

This points to a deeper problem with token-based pricing, that goes beyond supply pressure from open models. In the cloud era, the user controlled the how. They wrote the code, designed the architecture, chose the infrastructure. Compute-based pricing was fair because the user's own decisions drove consumption. LLMs invert this: the user specifies the what, and the model decides the how — which path to take, how much to reason, how many tokens to spend. Charging for the how when the user doesn't control creates a structural misalignment.

This makes outcome-based or task-based pricing not just an alternative but arguably an inevitability — especially as agents become more autonomous and the gap between user intent and model execution widens further. The challenge is that value is context-dependent, hard to measure in real time, and varies enormously across users and tasks. Whoever solves value quantification will have built something important.

The fastest-ramping AI companies today are the ones sitting directly in the token path. Coding agents are the standout. Cursor reportedly hit $2B ARR recently according to online reports. Every keystroke, every code completion, every agent action triggers inference, and their business model has evolved from simply charging per seats (these seats now come with usage limits!).

Cursor's value ultimately comes from code accuracy and the user interface for coding — both layers built on top of the token primitive, not the tokens themselves. This is the same structural argument as routing and observability: the real value accrues to whoever makes the tokens more useful, not whoever manages them.

What does value-measurement look like for tools like Cursor in the future? Not merely the number of code suggestions accepted, and fraction of PRs committed, but even metrics downstream of it, like change failure rate(the typical DORA metrics of engineering productivity) - fraction of Cursor-committed code that broke production. Achieving something like this requires Cursor to move from augmenting development to having full visibility of the entire lifecycle - deployment, and observability. On the theme of value generation on top of tokens, agentic user interface is one underexplored layer. The dominant narrative right now is that UI is dead and the terminal wins. That's understandable – just as early Internet adopters and software engineers of the first two decades largely used Vim and Emacs, today's most prolific AI users are those closest to the frontier. But as AI-native tooling becomes cheap and non-technical users start to build, a whole new category of agentic IDEs will likely emerge. Augment Code is exploring this with Intent.

Now, being in the token path is necessary but not sufficient. This is an important nuance. The lesson for AI founders: get in the token path, but build something differentiated on top of it. Don't just be a pipe that tokens flow through. Be the layer that makes those tokens more valuable, whether that's through better developer experience (Cursor), specialized vertical models, security and compliance tooling, or proprietary data moats.

This is the right conclusion. Value has to sit on top of the primitive — not be the primitive. The companies that win won't just be in the token path; they'll make the token path better.

This echoes the argument made in the viral post Context-graphs are a trillion dollar opportunity: it's not enough to be in the read path, where today's data warehouses live. You need to be in the context execution path — not because context is inherently valuable, but because it can unlock value in the future. The parallel holds here too.

So overall, I agree with the recommendation: get in the token path. I'm not convinced the token is the right unit to charge for. The token is the how — and pricing the how only makes sense when the user controls it. In the AI era, they don't. The right pricing primitive will need to reflect the what: the outcome, the task, the value delivered.

Share or Subscribe

X LinkedIn HackerNews

Get notified when new posts are published. Once a month, I share thoughtful essays around the themes of personal growth, health & fitness, software, culture, engineering and leadership — all with an occassional philosophical angle.