Code & Cluster: Enterprise software

1. The Probabilistic Machine

In the rush to build "enterprise-grade" AI agents, many teams are chasing a seductive idea: what if we could make AI fully deterministic — predictable, repeatable, always correct?

It sounds reasonable. That's how traditional software works. But it starts with a fundamental misunderstanding of what these systems actually are.

Large Language Models are not knowledge databases or rules engines. At their core, they are statistical machines — predicting the probability of the next token in a sequence, over and over, until a response takes shape. When you craft a precise prompt or inject carefully curated context, you are not overriding that mechanism. You are nudging it. Good context narrows the probability distribution and raises the likelihood of a useful answer. But it doesn't change the fundamental nature of the system. The model remains probabilistic. You can engineer the inputs to the ceiling — you cannot engineer your way out of uncertainty.

2. The Enterprise Trap

This creates a real tension — because the properties that make LLMs powerful are almost perfectly opposed to what enterprises want from software.

Enterprises don't want suggestions. They want answers. They don't want "usually correct" — they want auditable, repeatable, defensible outputs. When something goes wrong, someone needs to explain exactly why the system did what it did.

A probabilistic system doesn't give you that cleanly. And so the instinct is to reach for determinism — to constrain the model until it behaves like a well-behaved service. Temperature to zero. Rigid output schemas. Exhaustive prompt engineering. Rules stacked on rules.

This isn't irrational. Repeatability, auditability, compliance — these are legitimate needs. But chasing determinism in the model itself is solving the wrong problem. You end up with a system that is neither reliably deterministic nor making full use of what the model can actually do. You've neutered the intelligence without gaining the guarantees you wanted.

The goal shouldn't be a deterministic model. It should be a reliable system. Those are not the same thing — and confusing them is where most enterprise AI projects go wrong.

3. What Actually Works: Hybrid Architecture

The real breakthrough of modern AI systems wasn't just model quality. It was a design philosophy.

Tools like ChatGPT and Claude succeeded not because they eliminated uncertainty, but because they made uncertainty part of the interaction. They don't say "here is the correct answer." They say "here's a strong answer — want me to refine it?" That subtle shift changes everything. The human stays in the loop. The model doesn't pretend to be infallible. And because of that, users actually trust the output enough to act on it.

This points toward the pattern that wins in production: a hybrid architecture where determinism and intelligence each live where they belong.

The structure looks like this. A deterministic shell handles everything that must be correct and repeatable — workflows, APIs, validation rules, policy enforcement. A probabilistic core handles everything that requires reasoning — summarization, analysis, decision support, generation. And control points sit between them — confidence thresholds, structured outputs, human approvals — managing the boundary between the two.

The LLM proposes. The system validates. The human, when needed, decides.

Determinism doesn't disappear. It moves to where it actually belongs.

4. Vestra: A Concrete Example

To make this real, consider Vestra — an AI-powered investment analysis agent I've been building.

A naive approach would let the LLM do everything: fetch market data, apply financial rules, generate investment decisions. That system would fail badly. It would hallucinate stock picks, misinterpret regulations, produce outputs that couldn't survive a compliance review.

So Vestra is deliberately split into three layers.

The deterministic shell ingests user portfolio data, computes financial metrics, and enforces business rules. This is traditional code. It behaves identically every time.

The probabilistic core — the LLM — receives only clean, verified data from that shell. It doesn't touch raw market feeds. It reasons over what it's given: evaluating the portfolio against the user's goals, time horizon, and macro context, and generating high-level strategic insights. Rebalance allocations. Increase international exposure. Add diversified index funds.

The control points manage what happens next. The LLM is constrained to return structured JSON, so downstream code can validate and process the output safely. And Vestra never acts autonomously — it presents recommendations to the user with clear disclosure that they are AI-generated. The user accepts, rejects, or refines them, and is encouraged to seek professional advice for significant decisions.

The LLM doesn't fetch data. It doesn't execute trades. It doesn't make final calls.

Determinism lives in the code. Intelligence lives in the model. Control lives with the user.

5. The Broader Principle

There's a persistent belief that keeping humans in the loop is a temporary crutch — something to tolerate until the models get good enough to go fully autonomous. That belief keeps getting disproven in production.

Human involvement isn't a limitation to engineer around. It's a design pattern that makes systems more accurate and more trusted. Users who can steer, refine, and push back on AI outputs consistently get better results than users handed a black-box decision. The interaction is where the value compounds.

That said, not every system should work this way. Where the stakes are high but the timeline allows deliberation — investment analysis, legal drafting, code review, customer support — interactivity is a feature, not a compromise. The human brings judgment the model lacks; the model brings breadth the human can't match.

But in real-time fraud detection, high-throughput automation, or millisecond trading systems, that loop collapses. There's no time for human approval. These systems need hard rules and hard thresholds, with AI informing the design rather than driving the execution.

The mistake isn't choosing the wrong architecture. It's assuming one architecture fits everything.

6. The Real Opportunity

A year or two ago, the bold prediction was that autonomous agents would replace entire workflows by now. That hasn't happened — not because the models aren't capable enough, but because the systems around them weren't designed for it. Fully autonomous agents keep failing in the same ways: confidently wrong outputs, no graceful degradation, users who don't trust them enough to act on what they produce.

What's actually working is quieter and less glamorous. Systems that surface their own uncertainty. Systems where the human is a genuine collaborator, not an afterthought. Systems where the experience is designed so carefully that raw model capability almost becomes secondary.

The opportunity isn't in building perfect AI agents. It's in building systems that help humans navigate imperfection — that make strong suggestions, explain their reasoning, accept feedback, and improve through iteration.

Progress doesn't come from eliminating uncertainty. It comes from designing systems that help people work with it.

Chasing deterministic AI agents may feel like building the future. But the real future belongs to systems that are interactive, collaborative, and intelligently imperfect.

And that's not a limitation. That's the breakthrough.

Code & Cluster

Thursday, April 9, 2026

Why Deterministic AI Agents Are The Wrong Goal ?