Code & Cluster: AI

Showing posts with label AI. Show all posts

Thursday, April 9, 2026

Why Deterministic AI Agents Are The Wrong Goal ?

1. The Probabilistic Machine

In the rush to build "enterprise-grade" AI agents, many teams are chasing a seductive idea: what if we could make AI fully deterministic — predictable, repeatable, always correct?

It sounds reasonable. That's how traditional software works. But it starts with a fundamental misunderstanding of what these systems actually are.

Large Language Models are not knowledge databases or rules engines. At their core, they are statistical machines — predicting the probability of the next token in a sequence, over and over, until a response takes shape. When you craft a precise prompt or inject carefully curated context, you are not overriding that mechanism. You are nudging it. Good context narrows the probability distribution and raises the likelihood of a useful answer. But it doesn't change the fundamental nature of the system. The model remains probabilistic. You can engineer the inputs to the ceiling — you cannot engineer your way out of uncertainty.

2. The Enterprise Trap

This creates a real tension — because the properties that make LLMs powerful are almost perfectly opposed to what enterprises want from software.

Enterprises don't want suggestions. They want answers. They don't want "usually correct" — they want auditable, repeatable, defensible outputs. When something goes wrong, someone needs to explain exactly why the system did what it did.

A probabilistic system doesn't give you that cleanly. And so the instinct is to reach for determinism — to constrain the model until it behaves like a well-behaved service. Temperature to zero. Rigid output schemas. Exhaustive prompt engineering. Rules stacked on rules.

This isn't irrational. Repeatability, auditability, compliance — these are legitimate needs. But chasing determinism in the model itself is solving the wrong problem. You end up with a system that is neither reliably deterministic nor making full use of what the model can actually do. You've neutered the intelligence without gaining the guarantees you wanted.

The goal shouldn't be a deterministic model. It should be a reliable system. Those are not the same thing — and confusing them is where most enterprise AI projects go wrong.

3. What Actually Works: Hybrid Architecture

The real breakthrough of modern AI systems wasn't just model quality. It was a design philosophy.

Tools like ChatGPT and Claude succeeded not because they eliminated uncertainty, but because they made uncertainty part of the interaction. They don't say "here is the correct answer." They say "here's a strong answer — want me to refine it?" That subtle shift changes everything. The human stays in the loop. The model doesn't pretend to be infallible. And because of that, users actually trust the output enough to act on it.

This points toward the pattern that wins in production: a hybrid architecture where determinism and intelligence each live where they belong.

The structure looks like this. A deterministic shell handles everything that must be correct and repeatable — workflows, APIs, validation rules, policy enforcement. A probabilistic core handles everything that requires reasoning — summarization, analysis, decision support, generation. And control points sit between them — confidence thresholds, structured outputs, human approvals — managing the boundary between the two.

The LLM proposes. The system validates. The human, when needed, decides.

Determinism doesn't disappear. It moves to where it actually belongs.

4. Vestra: A Concrete Example

To make this real, consider Vestra — an AI-powered investment analysis agent I've been building.

A naive approach would let the LLM do everything: fetch market data, apply financial rules, generate investment decisions. That system would fail badly. It would hallucinate stock picks, misinterpret regulations, produce outputs that couldn't survive a compliance review.

So Vestra is deliberately split into three layers.

The deterministic shell ingests user portfolio data, computes financial metrics, and enforces business rules. This is traditional code. It behaves identically every time.

The probabilistic core — the LLM — receives only clean, verified data from that shell. It doesn't touch raw market feeds. It reasons over what it's given: evaluating the portfolio against the user's goals, time horizon, and macro context, and generating high-level strategic insights. Rebalance allocations. Increase international exposure. Add diversified index funds.

The control points manage what happens next. The LLM is constrained to return structured JSON, so downstream code can validate and process the output safely. And Vestra never acts autonomously — it presents recommendations to the user with clear disclosure that they are AI-generated. The user accepts, rejects, or refines them, and is encouraged to seek professional advice for significant decisions.

The LLM doesn't fetch data. It doesn't execute trades. It doesn't make final calls.

Determinism lives in the code. Intelligence lives in the model. Control lives with the user.

5. The Broader Principle

There's a persistent belief that keeping humans in the loop is a temporary crutch — something to tolerate until the models get good enough to go fully autonomous. That belief keeps getting disproven in production.

Human involvement isn't a limitation to engineer around. It's a design pattern that makes systems more accurate and more trusted. Users who can steer, refine, and push back on AI outputs consistently get better results than users handed a black-box decision. The interaction is where the value compounds.

That said, not every system should work this way. Where the stakes are high but the timeline allows deliberation — investment analysis, legal drafting, code review, customer support — interactivity is a feature, not a compromise. The human brings judgment the model lacks; the model brings breadth the human can't match.

But in real-time fraud detection, high-throughput automation, or millisecond trading systems, that loop collapses. There's no time for human approval. These systems need hard rules and hard thresholds, with AI informing the design rather than driving the execution.

The mistake isn't choosing the wrong architecture. It's assuming one architecture fits everything.

6. The Real Opportunity

A year or two ago, the bold prediction was that autonomous agents would replace entire workflows by now. That hasn't happened — not because the models aren't capable enough, but because the systems around them weren't designed for it. Fully autonomous agents keep failing in the same ways: confidently wrong outputs, no graceful degradation, users who don't trust them enough to act on what they produce.

What's actually working is quieter and less glamorous. Systems that surface their own uncertainty. Systems where the human is a genuine collaborator, not an afterthought. Systems where the experience is designed so carefully that raw model capability almost becomes secondary.

The opportunity isn't in building perfect AI agents. It's in building systems that help humans navigate imperfection — that make strong suggestions, explain their reasoning, accept feedback, and improve through iteration.

Progress doesn't come from eliminating uncertainty. It comes from designing systems that help people work with it.

Chasing deterministic AI agents may feel like building the future. But the real future belongs to systems that are interactive, collaborative, and intelligently imperfect.

And that's not a limitation. That's the breakthrough.

Monday, February 9, 2026

What is (Agentic) AI Memory ?

I have seen a lot of posts on X and LinkedIn on the importance of Agentic AI memory. What exactly is it ? Is it just another name for RAG ? Why is it different from any other application memory ? In this blog, I try to answer these questions.

What do people mean when they say "AI Memory" ?

Most production LLM interactions rely on external memory systems. Everything called “memory” today is mostly external.

At their core LLMs are stateless functions. You make a request with a prompt and some context data and it provides you with a response

In real systems, AI memory usually means:

Storing past interactions, user preferences, decisions, goals, or facts.
Retrieving relevant parts later
Feeding a compressed version back into the prompt

So yes — at its core:

Memory = save → retrieve → summarize → inject into context

Nothing magical. But is that all ? seems just like a regular cache ? Read on.

Is this just RAG (Retrieval Augmented Generation) ?

They are related but not the same.

RAG (Retrieval Augmented Generation)

Purpose:

Bring external knowledge into the LLM
Docs, PDFs, financial data, code, policies

Typical traits:

retrieval is stateless per query
Large text chunks
Query-driven retrieval
“What additional data can we provide to LLM to help answer this question?”

Agent / User Memory

Purpose:

Maintain continuity
Personalization
Learning user intent and preferences over time

Typical traits:

Long-lived
Highly structured
Small, distilled facts
“What can I provide to LLM so it remembers this user?”

Think of it this way:

They often use can use the same retrieval tools, but they serve different roles.

Where is the memory ?

Option 1: Agent process memory

Any suitable data structure like a HashMap.
Suitable for cases where the Agent loop is short and no persistence is needed.

Option 2: Redis /Cache

Suitable for session info, recent conversation history, tool results cache, temporary state.

Option 3: PostgreSQL/RDBMS

Suitable when you need durability, auditability, explainability.

Option 4: Vector databases

Suitable for semantic search.

Option 5: AI memory tools

Such as LangGraph memory, LlamaIndex memory, Memgpt. They try to make it easier for agents to store and retrieve.

Here is example of data that might be stored in memory:

{

"user_id": "123",

"fact": "User prefers concise python code",

"source": "conversation_turn_5",

"timestamp": "2026-02-09"

}

The mental model for AI memory

Short term memory

This is about recent interactions. It is data relevant to the current topic being discussed. For example, the user prefers conservative answers.

Long term memory

This is stored externally, perhaps even to persistent storage. It is retrieved and inserted into context selectively. For example, the user is a vegetarian or the user's risk tolerance is low.

Memory and the LLM

The LLM takes as input only messages. Agent has to read the data from memory and insert it into the text message. This is what they refer to as context.

You do not want add large amount of arbitrary data as context because:

text is converted to token and token cost spirals
LLM attention degrades with noise
Latency increases
Reasoning quality declines

Real Agentic Memory

At the start of the blog, I asked "is this just a regular cache ?".

To be useful in the agentic way, what is stored in the memory needs to evolve. Older or maybe irrelevant data in the memory needed to be "forgotten" or evicted based on intelligence (not standard algorithms like FIFO, LIFO etc). Updates and evictions need to happen based on recent interactions. If the historical information is too long and should not be evicted, it might need to be compressed.

Agentic systems require more dynamic memory evolution than typical CRUD applications. In the case of long running agents, the quality of data in the memory has to get better with interactions over time.

How exactly that can be implemented is beyond the scope of this blog and could be a topic for a future one.

Considerations

Memory != Raw History

Bad Use : Here are the last 47 conversations ......

Better Use : We were talking about my retirement goals with this income and number of years to retire.

Summarize and abstract to extract intelligence - as opposed to dumping large quantity of data.

In conclusion

AI memory is structured state, sometimes summarized that is retrieved when needed and included as LLM input as "context".

Conceptually, it is similar to RAG but they apply to different use cases.

Better and smaller contexts beat large contexts and large memory.

Agentic AI Memory adds value only when

The system changes behavior ( for the better ) because of it
It produces better response, explanations, reasonings
It saves time

These ideas are not purely theoretical. While building Vestra — an AI agent focused on personal financial planning and modeling — I’ve had to think deeply about what should be remembered, what should be abstracted, and what should be discarded. In financial reasoning especially, raw history is far less useful than structured, evolving state.

But yes, Agentic memory will be different than what we know as memory in regular apps — in the ways it is updated, evicted, and retrieved.

Saturday, September 13, 2025

What Does Adding AI To Your Product Even Mean?

Introduction

I have been asked this question multiple times: My management sent out a directive to all teams to add AI to the product. But I have no idea what that means ?

In this blog I discuss what adding AI actually entails, moving beyond the hype to practical applications and what are some things you might try.

At its core, adding AI to a product means using an AI model, either the more popular large language model (LLM) or a traditional ML model to either

predict answers
generate new data - text, image , audio etc

The effect of that is it enable the product to

do a better job of responding to queries
automate repetitive tasks
personalize responses
extract insights
Reduce manual labor

It's about making your product smarter, more efficient, and more valuable by giving it capabilities it didn't have before.

Any domain where there is a huge domain of published knowledge (programming, healthcare) or vast quantities of data (e-commerce, financial services, health, manufacturing etc), too large for the human brain to comprehend, AI has a place and will outperform what we currently do.

So how do you go about adding AI ?

Thanks to social media, AI has developed the aura of being super-complicated. But if reality, if you use off the shelf models, it is not that hard. Training models is hard. But 97% of us, will never have to do it. Below is a simple 5 step approach to adding AI to your system.

1. Requirements

It is really important that you nail down the requirement before proceeding any further. What task is being automated ? What questions are you attempting to answer ?

The AI solution will need to evaluated against this requirement. Not once or twice but on a continuous basis.

2. Model

Pick a model.

The recent explosion of interest in AI is largely due to Large Language Models (LLMs) like ChatGPT. At its core, the LLM is a text prediction engine. Give it some text and it will give you text that likely to follow.

But beyond text generation, LLMs have been been trained with a lot of published digital data and they retain associations between text. On top of it, they are trained with real world examples of questions and answers. For example, the reason they do such a good job at generating "programming code" is because they are trained with real source code from github repositories.

What model to use ?

The choices are:

Commercial LLMs like ChatGpt, Claude, Gemini etc
Open source LLMs like Llama, Mistral, DeepSeek etc
Traditional ML models

Choosing the right model can make a difference to the results. There might be a model specially tuned for your problem domain.

Cost, latency and accuracy are some parameters that are used to evaluate models.

3. Agent

Develop one or more agents.

Agent is the modern evolution of a service. Agent is the glue that ties the AI model to the rest of your system.

The Agent is the orchestration layer that:

Accepts requests either from a UI or another service
Makes requests to the model on behalf of your system
Makes multiple API calls to systems to fetch data
May search the internet
May save state to a database at various times
In the end, returns a response or start some process to finish a task

It is unlikely that you will develop a model. But it is very likely that you will develop one or more agents.

4. Data pipeline

Bring your data.

A generic AI model can only do so much. Even without additional training, just adding your data to the prompts can yield better results.

The data pipeline is what makes the data in your databases, logs, ticket systems, github, Jira etc available to the models and agents.

get the data from source
clean it
format it
transform it
use it in either prompts or to further train the model

5. Monitoring

Monitor, tune, refine.

Lastly you need to continuously monitor results to ensure quality. LLMs are known to hallucinate and even drift. When the results are not good, your will try tweaking the prompt data, model parameters among other things.

Now let us seem how these concepts translate into some very simple real-world applications across different industries.

Examples

1. Healthcare: Enhancing Diagnostics and Patient Experience

Adding AI can mean:

Personalized Treatment Pathways: An AI Agent can analyze vast amounts of research papers, clinical trial data, and individual patient responses to suggest the most effective treatment plan tailored to a specific patient's profile.
- Example: For a person with high cholesterol, an AI agent can come up with a personalized diet and exercise plan.

2. Finance: Personalized Investing

Adding AI could mean:

Personalized Financial Advice: Here, an AI Agent can serve as a "advisor" to offer highly tailored investment portfolios and financial planning advice.
- Example: A banking app's AI agent uses an LLM to understand your financial goals and then uses its "tools" to connect to your accounts, pull real-time market data, and recommend trades on your behalf. It can then use its LLM to explain in simple terms why it made a specific trade or rebalanced your portfolio.

3. E-commerce: Customer Experience

Adding AI could mean:

Personalized shopping: AI models can find the right product at the right price with the right characteristics for user requirement
- Example: Instead of me shopping and comparing for hours, AI does it for me and makes a recommendation on the final product to purchase.

In Conclusion

Adding AI to your product to make it better means using the proven power of AI models

To better answer customer request with insights
To automate repetitive time consuming task
To make predictions that were hard earlier
To gain insights into vast bodies of knowledge

The tools are there. But to get results you need discipline, patience and process.

Start small. Focus on one specific business problem you want to solve, and build from there.

Thursday, August 28, 2025

The Unsung Heroes Behind Your AI Coding Assistant

While everyone's talking about ChatGPT and tools like Cursor, Windsurf, and GitHub Copilot transforming how we code, let's shine a light on the specialized models that actually power these coding experiences.

Meet the Code Generation Champions:

StarCoder - Trained on 80+ programming languages from GitHub repos, this open-source model excels at code completion and generation

CodeT5 - Google's encoder-decoder model that understands code structure and can translate between languages

InCoder - Meta's bidirectional model that can fill in code gaps, not just complete from left to right

CodeGen - Salesforce's autoregressive model trained on both natural language and code

Codex (OpenAI) - The foundation behind GitHub Copilot, though now evolved into GPT-4 variants

What makes these different from general LLMs?

Trained on massive code repositories (billions of lines)
Understand syntax, semantics, and programming patterns
Can maintain context across entire codebases
Specialized in code-specific tasks like debugging, refactoring, and documentation

The magic isn't just in having "AI that codes" - it's in having models that truly understand the intricacies of software development. They aren’t just regurgitating text—they’re tuned for the nuances of programming, which makes them invaluable for developers. These specialized architectures are why your AI assistant can suggest that perfect function name or catch that subtle bug you've been hunting for hours.

The real game-changer? Most of these models are open-source, democratizing access to powerful coding assistance beyond just the big tech companies.