Code & Cluster: Artificial Intelligence

Showing posts with label Artificial Intelligence. Show all posts

Saturday, March 7, 2026

AI Context Explained: The Real Engineering Behind Modern AI Systems

Most discussions about large language models focus on prompts — how to phrase instructions to get better responses. But in real AI systems, prompts are only a small part of the story.

What actually determines the quality of an AI system is context: the information available to the model when it generates a response. This includes prompts, conversation history, retrieved documents, tool outputs, and sometimes structured application state. Designing how this information is assembled and provided to the model is what many engineers now call context engineering.

Providing the right context to the LLM is the only reliable way to get accurate, production-grade answers. In this blog, I explore what context actually is, the hidden dangers of massive context windows, and how it should be used in Agentic AI.

Context

Context refers to all the information that is not in the user's immediate question, but is required to help the LLM generate a relevant, highly specific answer. It is the data that gives the LLM situational awareness.

Consider this basic prompt:

Prompt: What is a good stock mutual fund to invest in? Response (Abbreviated): 1. T. Rowe Price Global Technology Fund (PRGTX) 2. Wasatch Ultra Growth Fund (WAMCX).

For many investors, both of these are far too aggressive, high-risk, and expensive. Let's change the prompt slightly to inject some context:

Prompt: What is a good stock mutual fund to invest in? I am 56 years old, nearing retirement. I prefer low-risk, low-cost, highly diversified funds. Response (Abbreviated): 1. Vanguard Target Retirement 2035 Fund (VTTHX) 2. Fidelity ZERO Total Market Index Fund (FZROX) 3. Vanguard Total Bond Market Index Fund (VBTLX).

This response is drastically different and entirely appropriate for a conservative investor. The phrase "56 years old, nearing retirement. I prefer low-risk, low-cost, highly diversified funds" is the context. Without it, asking the LLM the same question multiple times will yield scattered, generic, or even dangerous financial advice.

How is Context Passed to the Model?

Whether you use native provider APIs (OpenAI, Google) or orchestration frameworks like LangChain, context is not a separate magical parameter. It is embedded directly into the input messages.

A raw API call looks like this:

# Python
client.responses.create(
    model="gpt-5.2",
    messages=[ 
        {"role": "system", "content": "You are a financial advisor."}, # system prompt
        {"role": "user", "content": "What is a good fund to invest in?"} # User prompt or query
        {"role": "user", "content": "I am 56 and prefer low risk."}, # Context    ],
    temperature=0.0
)

Everything the LLM knows is stuffed into that messages array. In an agentic system, it is all about getting the right information into that array at the right time.

Context generally falls into three categories:

Static: Data that rarely changes (e.g., "User is a male, NY Yankees fan, foodie").
Dynamic: Data that evolves as the agent runs and interacts with tools (e.g., the results of a real-time stock price lookup).
Long-Lived: Data that spans across multiple sessions or days (e.g., "User already rejected the Vanguard recommendation yesterday").

In practice, building AI systems is often less about “prompt engineering” and more about deciding what information should be included in the model’s context at the moment of inference.

The data comes from a variety of sources. Cache would have the most recent data. The databases have the system of record. The log have events as they occurs. Context engineering involves getting the right data from the right place at the right time.

Note that context can go stale. If you feed in stale context, you get less the accurate answers. Keeping it current is part of the engineering.

The Illusion of the Infinite Context Window

The context window refers to how much of your context the LLM retains for a conversation. The more it can remember, the better. Right ?

LLM providers aggressively advertise their context window sizes. Bigger appears better, but that is a dangerous trap for developers.

The context window simply represents the hard cap on how much text an LLM can "see" at once. Look at the landscape in early 2026:

Meta Llama 4 Scout: ~10 Million tokens
Gemini 3 Pro: ~1.0M - 2.0M tokens
OpenAI GPT-5.2: ~400,000 tokens
Claude 4.5 Sonnet: ~1.0M tokens
DeepSeek-R1: ~164,000 tokens

However, the context window is a model attribute, not an agent capability. Research in 2025 and 2026 has consistently proven that models severely degrade well before hitting their upper limits. This phenomenon is known as Context Rot.

Just because a model can accept 1 million tokens (about 8 full novels) doesn't mean it pays equal attention to all of them. Studies show that when a context window passes 50% capacity, models begin to heavily favor tokens at the very beginning or the very end of the prompt, completely ignoring critical constraints buried in the middle.

The industry is now focusing on the Maximum Effective Context Window (MECW). A model might advertise 1 million tokens, but its MECW—the point where accuracy actually drops off a cliff—might be only 130k tokens.

The Agent Loop

Because of Context Rot, you cannot just dump an entire database into the LLM and expect it to figure things out. This is why we build Agents.

An LLM is a stateless text predictor. An Agent is a software loop that uses the LLM as a reasoning engine to manage its own context. Agents operate in a continuous cycle: Observe → Think → Act.

Imagine building an AI-based investment analysis product. The agent doesn't just ask the LLM one massive question. It loops:

Observe: The user asks, "Should I adjust my portfolio for the upcoming rate cuts?"
Think (LLM): The model realizes it lacks context. It outputs a tool-call: get_user_portfolio() and get_risk_tolerance().
Act (Code): The Python orchestration framework queries a PostgreSQL database to fetch the financial profile.
Update Context: The framework appends only the relevant portfolio metrics into the messages array.
Loop: The agent sends this newly enriched, highly specific context back to the LLM to generate the final advice.

In this loop, the context is actively mutating. The agent is continuously pruning the messages array, summarizing old turns, and injecting fresh tool outputs to keep the token count well within the Maximum Effective Context Window.

Prompt Engineering vs. Context Engineering

If prompt engineering is about how you ask the question, context engineering is about what the model knows before it attempts to answer.

To use an operating system analogy: The LLM is the CPU, the Prompt is the executable command, and the Context Window is the RAM and context is the data in RAM.

Prompt Engineering is writing a better command. It is user-facing, static, and brittle.
Context Engineering is managing the data in RAM. It is developer-facing, dynamic, and systemic.

As we move toward enterprise-grade AI, prompts are no longer enough. Context engineering involves building the infrastructure that feeds the model. It encompasses Retrieval-Augmented Generation (RAG) to find specific documents, Episodic Memory Graphs to track user decisions over time, and Context Pruning to prevent token overflow.

The Frontier: Context Graphs

While context engineering today is mostly about managing lists of messages, the future of enterprise AI lies in Context Graphs. Current LLM context is linear—a flat, chronological scroll of "User said X, Agent did Y." This works for chat, but it fails for complex enterprise workflows. Real-world business data isn't a timeline; it's a web of relationships.

Enter the Context Graph. Instead of dumping raw logs into the window, advanced agents now build and maintain a dynamic graph structure. Nodes represent entities (User, File, Decision, Error). Edges represent causality or relationships (e.g., User Upload caused Error 500, which triggered Retry Logic).

This structure transforms the context from a "temporary scratchpad" into an organizational brain. If a human auditor later asks, "Why did the agent reject this loan application?", a linear log forces the LLM to re-read thousands of lines of text to guess the reason. A Context Graph simply traverses the edge: Loan Application -> {rejected\_because}} -> Risk Score > 80.

For enterprise applications, this is the missing link. It allows agents to reason across disconnected data points (e.g., linking a Slack message from Tuesday to a Code Commit on Friday) without needing a massive, expensive context window to hold all the noise in between.

Conclusion

A perfectly engineered prompt might get you a clever answer once. But a well-engineered context pipeline ensures your Agent gets the accurate answer securely, cost-effectively, and consistently, every single time.

Monday, February 9, 2026

What is (Agentic) AI Memory ?

I have seen a lot of posts on X and LinkedIn on the importance of Agentic AI memory. What exactly is it ? Is it just another name for RAG ? Why is it different from any other application memory ? In this blog, I try to answer these questions.

What do people mean when they say "AI Memory" ?

Most production LLM interactions rely on external memory systems. Everything called “memory” today is mostly external.

At their core LLMs are stateless functions. You make a request with a prompt and some context data and it provides you with a response

In real systems, AI memory usually means:

Storing past interactions, user preferences, decisions, goals, or facts.
Retrieving relevant parts later
Feeding a compressed version back into the prompt

So yes — at its core:

Memory = save → retrieve → summarize → inject into context

Nothing magical. But is that all ? seems just like a regular cache ? Read on.

Is this just RAG (Retrieval Augmented Generation) ?

They are related but not the same.

RAG (Retrieval Augmented Generation)

Purpose:

Bring external knowledge into the LLM
Docs, PDFs, financial data, code, policies

Typical traits:

retrieval is stateless per query
Large text chunks
Query-driven retrieval
“What additional data can we provide to LLM to help answer this question?”

Agent / User Memory

Purpose:

Maintain continuity
Personalization
Learning user intent and preferences over time

Typical traits:

Long-lived
Highly structured
Small, distilled facts
“What can I provide to LLM so it remembers this user?”

Think of it this way:

They often use can use the same retrieval tools, but they serve different roles.

Where is the memory ?

Option 1: Agent process memory

Any suitable data structure like a HashMap.
Suitable for cases where the Agent loop is short and no persistence is needed.

Option 2: Redis /Cache

Suitable for session info, recent conversation history, tool results cache, temporary state.

Option 3: PostgreSQL/RDBMS

Suitable when you need durability, auditability, explainability.

Option 4: Vector databases

Suitable for semantic search.

Option 5: AI memory tools

Such as LangGraph memory, LlamaIndex memory, Memgpt. They try to make it easier for agents to store and retrieve.

Here is example of data that might be stored in memory:

{

"user_id": "123",

"fact": "User prefers concise python code",

"source": "conversation_turn_5",

"timestamp": "2026-02-09"

}

The mental model for AI memory

Short term memory

This is about recent interactions. It is data relevant to the current topic being discussed. For example, the user prefers conservative answers.

Long term memory

This is stored externally, perhaps even to persistent storage. It is retrieved and inserted into context selectively. For example, the user is a vegetarian or the user's risk tolerance is low.

Memory and the LLM

The LLM takes as input only messages. Agent has to read the data from memory and insert it into the text message. This is what they refer to as context.

You do not want add large amount of arbitrary data as context because:

text is converted to token and token cost spirals
LLM attention degrades with noise
Latency increases
Reasoning quality declines

Real Agentic Memory

At the start of the blog, I asked "is this just a regular cache ?".

To be useful in the agentic way, what is stored in the memory needs to evolve. Older or maybe irrelevant data in the memory needed to be "forgotten" or evicted based on intelligence (not standard algorithms like FIFO, LIFO etc). Updates and evictions need to happen based on recent interactions. If the historical information is too long and should not be evicted, it might need to be compressed.

Agentic systems require more dynamic memory evolution than typical CRUD applications. In the case of long running agents, the quality of data in the memory has to get better with interactions over time.

How exactly that can be implemented is beyond the scope of this blog and could be a topic for a future one.

Considerations

Memory != Raw History

Bad Use : Here are the last 47 conversations ......

Better Use : We were talking about my retirement goals with this income and number of years to retire.

Summarize and abstract to extract intelligence - as opposed to dumping large quantity of data.

In conclusion

AI memory is structured state, sometimes summarized that is retrieved when needed and included as LLM input as "context".

Conceptually, it is similar to RAG but they apply to different use cases.

Better and smaller contexts beat large contexts and large memory.

Agentic AI Memory adds value only when

The system changes behavior ( for the better ) because of it
It produces better response, explanations, reasonings
It saves time

These ideas are not purely theoretical. While building Vestra — an AI agent focused on personal financial planning and modeling — I’ve had to think deeply about what should be remembered, what should be abstracted, and what should be discarded. In financial reasoning especially, raw history is far less useful than structured, evolving state.

But yes, Agentic memory will be different than what we know as memory in regular apps — in the ways it is updated, evicted, and retrieved.

Saturday, September 13, 2025

What Does Adding AI To Your Product Even Mean?

Introduction

I have been asked this question multiple times: My management sent out a directive to all teams to add AI to the product. But I have no idea what that means ?

In this blog I discuss what adding AI actually entails, moving beyond the hype to practical applications and what are some things you might try.

At its core, adding AI to a product means using an AI model, either the more popular large language model (LLM) or a traditional ML model to either

predict answers
generate new data - text, image , audio etc

The effect of that is it enable the product to

do a better job of responding to queries
automate repetitive tasks
personalize responses
extract insights
Reduce manual labor

It's about making your product smarter, more efficient, and more valuable by giving it capabilities it didn't have before.

Any domain where there is a huge domain of published knowledge (programming, healthcare) or vast quantities of data (e-commerce, financial services, health, manufacturing etc), too large for the human brain to comprehend, AI has a place and will outperform what we currently do.

So how do you go about adding AI ?

Thanks to social media, AI has developed the aura of being super-complicated. But if reality, if you use off the shelf models, it is not that hard. Training models is hard. But 97% of us, will never have to do it. Below is a simple 5 step approach to adding AI to your system.

1. Requirements

It is really important that you nail down the requirement before proceeding any further. What task is being automated ? What questions are you attempting to answer ?

The AI solution will need to evaluated against this requirement. Not once or twice but on a continuous basis.

2. Model

Pick a model.

The recent explosion of interest in AI is largely due to Large Language Models (LLMs) like ChatGPT. At its core, the LLM is a text prediction engine. Give it some text and it will give you text that likely to follow.

But beyond text generation, LLMs have been been trained with a lot of published digital data and they retain associations between text. On top of it, they are trained with real world examples of questions and answers. For example, the reason they do such a good job at generating "programming code" is because they are trained with real source code from github repositories.

What model to use ?

The choices are:

Commercial LLMs like ChatGpt, Claude, Gemini etc
Open source LLMs like Llama, Mistral, DeepSeek etc
Traditional ML models

Choosing the right model can make a difference to the results. There might be a model specially tuned for your problem domain.

Cost, latency and accuracy are some parameters that are used to evaluate models.

3. Agent

Develop one or more agents.

Agent is the modern evolution of a service. Agent is the glue that ties the AI model to the rest of your system.

The Agent is the orchestration layer that:

Accepts requests either from a UI or another service
Makes requests to the model on behalf of your system
Makes multiple API calls to systems to fetch data
May search the internet
May save state to a database at various times
In the end, returns a response or start some process to finish a task

It is unlikely that you will develop a model. But it is very likely that you will develop one or more agents.

4. Data pipeline

Bring your data.

A generic AI model can only do so much. Even without additional training, just adding your data to the prompts can yield better results.

The data pipeline is what makes the data in your databases, logs, ticket systems, github, Jira etc available to the models and agents.

get the data from source
clean it
format it
transform it
use it in either prompts or to further train the model

5. Monitoring

Monitor, tune, refine.

Lastly you need to continuously monitor results to ensure quality. LLMs are known to hallucinate and even drift. When the results are not good, your will try tweaking the prompt data, model parameters among other things.

Now let us seem how these concepts translate into some very simple real-world applications across different industries.

Examples

1. Healthcare: Enhancing Diagnostics and Patient Experience

Adding AI can mean:

Personalized Treatment Pathways: An AI Agent can analyze vast amounts of research papers, clinical trial data, and individual patient responses to suggest the most effective treatment plan tailored to a specific patient's profile.
- Example: For a person with high cholesterol, an AI agent can come up with a personalized diet and exercise plan.

2. Finance: Personalized Investing

Adding AI could mean:

Personalized Financial Advice: Here, an AI Agent can serve as a "advisor" to offer highly tailored investment portfolios and financial planning advice.
- Example: A banking app's AI agent uses an LLM to understand your financial goals and then uses its "tools" to connect to your accounts, pull real-time market data, and recommend trades on your behalf. It can then use its LLM to explain in simple terms why it made a specific trade or rebalanced your portfolio.

3. E-commerce: Customer Experience

Adding AI could mean:

Personalized shopping: AI models can find the right product at the right price with the right characteristics for user requirement
- Example: Instead of me shopping and comparing for hours, AI does it for me and makes a recommendation on the final product to purchase.

In Conclusion

Adding AI to your product to make it better means using the proven power of AI models

To better answer customer request with insights
To automate repetitive time consuming task
To make predictions that were hard earlier
To gain insights into vast bodies of knowledge

The tools are there. But to get results you need discipline, patience and process.

Start small. Focus on one specific business problem you want to solve, and build from there.