Monday, February 9, 2026

What is (Agentic) AI Memory ?

I have seen a lot of posts on X and LinkedIn on the importance of Agentic AI memory. What exactly is it ? Is it just another name for RAG ? Why is it different from any other application memory ? In this blog, I try to answer these questions.

What do people mean when they say "AI Memory" ?

Most production LLM interactions rely on external memory systems. Everything called “memory” today is mostly external.

At their core LLMs are stateless functions. You make a request with a prompt and some context data and it provides you with a response

In real systems, AI memory usually means:

Storing past interactions, user preferences, decisions, goals, or facts.
Retrieving relevant parts later
Feeding a compressed version back into the prompt

So yes — at its core:

Memory = save → retrieve → summarize → inject into context

Nothing magical. But is that all ? seems just like a regular cache ? Read on.

Is this just RAG (Retrieval Augmented Generation) ?

They are related but not the same.

RAG (Retrieval Augmented Generation)

Purpose:

Bring external knowledge into the LLM
Docs, PDFs, financial data, code, policies

Typical traits:

retrieval is stateless per query
Large text chunks
Query-driven retrieval
“What additional data can we provide to LLM to help answer this question?”

Agent / User Memory

Purpose:

Maintain continuity
Personalization
Learning user intent and preferences over time

Typical traits:

Long-lived
Highly structured
Small, distilled facts
“What can I provide to LLM so it remembers this user?”

Think of it this way:

They often use can use the same retrieval tools, but they serve different roles.

Where is the memory ?

Option 1: Agent process memory

Any suitable data structure like a HashMap.
Suitable for cases where the Agent loop is short and no persistence is needed.

Option 2: Redis /Cache

Suitable for session info, recent conversation history, tool results cache, temporary state.

Option 3: PostgreSQL/RDBMS

Suitable when you need durability, auditability, explainability.

Option 4: Vector databases

Suitable for semantic search.

Option 5: AI memory tools

Such as LangGraph memory, LlamaIndex memory, Memgpt. They try to make it easier for agents to store and retrieve.

Here is example of data that might be stored in memory:

{

"user_id": "123",

"fact": "User prefers concise python code",

"source": "conversation_turn_5",

"timestamp": "2026-02-09"

}

The mental model for AI memory

Short term memory

This is about recent interactions. It is data relevant to the current topic being discussed. For example, the user prefers conservative answers.

Long term memory

This is stored externally, perhaps even to persistent storage. It is retrieved and inserted into context selectively. For example, the user is a vegetarian or the user's risk tolerance is low.

Memory and the LLM

The LLM takes as input only messages. Agent has to read the data from memory and insert it into the text message. This is what they refer to as context.

You do not want add large amount of arbitrary data as context because:

text is converted to token and token cost spirals
LLM attention degrades with noise
Latency increases
Reasoning quality declines

Real Agentic Memory

At the start of the blog, I asked "is this just a regular cache ?".

To be useful in the agentic way, what is stored in the memory needs to evolve. Older or maybe irrelevant data in the memory needed to be "forgotten" or evicted based on intelligence (not standard algorithms like FIFO, LIFO etc). Updates and evictions need to happen based on recent interactions. If the historical information is too long and should not be evicted, it might need to be compressed.

Agentic systems require more dynamic memory evolution than typical CRUD applications. In the case of long running agents, the quality of data in the memory has to get better with interactions over time.

How exactly that can be implemented is beyond the scope of this blog and could be a topic for a future one.

Considerations

Memory != Raw History

Bad Use : Here are the last 47 conversations ......

Better Use : We were talking about my retirement goals with this income and number of years to retire.

Summarize and abstract to extract intelligence - as opposed to dumping large quantity of data.

In conclusion

AI memory is structured state, sometimes summarized that is retrieved when needed and included as LLM input as "context".

Conceptually, it is similar to RAG but they apply to different use cases.

Better and smaller contexts beat large contexts and large memory.

Agentic AI Memory adds value only when

The system changes behavior ( for the better ) because of it
It produces better response, explanations, reasonings
It saves time

These ideas are not purely theoretical. While building Vestra — an AI agent focused on personal financial planning and modeling — I’ve had to think deeply about what should be remembered, what should be abstracted, and what should be discarded. In financial reasoning especially, raw history is far less useful than structured, evolving state.

But yes, Agentic memory will be different than what we know as memory in regular apps — in the ways it is updated, evicted, and retrieved.

Thursday, January 15, 2026

Unique ID generation

Unique ID generation is a deceptively simple task in application development. While it can seem trivial in a monolithic system, getting it right in a distributed system presents a few challenges. In this blog, I explore the commonly used ID generation techniques, where they fail and how to improve on them.

Why are unique Ids needed in applications ?

They are needed to prevent duplicates. Say the system creates users or products or any other entity. You could use name at the key for uniqueness constraint. But there could be two users or products with the same name. For users, email is an option but that might not be available for other kinds of entities.

Sometimes you are consuming messages from other systems over platforms like Kafka. Processing the same message multiple times can lead to errors. Duplicate delivery can happen for a variety of reasons that can be out of control of the consumer. Senders therefore include a uniqueId with the message so that consumer can ignore ids already processed (Idempotency) .

They can be useful for ordering. Ordering is useful to determine which event happened before others. It can be useful if you want to query for the most recent events or ignore older events.

What should the size and form of an id be ?

Should it be a number or should it be alphanumeric? Should it be 32 bit, 64 bit or larger.

With 32 bit the maximum numeric id you can generate is ~ 4 billion. Sufficient for most systems but not enough if your product is internet scale. With 64 bit, you can generate id into the hundreds of trillions.

But size is not the only issue. It is not the case that ids will generated in sequence in one place one after the other. In any system of even moderate scale, the unique ids will need to be generated from multiple nodes.

On the topic of form, numerics ids are generally preferred as the take up less storage and can be easily ordered and indexed.

In the rest of the blog, I go over some unique id generation techniques I have come across.

ID GenerationTechniques

1. Auto increment feature of the database

If your services uses a database, this becomes an obvious choice as every database supports auto increment.

With postgresql, you would set up the table as

CREATE TABLE users (id SERIAL PRIMARY KEY, name TEXT not null);

With mysql,

CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) not null);

Every insert into the table generates an id in the id column.

INSERT INTO users(name) VALUES ('JOHN') ;
INSERT INTO users(name) VALUES ('MARY') ;

id | name |

-----+---------

1 | JOHN |

2 | MARY |

While this is simple to setup and use, it is appropriate for simple crud applications with low usage. The disadvantages are

It requires an insert to the db to get an id
If the insert fails for other reasons, you can have gaps
For distributed applications, it is a single point of failure

The single point of failure issue can be solved setting up multiple databases in a multi master arrangement as show below. In fact, here we are handing out ids in batches of hundred to reduce the load on the database.

2. UUID

These are 128 bit alpha numeric strings that are quite easy to generate and can be suitable for ids because they are unique. An example UUID is e3d6d0e9-b99f-4704-b672-b7f3cdaf5618

The advantages of UUID are:

easy to generate. Just add UUID generator to the service.
low probability of collision
Each replica will generate unique id

Disadvantages:

128 bits for each id will eventually take up space
No ordering. Ids are random

UUID v1 has a time component and could be sortable but is not unique enough. UUID v4 is completely random. UUID v7 has a 48 bit timestamp followed by 78 bits of random data. So if you are using UUID, you want to use v7 or higher.

3. Flickr ticket server

The Flickr ticket server is an improvement of the auto increment feature. In the example we have in 1, the id is tied to the users table. You get a new id only when you insert into the user table. if you needed unique ids for another entity, you would need to add autoincrement column to that table. But what if you needed unique ids in increasing order across tables or tables as would be the case in a distributed system ?

We could create a generic table

CREATE TABLE idgenerator (id SERIAL PRIMARY KEY);

This can work but it would keep accumulating rows of ids which would also be store elsewhere.

What they did at flicker was this

create table ticketserver (id int auto_increment primary key, stub char(1));

When they need an id , the do

replace into tickerserver(stub) values ('a') ;
select LAST_INSERT_ID();

This table always have only 1 row, because everytime you need an id, you are replacing the previous row. The above code is Mysql specific.

For SQL that will work on postgresql you can do it little differently

create table ticketserver( stub char(1) primary key, id bigint notnull default 0)

INSERT into ticketserver(stub,id)
VALUES('a', COALESE((SELECT ID from ticketserver where stub = 'a'),0)
ON CONFLICT(stub) DO UPDATE
SET id = ticketserver.id + 1
returning id

4. Snow flake twitter approach

This Twitter approach that does not rely on the database or any third party service. It can be generated in code just following the specification.

It is a 64 bit id.

The left most bit is a sign bit for future use.
The next 41 bits are used for the timestamp. Current time in ms since the epoch Nov 4 2010 1:42:54 UTC. But you can use any epoch.
The next 5 bits are for datacenter id - that gives you 2 power 5 or 32 data centers.
The next 5 bits are for machine id. 32 machines per datacenter.
The last 12 bits are a sequence id - giving you 2 power 12 = 4096 ids per ms ( per datacenter per machine)

The value of 2 power 41 - 1 is 2,199,023,255,551. With 41 bits for timestamp , you get a lot ids. This will last almost 70 years from epoch.

You can change things to reduce the size from 64 bits if needed. You may not need a datacenter id or you can decide to use fewer bits for the timestamp.

The advantages of this approach are

Decentralized generation. No dependency on DB. Lower latency
Time based ordering
Higher throughput
Datacenter id and machine id can help in debugging

The disadvantages are

clocks need to be synchronized
datacenters/machines need unique id
epoch needs to be choose wisely

Other considerations

Clock drift: In distributed systems where timestamp is part of the generated ID, you need to be aware of clock drift and take steps to mitigate it.

Sequential vs Monotonic: Most the time IDs need to be only monotonic, that is always increasing. They not need to be sequential.

Batching: If strictly increasing is not a requirement and you are using the database, you can reduce the load on the database by having the database handout ids to services in batches ( eg. 100 at a time)

Summary

ID generation evolves with your systems scale. When you are starting out, it is normal to keep things simple and go with auto increment. But sooner or later you will need to scale (good problem to have). But the Flicker and Twitter methods are solid. I personally like the Twitter approach as it has no dependency on the database. It offers an excellent balance of decentralization, ordering and efficiency but requires clock synchronization. Whatever approach you choose, you need to ensure that aligns with your systems consistency requirements, scaling needs and tolerance for operational complexity.

Reference

1. Flickr Ticket Server

2. Twitter Snowflake

Sunday, November 30, 2025

Review : Facebook Memcache Paper

Introduction

Facebook (Meta) application handles billions of requests per second. The responses are built using thousands of items that needs to retrieved at low latencies to ensure good user experience. Low latency is achieved by retrieving the items from a cache. Once their business grew, a single node cache like Memcached would obviously not be able to handle the load. This paper is about how they solved the problem by building a distributed cache on top of Memcached.

In 2025, some might say the paper is a little dated ( 2013). But I think it is still interesting for several reasons. It is one of the early papers from a time when cloud and distributed systems exploded. I see it in the same category as the Dynamo paper from Amazon. While better technology is available today, this paper teaches important concepts. More importantly the paper shows how to take technology available and make more out it. In this case, they took a single node cache and built a distributed cache on top of it.

This is my summary of the paper Scaling Memcache at Facebook.

Requirements at Facebook

- Allow near real time communication
- aggregate content on the fly
- access and update popular shared content
- scale to millions of user requests per second

The starting point was single node Memcached servers.
The target was a general purpose memory distributed key value store - called Memcache, that would be used by a variety of application use cases.

In the paper and in this blog, Memcached refers to the popular open source in memory key value store and Memcache is the distributed cached that Facebook built.

Observations:

Read volumes are several orders of magnitude higher than write volumes.
Data fetched from multiple sources - HDFS, MySql etc
Memcached supports simple primitives - set, get, delete

Details

The diagram below shows the Memcache architecture.

Memcache is a demand filled look aside cache. There can be thousands of memcached servers within a memcache cluster.

When an application needs to read data, it tries to get it from memcache. If not found in Memcache, it gets the data from the original source and populates Memcache.

When the application needs to write data, it writes to original source and the key in Memcache is invalidated.

Wide fan out: When the front end servers scale, the backend cache needs to scale too.

Items are distributed across Memcached servers using consistent hashing.

To handle a request, front end server might need to get data from many Memcached servers.
Front end servers use a Memcache client to talk to memcached servers. Client is either a library or a proxy called mcrouter. Memcached servers do not communicate with each other.

Invalidation of the Memcached key is done by code running on the storage. It is not done by the client.

Communication

UDP is used for get requests. (Surprised ? clearly an optimization). Direct from client in webserver to Memcached. Dropped requests are treated as a cache miss.
TCP via mcrouter used for set and delete requests. Using mcrouter helps manage connections to storage.
Client implements flow control to limit load on backend components

Leases

Leases were implemented to address stale sets and thundering herd. Stale sets are caused by updating the cache with invalid values. Requiring a lease lets the system check that update is valid.
Thundering herd happens when there is heavy read write activity on the same key at the same time. By handing out leases only every so often say 10 sec, they slow things down.

Memcache pools

This is a general purpose caching layer used by different applications, different workloads with different requirements. To avoid interference, the clusters servers are partitioned into pools. For example, a pool for keys that are accessed frequently and cannot tolerate cache miss. Another pool for infrequently accessed keys. Each pool can scaled separately depending on requirement.

Failures

For small failures, the requests are directed to a set dedicated backup servers called gutters. When a large number of servers in the cluster down, the entire cluster is considered offline and traffic is routed to another cluster.

Topology

A frontend cluster is a group of webservers with Memcache.
A region is frontend cluster plus storage.
This keeps the failure domains small.
Storage is the final source of truth. Use mcsql and mcrouter to invalidate cache.

When going across data centers and across geo regions, the storage master in one region replicates to replicas in other regions. On an update, when the Memcached needs to be invalidated by the storage, it is not a problem if the update is in the master region. But if the update is in the replica region, then the read after write might read a stale data as the replica might not have caught up. In replica regions, markers are used to ensure that only data in sync with the master is read.

Summary

In summary, this paper show how Facebook took a single node Memcached and scaled it to its growing needs. No new fundamental theories are applied or discussed. But this demonstrated innovation and trade offs in engineering to scale and grow in product in production that needs to scale to meet user demand.

Key point is that separation cache and storage allowed each to be scaled separately. Kept focus on monitoring, debugging and operational efficiency. Gradual rollout and rollback of features kept a running system running.

Some might say that Memcache is a hack. But some loss in architectural purity is worth it -- if your users and business stay happy.

Memcache and Facebook were developed together with application and system programmers working together to evolve and scale the system. This does not happen when teams work in isolation.

Sunday, October 5, 2025

Data Storage For Analytics And AI

For a small or medium sized company storing all the data in relational database like Postgresql or MySQL is sufficient. Perhaps if analytics is needed they might also use a columnar store, more like a data warehouse.

What if you have large amounts of unstructured data as well ? May be logs from your e-commerce site, emails, support logs etc. Those need to be queried, aggregated, summarized and reported as well.

If your business grows to handle large volumes of unstructured data—maybe logs from your e-commerce site, emails, support tickets, images, or customer audio—storing everything in a single RDBMS becomes impossible. These new data types require specialized architectures designed for scale, flexibility, and advanced analytics (like Machine Learning and Generative AI).

Here is a guide to the key data storage paradigms you will encounter:

This is a brief introduction to the options.

1. Relational Database Management Systems (RDBMS)

This needs no introduction.

Primary Use Case: Online Transaction Processing (OLTP). Applications requiring fast, frequent reads and writes, and ACID compliance.

Data Structure: Data is modeled at normalized rows and columns. Explicit relationships are enforced using foreign keys. In most cases storage is implemented as a B+ tree.

Schema Approach: The schema must be defined and enforced before data can be written.

Examples: PostgreSQL (Open Source), MySQL (Open Source/Commercial), Oracle, Microsoft SQL Server.

2. Data Warehouse (DW)

Primary Use Case: Online Analytical Processing (OLAP). Business Intelligence (BI), historical reporting, and generating complex aggregated reports across years of data.

Data Structure: Columnar data store. Data often denormalized into Star or Snowflake schemas to optimize large, analytical JOIN queries.

Schema Approach: Schema-on-Write: Data is cleaned, transformed, and structured via ETL/ELT pipelines before loading.

Examples: Snowflake, Google BigQuery, Amazon Redshift, Apache Pinot, Apache Druid, ClickHouse

3. Data Lake

Primary Use Case: Storing all data (raw and processed) at massive scale for Data Science, Machine Learning (ML), and exploratory analytics.

Data Structure: Stores data in its native, raw format—structured, semi-structured (JSON, XML), and unstructured (logs, images, audio).

Schema Approach: Schema-on-Read: Structure is applied dynamically by the query engine when the data is read. This offers maximum flexibility.

Examples: Amazon S3 (storage), Azure Data Lake Storage (ADLS), Apache Hadoop, Delta Lake, Apache Hudi

4. Data Lakehouse

Primary Use Case: Unifying the scale and flexibility of a Data Lake with the reliability and performance of a Data Warehouse.

Data Structure: Hybrid: Stores all raw data in the lake but adds a metadata and transaction layer (e.g., Delta Lake) to enforce quality and provide table-like features.

Schema Approach: Hybrid: Allows Schema-on-Read for raw ingestion while enforcing Schema Enforcement and ACID transactions for curated tables.

Example: Databricks, Apache Iceberg

5. NOSQL Database

Primary Use Case: High-volume, dynamic, operational use cases where schemas change frequently and extreme horizontal scaling is needed (e.g., user profiles, content management).

Data Structure: Varies (Document, Key-Value, Graph, Wide-Column). Data is often stored as flexible records or objects without strict relationships.

Schema Approach:Schema-less or Dynamic Schema: Structure can evolve on a per-document basis without downtime.

Example: MongoDB (Document), Redis (Key-Value/Cache), Apache Cassandra (Wide-Column), Neo4j (Graph).

6. Vector Database

Given the rise of LLMs and Generative AI, this is one more specialized option critical for working with unstructured data:

This is designed to store and index vector embeddings—numerical representations of unstructured data (text, images, audio) created by AI models. They allow for similarity search (finding "like" data) rather than exact keyword matches.

Primary Use Case: Retrieval-Augmented Generation (RAG), semantic search, recommendation engines, and high-dimensional ML applications.

Example: Pinecone (Commercial), Weaviate (Open Source/Commercial), Qdrant.

Summary

All of these options, from the structured RDBMS to the fluid Vector DB, combine to form a modern enterprise data architecture.

In essence, the modern enterprise no longer relies on a single data storage solution. The journey usually starts with the RDBMS for transactional integrity, moves to the Data Warehouse for structured BI, and expands into the Data Lake to capture all raw, unstructured data necessary for Machine Learning and discovery.

The Data Lakehouse is the cutting-edge step, unifying these functions by bringing governance and performance directly to the lake. Vector Databases bridge the gap between unstructured data and the world of Generative AI.

Understanding the specialized role of each platform is the first and most critical step in designing a future-proof data strategy that extracts maximum value from every piece of information your business creates.

Note that there is some overlap between the categories. For example Postgresql supports JSONB and vector storage, making it useful for some NoSql and AI use cases. Some products that started of as data lakes added features to become lakehouses.

Saturday, September 13, 2025

What Does Adding AI To Your Product Even Mean?

Introduction

I have been asked this question multiple times: My management sent out a directive to all teams to add AI to the product. But I have no idea what that means ?

In this blog I discuss what adding AI actually entails, moving beyond the hype to practical applications and what are some things you might try.

At its core, adding AI to a product means using an AI model, either the more popular large language model (LLM) or a traditional ML model to either

predict answers
generate new data - text, image , audio etc

The effect of that is it enable the product to

do a better job of responding to queries
automate repetitive tasks
personalize responses
extract insights
Reduce manual labor

It's about making your product smarter, more efficient, and more valuable by giving it capabilities it didn't have before.

Any domain where there is a huge domain of published knowledge (programming, healthcare) or vast quantities of data (e-commerce, financial services, health, manufacturing etc), too large for the human brain to comprehend, AI has a place and will outperform what we currently do.

So how do you go about adding AI ?

Thanks to social media, AI has developed the aura of being super-complicated. But if reality, if you use off the shelf models, it is not that hard. Training models is hard. But 97% of us, will never have to do it. Below is a simple 5 step approach to adding AI to your system.

1. Requirements

It is really important that you nail down the requirement before proceeding any further. What task is being automated ? What questions are you attempting to answer ?

The AI solution will need to evaluated against this requirement. Not once or twice but on a continuous basis.

2. Model

Pick a model.

The recent explosion of interest in AI is largely due to Large Language Models (LLMs) like ChatGPT. At its core, the LLM is a text prediction engine. Give it some text and it will give you text that likely to follow.

But beyond text generation, LLMs have been been trained with a lot of published digital data and they retain associations between text. On top of it, they are trained with real world examples of questions and answers. For example, the reason they do such a good job at generating "programming code" is because they are trained with real source code from github repositories.

What model to use ?

The choices are:

Commercial LLMs like ChatGpt, Claude, Gemini etc
Open source LLMs like Llama, Mistral, DeepSeek etc
Traditional ML models

Choosing the right model can make a difference to the results. There might be a model specially tuned for your problem domain.

Cost, latency and accuracy are some parameters that are used to evaluate models.

3. Agent

Develop one or more agents.

Agent is the modern evolution of a service. Agent is the glue that ties the AI model to the rest of your system.

The Agent is the orchestration layer that:

Accepts requests either from a UI or another service
Makes requests to the model on behalf of your system
Makes multiple API calls to systems to fetch data
May search the internet
May save state to a database at various times
In the end, returns a response or start some process to finish a task

It is unlikely that you will develop a model. But it is very likely that you will develop one or more agents.

4. Data pipeline

Bring your data.

A generic AI model can only do so much. Even without additional training, just adding your data to the prompts can yield better results.

The data pipeline is what makes the data in your databases, logs, ticket systems, github, Jira etc available to the models and agents.

get the data from source
clean it
format it
transform it
use it in either prompts or to further train the model

5. Monitoring

Monitor, tune, refine.

Lastly you need to continuously monitor results to ensure quality. LLMs are known to hallucinate and even drift. When the results are not good, your will try tweaking the prompt data, model parameters among other things.

Now let us seem how these concepts translate into some very simple real-world applications across different industries.

Examples

1. Healthcare: Enhancing Diagnostics and Patient Experience

Adding AI can mean:

Personalized Treatment Pathways: An AI Agent can analyze vast amounts of research papers, clinical trial data, and individual patient responses to suggest the most effective treatment plan tailored to a specific patient's profile.
- Example: For a person with high cholesterol, an AI agent can come up with a personalized diet and exercise plan.

2. Finance: Personalized Investing

Adding AI could mean:

Personalized Financial Advice: Here, an AI Agent can serve as a "advisor" to offer highly tailored investment portfolios and financial planning advice.
- Example: A banking app's AI agent uses an LLM to understand your financial goals and then uses its "tools" to connect to your accounts, pull real-time market data, and recommend trades on your behalf. It can then use its LLM to explain in simple terms why it made a specific trade or rebalanced your portfolio.

3. E-commerce: Customer Experience

Adding AI could mean:

Personalized shopping: AI models can find the right product at the right price with the right characteristics for user requirement
- Example: Instead of me shopping and comparing for hours, AI does it for me and makes a recommendation on the final product to purchase.

In Conclusion

Adding AI to your product to make it better means using the proven power of AI models

To better answer customer request with insights
To automate repetitive time consuming task
To make predictions that were hard earlier
To gain insights into vast bodies of knowledge

The tools are there. But to get results you need discipline, patience and process.

Start small. Focus on one specific business problem you want to solve, and build from there.

Saturday, September 6, 2025

CRDT Tutorial: Conflict Free Replication Data Types

Have you ever wondered how Google docs, Figma, Notion provide real time collaborative editing?

The challenge is : What happens when 2 users edit the same part of the document at the same time.

User A at position 5: types X
User B at position 5: types Y

This is a concurrency problem. A traditional implementation would need to lock the document to handle this. But that would destroy real-time responsiveness. There is a need to automatically resolve conflicts so that every one ends up with same document state.

In Google docs, CRDTs are used to handle concurrent text edits, ensuring that if users insert text at the same position, the system is able to resolve the order without conflicts.

What is a CRDT?

CRDT stands for conflict free replication data type.

A CRDT is a specially designed data structure for distributed systems that:

Can be replicated across multiple nodes or regions.
Allows each replica to be updated independently and concurrently (without locks or central coordination).
Guarantees that all replicas will converge to the same state eventually, without conflicts, even if updates are applied in different orders.

Why do we need CRDTs?

In collaborative editing (like Google Docs, Notion, Figma):

Many users may edit the same document concurrently.
Network latency or partitions mean updates may arrive in different orders at different servers.
We can’t just “last-write-wins” — that would lose user edits.
We want low-latency local edits (user sees their change immediately), with eventual consistency across the system.
Typical in distributed systems

CRDTs give us a way to allow users to edit locally first and let the system reconcile changes without central locks.

Types of CRDTs

There are two broad families:

State-based (Convergent CRDTs, CvRDTs)
- Each replica occasionally sends its full state to others.
- Merging = applying a mathematical "join" function (e.g., union, max).
Operation-based (Commutative CRDTs, CmRDTs)

Each replica sends only the operations performed (e.g., "insert X at position 2").
These operations are designed so that applying them in any order yields the same final result.

Examples of CRDTs in Practice

G-Counter (Grow-only counter): Each replica increments a local counter, merge = element-wise max.
PN-Counter (Positive-Negative counter): Like G-counter, but supports increment & decrement.
G-Set (Grow-only set): Only supports adding elements.
OR-Set (Observed-Remove set): Supports add & remove without ambiguity.
RGA (Replicated Growable Array) or WOOT or LSEQ: For collaborative text editing, where inserts/deletes happen at positions in a string.

These are the basis for how real-time editors like Google Docs or Figma handle concurrent text/graphic editing.

Below is a simplistic Java implementation of a CRDT:

https://github.com/mdkhanga/blog-code/tree/master/general/src/main/java/com/mj/crdt

The code above provides a simple implementation of a G-counter that supports insert, update, delete and merges replicas by taking the maximum value for each node. It is a starting point to understand how CRDTs ensure convergence in distributed systems.

CRDT vs. Centralized Coordination

If concurrent editing is rare → a simple centralized lock/version check may be enough (like your first idea).
If concurrent editing is common (e.g., Figma boards with dozens of people) → you want CRDTs to avoid merge conflicts.

In short:

A CRDT is a mathematically designed data structure that ensures all replicas in a distributed system converge to the same state without conflicts — perfect for real-time collaborative editing.

Note that this would be needed only for collaborative editing at scale in distributed systems. For anything else, it could be an overkill.

Saturday, August 30, 2025

Cache in front of a slow database ?

Should You Front a Slow Database with a Cache?

Most of us have been there: a slow database query is dragging down response times, dashboards are red, and someone says, “Let’s put Redis in front of it.”

I have done it myself for an advertising system that needed response times of less than 30 ms. It worked very well.

It’s a tried-and-true trick. Caching can take a query that costs hundreds of milliseconds and make it return in single-digit milliseconds. It reduces load on your database and makes your system feel “snappy.” But caching isn’t free — it introduces its own problems that engineers need to be very deliberate about.

Good Use Cases for Caching

Read-heavy workloads
When the same data is read far more often than it’s written. For example, product catalogs, user profiles, or static metadata.
Expensive computations
Search queries, aggregated analytics, or personalized recommendations where computing results on the fly is costly.
Burst traffic
Handling sudden spikes (sales events, sports highlights, viral posts) where the database alone cannot keep up.
Low latency requirements
Some systems have low latency requirements. Clients need a response is say less than 50 ms or client aborts.

The Catch: Cache Consistency

The hardest part of caching isn’t adding Redis or Memcached — it’s keeping the cache in sync with the database.

Here are the main consistency issues you’ll face:

Stale Data
If the cache isn’t updated when the database changes, users may see outdated results.
Example: A user updates their shipping address, but the checkout flow still shows the old one because it’s cached.
Cache Invalidation
The classic hard problem: When do you expire cache entries? Too soon → database load spikes. Too late → users see stale values.
Race Conditions
Writes may hit the database while another process is still serving old cache data. Without careful ordering, you risk “losing” updates.

Common Strategies

Cache Aside (Lazy Loading)
Application checks cache → if miss, fetch from DB → populate cache.
✅ Simple, common.
❌ Risk of stale data unless you also invalidate on updates.
Write-Through
Writes always go through the cache → cache updates DB.
✅ Consistency is better.
❌ Higher write latency, more complexity.
Write-Behind
Writes update the cache, and DB updates happen asynchronously.
✅ Fast writes.
❌ Risk of data loss if cache fails before DB is updated.
Time-to-Live (TTL)
Expire cache entries after a set period.
✅ Easy safety net.
❌ Not precise; stale reads possible until expiry.

So, Is It Worth It?

If your workload is read-heavy, latency-sensitive, and relatively tolerant of eventual consistency, caching is usually a big win.

But if your workload is write-heavy or requires strict consistency (think payments, inventory, or medical records), caching can create more problems than it solves.

The lesson: don’t add Redis or Memcached just because they’re shiny tools. Add them because you’ve carefully measured your system, know where the bottleneck is, and can live with the consistency trade-offs.

Takeaway:
Caching is like nitrous oxide for your system — it can make things blazing fast, but you need to handle it with care or you’ll blow the engine.

Thursday, August 28, 2025

The Unsung Heroes Behind Your AI Coding Assistant

While everyone's talking about ChatGPT and tools like Cursor, Windsurf, and GitHub Copilot transforming how we code, let's shine a light on the specialized models that actually power these coding experiences.

Meet the Code Generation Champions:

StarCoder - Trained on 80+ programming languages from GitHub repos, this open-source model excels at code completion and generation

CodeT5 - Google's encoder-decoder model that understands code structure and can translate between languages

InCoder - Meta's bidirectional model that can fill in code gaps, not just complete from left to right

CodeGen - Salesforce's autoregressive model trained on both natural language and code

Codex (OpenAI) - The foundation behind GitHub Copilot, though now evolved into GPT-4 variants

What makes these different from general LLMs?

Trained on massive code repositories (billions of lines)
Understand syntax, semantics, and programming patterns
Can maintain context across entire codebases
Specialized in code-specific tasks like debugging, refactoring, and documentation

The magic isn't just in having "AI that codes" - it's in having models that truly understand the intricacies of software development. They aren’t just regurgitating text—they’re tuned for the nuances of programming, which makes them invaluable for developers. These specialized architectures are why your AI assistant can suggest that perfect function name or catch that subtle bug you've been hunting for hours.

The real game-changer? Most of these models are open-source, democratizing access to powerful coding assistance beyond just the big tech companies.

Sunday, August 24, 2025

JDK 21 Virtual threads: The end of regular threads ? Not quite.

A question I get asked all the time: If JDK 21 supports virtual threads, do I ever need to use regular threads ?

Java 21 brought us virtual threads, a game-changer for writing highly concurrent applications. Their lightweight nature and massive scalability are incredibly appealing. It's natural to wonder: do we even need regular platform (OS) threads anymore?

While virtual threads are fantastic for many I/O-bound workloads, there are still scenarios where platform threads remain relevant. Here's why:

1. CPU-Bound Tasks:

Virtual threads yield the carrier thread when they perform blocking I/O operations. However, for purely CPU-bound tasks, they don't offer a significant advantage over platform threads in terms of raw processing power. In fact, the context switching involved might introduce a tiny bit of overhead.

Consider a computationally intensive task like calculating factorials:

Virtual threads example:


// A CPU-intensive task
Runnable cpuBoundTask = () -> {
    long result = 1;
    for (int i = 1; i <= 10000; i++) {
        result *= i;
    }
    System.out.println("Virtual thread task finished.");
};

// Start a virtual thread for the task
Thread.startVirtualThread(cpuBoundTask);

Platform threads example:


Runnable cpuBoundTask = () -> {
    long result = 1;
    for (int i = 1; i <= 10000; i++) {
        result *= i;
    }
    System.out.println("Platform thread task finished.");
};

// Start a regular platform thread
new Thread(cpuBoundTask).start();

For sustained CPU-bound work, managing a smaller pool of platform threads might still be a more efficient approach to leverage the underlying hardware.

2. Integration with Native Code and External Libraries:

Some native libraries or older Java APIs might have specific requirements or behaviors when used with threads. Virtual threads, being a newer abstraction, might not be fully compatible or optimally performant with all such integrations. Platform threads, being closer to the operating system's threading model, often provide better compatibility in these scenarios.

3. Thread-Local Variables with Care:

While virtual threads support thread-local variables, their potentially large number can lead to increased memory consumption if thread-locals are heavily used and store significant data. With platform threads, you typically have a smaller, more controlled number of threads, making it easier to reason about thread-local usage. However, it's crucial to manage thread-locals carefully in both models to avoid memory leaks.

4. Profiling and Debugging:

The tooling around thread analysis and debugging is more mature for platform threads. While support for virtual threads is rapidly improving, there might be cases where existing profiling tools offer more in-depth insights for platform threads.

5. Backward compatibility

If you want you library or server to be available to users who are on JDKs earlier than JDK21, then you have no choice but to use regular threads. Virtual threads are not just a new library; they are a fundamental change to the Java Virtual Machine's threading model (part of Project Loom). The underlying code that manages and schedules virtual threads on top of carrier threads is not present in older JVMs. This can be one of the most important reasons for using platform threads.

In Conclusion:

Virtual threads are a powerful addition to the Java concurrency landscape and will undoubtedly become the default choice for many concurrent applications, especially those with high I/O. However, platform threads still have their place, particularly for CPU-bound tasks, legacy integrations, and situations requiring fine-grained control over thread management.

Understanding the nuances of both models will allow you to make informed decisions and build more efficient and robust Java applications.

Sunday, May 4, 2025

Understanding Isolation levels vs Consistency levels

In databases, the terms isolation level and consistency level/model are sometimes used interchangeably. "Read repeatable" and "Serializable" are well known isolation levels. But "Strict Serializable" and "Linearizable" are consistency terms.

If you have used Mysql or Postgresql, you know probably know what an isolation levels like "Read repeatable" or "Serializable" means. But when you work on a distributed database you hear about consistency level much more.

The first time I heard about consistency level was when I worked with Apache Cassandra which claimed to only support "eventual consistency". A few years ago when my company was evaluating distributed databases, we had a few architects that insisted that we needed a database that support "strict serializability". CockroachDB was a database that supported this consistency level.

If you are confused, read long. I wrote this blog in attempt to clear up my confusion.

So far, the best explanations on this topic that I found are by Daniel J Abadi [2] [3]. Kyle Kingsbury @Jepson [1] has good descriptions of the topic as well.

But first, a clarification on what consistency means.

What is consistency ?

Consistency is an overloaded term and its meaning has changed in recent times.

ACID consistency

The database must preserve its internal correctness rules after every transaction.

Consider a banking database with a constraint account_balance > 0.

If the starting account_balance is 50 and a transaction tried to deduct 100, that is a violation of that constraint and should fail.

This is the C is ACID. Databases support constraints to ensure this. But it is mainly the responsibility of the application programmer. It is well understood and rarely discussed these days.

Distributed systems consistency

The system must ensure that all nodes (or clients) agree on the same view of data.

Make the distributed system feel like a single threaded single node system. Read of a value any where in the system produces the same result [2]. The result returned is the most recently written value no matter where it was written.

Consider a system with multiple nodes. X was 1. The value X=2 is written to one node and replicated to others. If clients read from the replicas. Do they all see X=2 immediately ? With strict serializable consistency level , the answer is yes. With weaker models, it is possible they read a an older value.

Most of us first heard of this description from the CAP theorem.

Why the difference ?

Both describe behavior under concurrency.

Isolation levels describe problems that occurs in single node databases when transactions execute concurrently. At the highest isolation level transactions execute in some order. Each transaction executes as if it were alone.

In distributed systems there is network latency, replication and partitioning, all contributing latency and timing issues to concurrency issues. Consistency approaches concurrency issues taking time and latency into account as well. At the highest consistency level, transactions execute in order of their order of completion (commit) in real time.

Serializable is the strictest isolation level. Strict serializability is the strictest consistency model. In a single node system, there is very little difference between the two because the time issues are small.

Isolation Levels vs. Consistency Models

To summarize the key differences.

Isolation Levels

Prevent read, writes of uncommitted data.
Prevent anomalies like read uncommitted, non repeatable reads, phantom reads
Focus on managing concurrent access to data while balancing performance and correctness.
Common isolation levels (from weakest to strongest):
- Read Uncommitted
- Read Committed
- Repeatable Read
- Serializable — the strictest standard defined by the ANSI SQL standard.
Old blog

Consistency Levels

Typically relevant distributed databases.
Time is a factor
They describe the guarantees about visibility and ordering of updates in a distributed, replicated data system.
They focus on the behavior perceived by clients across multiple nodes or replicas.
Examples include:

Strict serializability
Linearizability
causal consistency

Example to Illustrate the Difference:

Scenario:

Two accounts A and B initially have a balance of 100 each.
Two concurrent transactions:
- Tx1: Transfer 50 from A to B.
- Tx2: Reads balances of A and B and sums them

Isolation level Serializable:

Tx1 and Tx2 are serialized, and the sum read by Tx2 is 200.
(Tx1, Tx2) and (Tx2, Tx1) are valid orders irrespective of when each actually committed first.

Consistency level Strict Serializable

If Tx1 commits before Tx2 starts, Tx2 must see all effects of Tx1. The only valid order is (Tx1, Tx2)
However if there is some overlap like if Tx1 commits after Tx2 starts, then both orders (Tx1, Tx2) and (Tx2, Tx1) can be valid. Reason is that Tx2 cannot read the data committed by Tx1

A few descriptions

Let us briefly touch on some levels you will encounter often. For more detailed descriptions, I will refer you to https://jepsen.io/consistency [1]

Serializability

Transactions occur in some total order. Even though they may actually execute concurrently, it appears as if they execute one after another. While serializable will prevent non repeatable reads and phantom reads, It will allow "time travel" anomalies as shown in the example above. It can appear that Tx2 happened before Tx1, even though in reality it was the other way around.

Strict Serializability

Transactions occur in a strict order that is consistent with the real time (clock time) order in which transactions occur. It applies to the entire system encompassing multiple objects. A is before B in the order if A commits before B begins. So the only valid order is (A,B). However if A commits after B begin, then both orders (A, B) and (B, A) are valid.

Linearizable

Transactions occur in a strict order that is consistent with the real time (clock time) order in which transactions occur. But this applies to a single object not to the entire dataset. Definition of a single object varies. Could be a key or a table. [1]

Most of the time concurrency issues are important when multiple threads touch the same data and that why this model is also as important as strict serializability.

Causal Consistency

Transaction that are causally related are seen by all nodes in the same order, while concurrent (unrelated) operations may be seen in different orders. In a social media application, a user making a post and another user liking the post are causally related. The like must be seen only after the post is seen. However, it is ok for a unrelated post that happened after the previous post to be seen before that.

Conclusion

It is all about how systems behave under concurrency.

Isolation levels deal with how transactions behave when they run at the same time, while consistency models talk about how different nodes in a distributed system agree on data. And "consistency" itself has changed over time, from enforcing business rules in ACID databases to ensuring replicas don't drift apart in distributed ones.

Database vendors advertise the consistency level they support as a key feature. That is why it is important we understand what it means and ensure that we pick the right database the fits our needs.

References

1. https://jepsen.io/consistency

2. Introduction to consistency levels , Daniel J Abadi

3. Isolation levels vs Consistency levels, Daniel JAbadi

Monday, February 9, 2026

What do people mean when they say "AI Memory" ?

In real systems, AI memory usually means:

Is this just RAG (Retrieval Augmented Generation) ?

Agent / User Memory

Where is the memory ?

The mental model for AI memory

Memory and the LLM

Real Agentic Memory

Considerations

In conclusion

Thursday, January 15, 2026

Why are unique Ids needed in applications ?

What should the size and form of an id be ?

ID GenerationTechniques

1. Auto increment feature of the database

2. UUID

3. Flickr ticket server

4. Snow flake twitter approach

Other considerations

Summary

Reference

Sunday, November 30, 2025

Introduction

Requirements at Facebook

Details

Communication

Leases

Memcache pools

Failures

Topology

Summary

Sunday, October 5, 2025

1. Relational Database Management Systems (RDBMS)

2. Data Warehouse (DW)

3. Data Lake

4. Data Lakehouse

5. NOSQL Database

6. Vector Database

Summary

Saturday, September 13, 2025

Introduction

So how do you go about adding AI ?

1. Requirements

2. Model

3. Agent

4. Data pipeline

5. Monitoring

Examples

1. Healthcare: Enhancing Diagnostics and Patient Experience

2. Finance: Personalized Investing

3. E-commerce: Customer Experience

In Conclusion

Saturday, September 6, 2025

What is a CRDT?CRDT stands for conflict free replication data type.

Why do we need CRDTs?

Types of CRDTs

Examples of CRDTs in Practice

Saturday, August 30, 2025

Should You Front a Slow Database with a Cache?

Good Use Cases for Caching

The Catch: Cache Consistency

Common Strategies

So, Is It Worth It?

Thursday, August 28, 2025

Sunday, August 24, 2025

Sunday, May 4, 2025

What is consistency ?

ACID consistency

Distributed systems consistency

Why the difference ?

Isolation Levels vs. Consistency Models

Isolation Levels

Consistency Levels

Example to Illustrate the Difference:

Scenario:

A few descriptions

Serializability

Strict Serializability

Linearizable

What is a CRDT?

CRDT stands for conflict free replication data type.