Saturday, September 6, 2025

CRDT Tutorial: Conflict Free Replication Data Types

Have you ever wondered how Google docs, Figma, Notion provide real time collaborative editing?

The challenge is : What happens when 2 users edit the same part of the document at the same time. 

  • User A at position 5: types X
  • User B at position 5: types Y

This is a concurrency problem. A traditional implementation would need to lock the document to handle this. But that would destroy real-time responsiveness. There is a need to automatically resolve conflicts so that every one ends up with same document state.

In Google docs, CRDTs  are used to handle concurrent text edits, ensuring that if users insert text at the same position, the system is able to resolve the order without conflicts.





What is a CRDT?

CRDT stands for conflict free replication data type.

A CRDT is a specially designed data structure for distributed systems that:

  • Can be replicated across multiple nodes or regions.

  • Allows each replica to be updated independently and concurrently (without locks or central coordination).

  • Guarantees that all replicas will converge to the same state eventually, without conflicts, even if updates are applied in different orders.

Why do we need CRDTs?

In collaborative editing (like Google Docs, Notion, Figma):

  • Many users may edit the same document concurrently.

  • Network latency or partitions mean updates may arrive in different orders at different servers.

  • We can’t just “last-write-wins” — that would lose user edits.

  • We want low-latency local edits (user sees their change immediately), with eventual consistency across the system.

  • Typical in distributed systems

CRDTs give us a way to allow users to edit locally first and let the system reconcile changes without central locks.

Types of CRDTs

There are two broad families:

  1. State-based (Convergent CRDTs, CvRDTs)

    • Each replica occasionally sends its full state to others.

    • Merging = applying a mathematical "join" function (e.g., union, max).

  2. Operation-based (Commutative CRDTs, CmRDTs)

    • Each replica sends only the operations performed (e.g., "insert X at position 2").

    • These operations are designed so that applying them in any order yields the same final result.

Examples of CRDTs in Practice

  • G-Counter (Grow-only counter): Each replica increments a local counter, merge = element-wise max.

  • PN-Counter (Positive-Negative counter): Like G-counter, but supports increment & decrement.

  • G-Set (Grow-only set): Only supports adding elements.

  • OR-Set (Observed-Remove set): Supports add & remove without ambiguity.

  • RGA (Replicated Growable Array) or WOOT or LSEQ: For collaborative text editing, where inserts/deletes happen at positions in a string.

These are the basis for how real-time editors like Google Docs or Figma handle concurrent text/graphic editing.

Below is a simplistic Java implementation of a CRDT:

https://github.com/mdkhanga/blog-code/tree/master/general/src/main/java/com/mj/crdt

The code above provides a simple implementation of a G-counter that supports insert, update, delete and merges replicas by taking the maximum value for each node. It is a starting point to understand how CRDTs ensure convergence in distributed systems.

CRDT vs. Centralized Coordination

  • If concurrent editing is rare → a simple centralized lock/version check may be enough (like your first idea).

  • If concurrent editing is common (e.g., Figma boards with dozens of people) → you want CRDTs  to avoid merge conflicts.

In short:

A CRDT is a mathematically designed data structure that ensures all replicas in a distributed system converge to the same state without conflicts — perfect for real-time collaborative editing.

Note that this would be needed only for collaborative editing at scale in distributed systems. For anything else, it could be an overkill.

No comments:

Post a Comment