Friday, January 9, 2015

Apache Kafka JAVA tutorial #3: Once and only once delivery

In Apache Kafka introduction, I provided an architectural overview on the internet scale messaging broker. In JAVA tutorial 1, we learnt how to send and receive messages using the high level consumer API. In JAVA tutorial 2, We examined partition leaders and metadata using the lower level Simple consumer API.

A key requirement of many real world messaging applications is that a message should be delivered once and only once to a consumer. If you have used the traditional JMS based message brokers, this is generally supported out of the box, with no additional work from the application programmer. But Kafka has distributed architecture where the messages to a topic are partitioned for scalability and replicated for fault tolerance and hence the application programmer  has to do a little more to ensure once and only once delivery.

Some key features of the Simple Consumer API are:
  • To fetch a message, you need to know the partition and partition leader.
  • You can read messages in the partition several times.
  • You can read from the first message in the partition or from a known offset.
  • With each read, you are returned an offset where the next read can happen.
  • You can implement once and only once read, by storing the offsets with the message that was just read, thereby making the read transactional. In the event of a crash, you can recover because you know what message was last read and where the next one should be read.
  • Not covered in this tutorial, but the API lets you determine how many partitions there are for a topic and who the leader for each partition is. While fetching message, you connect to the leader. Should a leader go down, you need to fail over by determining who the new leader is, connect to it and continue consuming messages
For this tutorial you will need

(1) Apache Kafka 0.8.1
(2) Apache Zookeeper
(3) JDK 7 or higher. An IDE of your choice is optional
(4) Apache Maven
(5) Source code for this sample from https://github.com/mdkhanga/my-blog-code if you want to look at working code

In this tutorial, we will
(1) start a Kafka broker
(2) create a topic with 1 partition
(3) Send a messages to the topic
(4) Write a consumer using Simple API to fetch messages.
(5) Crash the consumer and restart it ( several times). Each time you will see that it reads the next message after the last one that was read.

Since we are focusing of reading messages from a particular offset in a partition, we will keep other things simple by limiting ourselves to 1 broker and 1 partition.

Step 1: Start the broker


bin/kafka-server-start.sh config/server1.properties

For the purposes of this tutorial, one broker is sufficient as we are reading from just one partition.

Step 2: Create the topic

bin/kafka-topics.sh --create --zookeeper localhost:2181 --partitions 1 --topic atopic

Again for the purposes of this tutorial we just need 1 partition.

Step 3: Send messages to the topic

Run the producer we wrote in tutorial 1 to send say 1000 messages to this topic.

Step 4: Write a consumer using SimpleConsumer API

The complete code is in the file KafkaOnceAndOnlyOnceRead.java.

Create a file to store the next read offset. 

static {
    try {
      readoffset = new RandomAccessFile("readoffset", "rw");
    } catch (Exception e) {
      System.out.println(e);
    }

}

Create a SimpleConsumer.

SimpleConsumer consumer = new SimpleConsumer("localhost", 9092, 100000, 64 * 1024, clientname);

If there is a offset stored in the file, we will read from the offset. Otherwise, we read from the beginining of the partition -- EarliestTime.

 long offset_in_partition = 0 ;
    try {
      offset_in_partition = readoffset.readLong();
    } catch(EOFException ef) {
      offset_in_partition =     getOffset(consumer,topic,partition,kafka.api.OffsetRequest.EarliestTime(),clientname) ;
    }


The rest of the code is in a

while (true) {

}

loop. We will keep reading messages or sleep if there are none.

Within the loop, we create a request and fetch messages from the offset.

FetchRequest req = new FetchRequestBuilder()
          .clientId(clientname)
          .addFetch(topic, partition, offset_in_partition, 100000).build();
FetchResponse fetchResponse = consumer.fetch(req);


Read messages from the response.

for (MessageAndOffset messageAndOffset : fetchResponse.messageSet(topic, partition)) {
        long currentOffset = messageAndOffset.offset();
        if (currentOffset < offset_in_partition) {
          continue;
        }
        offset_in_partition = messageAndOffset.nextOffset();
        ByteBuffer payload = messageAndOffset.message().payload();

        byte[] bytes = new byte[payload.limit()];
        payload.get(bytes);
        System.out.println(String.valueOf(messageAndOffset.offset()) + ": " + new String(bytes, "UTF-8"));
        readoffset.seek(0);
        readoffset.writeLong(offset_in_partition);
        numRead++;
        messages++ ;

        if (messages == 10) {
          System.out.println("Pretend a crash happened") ;
          System.exit(0);
        }
  }


For each message that we read, we check that the offset is not less than the one we want to read from. If it is, we ignore the message. For efficiency, Kafka batches messages. So you can get messages already read. For each valid message, we print it and write the next read offset to the file. If the consumer were to crash, when restarted, it would start reading from the last saved offset.

For demo purposes, the code exits after 10 messages. If you run this program several times, you will see that it starts reading exactly from where it last stopped. You can change that value and experiment.

Step 5: Run the consumer several times.

mvn exec:java -Dexec.mainClass="com.mj.KafkaOnceAndOnlyOnceRead"

210: 04092014 This is message 211
211: 04092014 This is message 212
212: 04092014 This is message 213
213: 04092014 This is message 214
214: 04092014 This is message 215
215: 04092014 This is message 216
216: 04092014 This is message 217
217: 04092014 This is message 218
218: 04092014 This is message 219
219: 04092014 This is message 220


run it again 

mvn exec:java -Dexec.mainClass="com.mj.KafkaOnceAndOnlyOnceRead"

220: 04092014 This is message 221
221: 04092014 This is message 222
222: 04092014 This is message 223
223: 04092014 This is message 224
224: 04092014 This is message 225
225: 04092014 This is message 226
226: 04092014 This is message 227
227: 04092014 This is message 228
228: 04092014 This is message 229
229: 04092014 This is message 230


In Summary, it is possible to implement one and only once delivery of messages in Kafka by storing the read offset.

Related Blogs:

Apache Kafka Introduction
Apache Kafka JAVA tutorial #1
Apache Kafka JAVA tutorial #2 
Apache Kafka 0.8.2 New Producer API