July 21st, 2016
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: firstname.lastname@example.org
Not mentioned below is that RabbitMQ works hard to guarantee delivery of a message, so it is slow, but that is because it is in some ways doing more than Kafka.
I created 4 queues, wrote a ruby client and started inserting messages. I got a publishing rate of about 20k/s using multiple threads but I got a few stalls caused by the vm_memory_high_watermark, from my understanding during those stalls it writing to disk. Not exactly awesome given my requirements. Also, some part is always kept in memory even if a queue is durable so, even though I had plenty of disk space, the memory usage grew and eventually hit the vm_memory_high_watermark setting. The cpu load was pretty high during the load, between 40% and 50% on an 8 cores VM.
Even though my requirements were not met, I setup a replicated queue on 2 nodes and inserted a few millions objects. I killed one of the two nodes and insert were even faster but then… I did a mistake. I restarted the node and asked for a resync. Either I didn’t set it correctly or the resync is poorly implemented but it took forever to resync and it was slowing down as it progressed. At 58% done, it has been running for 17h, one thread at 100%. My patience was exhausted.
Sometimes you look at a technology and you just say: wow, this is really done the way it should be. At least I could say that for the purpose I had. What is so special about Kafka is the architecture, it stores the messages in flat files and consumers ask messages based on an offset.
…Feature wise Kafka, isn’t that great. There’s no web frontend builtin although a few are available in the ecosystem. Routing and rules are inexistent and stats are just with JMX. But, the performance… I reached a publishing speed of 165k messages/s over a single thread, I didn’t bother tuning for more. Consuming was essentially disk bound on the server, 3M messages/s… amazing. That was without Zookeeker coordination. Memory and cpu usage were modest.
Kestrel is very simple, queues are defined in a configuration file but you can specify, per queue, storage limits, expiration and behavior when limits are reached. With a setting like “discardOldWhenFull = true”, my requirement of never blocking the publishers is easily met.
In term of clustering Kestrel is a bit limited but each can publish its availability to Zookeeper so that publishers and consumers can be informed of a missing server and adjust. Of course, if you have many Kestrel servers with the same queue defined, the consumers will need to query all of the broker to get the message back and strict ordering can be a bit hard.
In term of performance, a few simple bash scripts using nc to publish messages easily reached 10k messages/s which is very good. The rate is static over time and likely limited by the reconnection for each message. The presence of consumers slightly reduces the publishing rate but nothing drastic. The only issue I had was when a large number of messages expired, the server froze for some time but that was because I forgot to set maxExpireSweep to something like 100 and all the messages were removed in one pass.
So, fairly good impression on Kestrel, simple but works well.