July 20th, 2016
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: firstname.lastname@example.org
RabbitMQ is written in Erlang, so I was inclined to think well of it, though I heard criticism of it. And then the New York Times used it. A very surprising vote of confidence in RabbitMQ:
This architecture – Fabrik – has dozens of RabbitMQ instances spread across 6 AWS zones in Oregon and Dublin. The instances are organized into “wholesale” and “retail” layers. Connection to clients is via websockets/sockjs.
Upon launch today, the system autoscaled to ~500,000 users. Connection times remained flat at ~200ms.
Fabrik provides subscription services for breaking news, video feeds, etc. and will add more event based services. It also supports individual messaging related to subscription status for registered users.
This system would not have been possible without RabbitMQ. It was the one component, used everywhere, that never faltered or failed.
We are using: a single Amazon Linux AMI, RabbitMQ, Cassandra 2, python 2
We use pika with tornado and libev for the nyt⨍aбrik wholesale and retail pieces; our internal clients use Java and PHP.
We use Rabbit MQ as our message passing system. Right now, the messages we handle are things like Breaking News Alerts and Live Video alerts. Our internal clients send the fabrik these messages over AMQP. We then send them around our stack, ensuring they are delivered.
We have Rabbit in all layers of our stack, with shovels connecting them. Our own internal code helps route the messages based on there services level. Some messages, like Breaking News, must go out as quickly as possible. So we spread these out over out clusters AND shovel them to clusters in other regions for processing. From there the messages get send to the front end for delivery.
We also use Rabbit for individual messages. If you are a registered NYTimes users, we can send you personally a message. Things like credit card expiring.
In production we have a RabbitMQ client 3-cluster and a core 3-cluster in each region on c1-xlarges. A proxy cluster of c1-mediums in Virginia connects clients to the client clusters. All services are parallelized so we can add more cores and clients.
The retail layer autoscales and use c1-mediums with a single rabbit shovel-connected to one of the core rabbits. Each python websocket/sockjs gateway supports up to 100K clients.
We autodeploy into subnets within Virtual Private Clouds in AWS. Clients are routed via least latency to the fastest healthy region.
Of the technical components, the gateway is the most complex. We will be moving it into open source in pieces and the first piece is likely to be the python websocket/sockjs libraries which, frankly, beat the crap out of most other stuff out there and fully conform with the relevant standards. It can be loosely thought of as a C co-process managed by python, and as such, may be possible to reuse in other languages/environments.
We have a 12-node Cassandra cluster across the 2 regions / 6 zones. It is used for persistence of messages and as cache. We do not use persistence in RabbitMQ. Our services are idempotent and important messages may be replicated multiple times creating intentional race conditions in which the fastest wins.