# One write point, one read point, one log

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com

My last several clients have needed high write throughput systems. Most of these clients were doing something with Natural Language Processing (NLP) and they were often sucking in gigabytes of data, which had to be put somewhere until it could be processed. I found that entrepreneurs were surprised at how much time and money had to go into the support systems. So finally I created this all-in-one OWPORPOL graphic, which tries to show how all the different parts of the system fit together:

One great thing about an async architecture like this is it helps you build a frontend that can work offline. In some industries, this is crucial. As an example, Handshake CRM entered a crowded market and was able to gain market share from Salesforce because Handshake was the first Web native app that also worked offline. OWPORPOL is the backend you need to create such a frontend.

Such an architecture is also very useful for integrating multiple legacy systems. Over the last 20 years enterprise strategy has evolved through 3 phases:

1.) ESB

2.) SOA/Microservices

3.) The unified log, with a history of all events

“One write point, one read point, one log” is a simple take on the notion of the unified log. It’s not just for unifying legacy systems, it is also the right strategy to start with, for certain kinds of clients.

A bit off-topic, but if you need to be convinced that ESB is dying:

There seems to be a de facto consensus that iPaaS (Integration Platform as a Service) is the next big thing. So much so that the MuleSoft, one of the leading iPaaS vendors, achieved unicorn status last year after a successful round of venture capital financing and, then, this past March, went on to have a very successful IPO.

And while there’s little doubt for CIOs that legacy middleware integration technologies such as ESB are on the verge of obsolescence, there should be plenty of doubt as to whether iPaaS (and the vendors such as MuleSoft, Jitterbit, SnapLogic, etc. that offer iPaaS) is the logical successor to fill the white space being left behind. Why? Because iPaaS is stubbornly more of the same. It’s a modernization, but not the much-needed re-imagination, of the ESB model that is currently on life support at organizations around the world.

### One write point, one read point, one log

Most of this essay is a long list of questions that I’ve received from my clients. And mostly I’m dealing with inexperienced teams because if they had someone like me in-house, then they would not need to hire me. So don’t be surprised if the questions seem like they are from someone inexperienced.

You can see that I’ve put together a giant graphic that tries to explain the system I’m talking about. You’ll see I also include a lot of devops details that you might think of as superfluous to OWPORPOL. That’s because a few months back I showed a classic CQRS diagram to a client and I had this conversation:

CLIENT: So, this is all we need to build?

ME: Yes, that is the whole system for importing the text and running the NLP processes on them.

(A week passes)

CLIENT: Okay, the team has made some progress reading and writing to Kafka, and running the NLP, but it seems like we have to manually copy our code to the servers to make it run for real?

ME: No worries, we will set up a Jenkins server, which can run tests, build the uberjar, and then deploy it to your servers.

CLIENT: What? We need another server?

Me: For the build system, yes.

CLIENT: Last week you said you showed us everything we need to build! Are you trying to rip us off?

ME: What?

CLIENT: You told me that graphic you gave us was everything we needed to build!

ME: Sure, for importing text and running NLP. But you need a build system, a deployment system, a health check system, a lot of other things.

CLIENT: God damn, are you kidding me! How much is all this going to cost?

I didn’t like the idea that they thought I was trying to manipulate them into giving me more money. I’d stupidly made the assumption that they already knew that there would have to be other systems. I was wrong to assume that. So I created the all-in-one graphic, because I’ve found it useful to have a graphic that explains how all the systems combine and work together. Especially if I’m talking to a non-technical entrepreneur, this lets us have an honest conversation about what the total costs will be.

Why use OWPORPOL? Martin Fowler says:

CQRS allows you to separate the load from reads and writes allowing you to scale each independently. If your application sees a big disparity between reads and writes this is very handy. Even without that, you can apply different optimization strategies to the two sides… If your domain isn’t suited to CQRS, but you have demanding queries that add complexity or performance problems, remember that you can still use a ReportingDatabase…
Despite these benefits, you should be very cautious about using CQRS. Many information systems fit well with the notion of an information base that is updated in the same way that it’s read, adding CQRS to such a system can add significant complexity. I’ve certainly seen cases where it’s made a significant drag on productivity, adding an unwarranted amount of risk to the project, even in the hands of a capable team.

Fowler is right to warn about the complexity, although I’ve seen teams get into even worse situations when they try to implement piecemeal ad-hoc solutions to deal with high write throughput. I believe what happens at such teams goes like this (based on a real example):

Santiago: Huh, our Ruby On Rails app has always been able to take input from the Web and write it to our MySQL database, but now that we are letting our users upload Excel spreadsheets, the app is going too slowly, what should we do?

Imani: No worries. Let’s set up some separate apps that just handle the Excel spreadsheets! One app can convert the XLSX to CSV, the put them on a RabbitMQ queue, then another app can parse the CSV and write that to MySQL.

Santiago: Excellent idea! That solves all of our problems!

(One year later)

Santiago: Our new Web scraping Python scripts are pulling in 50 gigabytes of HTML each day. We can’t write all of this to MySQL, what should we do?”

Imani: No worries. We can store all the raw HTML in S3 buckets, then we can have other scripts that pull the HTML files and run them through our NLP filters, then write the parsed Statistically Significant Phrases (SSPs) and linked sentences back to other S3 buckets.

Santiago: Excellent idea! That solves all of our problems!

(One year later)

Santiago: The sales teams wants us to allow our customers to set up notifications, so when a particular name is discovered by our NLP scripts, we can send an alert to our customers, but our Notifications table is in MySQL and our NLP output is in an S3 bucket. What do we do?

Imani: No worries. We can pull from MySQL and the S3 buckets and combine them in MongoDB, then create an app that looks for new data appearing in MongoDB, and then we can send the appropriate notifications!

Santiago: Excellent idea! That solves all of our problems!

(One year later)

Santiago: Our business intelligence team wants to find out if our customers are getting any real benefit from our Notifications, but their analysis tools needs data from S3 and MySQL and MongoDB. What should we do?

Imani: No worries. We’ll create an app that pulls from S3 and MySQL and MongoDB and combines all the data in Redshift. Then the business intelligence team can run their queries against Redshift.

Santiago: Excellent idea! That solves all of our problems!

(More years pass, more additions are made, and so on, and so on.) [1 About the names]

No, this does not solve all their problems. What has happened, over time, is a series of ad-hoc solutions has built up data silos that now each have a bit of the canonical truth of the system, which is to say, the system no longer has a canonical source of truth. This is such a powerful anti-pattern, that I need to emphasize it:

The system no longer has a canonical source of truth

More so, the intuitions that allow this system to work exist only in Imani’s head, and if you tried to document this system, you would have great difficulty writing it down in a manner that would allow a new member of the tech team to understand what is actually happening.

At this point we need to reconsider Martin Fowler’s warning:

I’ve certainly seen cases where it’s made a significant drag on productivity, adding an unwarranted amount of risk to the project

because now OWPORPOL would be simpler and less risky than what Santiago and Imani have done.

But this is what you want:

Of the ideas in this essay, the unified log is more important than “one write point, one read point” because there are some companies that have tried to implement CQRS without having a unified log and they end up worse off:

To be honest our system doesn’t do that much yet and all these services might seem like overkill. They do to us at times as well, that’s for sure. It can be annoying that adding a property to a case will pretty much always involve touching more than one service. It can be frustrating to set up a new service just for “this one little thing”. But given the premise that we know that we must build for change I think we are heading in the right direction. Our services are loosely coupled, our query services can easily be thrown away or replaced, the services have one and only one reason to live and are easy to reason about. We can add new query services (I can see lots of them coming up, e.g. popular-cases and cases-recommended-to-you) without touching any of the command services.

You’ll notice that the graphic in that article has an “events database” but it is not the center of the whole architecture. That article has no concept of the unified log, and I think that is why they got into trouble. I find it fascinating how their ideas overlap with 90% of mine but the remaining 10% seems to be very important. They do understand this much:

You get a transaction log for free, and you can get the current state by replaying all the events for a given order. Add to that, now you can also get the order state at any given point in time (just stop the replay at this time).

But they don’t seem to understand the idea that the log needs to be the center of all communication in their system. Indeed, they call it a “transaction log” rather than a “unified log.” In this case, it’s possible the nomenclature is significant. They also say this:

Worry not, there is a better way, and it is called event collaboration. With event collaboration, the Order Management Service would publish events as they occur. You typically use a message bus (e.g. Kafka, RabbitMQ, Google Cloud PubSub, AWS SNS) and publish the event to a topic.

If you combine their “transaction log” and their “message bus” then you get the concept of a Unified Log. Maybe they understand this, or maybe they don’t; what I can see is that the graphic they use in that article has “publish” and “subscribe” scattered around the system in a disorganized way.

### An all-in-one OWPORPOL graphic

A few notes about the big workflow graphic:

1.) I previously complained about graphics that are “a mish-mash of logical and physical items.” In this graphic, you can assume each item is a physical item, if you believe the system is built using a microservices style. However, the graphic is really agnostic on that issue. I used to promote the microservices style. However, I’ve seen some teams do well with it, and other teams do badly with it. The more I consult with different teams, the more I’ve come to appreciate that creating software is more of an art than a science, and therefore architectural ideas should show deference to the artists who are creating the software. If your lead engineer loves microservices, use them. If your lead engineer loves a monolith, then use that monolith. In the graphic, you can believe that each model is its own app, or you can believe they are separate modules in Ruby On Rails or Sinatra or Django or Flask or Symfony or Zend or Axon or whatever you want to use. If you have an inexperienced team that mostly just works with Django, then using Django and PostGres on the backend can give them a pleasant feeling that at least part of this architecture is familiar to them.

2.) Many of the graphics used in other tech articles are vague and abstract. Understandably, many developers try to be “technology agnostic” when they write about architecture, because they assume other developers will want to implement the same architecture in a different language. However, my non-technical (or technical but inexperienced) audience is allergic to this much abstraction, so here I wanted to be specific, using the names of actual technologies. But depending on your needs, any of these technologies could be replaced by something else, and reasonable software developers might prefer different technologies simply because they have experience with those technologies. So for instance, I use ElasticSearch in this graphic, but I worked with one company that converted to GraphQL, as part of a re-write during which they committed to the React eco-system (at least for now, I would recommend against GraphQL unless you are committing to the React system). In that case, we started with Code Foundries Universal Boilerplate for Relay, but we tore out the code for Cassandra and used MongoDB — we did this simply because we all had experience with MongoDB, but we didn’t have experience with Cassandra. This is not a vote against Cassandra, as I have friends at Parsely, where they use Cassandra to good effect. Likewise, elsewhere in the graphic I use PostGres, but of course you could use MySQL or Oracle or MS SQL Server, or for that matter, since it is only a cache, you could use any cache technology, such as MongoDB or even Redis. But most of the time I recommend PostGres, because RDBMSs are familiar to most teams, and PostGres is the best of such Open Source systems. Also, I mention Kafka as the Unified Log, but AWS has a similar product called Kinesis.

### I am grateful to my clients for educating me

I am fascinated with the things my clients don’t understand. Their questions are a major part of my own education. I am deeply grateful for the chance to answer their questions. What follows are the most common questions I’ve gotten regarding OWPORPOL.

### How does the frontend talk to this backend?

I had this conversation with a frontend developer who was completely new to the OWPORPOL architecture.

Them:

How do I know I have created something on the backend? Suppose a new user wants to create their profile. If we were using a normal website, something like Django, then my Javascript code does a POST to the server, and I’m going to either get back a 200 or a 400 or a 404 or maybe a 500. Either it works, or it doesn’t work, and I get back an HTTP response code that tells me what happened. And then the frontend code can show an appropriate message to the user. But in this architecture that you are talking about, I send a JSON document via POST and then… what? Maybe the API Gateway gives me a 200, but that doesn’t tell me if they document made it all the way to the database. That just means I had the correct API key and so I was allowed to make a POST. But I’ve got no idea if anything succeeded, and I’ve got no way to reference this document. I’ve got no ID, no way of knowing how to refer to this in the future.

Me:

It might seem like a complicated architecture, but in my experience, when things are working correctly, a new document can get through the system in less than 2 seconds. It should never take more than 2 or 3 seconds for a document to go through these steps:

1.) from the users web browser to the API Gateway

2.) to an app that reformats the data for Kafka

3.) to Kafka

4.) to the backend robot that handles User profiles

5.) to PostGres

6.) to the Translator app

7.) to ElasticSearch

If, for some reason, your system slows down the point where it is taking 20 or 30 seconds for a document to get through the system then, first of all, you need to look at the bottle neck and fix it, but, second of all, I have seen many startups cheat a bit, and for high priority information, the backend robots will both write to PostGres and also write directly to ElasticSearch. That leads to tight coupling between the backend robots and the denormalized data in ElasticSearch, so I don’t recommend it, but I have often seen it done.

Assume that 99.9% of the time a new document can get through your system and arrive in ElasticSearch in less than 2 seconds. Then your code can POST to the write point, wait 2 seconds, and then make a read from the read point — your new document should be there.

Them:

But how would I know? How would I be able to tell if the new document is in ElasticSearch? I don’t know its ID in the database, so I have no way to identify it.

Me:

Before you send it, you’ll want to create a UUID and add that in as the document id. And you’ll want to send it with the session id. So 2 seconds later your code can ask, is there a document with this session id and this document id? In other words, for a new user profile, you send something like this:

{
“first_name” : “Allie”,
“last_name” : “Veblen”,
“phone” : “01-987-234-9812″,
“company_id” : “9871fVGTED120dkfir93F”,
“document_id” : “f02343dlGTRF9090909dLc”,
“session_id” : “888F8F2cvnd4jf09LKJiv2″
}

And then 2 seconds later you can query for:

{
“document_id” : “f02343dlGTRF9090909dLc”,
“session_id” : “888F8F2cvnd4jf09LKJiv2″
}

If the document comes back, then you know it was successfully created. If nothing comes back, wait another 5 seconds and try again. Then another 5 seconds and try again. You can set some cut off, let’s say 15 seconds. If the document is not there after 15 seconds, you can assume something failed. You can either do the POST again, or tell the user that there is a problem.

In general, you really should not rely on the auto-generated ID of a database table to be your document id. That style was popularized by PHP and MySQL and then Ruby On Rails and MySQL, but that style needs to die. Arguably it is a security risk, since outsiders can easily guess the id of any document in your system, which they can not do if you use UUIDs.

Them:

But even with UUIDs, isn’t there still a big security risk? Can’t they potentially guess a UUID?

Me:

Sure, hackers can try to guess UUIDs, that is a well known attack. But there are a few things to keep in mind. First of all, with UUIDs, they would have to guess many trillions and trillions of UUIDs before they would even have a 50% chance of correctly guessing a UUID that you are using. Second of all, you should have some code that checks to see if hackers are attacking your site with random guesses. If you use the AWS API Gateway, then you don’t have to write any code yourself, because Amazon has already done that for you. The only requests that get through will be requests that have your API key.

Them:

A UUID seems complicated. I don’t know how to create them. Maybe I can just use a timestamp instead?

Me:

No, you don’t want to use a timestamp. That would only work for the simplest cases. First of all, as time goes by, your frontend will become more complex, and eventually you will find you are attaching multiple events to a single button click. These would all have the same timestamp, so the timestamp would fail to tell you which event/document is which. Second of all, you don’t need to create UUIDs yourself, there are many good libraries that do you this for you, and which are easy to use.

### How does the user login?

Especially nowadays, inexperienced developers have gotten so used to working with systems that automatically handle sessions that many such developers don’t seem to understand what a session is, or how to handle it manually. So I am often asked a variant of this, by whoever is working on the frontend:

How can I handle login in a system like this? If I build a website using WordPress or Ruby On Rails or Django then all of those systems have code to handle logins. Normally I get the user’s username and password, from an HTML form, and then I POST that to the server, and if the username and password match an existing username and password, then the user is logged in. But in this OWPORPOL system, I don’t have a direct connection to the backend, so I don’t think I can maintain a session?

To answer, let’s first talk about a totally normal website, like the type run by WordPress. In that case:

3.) (assume success) the server sends back a session_id, to be set as a cookie

4.) the web browser records the cookie

5.) thereafter, the web browser sends the session_id with every request, in the HTTP cookie headers

One could just send the username and password with every single request, but that would get tedious and would also possibly be a security risk.

How to handle authentication with an OWPORPOL system? There are a few ways, including the traditional “send username and password from HTML form.” But nowadays 3rd party auth has become very common, so let’s talk about that too.

In the (very traditional) first case:

1.) you have a website where the user fills in the username and password in an HTML form.

2.) You also have a small auth app that is available at a public endpoint. This has access to a cache of all usernames and passwords (I’m just going to assume encryption and not talk about it). This auth app is an exception to the overall architecture, since it amounts to a write point, other than the main write point. I think this is a minor exception that doesn’t hurt the overall system

3.) The HTML form POSTs the username and password to the auth app

4.) let’s assume the auth app finds a match for that username and password

5.) the auth app generates a UUID to be used as the session_id. This gets sent back to the web browser

6.) the auth app also writes the session_id and username to Kafka (this is a direct write, and therefore another exception to the notion of “one write point”), so that the User robot on the backend can pick it up and see what the current session_id is. If your backend is written in a standard Django or Ruby On Rails style, you might have a Sessions table in PostGres where you store all current session_ids

7.) thereafter, any document sent to the write point should include the session_id

In other words, the auth app writes something like this to Kafka:

{
“session_id” : “9230940329420394″,
“status” : “success”
}

or this for a failed login:

{
“status” : “failed”
}

(To many failed login attempts is a problem. You should have some backend robot that tracks this.)

Later, someone tries to send in a New Business action, which might look like this:

{
“name” : “Joel’s Country Vegetables”,
“document_id” : “frt43209KJH3111nvBt2″,
“session_id” : “9230940329420394″
}

This gets picked up by the Business robot, which needs to know if this is a valid user, so it checks the session_id in PostGres. This is exactly what the major frameworks do: Flask, Ruby On Rails, Django, Symfony, etc. In fact, you could use any of those frameworks for the backend.

The Business robot sees the session_id is valid, so it creates a new row in the Business table in PostGres. And it should write a message “Your new business has been created!” into the Messages table, and that should be passed along to ElasticSearch, and then to the user (all of this usually happens in less than 2 seconds).

PostGres is a cache that holds the current snapshot of what data is valid. However, it is not the canonical source of truth. Note that you could delete PostGres, and delete all of your PostGres backups, and you could still correctly recreate all of the information in PostGres, by having your robots run over the data in Kafka. So the history of events in Kafka is the canonical source of truth, in the sense that all the knowledge of your app can be recreated by re-reading the data from Kafka. However, in terms of what is the current data, you can keep that in PostGres, and in that sense the app will work a lot like a typical Flask app, or a typical Ruby On Rails app.

The failure mode is also simple, if the Business robot can not find a matching session_id in the Sessions table, then it should log this as a failed attempt. Failures should also be stored in the Messages table, to be passed along to the user. “We could not create a business, could not find a valid session.”

How long does login last? Every website has different rules about this. Most sites will allow you stay logged in forever — the session never expires. A user remains logged in unless they specifically initiate a logout action. However, some sites, such as my bank, are wary about security, so they cancel sessions after just 5 minutes. Ending a session just means deleting the session_id from the Sessions database table. Or maybe the user clicks the “logout” button on the website, at which point a “logout” event is sent to the public write point, then to Kafka, then it gets picked up by the User robot (or maybe you have a Sessions robot) and that deletes the session_id from the Sessions database table.

Now let’s consider the other case, when you use 3rd party auth.

Firebase: let’s assume you use something like Firebase for all authentication.

Your frontend code can make a call directly to Firebase. You’ll get back a JWT, which contains a session key (for some reason, Firebase uses an enormous session key, something like 900 characters. I’ve no idea why this is so huge. But it works fine as a session key). Pass that session key along with every document that you send to the public write point. Each of the robots on the backend will then have to verify that the session token is valid. That is, they get the session key, then they make a call to Firebase to see if the key is valid and who does it belong to. For this, it can be helpful if you are using a monolith on the backend, such as Django or Ruby On Rails, as it makes it easier to use the “verify with Firebase” code just once for each document. Otherwise, potentially, if one document needs to be processed by 3 different backend robots, each robot will have to verify the session key with Firebase. But if you have developed your backend as a series of microservices, it is not much work to write a simple wrapper for your Firebase code and then include that everywhere as a shared library. A very minor pain point.

In either approach, authentication and maintaining sessions are not radically different compared to traditional websites, especially if your website was already mostly Javascript (so, “traditional” in the post-2005 sense, not in the 1993 sense).

### Too complicated! I don’t wanna!

After I explain these ideas to a team, at some point I get this pushback:

This is too complicated. It’s also unnecessary. I understand why our web scraping script has to dump raw HTML somewhere — it’s pulling in 50 gigabytes a day. So, yes, the web scraping script can dump its raw HTML on S3 or Kafka, or wherever. But I’m just writing the website code. This part of our code can be a normal Ruby On Rails app. There is no reason why our frontend code needs to interact with a system as complicated as what you are talking about.

My response is usually something like this:

Be wary! You might well be in the early stages of creating a disaster like the one created by Imani and Santiago. If you build a system with multiple write points, at the very least be sure you have a unified log through which all messages go. Otherwise you are on the road to a system that has multiple data silos and no canonical source of truth. The unified log is essential.

Less essential, but also helpful, is the idea of having a single write point. That means that even your web scraping app writes to the end point. Yes, you could instead have the web scraping app write directly to Kafka. You could have lots of apps write directly to Kafka. If you do that, three possible annoyances often arise:

1.) logic for writing to Kafka is all over the place

2.) logic for enforcing schemas is all over the place

3.) lots of material appears in Kafka and you may not know where it is coming from

That said, I see a lot of companies that cheat regarding “one write point” and they don’t seem to suffer much. But you absolutely need the unified log. That much is essential.

### Caches are cheap, build a lot of them for the frontend

There was a stretch, maybe from 2005 to 2015, which will be remembered as the era of simple Ajax uses. The era started when Sam Stephenson released the Prototype library, but it really got going once jQuery was released. It was an era of Javascript fetching data via Ajax, and relying on the backend code to decide what data is sent in response to each request.

That era is evolving into something different. Frontends are now complicated enough that the frontend coders need to have control over what their queries fetch. GraphQL was invented to satisfy this need.

Maybe GraphQL is the big wave of the future. I don’t know. For now, I favor a simpler approach.

I wrote a small translator app. It is less than 1,000 lines of Clojure. I hope to eventually clean it up and release it as Open Source. You can recreate it easily on your own. The idea is to pull data from PostGres and denormalize it to ElasticSearch. The crucial idea is that the SQL, and the reformatting of the data, should be controlled by a simple configuration file, which the frontenders can change at will. Therefore, whenever they need a new data structure in ElasticSearch, they just add a few more lines to the configuration file, and the Translator app picks up the changes in the configuration file and immediately starts creating a new data structure in ElasticSearch. And the Translator app is forever polling PostGres, so whenever new data appears in PostGres, the Translator app is quick to pick up the change and move it to ElasticSearch.

In the early days of Ajax, it was common to offer RESTful interfaces that exactly matched the database tables in a SQL database. So if your MySQL or PostGres had tables for Users, Businesses, Messages, then the RESTful interface offered endpoints for Users, Businesses, and Messages. And the Ajax interfaces were very busy, making a lot of calls. I can recall early Angular websites where the pages were making 20 or 30 or even 40 Ajax calls to fill in the page.

You should denormalize your SQL data into a form that is convenient for the frontend. Your frontend should only have to make 1 or 2 or maybe 3 Ajax calls to fill in a web page.

Because the Translator app is easy to use, the frontenders can create different kinds of documents for every page on the website. They can have some documents for the Profile page, different documents for the Rising Trends page, different documents for the Most Commented page. You can have a cache for Rising Trends page logout and a cache for Rising Trends page logged in. If a given page shows data from 8 different SQL database tables, then you need to fetch those 8 tables and combine the data into one JSON document, so the frontend can get everything it needs with 1 Ajax call. You can and should have as many cache collections as possible. The only limit is that the speed of writes to ElasticSearch. On a B2B site you could potentially have a cache for every user. On a B2C site you would probably have too many users to do so. But in the interest of convenience for the frontend, you should certainly push the limit of how many caches you can have before the write/updates begin to slow down to unacceptable levels.

When I first created my Translator app, it was at a company that had a PHP website which made straight calls to MySQL. They were trying to develop an API, so they could have some separation between the backend code and the frontend code. I built the Translator app to make it easy for frontenders to pull data from MySQL and get it to ElasticSearch. Here is an example of the config the frontenders could use to create 2 document collections:

{
"denormalization" : [

{
"key_to_denormalize_upon" : "profile_id",
"collection_name" : "recent_investments_page"
},

{
"key_to_denormalize_upon" : "profile_id",
"collection_name" : "recent_investments_page"
},

{
"key_to_denormalize_upon" : "profile_id",
"collection_name" : "recent_investments_page"
},

{
"key_to_denormalize_upon" : "profile_id",
"collection_name" : "company_details_page"
},

{
"sql" : "SELECT * FROM company_websites",
"key_to_denormalize_upon" : "profile_id",
"collection_name" : "company_details_page"
},

{
"key_to_denormalize_upon" : "profile_id",
"collection_name" : "company_details_page"
},

{
"sql" : " SELECT * FROM company_board_members",
"key_to_denormalize_upon" : "profile_id",
"collection_name" : "company_details_page"
},

{
"sql" : " SELECT * FROM company_investors",
"key_to_denormalize_upon" : "profile_id",
"collection_name" : "company_details_page"
}

]

}


This creates 2 document collections:

recent_investments_page

company_details_page

(I used the word collection because I was originally going to use MongoDB, not ElasticSearch.)

This allows the frontenders to write whatever SQL they need, have the data combine around some key, and live in denormalized state in ElasticSearch. I believe this gives all the flexibility that frontenders seek from GraphQL, but setting this up is easier than setting up a GraphQL system.

### What should be stored permanently in Kafka?

You can store a lot in Kafka. At LinkedIn, they currently keep 900 terabytes of data in Kafka. But your system will almost certainly generate some useless debris that does not need to be stored forever in Kafka. You should use Kafka as your main communication channel, so everything should go through Kafka, but some types of messages can be deleted after a few minutes.

I’m going to share an actual chat I had with a client, because I think this shows the learning process that a team goes through:

Them:

When you used ElasticSearch, how did you resolve race conditions pertaining to two documents updating at the same time? Some merge logic? Did you retry the write?

Me:

I was simply pulling from MySQL and writing to ElasticSearch so generally it was fine to go with “Last write wins.” You need to think carefully about how much harm will be caused if sometimes you have a document in ElasticSearch that’s out-of-date. Obviously, if the issue is payment, that could be very painful (if the user has paid, but ElasticSearch still thinks the user has not paid). The need for “real time” handling can get very complicated. When we say “real time” we mean “high priority.” Some companies use Apache Storm in this situation, which has multiple redundant checks and restarts on failed functions, as such would allow a team to avoid the possibility that a failed app or a failed write leaves the final cache (ElasticSearch) in an out-of-date state. But both Jay Kreps and Nathan Marz have agreed that you’ll want to run Kafka no matter what, but running both Kafka and Storm leads to an incredible amount of complexity. Still, some companies do this. Parsely does this. But in general, you want to keep things simple. Just stick with the basic architecture, and try to make the translator app as fast as possible, so if it gets something wrong now, hopefully it will be less than a minute before it re-tries and get things correct. Will your customer be angry if the data is wrong for one minute? Can you calm them down? Because if you can manage their expectations, then your architecture can be a lot less complex.

Them:

Hmmm, I think perhaps I have a different kind of race condition than what you just talked about. I’m dealing with the async nature of NodeJS. I’m writing the Translator app in NodeJS. And there are multiple Kafka messages coming in.

Me:

I never wrote directly from Kafka to ElasticSearch, so I’m not sure how to answer. What I typically do is I have the backend robots do some processing and then store the data in normalized form in PostGres.

Them:

So, say a user makes a payment. To facilitate this, a message is posted to Kafka saying, “This user, with these credentials, would like to make a payment.” This message is picked up by the “Billing” service, which uses the incoming information to determine if the billing succeeded.

At this point, there are two ways the front-end could be notified of this change.

Option A: The Billing service could cache the successful billing receipt in PostGres. The Translator app could listen to this log and update accordingly. The User service could also listen to this log and update some info in the User profile.

Option B: The Billing service does not cache anything in PostGres, but instead posts a “Transaction Complete” message to Kafka. Anything listening can do whatever it wants with that information, including updating some information in ElasticSearch or info in a User profile.

In both of these situations, the event is stored somewhere.

We will have some forms of orchestration, and I don’t think listening to PostGres in that regard is the best idea. I don’t think it does what we want. We have Kafka specifically for events.

Me:

Sending a Message to ElasticSearch that says “Your payment was successful” is the right idea. And you can mark a “Payment successful” in Kafka. I’d say those are two separate messages that the Billing robot will send. I think it is okay to cheat a little bit, if you need the speed, and have the Billing robot write directly to ElasticSearch, rather than write to PostGres. Again, if you need the speed. Balance this against the fact that you are making your life a little bit more complicated every time you make an exception to the overall architecture. And when you think you need that extra bit of speed, so you want to cheat a little bit, you are taking a step down the road that lead the industry to the whole debate about “Should we use Kafka and Storm together?” See this:

The problem with the Lambda Architecture is that maintaining code that needs to produce the same result in two complex distributed systems is exactly as painful as it seems like it would be. I don’t think this problem is fixable.

As a general rule you only write to Kafka what you would need to recreate the whole history of your app, and therefore the current state. That does not include all human readable messages, but it does include things like receipts, since you would not be able to recreate that later on.

As to how many events you write to Kafka, you can be practical about that. No writing two messages when one will describe the event. Don’t write permanent messages if your only goal is to trigger some other backend robot that you know is waiting for some particular signal.

Them:

Hmmm. I see what you’re saying. Would you agree there is a difference between a “Please Process A Payment” message, versus a “A Payment Was Successful” type of message?

Me:

You can distinguish between intentions and facts. The future tense versus the present tense. In standard CQRS lingo, these are called “commands” and “events.”

“I would like to make a payment with these credentials” is a command.

“Your payment was successful” is an event.

I don’t often use that lingo because I don’t know if it clarifies much, but maybe it is useful?

Them:

Oh it does!

It slipped my mind that while the incoming messages are commands, they’re not important and all we need are the resulting events.

Me:

That is a reasonable architectural decision. There are different ways to handle commands. The crucial thing is that the long-term permanent records in Kafka hold all info that you would need to recreate the history of your service, and therefore the current state.

Them:

So “command” messages can delete themselves after an hour, or something like that. 5 minutes. Or a day. Whatever. They don’t need to be permanent.

Me:

That’s correct, they don’t need to be permanent. Some of that depends on what sort of experience you want to guarantee for your customers.

You don’t need to permanently record the commands, you could just permanently records the results (the events). But if an app dies when it’s in the middle of handling a command, it’s useful to have recorded the command, so the app can try again when it restarts. I assume the app will need less than 1 second to restart. For instance if you get the command “I would like to make a payment with these credentials” and your Billing app dies when it tries to handle the command, then when the Billing app restarts it would see an unhandled command and could re-run it. Or you can simply assume that your customer will try again later. This latter option is less complicated, so it is preferable if you are allowed to treat your customers that way. Just be sure it’s really okay with your business model.

Them:

Yes.

So, to bring this back to the Translator app. It could also respond to these events, and be somewhat agnostic regarding the use of PostGres. We don’t need PostGres, do we? If ElasticSearch is our cache, why do we also need PostGres?

Me:

Be wary. This conversation runs the risk of confusing two technologies with two different kinds of cache. You can use ElasticSearch for everything, but you will still have two conceptually different caches: one that holds normalized data, and one that holds denormalized data.

You don’t need to use PostGres. You can use any kind of data store. You can use Cassandra of MongoDB or MySQL or Redis. Since you are already using ElasticSearch, you can use ElasticSearch for everything.

So it doesn’t matter if you use PostGres. You could use ElasticSearch for everything. Just remember that, regardless of which technologies you use, you still will have 2 conceptually different caches — one for normalized data and one for denormalized data. You can put both caches in ElasticSearch, but remember how different they are.

And also remember, even if you use PostGres, you are using it as a cache. The permanent items in Kafka are the only canonical source of truth for your system. You could delete your PostGres database, and delete all the backups for the PostGres database, and that wouldn’t matter at all, because you could regenerate that data by having your backend robots re-read the history of all events in Kafka. Your log is your source of truth. Everything else is a cache.

### What sort of machines should be used to run Kafka?

This info comes from Chris Clarke, who runs devops at Parsely:

For any production system, I’d recommend running at least 3 Kafka brokers on EC2 instances with local SSD storage and replication set to at least 2. 5 nodes with replication of 3 will allow for faster recovery. I highly recommend using a separate Zookeeper cluster that is dedicated to this Kafka cluster. These can be tiny machines. We use c5d.large, but this is overkill. They use almost no CPU, and around 2G of RAM, which is mostly page cache. These things are just a high availability key-value stores.

To give a little background, we run an old version of Kafka (0.8.2) on EC2 instances and connect to it using pykafka which is an open source python module that we maintain. This version of Kafka is buggy, and there are some problems with pykafka, particularly around handling disconnected/down brokers. These issues have lead to a few terrible disasters, so we’re testing the newest version of Kafka and a number of clients. As of right now, the confluent client[1] appears to be the clear winner at least with regard to reliability.

.

.

.

Epilogue:

Do you use Docker? Then keep reading. But if you don’t use Docker, then you can stop reading now.

### How to use Docker to handle software deploys

I recommend against using Docker, but I sometimes have clients who insist on Docker, so I did recently build a system that used the AWS Docker ecosystem. The parts of the system:

1.) a public URL that pointed to a load balancer

2.) behind the load balancers, some EC2 instances optimized for the Elastic Container Service (these are a particular AMI that Amazon makes available)

3.) Docker images stored in the Elastic Container Registry

4.) A Jenkins buildfile that could pull the Python code from Bitbucket, build the Docker images, and then push them into the Elastic Container Registry.

5.) A call to “aws ecs update-service” to force AWS to push the new Docker image out to the ECS instances.

The Jenkins build script did stuff like run unit tests and fail the build if the tests did not pass. For the sake of careful stability, the client decided that we would only deploy code when someone logged into Jenkins and hit the “deploy” button. They would have to add a tag such as “production” or “staging” to determine which system we were deploying to. We captured the user’s writing by using the “input” command:

    stage('Add tags') {

try {
timeout(time: 180, unit: 'SECONDS') {
userInput = input(
id: 'userInput',

message: "Add tags, separated by whitespace. 'develop' or 'staging' or 'production' will deploy to our various environments.",
parameters: [
[$class: 'TextParameterDefinition', defaultValue: "build-${env.BUILD_NUMBER}", description: 'Whatever you type becomes a Docker tag.', name: 'Add tags']
])

listOfDockerTags = userInput.tokenize()
}
} catch(e) { // timeout reached or input false

println "An exception occurred during the 'Add tags' stage:"
println e.getMessage()
throw e
}

println "End of the 'Add tags' stage"
}


Obviously, this is all written in Groovy, which is the DSL language for Jenkins. The input looked like this:

The Jenkins script had to first authorize itself with AWS, and capture a login token that could be used in future AWS commands. For the sake of high available reliability, we were multi-regional, and since AWS logins are per region, I had to log in to the 2 regions we were operating in:

    stage('AWS ECR get-login') {
println "Start of stage: AWS ECR get-login"

// 2018-03-26 -- possibly I should test for the presense of 'docker login' to be sure the this line works
sh " ${awsEcrLoginEast1} " def awsEcrLoginEast2 = sh(script: "aws ecr get-login --region 'us-east-2' --no-include-email", returnStdout: true) sh "${awsEcrLoginEast2} "

println "End of stage: AWS ECR get-login"
}


Finally we call “aws ecs update-service” to force AWS to push the new Docker images from the ECR to the ECS servers:

    stage('ECS update-service') {
println "At the start of stage ECS update-service"
println "As of 2018-03-27 we call 'aws ecs update-service --service aui-stage-fcr-service --cluster aui-stage-ecs-cluster --force-new-deployment' after every build, since we only do manual builds. We assume the person doing the build is confident that the code can be deployed, assuming we have a service/task set up in the Elastic Container Service."

for (String oneDockerTag : listOfDockerTags) {
if (oneDockerTag == 'production') {
sh "aws ecs update-service --service aui-prod-fcr-service --cluster aui-prod-ecs-cluster --region 'us-east-1' --force-new-deployment"
}
if (oneDockerTag == 'staging') {
sh "aws ecs update-service --service aui-stage-fcr-service --cluster aui-stage-ecs-cluster --region 'us-east-1' --force-new-deployment"
}
if (oneDockerTag == 'develop') {
sh "aws ecs update-service --service aui-dev-fcr-service  --cluster aui-dev-ecs-cluster  --region 'us-east-1' --force-new-deployment"
}
}

println "At the end of stage ECS update-service"
}


The best thing about this is that AWS handles the graceful rollout of the new app. This is awesome because I’ve wasted many long hours of my life writing that kind of code, trying to ensure minimal downtime when we roll out a new version of our code, trying to spin up a new set of servers while the old servers spin down. But if you use this system, then AWS does all that for you.

If I call “aws ecs update-service” and then ssh (via the bastion box) to the ECS box where the Docker containers are running, I can see 3 old containers that have been running for weeks, and 3 that have only been running for a few seconds:

And if I check again in 15 minutes, the old ones are gone, and all that is left running are the 3 containers I just started.

### Why isn’t Docker a part of this graphic?

If you’ve read this blog before you know that I don’t recommend Docker. I’ve previously made the case that if you want security, isolation, flexibility, orchestration, and an easy way to scale up, then the smart way to go is to use Terraform/Packer. Please read my article “Docker is the dangerous gamble which we will regret.” The article sparked very long conversations at places such as Reddit and HackerNews, which are linked in the comment section of the blog post.

Most of that article went to criticizing Docker, and very little went to explaining Terraform and Packer. I’ll try to say a little more about Packer here.

Sean Hull wrote this to me in a private email:

Terraform can do everything up until the machine is running. So things post boot, *could* be done with Ansible. That said another route is to BAKE your AMIs the way you want them. There’s a tool called “packer” also from hashicorp. It builds AMIs the way docker builds images. In that model, you can simply use terraform to deploy a fully baked AMI. If your configuration changes, you re-bake and redeploy that new AMI. Barring that, it’s terraform for everything up until boot, and then Ansible for all post-boot configuration.

Create your ideal Linux setup, then save it as an image. Use Packer, which will create the image for you:

Source