December 12th, 2017
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: firstname.lastname@example.org
I’m posting here some quotes I like from this book:
Time for a cautionary tale. Back in 2006, I was working on building a pricing system for a bank. We would look at market events, and work out which items in a portfolio needed to be repriced. Once we determined the list of things to work through, we put these all onto a message queue. We were making use of a grid to create a pool of pricing workers, allowing us to scale up and down the pricing farm on request. These workers used the competing consumer pattern, each one gobbling messages as fast as possible until there was nothing left to process.
The system was up and running, and we were feeling rather smug. One day, though, just after we pushed a released out, we hit a nasty problem. Our workers kept dying. And dying. And dying.
Eventually, we tracked down the problem. A bug had crept in whereby a certain type of pricing request would cause a worker to crash. We were using a transacted queue: as the worker died, its lock on the request timed out, and the pricing request was put back on the queue — only for another worker to pick it up and die. This was a classic example of what Martin Fowler calls a catastrophic failover.
Aside from the bug itself, we’d failed to specify a maximum retry limit for the job on the queue. We fixed the bug itself, and also configured a maximum retry. But we also realized we needed a way to view, and potentially replay, these bad messages. We ended up having to implement a message hospital (or dead letter queue), where messages got sent if they failed. we also created a UI to view those messages and retry them if needed. These sorts of problems aren’t immediately obvious if you are only familiar with synchronous point-to-point communication.
The associated complexity with event-driven architectures and asynchronous programming in general leads me to believe that you should be cautious in how eagerly you start adopting these ideas. Ensure you have good monitoring in place, and strongly consider the use of correlation IDs, which allow you to trace requests across process boundaries.
Page 57, Building Microservices, Sam Newman
[We want loose-coupling and high cohesion:]
Whether you choose to become a REST ninja, or stick with an RPC based mechanism such as SOAP, the core concept of the service as state machine is powerful. We’ve spoken before about our services being fashioned around bounded contexts. [As an example,] our Customer microservice owns all the logic associated with behavior in this context.
When a consumer wants to change a customer, it sends an appropriate request to the customer service. The customer service, based on its logic, gets to decide if it accepts the request or not. Our customer service controls all lifecycle events associated with the customer itself. We want to avoid dumb, anemic services that are little more than CRUD wrappers. If the decision about what changes are allowed to be made to a customer leak out of the Customer service itself, we are losing cohesion.
Having the lifecylce of key domain concepts explicitly modeled like this is pretty powerful. Not only do we have one place to deal with collisions of state (e.gi, someone trying to update a customer that has already been removed), but we alos have a place to attach behavior based on those state changes.
Page 58, Building Microservices, Sam Newman
Building Microservices, Sam Newman
[The Finance Service can not reach into the parts of the database that should be controlled by the Catalog Service.]
At this point it becomes clear that we may well end up having to make two database calls to generate the report. This is correct. And the same thing will happen if these are two separate services. Typically concerns around performance are now raised. I have a fairly easy answer to those: how fast does your system need to be? And how fast is it now? If you can test its current performance and know what good performance looks like, then you should feel confident in making a change. Sometimes making one thing slower in exchange for other things is the right thing to do, especially if slower is still perfectly acceptable.
But what about the foreign key relationship? Well, we lose this altogether. This becomes a constraint we need to now manage in our resulting services, rather than in the database level. This may mean that we need to implement our own consistency check across services, or else trigger actions to clean up related data. Whether or not this is needed is often not a technologist’s choice to make. For example, if our order contains a list of IDs for catalog items, what happens if a catalog item is removed