July 8th, 2017
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: firstname.lastname@example.org
Everything I’ve read suggests to me that companies are drifting towards Cassandra as the best of the NoSQL databases. But some of my clients give me directives such as “Please restrict yourself to AWS technologies”. So I was considering using DynamoDB. And then I read this.
When a customer does an action, it is reasonable to think that this kicks off a number of actions in your system. One action might spawn a number of events to be sent into the backend. Or perhaps your frontend is intelligent and batches events together. During a time when a customer is actively engaging in your application, there is a good chance the key for that customer ID is going to be more active. Potentially getting 1–5 requests per second for a given second but certainly not a sustained load of that.
This is not a hot key by any definition. But DynamoDB will fail.
As each of your nodes will only be allocated a small fraction of your total throughput this will cause a throughput exception in DynamoDb. You will have to overprovision by many multiples to avoid the problem and this causes costs to go through the roof.
Where DynamoDB does work
There are many use cases where DynamoDB is likely a good solution. If you know your dataset size is relatively small or your query distribution is high enough that the above scenario doesn’t apply to you then DynamoDB could help you solve problems. I have seen it used as a method for deduping where data is not stored long term so the dataset size does not grow. It is a fast and highly available tool if you can get around the issue I mentioned above. This article is pretty damning of DynamoDB but it really should be read as an indictment of our particular expectations of the database. If you set your expectations differently and work within the restrictions you might find a good use for the product.