Don’t waste $1 million on devops infrastructure that you’ll never need

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at:, or follow me on Twitter.

Samy Dindane asked me a question:

How would you deploy a simple React/NodeJS app?

I responded on Twitter:

Deployment is a big question but I’ll try to summarize. First, ask yourself, do you need to High Availability? When I ask CEOs of early stage startups, they always say “Yes” yet I think only 10% really need High Availability.

I think of High Availability as implying multiple servers, multiple load balancers, multiple failover databases, multiple data centers, multiple regions. Lots of redundancy, probably configured for failover.

Beware of perfectionism and maximalism. A recent client of mine had raised $5 million then hired Robots And Pencils and spent $1 million to build their MVP, and the CEO insisted on High Availability. He felt certain that his product was going to be the next Pokemon Go:

Beyond being a global phenomenon, Pokémon GO is one of the most exciting examples of container-based development in the wild. The application logic for the game runs on Google Container Engine (GKE) powered by the open source Kubernetes project. Niantic chose GKE for its ability to orchestrate their container cluster at planetary-scale, freeing its team to focus on deploying live changes for their players. In this way, Niantic used Google Cloud to turn Pokémon GO into a service for millions of players, continuously adapting and improving.

First of all, that’s rare, and second of all, does it actually matter if you lose a few customers because of a few dropped connections? Most of my clients seem to believe they are in markets where they must support uptime of 99.9999999%, even though most of my clients could have 98% uptime during a spike, and it would be fine in the long-run.

In the case of my client, the public didn’t like the MVP. Which is a common experience. And the “M” was a lie. If you go for High Availability, then “M” is a lie, unless it’s redefined as “Maximum”. They built a “maximal viable product” instead of a “minimal viable product.” High Availability is excessive; there is nothing “minimum” about it.

For a new startup, fast iteration is important. As Steve Blank says, you’re looking for product/market fit. Every variation of your product is a new experiment. The whole point of a Minimal Viable Product is that the public might not like your first idea, that’s why you keep it minimal — don’t waste much money on any one idea, till you have reason to believe that the public likes your idea.

You want fast iteration of your business idea, not fast iteration of a devops setup. You want agile product development, not agile devops resource use. Once you have product/market fit, it is nice if you have fine-grained control of devops resources, because you can save a few dollars that way, but it’s not the kind of money you should worry about when you’re still trying to find an idea that the public likes. Small optimizations should wait till you are a stable and mature organization.

Fast iteration of extremely minimal products allows you to test many ideas — you thus decrease your risk of failure. High Availability is a symptom of perfectionism, and as such it is the absolute opposite of an MVP.

Which startups need to be Highly Available, right from the beginning? Startups doing finance, actually handling money, or startups doing security. That’s it. Unless you’re handling money, or protecting money, you don’t need to start with High Availability. Get there slowly. It’s a nice ideal: when you are big and successful then you should eventually implement it. But don’t start with it.

I’ve previously been critical of certain devops strategies, especially Docker/Kubernetes. People have pushed back against my criticism, they ask, “If it’s good enough for Google then surely it is good enough for us?” But Google is a stable and mature company, so it is appropriate for Google to optimize its devops setup, to increase efficiency and lower costs at scale. Are you running a large, mature, stable company? If yes, this essay is not written for you. I’m writing for small startups launching an MVP.

That means you don’t need massive redundancy in some cloud strategy. You can consider bare metal servers, which can be very, very cheap. Have you priced a co-lo center? It’s surprising how cheap they can be.

You have to balance the risks, based on what your company really needs. I’ve a good friend at a startup who found a co-lo data center in a small college town in Virginia, where they were able to build out a massive MongoDB database, sharded across 4 massive servers, each server holding 256 gigabytes of RAM, 1 terabyte of RAM in total. They also had a Cassandra database and several servers for their front-end. They were paying $7,000 a month for this. The co-lo center was conveniently close to where their devops guy lived. Then the devops guy quit. Because the co-lo center was in this small town, they had to hire another devops person who was also in this small town, which was a limited market to hire in. The company decided that they could lower their risks by moving to AWS, because then they would be able to hire a devops person who lived anywhere. So they re-created their setup on AWS. Suddenly they were paying $35,000 a month. Yes, their bill went up 500% for the same setup. Co-lo is often much cheaper than the cloud, but you do face the risk of being location dependent. This can be a good risk for some companies, and a bad risk for other companies. You need to think carefully about which risks you really need to mitigate, and remember that it is impossible to mitigate all risks.

If you avoid High Availability, your setup can be very simple. Bare metal servers, and then deploy an uberjar, or a fat Go binary, or some other easy system. If you have multiple microservices with internal dependencies, you can write a deployment script in Jenkins, and this will still be simpler than a Docker/Kubernetes setup.

Not that you have to go with bare metal, I only mention it because it’s faster and cheaper than many of my clients seem to realize. If you want to be in the cloud, you can. And Terraform/Packer gives you an easy way to setup a standard virtual machine (VM) that your developers can use.

Terraform/Packer is an easier way to get what developers typically use Docker for. If you want an easy way to scale, you want isolation, easy deployment, security, a standard development environment, plus an AMI that can be spun up many times in AWS, you can use Terraform/Packer. Your resource use will end up being course-grained, compared to the fine-grained control you get with Docker/Kubernetes, but most startups do not need the fine-grained resource control that Docker/Kubernetes makes possible.

Terraform means you’re still working with a standard Virtual Machine, which you can run in VirtualBox or Vagrant, as well as run on AWS as an AMI. You can work with something standard, that you’ve probably been using for the last 10 years. You can setup a standard Virtual Machine that runs Linux or Windows, and that means you can stay with the environment that you have many years experience with, using tools that you’ve used for years. By contrast, if you instead use Docker/Kubernetes, you are launching into an adventure that involves a still new eco-system which is still somewhat immature and is still undergoing rapid evolution. If you really want to be loyal to the spirit of MVP, building something really minimal, then avoid Docker and Kubernetes. They are an advanced optimization, and therefore they are not MVP.

I mentioned the client who paid Robots And Pencils $1 million to build their MVP? The CEO at this client said, “The public is going to go wild over our product, once they see it, so we need to be ready to scale quickly. I expect we’ll have 10 million users within the first 6 months.” But in fact, after a year, they had 40,000 users, and that was after spending another $1 million on marketing. But hey, their app was Dockerized and set up in AWS Fargate, so they were ready for 10 million users, whenever those 10 million users decided to show up. None of this followed the real spirit of an MVP: there was nothing minimal here, instead, the project was contaminated with maximalism and perfectionism — the two biggest enemies of an early stage startup.

I wish I could say that this problem only happened at one of my clients, but in fact, nearly every client I’ve had these last few years have been swept up in a kind of perfectionist mania, chasing after expensive (but flawless!) implementations of ideas that had not yet been tested against real customers. This is not the correct way to build a company.

Beware hosted solutions that promise to remove all the devops pain from your life. The CEO felt certain that if Robots And Pencils could Dockerize the app, then it would be easy to achieve infinite scalability via AWS’s Fargate service. While that service is very good, and scales very well, it also turns out that you still need to write auto-scaling rules! Which they never got around to, because by the time they got to that point, they realized their app was not popular with the public. So they spent considerable money trying to achieve “easy scaling” and they never quite got there. Even more sad, or futile, or wasted, is the fact that their server app is a completely self-contained Go binary, with all dependencies included. So they could have easily achieved infinite scaling without Docker.

My basic point is, Docker/Kubenetes brings in a whole new ecosystem, with a lot of new worries, but you can get all the benefits via older Virtual Machine tech, and then script those older Virtual Machines with some mix of Ansible and Terraform (assuming you want to be in the cloud).

Whether you go bare metal or the cloud, MVP should be about a simple approach to devops.

High Availability is a beautiful ideal and every company should work toward that. Eventually, it is something you should achieve. Once you have proven product-market fit, then you can focus on reliability. But should a small startup begin with that, before you even know if the public likes your product? I say, leave High Availability till later. Don’t start with that. High Availability is not MVP, because MVP is about fast iteration of the main idea, not fast iteration of some devops strategy.

Post external references

  1. 1
  2. 2
  3. 3
  4. 4