Smash Company Splash Image

September 30th, 2019

In Uncategorized

11 Comments











If you enjoy this article, see the other most popular articles




















If you enjoy this article, see the other most popular articles




















If you enjoy this article, see the other most popular articles

My final post regarding the flaws of Docker / Kubernetes and their eco-system

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

Summary: As a preface, you might want to read High Availability is not compatible with an MVP, because an MVP is about fast iteration. Perfectionism is dangerous for a business. If you are head-of-tech, you need to balance a whole menu of concerns that you are responsible for. Over the last 20 years the tech world has produced increasingly sophisticated devops tools. In an ideal world, we want easy development, deployment, consistency, networking, security, isolation and infinite scalability. But each of these cost money and time, so each needs to be balanced with the money and time that your company has to achieve its goals. Though Docker and containers and Kubernetes have received an intense amount of investment, intellectual brilliance, and discussion, they constitute one of the most complex ways of meeting the devops concerns of your business. Complexity tends to lead to more complexity leading to yet more costs. Very few businesses actually need the fine-grained control of resources that containers with Kubernetes makes possible, and simpler approaches are only slightly more coarse-grained. Over the last 20 years there have been 4 main approaches to devops, all of them are still valid, and all of them should be evaluated to see if they fit the circumstances of your company.

.

At places like Hacker News and Lobste.rs and Reddit there were some good conversations about my previous essays regarding “Docker”, a word I mostly used as a synonym for all containers, a practice which made sense in 2017, but which has not aged well. Non-Docker container technologies are much more of a thing now than they were two years ago. However, for this essay, one more time, I’m going to be using “Docker” as a synonym of for all containers.

If you were to read all 3 previous essays, plus all the comments on Hacker News and Lobste.rs and Reddit (I list these at the end), you’d be reading more than 30,000 words, which is halfway to a book. I’m not saying that’s a bad thing; the feedback helped me clarify my thoughts. To everyone who commented, you have my sincere thanks.

.

14 years ago, Puppet seemed like a brilliant breakthrough. That was followed by interesting ideas in Chef and Ansible, and more recently Terraform and Packer. There are a great many tools that helped with specific problems, such as monitoring (collectd), starts/restarts (Supervisord), visualizations (Grafana, Kibana), aggregation (ELK stack, Riemann), service discovery (etcd, Consul, Zookeeper), and endless 3rd parties jumping in with paid services (New Relic, DataDog, Reliaquest).

With a surprising number of Docker advocates, they advocate for Docker/Kubernetes in a vacuum, as if the question was “Do you want devops, or not?” when the correct question is “Of this set of dozens of devop technologies, which subset is the right one for my company?” I’m often subjected to a three part argument that’s missing its middle part:

1.) Do you want easy development, deployment, consistency, networking, security, isolation and infinite scalability?

2.) ???

3.) In conclusion, Docker!!!!!!

Some of the arguments remind me of the fight regarding Object Oriented Programming, circa 1992, when OOP was still the “believable promise” of the future, rather than a disappointing present reality.

Advocate: Soon all of our code will be objects that combine algebraic data types with behavior thus allowing us to translate reality directly into code.

Critic: But how do you persist the data?

Advocate: In a relational database.

Critic: But won’t you run into the object relational impedence mismatch?

Advocate: Ah, no worries! We already thought of that. Soon we will get rid of SQL and stupid relational databases and replace them with pure object databases. And then there will be no impedence between the Object Oriented code and the object database!

This last bit, the dream of a seamless flow between the OOP code and the OOP database, is what gave us Zope, in the world of Python, and the original disaster of Struts/EJBs, in the world of Java, plus several object databases that were loudly hyped during the 1990s but which eventually vanished, because they never solved the problem they promised to solve. Yet billions were invested in this dream. The mood of the times could best be summarized as:

Whenever we discover problems with objects, the answer must be to go deeper into the world of objects!

The technology world exhausted itself in this pursuit. To his credit, David Heinemer Hanson played a role in ending that era. When he was 25 years old he released Ruby On Rails. The attitude of the framework was something like “Let’s come up with something practical, based around ideas that are known to work.” Even though Ruby is an Object Oriented language, the Rails framework pragmatically assumed it was natural to talk to a SQL database, and the simplicity of that assumption helped end the era of “Objects For Everything”. Bruce Eckel caught the changing spirit of the times in his essay “The departure of the hyper-enthusiasts” and there is a spirit here that applies to any conversation about Docker/Kubernetes:

Rails brings up a deeper issue. Apparently, something “relatively simple that you only do once,” such as setting up the database by writing SQL, really does benefit from automation, possibly because you actually end up doing these things more than once. Or possibly because simplicity of both expression and of understanding really is important. There is a faction among us that seems to feel that if you can do anything at all, it doesn’t matter how many hoops you must jump through to accomplish that thing. These are the folks that assert that Java’s verbosity is “just finger typing that Eclipse/IntelliJ will do for me,” and it doesn’t matter if the resulting code has 20 times the visual bulk of a simpler approach. One of the basic tenets of the Python language has been that code should be simple and clear to express and to read… But for someone who has invested Herculean effort to use EJBs just to baby-sit a database, Rails must seem like the essence of simplicity. The understandable reaction for such a person is that everything they did in Java was a waste of time, and that Ruby is the one true path.

When it comes to containers, the hyper-enthusiasts have definitely not departed, though I suspect some day they will. For now, the spirit of the times can be summarized as:

Whenever we discover problems with containers, the answer must be to go deeper into the world of containers!

And yet the tech world made an important breakthrough when it gave up on “Objects For Everything” and I suspect it will make an important breakthrough when it gives up on “Containers For Everything”. I’ll come back to this at the end of this essay.

Some of the conversations I’ve recently had about containers remind me of the older conversations about objects; especially the circular reasoning, blindspots, and conclusions which must be motivated by something that’s been left unsaid. This is what it often sounds like:

Advocate: Docker really helps with development.

Critic: More than a standard Virtual Machine (VM), the kind of thing I could run on my Mac in VirtualBox?

Advocate: Are you joking? Docker is very lightweight compared to a standard VM. You can run a hundred Docker containers on your laptop. Can you do that with a standard VM?

Critic: No, of course not.

Advocate: See? Docker allows you to run a hundred apps in a lightweight manner!

Critic: But it forces me to run a lot of apps, doesn’t it? I don’t have a choice? I mean, if every app is in a separate container?

Advocate: That is a ridiculous myth! You don’t have to put each app in a separate container! You could put a hundred apps in a single container!

Critic: Oh, I see. Do you put a hundred apps in a single container?

Advocate: No, I put each app in a separate container.

Critic: Oh…

Advocate: But it’s easy to orchestrate all of the different apps! We have better and better tools all the time to make it easy to orchestrate a hundred apps in a hundred containers!

Critic: But if I actually needed to run a hundred apps, I could write a simple bash configuration script to orchestrate them. Isn’t Docker just giving me the same thing, but making it more complicated?

Advocate: What if an app fails to start? Are you going to add restart logic to your bash script? After awhile, that becomes a very complicated script.

Critic: Well, I’d only have to write the configuration script once.

Advocate: Do you realize how stupid you sound? The words “I’d only have to write the configuration once” are among the most notorious words that a software developer can say. Because the reality is you end up changing your configuration endlessly, as your project grows and changes. You really end up writing it a million times. Anyway, no one writes bash anymore. This isn’t 1995. Using bash is an incredibly stupid idea.

Critic: Sure, that’s true, but you could use Ansible or Chef or even something older, like Puppet.

Advocate: None of those are real orchestration technologies. They don’t make it easy to get a hundred apps to talk to each other.

Critic: Okay, how does Docker help you get a hundred apps to talk to each other?

Advocate: The very premise of your question is ridiculous. Docker isn’t about orchestration! We use Kubernetes for that! And setting up a Kubernetes cluster is easy!

Critic: Oh. So what is Docker for?

Advocate: It’s actually a shipping container!

Critic: Oh? It allows you to deploy code?

Advocate: Exactly! It’s the easiest way to deploy the app once you’ve built it!

Critic: But wouldn’t it be easier for me to create a binary with all dependencies included, and deploy that? Like an uberjar on the JVM, or a Go binary, or something like that?

Advocate: Ha! You know nothing about the modern tech world! How can you standardize your deployments when you’re using different build systems?

Critic: So Docker frees me from building my code?

Advocate: No, you still need to build your code. But do you have a standardized way to build different apps with different technologies? How is your app going to get its environment variables?

Critic: In the old days I would just set the environment variables on my machines, and that is still a valid answer in simple setups. But assuming we’re in the cloud, can’t I just inject the environment variables at build time? Jenkins has several plugins that make this painless.

Advocate: Ha! You know nothing about the modern tech world! Nobody uses Jenkins anymore!

Critic: Why is that?

Advocate: Because we’re all using containers!

Critic: Why is that?

Advocate: Without containers, how is your app going to get it’s environment variables?

Critic: Um, wait, really? We just…

Advocate: And what about databases? What about the failure modes of your persistence layer?

Critic: Well, at the application level, I’d use a library that has automatic retries, and at the system level, I’d presumably have some health checks in place, that can force a restart when necessary.

Advocate: So complicated! Think about how many technologies you just mentioned! That’s totally insane! You need to simplify your system!

Critic: Oh, okay, well, how would you do it?

Advocate: Easy! First I Dockerize my app and create the Dockerfile that details what sort of operating system I’m expecting in the container, and what software, if any, I expect to see in that container, then, for security, I create a private image repository and verify the integrity of everything in it, or I use a 3rd party repo, after I verify the integrity of its images.

Critic: Oh, I see, that’s for the software libraries, right? Like if I’m working in Java, I might have a private Maven repo?

Advocate: No, if you’re working in Java and need private packages, you still need to set up a local Maven repo. Setting up a private Docker image repository would be in addition to that.

Critic: Oh!

Advocate: Stop interrupting! The beautiful thing about creating your app container with Docker is that every command that you run becomes its own unique layer in the container, and you can run the command “docker history” to see exactly how your image was created. If you run a command such as ^ RUN mkdir -p /usr/local/bin/ ^ that becomes its own layer! And then if run ^ RUN yum install emacs ^ that becomes its own layer! And you can run the “docker diff” command to see the difference between each image, so if you save each layer as its own image, you can see the difference between every layer! This gives you fine-grained controlled over the way each layer makes up your app image!

Critic: That sounds fascinating! And how have people used this power?

Advocate: Oh, God only knows. I make a change and then recreate the whole image. I guess it helps with some cache stuff, in some situations? But listen, someone, somewhere, is probably doing very cool and amazing and mind-blowing things with this power!

Critic: Um, could you give an examp…

Advocate: VERY COOL THINGS!!!!!

Critic: Um…

Advocate: Stop interrupting! Locally, I can now run the “docker build” command to create my image.

Critic: Image?

Advocate: Yes, as the documentation says “a read-only templates from which Docker containers are launched.”

Critic: How long do the builds take?

Advocate: For small apps, just a few seconds.

Critic: And large, complex apps?

Advocate: A bit longer.

Critic: And I rebuild after each change?

Advocate: That’s how most developers have done it in the past.

Critic: It sounds like the old cycle of write, compile, run, change, compile, run, change, compile, run, change, and so on. Like Java programming, 15 years ago, before we had ways of doing hot reloading.

Advocate: Don’t worry, layer caching is getting so good, soon it’ll be just like hot reloading. For some platforms and some languages you can already do it! Almost.

Critic: Oh, good. How soon will that be reliable and consistent?

Advocate: You’re getting sidetracked by trivialities.

Critic: My productivity as a developer is a triviality?

Advocate: Would you like a system that can be scaled to infinity?

Critic: Well, sure, but I have to balance a menu of concerns, so…

Advocate: If you want to live in the future, some sacrifices will be necessary.

Critic: Are you saying that my productivity as a developer should be sacrificed so…

Advocate: DO YOU WANT A SYSTEM THAT CAN SCALE TO INFINITY?

Critic: Well, I’m not opposed…

Advocate: Good! So stop interrupting. Now, as I was saying, I can use the “docker run” command to spin up the image, then I can use “docker ps” to see the IDs of the running containers, then I use “docker exec” to get an ssh session in the container and see what is going on.

Critic: Wouldn’t it be easier to just to run the app in my terminal?

Advocate: You’re embarrassing yourself! Who cares if it works on your machine? You call yourself a professional? All that matters is whether it runs on other machines, and that’s what Docker is giving you, true consistency across platforms, true certainty that what is running on your machine will also run in the cloud!

Critic: But I could use a terminal inside of a standard VM, right? And then deploy that to any cloud. An AMI can run on my Mac, in VirtualBox, and the same AMI could also run on AWS.

Advocate: Are you sure you’re intelligent? Because sometimes you seem pretty slow. We already covered why you can’t use a standard VM!

Critic: And why was that again?

Advocate: Because it isn’t Docker!

Critic: Oh…

Advocate: Try to keep up. Now, I configure my CI to build the container image. This is pulling from my secure private Docker image hub, or some 3rd party service where I’ve verified the integrity of the images. These can be pulled in and built and stored in my image repository.

Critic: How would I handle a situation where my app has heavy-weight and very specific dependencies that need to be locally cached for fast building?

Advocate: Oh, I don’t know, maybe write a bash script to force a local cache refresh?

Critic: I thought you said nobody was using bash anymore? That using bash was “an incredibly stupid idea”?

Advocate: Everything is smart once you use Docker! Bad ideas become good again!

Critic: Um…

Advocate: And that’s it! Basically, on the app side, I’ve covered everything. It really is that simple. Set your CI to deploy from your repo. You spin up your containers and the app runs. Of course, you also need to setup the database.

Critic: But don’t you actually have to map every port in the container to a port in the outside world, and then those external ports have to be dynamically remapped to other ports depending on what apps are actually running? Isn’t auto-discovery a lot to configure?

Advocate: No, for God’s sake, you are so stuck back in 2016. Maybe once upon a time you had to be careful to enable autodiscovery on your Kubernetes daemonset and also be careful to attach the proper annotations to your pods, but that was a long, long time ago. I haven’t done it in weeks. Nowadays we just use Helm and the correct Helm Chart. Helm is the package management system for Kubernetes.

Critic: I can see using a default Chart for a default setup, but isn’t it true that most developers will have to tweak the setup to their actual needs? For instance, most companies will have some specific logging needs? Or if I’m using the PostGres database, what if I want to add in pgBadger to generate reports?

Advocate: Yes, that can get complicated, but it’s fine because, really, you only have to write the configuration once.

Critic: But you just…

Advocate: So long as you use the right configuration file, you’ll be fine. Be sure you don’t touch stuff like postgresqlConfiguration because you might end up wiping out some of the values you need. But, really, it’s very easy. Do you understand what a nightmare it can be to try to achieve high availability of databases without Docker and Kubernetes?

Critic: I typically setup a system that relies on etcd, with some kind of “I’m alive” heartbeat check against etcd.

Advocate: That’s a complete nightmare! Think about how much work it is to build a system like that!

Critic: But Kubernetes is built around etcd, isn’t it? All the data about the cloud, everything that Kubernetes is supposed to setup or maintain, all of the state of the current system, that has to live in etcd, doesn’t it?

Advocate: Yes, but we no longer have to setup everything manually! We have tools to automate the work!

Critic: Even if Helm and Helm Charts makes it easy to install a set of apps into a Kubernetes cluster, surely some actions are more complicated than a simple install? What about upgrading a PostGres database?

Advocate: Way ahead of you! We worked out that problem a long time ago! You just use a Helm Operator!

Critic: Really? And this solves all the problems of upgrading PostGres in a stable and reliable way?

Advocate: Uh, well, it’s supposed to.

Critic: Supposed to?

Advocate: Sure, so long as you have a working Go environment, you just use the Operator SDK to generate the scaffold for your Operator.

Critic: The scaffold?

Advocate: Sure, the scaffold sets up the basics, figures out the permissions, the dependencies, everything needed to install the Helm Chart. Then you can build the Operator container, and install it in your Kubernetes cluster.

Critic: This sounds complicated.

Advocate: Way ahead of you! The good folks at RedHat knew people like you were going to whine about stuff, since you obviously like to whine about stuff, so they created the Operator Lifecycle Manager to make all of this a lot easier.

Critic: Doesn’t it seem like we keep piling new technology on top of new technology, to manage the excessive complications of the previous layer of technologies?

Advocate: Hey, think of the alternatives. You don’t want to go back to the bad old days of the past, do you?

Critic: You mean, the bad old days when stuff mostly worked and I didn’t have to learn 3 new alpha technologies each day?

Advocate: That’s a ridiculous exaggeration! Some of these technologies are beta.

Critic: And you seriously regard these piles of code, heaped upon piles of code, as an improvement on the old situation?

Advocate: Are you kidding? It’s like night and day. I’d rather drink arsenic than get dragged back to the bad old days when I had to write Ansible scripts. We live in the future now. We’ve escaped the old world where every attempt at devops became a painful, confusing, unmaintainable disaster after 2 years.

Critic: How long have you been using Docker/Kubernetes in production?

Advocate: 18 months.

Critic: Did Docker/Kubernetes solve all of the problems you were having with Ansible?

Advocate: If you want dive deep into the weeds then you’re going to have to carefully define your technical terms.

Critic: Which technical terms?

Advocate: Well, uh, define the word “all”.

Critic: I’ll re-phrase. Has Docker/Kubernetes solved your top 3 Ansible problems?

Advocate: Yes!

Critic: And it did this without introducing any new problems?

Advocate: Define “new”.

Critic: Did you actually get rid of Ansible, or are you still using it?

Advocate: Define “still”.

Critic: So you’re just adding more technologies on top of the old technologies?

Advocate: Define “on top”.

Critic: How long has your Docker/Kubernetes setup been stable?

Advocate: Define “stable”.

Critic: Doesn’t the complexity get insanely out of control?

Advocate: Way ahead of you! Have I told you about Rancher? It’s a complete platform for managing Kubernetes because, let’s face it, sometimes Kubernetes is a nightmare.

Critic: But you said that setting up Kubernetes clusters was easy!

Advocate: Define “easy”.

Critic: You’re the one who used “easy” so maybe you should define it!

Advocate: Uh, “People on Reddit seem to like it.”

Critic: And once I’ve worked with the Helm Operator SDK and generated the scaffold, don’t I still need to write out some operational knowledge of how to upgrade my specific instance of PostGres? After all, every version of PostGres has some unique concerns when doing an upgrade.

Advocate: Sure, we can’t automate everything. In the end, you have to write a few details down.

Critic: But how is this an improvement over the bad old days when I wrote a bash script to upgrade my instance of PostGres?

Advocate: Infinite scaling! Your old bash script was ad-hoc and error prone and could only be used once! We have real automation now! Look, I get that Docker and Kubernetes and Helm and Helm Operators might seem like a little bit of setup work…

Critic: A little!

Advocate: …but once you’ve got it all setup, then you’ve got a system that scales to infinity! Don’t you want infinity?

Critic: Infinity is nice, but I’ve actually got a menu of concerns I am responsible for, and infinite scaling is just one of them.

Advocate: Why are you being so conservative with your technology choices?

Critic: What?

Advocate: You’re stuck in the past! You refuse to learn new things! You’re an example of an “Expert Beginner”! You think you know things, but all of your knowledge is out of date! You haven’t kept up with the times!

Critic: Well, I just learned Terraform and Packer.

Advocate: Never heard of them. Do they help with Docker?

Critic: They could, but actually, they sort of make Docker unnecessary. It’s fascinating because with a tiny Terraform script you can…

Advocate: You’re doing it again!

Critic: What?

Advocate: Refusing to learn new things!

Critic: But I was just telling you about Terraform. It’s really interesting because you can use it to…

Advocate: [ putting hands over ears and shouting loudly ] BLAH BLAH BLAH I CAN NOT HEAR YOU BLAH BLAH BLAH YOUR WORDS CAN NOT HURT ME NOW!

There are many problems that will probably never be solved inside the paradigm established by containers and Kubernetes, but it is absolutely impressive how much money and intellectual brilliance is being poured into the effort. And yet, is there really a way to automate something as complex as upgrading a database, or reattaching a database’s persistent volumes to a database master that is running in a stateless pod? Consider the effort that is being made to try to solve these problems:

The CoreOS team (now part of RedHat) developed the concept of Kubernetes Operators. An Operator implements common operational tasks in code. These are run either manually when an API is invoked, or automatically when required or on a schedule. Such tasks could be “back up database” or “create a new read replica”. As such, Operators can reduce the administrative burden even for complex systems.

However, as we all know, automating the relatively easy tasks is easy. It is much harder when the tasks are more difficult. Adding a read replica may be easy, but fixing a database’s broken write-ahead-log file that was corrupted by a failing file system is not. Therefore, the engineering effort that goes into Operators is considerable. The etcd Operator is one of the most mature ones, and it currently has about 9,000 lines of code. And counting.

Sadly, it is unlikely that any Kubernetes Operator can cover all operational aspects of even a single complex stateful data store. They definitely make certain tasks easier. But if they could cover all the error cases and recover automatically, why would that functionality not already be in the code of the stateful data store to begin with?

Also, Jessie Frazelle asks us to consider that this ticket has been open for a year:

Jessie Frazelle has an excellent post with more details:

Kubernetes is not to be used for stateful data. There has been a lot of work done in this area but it is still not sufficent. For the more technical members of our audience I direct you to exhibit A. The linked issue goes over problems when a “StatefulSet” gets into an error during deploying or upgrading. This can lead to data loss or corruption since Kubernetes will need manual intervention to fix the state of the deployment. This could even lead to the point where the only recommended fix is you delete the state. What does this mean for your business? Well, if you lose or corrupt your data it could mean a lot of different things depending on what the data was. If the data was your customer database of new account signups, well you might have just lost the data for your new customers. If you are an ecommerce site, it might have been your latest sale. If you are in banking or investments, it might have been data accounting for the movement of capital.

Is this the end of the era of the inexpensive-to-launch software startup?

The following advice (from the Elastisys article) is correct for any one specific business, but for the overall world of startups this situation leaves leaves me feeling sad about where the tech industry has got itself:

You should ask yourself this. Is what makes your business unique your ability to manage databases (or other stateful data stores)? No? Then get a hosted database service from your cloud provider. Spend your time and effort on what makes your business unique instead. And on the off chance that you answered “Yes!”, then you should go out and find everybody out there who answered “No”. Because there are many out there!

I’ve been hearing this more and more: use hosted services, because the new devops situation is too complex for mere mortals; only a handful of experts really understand it.

My concern is this: we just enjoyed a roughly 25 year stretch, maybe 1990 to 2015, when the economics of starting a business favored software startups: cheaper computers, cheaper network bandwidth, open source software; it all combined to create a world where a small handful of people could come together, start a company, and do amazing things. And the magic ingredient was “It is really cheap to get started.” But nowadays, more and more, I’m hearing, “All this devops stuff is so damn crazy complicated, you probably can’t figure it out, so you should probably just use a hosted solution.” And that raises costs. I worry that we are slowly going back to the world that existed before 1990, when “software” meant “expensive”. If we are not careful, this beautiful era of software startups will be suffocated by the complexity we are needlessly inflicting on ourselves.

When I say this, some people respond, “These cloud services are ridiculously cheap and they actually help lower costs.” I’ll respond to that in a moment.

(Please note, I’m not arguing against all managed services here. I’m only arguing against making standard devops tools so complicated that we poor mortals have no choice but to use managed services.)

You are not Google

I said “the complexity we are needlessly inflicting on ourselves”. I mean “needless” in the sense that Oz Nova meant when he wrote “You Are Not Google“:

Software engineers go crazy for the most ridiculous things. We like to think that we’re hyper-rational, but when we have to choose a technology, we end up in a kind of frenzy — bouncing from one person’s Hacker News comment to another’s blog post until, in a stupor, we float helplessly toward the brightest light and lay prone in front of it, oblivious to what we were looking for in the first place. This is not how rational people make decisions, but it is how software engineers decide to use MapReduce.

As Joe Hellerstein sideranted to his undergrad databases class (54 min in):
The thing is there’s like 5 companies in the world that run jobs that big. For everybody else… you’re doing all this I/O for fault tolerance that you didn’t really need. People got kinda Google mania in the 2000s: “we’ll do everything the way Google does because we also run the world’s largest internet data service” [tilts head sideways and waits for laughter]

In response to one of my previous essays alter3d criticized my ideas with this comment:

This guy needs to tell Google that 100% of their infrastructure is wrong.

That’s a valid criticism, if you are overseeing infrastructure at Google. If you run devops at Google, please ignore my essays, I have not written anything that is relevant to you.

But are you running devops at Google?

Let’s talk about the word “agile”. It has meant different things in different eras. I’ve been writing software for 20 years, and I’ve been building software startups for 17 years. Almost everything we did in 2002 would be considered wildly unprofessional by today’s standards, and some of it was considered unprofessional by the standards of 2002. But it did allow us to iterate fast.

We did not start using any version control till 2005, and then it was Subversion. (I didn’t start using Git till 2012). In 2002, we were working on a blog engine written in PHP. (Weblogs were still a new idea then, and around that time Typepad raised $23 million to build out their weblog service.) We had two big web servers that we rented from Hostway, one for serving our frontend, and one for the database. Each server was $100 a month. There was no failover for the database. The backups for the database were saved to a folder which I had to remember to download to my computer every day or two, or three. Deployment meant we edited a PHP file, or an HTML file, and we uploaded it with FTP to our frontend server. We deployed 50 times a day, sometimes 100 times in a day. We definitely tested in production, but maybe not in a cool, sophisticated way. We had a ridiculous amount of fun, brainstorming ideas and pushing them out at a fast pace. We were extremely agile, under the only definition of agile which should matter to a small startup, which is fast iteration of the basic idea. When weblogs didn’t work out for us, we pivoted to ecommerce software, and there we had our main success. Being able to do fast pivots is the life-or-death question for small startups. When I think of that era, I am embarrassed about a great deal, yet our system had some attributes that I would still be willing to copy for a small startup:

1. Our hosting costs were ridiculously cheap. During the first year we did not spend more than $200 a month on servers.

2. Our devops setup was so simple that the non-technical staff understood it perfectly. Our graphic designer could completely redo our user interface, and deploy the new version, without needing any help from a computer programmer.

3. We were extremely agile. If a customer sent us a suggestion for a feature, we could design it, code it and deploy it in a day. We could experiment with different versions of code at different vhosts in Apache.

The extreme simplicity of the devops situation meant that we could focus on the other aspects of product/market fit. (My understanding of startup development is that Steve Blank will be angry with you if you focus on operational optimizations before you have found product/market fit.)

There are other definitions of “Agile” that are valid, but be aware what you might lose if you adopt them. Consider this definition, being pushed by Microsoft’s Azure service:

Achieve agility at scale with Kubernetes and DevOps

As containers, environments, and the teams that work with them multiply, release frequency can increase—along with developmental and operational complexity. Move quickly at scale with enhanced security by employing DevOps in Kubernetes environments, you can move quickly at scale with enhanced security.

Under this definition, you have to spend a lot of money to get back to the release frequency that we innocently enjoyed in 2002. If you actually need the reliability offered by these services, then of course you should investigate them to see if they answer your company’s needs. The most important phrase in the Microsoft text is “at scale”. Be sure your need for scale is real before you start spending money on this option. As I mentioned in High Availability is not compatible with an MVP, because an MVP is about fast iteration every one of the CEOs I’ve worked with recently have insisted they need High Availability right from the start. This is perfectionism, and perfectionism is dangerous for a business.

By the way, everywhere in this essay that I use the word “Kubernetes” I’m sure someone will suggest a hosted service that is supposed to remove all the pain of dealing with Kubernetes. Jessie Frazelle wants to remind us that even when you do use a hosted service, you are not avoiding all of the pain:

Now you are probably thinking, “my cloud provider said they’d take away all the pain you just described by selling me their managed Kubernetes.” That is indeed the dream. However, it is not reality. Having worked for some cloud providers, I have seen the pain customers still go through trying to learn the patterns Kubernetes implements and applying those patterns to their existing applications. This means your teams will still have to handle the steep learning curve. Just because it’s managed does not mean that your application’s uptime and availability are covered. That is still on your team.

(Slightly off topic, but I previously wrote how surprised I was, when I decided to use the managed ElasticSearch service that AWS offers, that it did not auto-scale the memory use, so I started getting OutOfMemory errors. Seriously? I have to do that manually? What is the advantage of using a hosted service?)

Cloud services help you save money!

Not exactly. Azure and Google Cloud and AWS can save you some money in certain situations, but typically you have to re-invent your architecture to take advantage of what cloud services offer. You get nothing but pain if you take your co-location data center setup and move it directly, unmodified, to the cloud. In High Availability is not compatible with an MVP, because an MVP is about fast iteration I mention my friends who were paying $7,000 a month at their co-location center, moved without changes to AWS, and ended up paying $35,000 a month. If you want to take advantage of cloud services, you will have to change your architecture, and re-inventing your architecture is itself a cost that should be included with any cost comparison.

If you offer a service that is easily decomposed into discrete compute units, and if usage is very uneven, with big spikes and long lulls, then something like AWS Lambda can save you money. But keep in mind it will be slower than what you can achieve on your own, and if your usage pattern is steady, instead of lumpy, then Lambda might in fact be more expensive than simply running some servers 24/7.

There are many issues to consider. This exchange from Hacker News brings up the kinds of arguments that developers are still having regarding AWS Lambda:

abiro wrote:

PSA: porting an existing application one-to-one to serverless almost never goes as expected. Couple of points that stand out from the article:

1. Don’t use .NET, it has terrible startup time. Lambda is all about zero-cost horizontal scaling, but that doesn’t work if your runtime takes 100 ms+ to initialize. The only valid options for performance sensitive functions are JS, Python and Go.

2. Use managed services whenever possible. You should never handle a login event in Lambda, there is Cognito for that.

3. Think in events instead of REST actions. Think about which events have to hit your API, what can be directly processed by managed services or handled by you at the edge. Eg. never upload an image through a Lamdba function, instead upload it directly to S3 via a signed URL and then have S3 emit a change event to trigger downstream processing.

4. Use GraphQL to pool API requests from the front end.

.

foxtr0t wrote:

So, to summarize, you should:

1. not use the programming language that works best for your problem, but the programming language that works best with your orchestration system

2. lock yourself into managed services wherever possible

3. choose your api design style based on your orchestration system instead of your application.

4. Use a specific frontend rpc library because why not.

Four different approaches to devops

Docker/Kubernetes offers you unrivaled fine-grained control of resources. Do you need that? There are older approaches that might work for you. I’ll here list the 4 main styles I’ve seen over the last 20 years:

1.) Slap some code on the server.

2.) Bare metal servers, with enough redundancy to easily support failover frontends and failover databases.

3.) Virtual Machines, Vagrant, Heroku.

4.) The true cloud optimized technologies: Terraform/Packer and Docker/Kubernetes.

Of these 4 approaches, which is best? The whole point of this essay is that there is no best. All 4 of these approaches are still in use, and all 4 approaches are valid.

(And obviously I’ve simplified things. I’m leaving out the many variations of CI/CD setups, version control work flows, and monitoring/observability tools. I can not cover every variation without writing a book.)

Each of these approaches has both strengths and weaknesses, which I’ll detail next.

Slap some code on a server

This is what we did in 2002. It really worked fine for us. Our customers understood our website occasionally had glitches. If you follow this strategy, you might get a reputation for being a bit unprofessional, but your devops costs will be rock bottom, and you can pass those costs savings along to your customers, or put them in your own pocket.

You might think this approach is hopelessly obsolete but I believe there are many advertising networks that still work this way — they often seem extremely glitchy, and website consumers nowadays use so many ad-blockers, it would almost be stupid for an ad network to offer an SLA contract to advertisers. What is the point of offering 99.999999% reliability when 50% of ads are blocked by some other filter, somewhere else on the network, or in the browser?

Also, I know of many graphic designers who knock up cheap WordPress sites for clients, and this is exactly their devops setup: they FTP code to the production server (which is the only server). If a company is paying $500 for a complete website, this is exactly the level of setup they should expect. Some companies compete on the basis of rapid iteration of marketing ideas; all they need is fast micro-sites for their next campaign. If cost really is more important than quality, for whatever market the company is in, then they are making a rational decision.

Bare metal servers, with redundancy

In 2011, when I worked at Shermans Travel, they were renting 24 machines on a long-term lease. They had two frontend web servers with another 2 for failover, 3 small test frontend machines, a big database machine with a failover, 3 small machines for test databases for development, plus a few other machines for other things, such as email.

We wrote code on our own machines but we didn’t run databases on our machines, instead we used the various remote test databases servers. The tech team consisted of 6 programmers and 1 devops person and 1 project manager. The devops person ensured that we always had some remote test database we could develop against, and that it had a copy of the current data, so we could see real world effects, such as any slowness that might appear if we wrote an 11 table JOIN against tables with tens of millions of rows.

Our project manager was also our QA team, and she knew how to deploy the code from Subversion to various test servers, where she would then kick it around and tell us if she found any bugs. Deployment was handled with Capistrano scripts.

The system was simple, everyone understood it, everyone could work with it. It was a perfectly good system that allowed us to push out changes fast.

There are two problems with a system like this, which you need to think about before you imitate this style at your own company.

First of all, each developer had to setup the software on their own machines, and it often took a few days of effort to get all the software installed and running on local machines. So that is a tax that every new developer pays. How often should your company pay that tax? If you have one or two developers, and they stay with you for many years, then the tax is not worth worrying about. But if you have 300 software developers, that tax will be horrendous, so you need to adopt a different style of development.

Second of all, when the company was at its peak, with 4 million weekly users, the long-term lease machines were cheaper than AWS, however, the company eventually lost most of its audience, at which point the long-term leased machines represented excess capacity and was way too expensive when the audience fell to just 1 million weekly subscribers. When the lease ended the company moved to AWS, because when your audience is fading, it’s good to be able to cut costs quickly. I previously mentioned that co-location data centers can be surprisingly cheap, but if you are in a long-term lease, that can be a problem if you’ve committed to more resources than you actually need.

Virtual Machines, Vagrant, Heroku

I worked with some startups where the software was developed in Ruby On Rails, in a VM, run in Vagrant, and then deployed to Heroku. For as long as a startup can use Heroku, it probably should, because Heroku keeps the devops situation as simple as possible.

One place where I saw the transition was at TimeOut.com, in 2012. They had built their main CMS with PHP, using the Symfony framework. They were also building a new API using Scala. And they’d bought a company that had a ticket selling system in Ruby On Rails. And when I got there, every developer was setting up all 3 software systems on their own machines. The Scala and the Rails apps were easy enough, but the PHP system had dozens of dependencies that were not managed by Composer and it took me 2 weeks to get it running.

While I was there they came up with a VM running CentOS (the same as our production machines) and they put the PHP CMS on that. After I left I believe they added the other apps to the VM, so when a new developer was hired, all the developer had to do was download the VM and spin it up in something like VirtualBox, and viola, they had all 3 apps running. Certainly, it’s an option to consider for your company. This makes it very easy for a new developer to become productive, and they work in an environment that is identical to the production machines. You can setup a standard Virtual Machine that runs Linux or Windows, and that means you can stay with the environment that you have many years experience with, using tools that you’re familiar with.

There are two problems with this style of development.

One is that resource use is coarse-grained compared to Docker/Kubernetes. If you want to increase your capacity, you are spinning up a new VM (or in AWS, a new EC2 server).

Two, if you are a developer, running a VM on your machine can be a real pain. From the point of view of 2016, the interest in Docker was obvious, and justifiable, because everyone was sick of working with standard VMs via software like VirtualBox on the Mac. Consider these comments on Hacker News:

rhinoceraptor on July 29, 2016

I think by ‘production’, they mean ‘ready for general use on developer laptops’. No one in their right mind is deploying actual production software on Docker, on OS X/Windows.
I’ve been using it on my laptop daily for a month or two now, and it’s been great. Certainly much better than the old Virtualbox setup.

.

mherrmann on July 29, 2016

I’m still using VirtualBox. Could you elaborate why Docker is better?

.

numbsafari on July 29, 2016

Leaving containers vs VMs aside, Docker for Mac leverages a custom hypervisor rather than VirtualBox. My overall experience with it is that it is more performant (generally), plays better with the system clock and power management, and is otherwise less cumbersome than VirtualBox. They are just getting started, but getting rid of VirtualBox is the big winner for me.

The phrase “slippery slope” is used too easily and too often, but in this case it really applies: “A VM is sluggish on my machine, Docker is lightweight, let’s switch to that. Oh, but wait, does that lead to complications in production? Hmm, okay, so we’ll just use Docker in development, we won’t use it in production. But wait, aren’t we missing all of the real benefits, if we don’t use it in production? It’s too weird to develop a working container and then not use it. So yes, let’s use it in production, but wait, we need to orchestrate this, so let’s use Kubernetes, but wait, Kubernetes can be a pain, so let’s also use Rancher.”

I’ve seen too many startups take one tiny step down that road and a minute later they are asking “How do we re-attach a failover persistent volume that might contain some corrupted data?”

Also, keep in mind, perfectionism can hurt you. I’ve run into a lot of CTOs who seem to want a devops setup that is truly painless. Is that realistic? I am willing to believe that God has a flawless system of failovers for the databases where God tracks our sins and our virtues, but I don’t believe any such devops setup will ever be created by the hand of mortal flesh born of this fallen and hollow world. There will always be some pain; if you are CTO, you must decide what tradeoffs are best for your company. What Charity Majors said about debugging in development also applies to devops:

Could we have ironed out all the bugs before running it in prod? No. You can never, ever guarantee that you have ironed out all the bugs. We certainly could have spent a lot more time trying to increase our confidence that we had ironed out all possible bugs, but you quickly reach a point of fast-diminishing returns.

We’re a startup. Startups don’t tend to fail because they moved too fast. They tend to fail because they obsess over trivialities that don’t actually provide business value. It was important that we reach a reasonable level of confidence, handle errors, and have multiple levels of fail-safes (i.e., backups).

There are many things to consider when thinking about the devops setup for your company but “It should be painless” is probably not the right lens to use for this question.

The true cloud optimized technologies

When the cloud first emerged circa 2009, most companies treated the cloud servers as if they were standard servers in a standard data center. Though new projects like Docker were launched, cloud native technologies did not grab the attention of the mainstream of the tech industry till somewhere around 2014 or 2015.

You can use Packer to create a VM that you can use on your own machine and also run in the cloud, and you can use Terraform to script how many servers you want to spin up, what security groups you want, what permissions they will have have, what databases you want to run, what backups you want to have — just about everything in your devops setup can be scripted with Terraform. Terraform/Packer allow you to stick with the pattern of development I mentioned in #3, it just adds a layer of automation that makes everything easier, and allows you to take full advantage of the cloud.

When you want to scale up, using a full VM is more coarse-grained that using Docker/Kubernetes, so in some cases you might waste resources, and in some cases you might waste money. With Docker/Kubernetes, if you have 100 microservices running, and your business expands, you can increase one microservice by 3%, another by 8%, another by 2%, another by 342%, another by 17%, and so on. Nothing can match the fine-grained control that Docker/Kubernetes gives you, which is probably why Google likes it so much. It’s a good choice for some companies, just please be sure you’ve thought about it carefully before jumping on that bandwagon.

Sometimes professionalism is bad, sometimes sloppiness is good, sometimes less is more

It’s becoming common that when I say “We don’t need to go with Docker/Kubernetes yet” someone else says “All professional shops are committed to it, if we don’t do this, then we are unprofessional.”

This continues the frustrating pattern, which has recurred many times in the tech industry, where a given technology is seen as the “believable promise” that is going to solve all the problems of the tech industry, and so it is then elevated to a status where it becomes untouchable, unimpeachable, unquestionable. I would really like to see the tech industry avoid making this mistake again.

“All professional shops are committed to it, if we don’t do this, then we are unprofessional” was applied to different technologies during different years:

1.) Compiled languages (as opposed to script languages) for all of the 90s

2.) Microsoft’s stack (as opposed to open source software) for all of the 90s

3.) Java EJBs/Struts, circa 2001

4.) XML (It’s a configuration language! It’s an RDF serialization! Strict XHTML is the future of HTML!) circa 2004

In every case, the tech industry was over-committed to a technology that eventually was seen to have some limits, and where alternatives existed that were eventually discovered to have unique benefits that could not be imitated by the technology that everyone had committed to.

Arguing “We must do this because this is professional” is an a variation of “argument from authority”. You’re not explaining the inherent goodness of the technology, but rather, you’re basically saying we should use it because everyone else is using it.

But in truth, sometimes a bit of sloppiness has costs savings that will be appropriate for your business. In the same way that sometimes a money manager might go long on soybeans and see that daring bet pay off, sometimes a company can run up a large tech debt, carry it for a long time, and have that daring bet pay off. Indeed, this is exactly what happens in real life, when tech debt is not an accident, but part of a deliberate plan. And, as any money manager will tell you, it is impossible to mitigate all risks. In a theoretically pure market there is no profit because perfect competition drives out all the profit. In the real world, the winner, with the biggest profit, is the company that finds the best way to arbitrage the risk.

Charity Majors said it well:

Organizations will differ in their appetite for risk. And even within an organization, there may be a wide range of tolerances for risk. Tolerance tends to be lowest and paranoia highest the closer you get to laying bits down on disk, especially user data or billing data. Tolerance tends to be higher toward the developer tools side or with offline or stateless services, where mistakes are less user-visible or permanent. Many engineers, if you ask them, will declare their absolute rejection of all risk. They passionately believe that any error is one too many. Yet these engineers somehow manage to leave the house each morning and sometimes even drive cars. (The horror!) Risk pervades everything that we do.

I might regret suggesting sloppiness can be good, since I realize this part of the essay is easily misread. The point is subtle. On the one hand, it is good to be realistic about the reasons why smart managers sometimes decide to allow tech debt. On the other hand, most tech debt arises accidentally, and it has a terrible effect on the morale of the software development team. Please don’t think this essay is making an argument for tech debt. For the most part, I’m arguing for the right level of simplicity, which can be understood as minimizing the number of moving parts that your team needs to worry about.

As CTO, you have an obligation to manage the risks of tech debt

Tech debt, if it is allowed to accumulate, will have a negative effect on the happiness of your software developers. In response to my previous essays, people made the point that some companies have a chaotic mix of different software packages and versions, and that Docker is the best way to manage the chaos. jstoja made this point on Lobste.rs:

jstoja
How do you manage easily 3 different versions of PHP with 3 different version of MariaDB? I mean, this is something that Docker solves VERY easily.

.

friendlysock
Maybe if your team requires 3 versions of a database and language runtime they’ve goofed…

.

jstoja
It’s always amusing to have answers pointing to the legacy and saying “it shouldn’t exist”. I mean, yes it’s weird, annoying but it exists now and will exists later.

.

friendlysock
It doesn’t have to exist at all–like, literally, the cycles spent wrapping the mudballs in containers could be spent just…you know…cleaning up the mudballs.

.

jstoja
I see it more like, the application runs fine, the team that was working on it doesn’t exist anymore, instead of spending time to upgrade it (because I’m no java 6 developer), and I still want to benefit from bin packing, re-scheduling, …

That’s absolutely true, and from the point of view of the individual software developers, Docker seems like a miracle that solves a lot of problems. If you are CTO and you are overseeing a situation where “3 different versions of PHP with 3 different version of MariaDB” is normal, and you have no plan for reducing the tech debt, your individual software developers will come up with a plan of their own, and you might not like it.

I once suggested that Docker will eventually be the kind of tech debt that we now jokingly associate with legacy Java apps. Someone responded that once everything is Dockerized, a company doesn’t have to worry about tech debt any more. That statement is as valid as “Once we do the complete re-write, we won’t have to worry about tech debt anymore,” which has become an industry joke. There is no reason to believe that Docker will solve the problems of tech debt, but rather, Docker moves everything to a higher level. Docker easily solves the problem of “3 different versions of PHP with 3 different version of MariaDB” but only by introducing a whole host of new devops technologies.

If you are a money manager, a big part of your job is managing the risk from the leverage (debt) you’ve taken on. If you are a CTO, a big part of your job is managing the risk from the tech debt you’ve taken on. What friendlysock wrote is altogether correct: “The cycles spent wrapping the mudballs in containers could be spent just… you know… cleaning up the mudballs.” jstoja’s point about sunk costs would be valid if the costs were actually in the past, but if you have to invest new money to keep old software running (by Dockerizing them), then that old software is not a sunk cost, and so re-inventing the software might be justified, since you have already decided to spend new money on it. Only you can make that decision, just be sure you take into account the full costs of a container strategy, when you are at those crossroads.

When the tech industry gave up on Objects For Everything an important step forward was made

In my previous essay I wrote:

The tech industry considers itself open minded, but in fact it is full of movements which gather momentum, then shut down all competing conversations, for a few years, then recede, and then it becomes acceptable for all of us to poke fun at how excessive some of the arguments were. In 2000 the excesses were XML and Object Oriented Programming (OOP).

It was OOP then, it’s containers now, but let’s consider an interesting possibility for the future.

During the 1990s, as the mania built around Objects For Everything, there was a major focus on solving the problem of object relational impedence mismatch. Getting rows of data from a SQL database, and then transforming those rows into objects, seemed like a flaw in the system. SQL was a dust mote in the eye of God, an unholy ugliness that needed to be abolished — SQL was not object oriented, therefore it needed to be destroyed and replaced by… uh, what exactly?

In a big company, you might have many teams, some using Java, and others using C++. Since SQL is a universal database language, both teams could read and write to the database, so long as their code understood SQL. But that means, when getting rid of SQL and replacing it with pure object databases, some new problems had to be confronted. If you want to write a Java object to the database, you need to serialize it first. That’s easy, but if you want the same universal quality as SQL, you need to serialize the Java object in a way that the C++ code can easily read and write to it. And vice versa, the Java code needs to be able to read and write to the serialized C++ objects. And if you’re selling data to 3rd parties, you need a system of serialization that will support all object oriented languages, and can work in a self-describing manner so that code that knows nothing about your code can still figure out how to search and read your serialized objects.

This lead to the Web Service Specification, one of the great mistakes in the history of the tech industry. After the fever finally broke, and people gave up on the dream of Objects For Everything, a reaction set in. As James Lewis and Martin Fowler said:

…a reaction away from central standards that have reached a complexity that is, frankly, breathtaking. (Any time you need an ontology to manage your ontologies you know you are in deep trouble.)

Yes, the Web Service Specification drove Martin Fowler to complain of “a complexity that is, frankly, breathtaking” — and if it befuddles someone as great as Fowler, then how are us are mere mortals supposed to understand it?

But the interesting thing is what happened next. People gave up on Objects For Everything and re-thought the problem. Object databases failed because every language had to come up with an object serialization that could be understood by every other language. But what if a language had a reasonable “plain text literal” format that could be used as a kind of intermediate universal serialization language? What if we could describe a User object like this:

{
 "_id" : "f9323nvhg829384",
 "name" : "Lawrence Krubner",
 "street" : "254 W 98th St",
 "apartment" : "6A",
 "city" : "New York",
 "state" : "NY",
 "phone" : "434 825 7694"
}

In other words, what if we all used JSON for serialization, and therefore, when in doubt, we could fall back to Javascript rules regarding reading and writing and querying? (Huge credit to del.icio.us for coming up with the first JSON API, back in 2004.) Then each language only needed to serialize its data to and from JSON, and other languages could decide how they were going to consume that JSON.

This was a big change for the industry. Dave Winer, who had a history of being wrong about things, acted like JSON was going to destroy the tech industry. He shouted “IT’S NOT EVEN XML!”. It’s worth reading his full reaction to get a sense of how certain people were shocked by JSON’s simplicity.

This idea was one of the starting points for NoSQL databases. And some of these databases, especially the document stores such as MongoDB and CouchDB, have answered some of the goals that people had initially hoped would be solved by pure object databases. Java and C++ can write to MongoDB. And yes, we gave up on some of the goals, such as a self-describing serialization format, because that seemed hopelessly complex. Instead we standardized around an API style that is sometimes called RESTful (though others insist it is still RPC).

We might enjoy a similar conceptual simplification once the mania for Containers For Everything is dead. Because there are some interesting ideas in this movement, though they are currently being dealt with through more and more layers of complexity, in a replay of the mistakes that lead to the Web Service Specification. What is needed, instead, is a re-thinking of the problem at the level of basic concepts. One of the more interesting ideas now associated with containers is separating compute from all other aspects of computer activity, such that functions can exist as pure entities floating in the cloud. AWS Lambda is currently the nearest thing we have to seeing this ideal come to life. But there might be other approaches that might work with fewer moving parts. Consider the argument that invoking a process on another machine should be exactly the same as invoking a process on one’s local machine. This idea has been under discussion since the 1980s, and perhaps should be re-examined as the right way forward for the tech industry. As my essay is already too long, I won’t waste any more words on the idea, but for those of you are interested, start by reading the Wikipedia page about RINA. (To be clear, RINA is fairly new, but it grew from a critique of the Internet that’s been percolating for decades.)

.

Thank you for reading.

.

For anyone interested in the previous conversations, here are the 3 essays and a partial list of places they were discussed:

Why would anyone choose Docker over uber binaries? (2017)

Reddit

Lobste.rs

Hacker News

.

Docker protects a programming paradigm that we should get rid of (2018)

Lobste.rs

Hacker News

.

Docker is the dangerous gamble which we will regret (2018)

Reddit

Hacker News

.

[ Style note: inspiration for some of the humor in the dialogue owes a debt to Pete Lacey's old classic about SOAP. ]

.

Off-topic: I host a once-a-month party that is mostly a tech event. Do you live in New York City? Would you like to be invited? Contact me via LinkedIn.

.

Source



Check out my books:
"I wish I could go back," said Anna. "I guess I thought it would always be there, and I could go back and learn more when I was older. But now I'm older and it's gone."

"All the great art scenes are like that," said Mariah. "Renoir's career was half over before the term Impressionism caught on. And Fitzgerald and Hemingway had given up on the Left Bank long before the place was overrun by talentless hacks who wanted to imitate the Lost Generation lifestyle. And the Beats had mostly left San Francisco before busloads of visitors started to do tours of the Haight-Ashbury. When Johnny Rotten couldn't work with the Sex Pistols anymore, he left and the London punk scene began to die. Later on, he said he regretted his decision to leave. Everyone thinks they can go away and come back later, but they never can. When Joan Didion and her husband left New York, she quipped that some other couples were staying too late at the party, but that gets it all backward. The party ends whether you want it to or not, and it takes an unusual arrogance to celebrate the end of an era that some people will remember as the best years of their life. Hemingway lived in Paris during his twenties, but he didn't write about his experience in Paris until he was in his sixties. No one ever knows they're part of an art movement; it's something you only see afterward."

"But if we only see it in retrospect, then how can we find the next great art scene?" asked Anna. "What do I look for?"




Also read this true story about a startup I worked at in 2015:




RECENT COMMENTS

11 COMMENTS

October 1, 2019
12:04 am

By John

I think you found an idea for your next startup – DevOps for mere mortals.

GitLab posted this article in May about their hosted cloud service. It was only then that they had started using the containerized deployment for GitLab. They were concerned about the risks and the complexity: https://about.gitlab.com/2019/05/02/gitlab-journey-from-azure-to-gcp/

Helm charts had been available for GitLab for several years by the time they switched. It seems to validate your argument when that widely used of a cloud service is slow to go all in on containers.

October 1, 2019
12:59 pm

By Tasos

Recently published:

Wow it’s almost like trying to write your own filesystem twice and spending so much time trying to get docker to work on windows, only to throw all that out when microsoft announced linux kernel support (which really means docker would never work on windows) costs a lot of money. And let’s not forget all the debugging required.

October 2, 2019
6:52 am

By Gorgi Kosev

The thing that bugs me the most about posts like these is that it sets up a nice little straw man argument between an Advocate and a Critic.

You can be as vague as you like, making the Advocate sound as naive as you want and the Critic as rational and reasonable as you want. A bit of a mix of https://en.wikipedia.org/wiki/Straw_man and https://en.wikipedia.org/wiki/Don_Quixote#Tilting_at_windmills and honestly, for me it severely detracts from the strength of the argument.

It would be a lot more useful if a full actual arguments are disseminated or at least referenced, rather than vague imaginary ones.

October 2, 2019
5:08 pm

By lawrence

Gorgi Kosev, your comment is vague and lacks specifics. I hope you appreciate the irony. I wrote 11,000 words detailing the problems with Docker/Kubernetes, and you dismiss it without pointing to anything specific that you disagree with. For people like you, who dislike the humor of the first half of the essay, I dive deep into details in the second half of the essay.

October 3, 2019
2:12 pm

By Gorgi Kosev

Well lets list what you wrote:

* an imaginary strawman argument between an advocate and a critic
* an actual argument about Kubernetes being immature for managing stateful services
* a semi-actual argument against complexity
* a short analysis of all the other choices except containers with no quantifiable categories or comparisons
* a short hand-wavy recommendation about when to Kubernetes
* self-help for CTOs
* a rant about objects

What the article promised:
* flaws of Docker / Kubernetes and their eco-system

Am I the only one seeing a mismatch here? I was honestly hoping for some actual substance.

October 4, 2019
12:40 am

By lawrence

Gorgi Kosev, do you disagree with anything that I wrote? All of your criticism so far is focused on my writing style. I understand that you don’t like my attempt at humor. That’s an individual thing, some people on Twitter liked it, some did not. But did I write anything that was inaccurate? You have now written two comments here, yet you have not named a single specific assertion that you feel is inaccurate.

October 4, 2019
1:39 am

By lawrence

Gorgi Kosev, also, I don’t think you understand what a “straw man argument” is. This is a definition from Wikipedia:

The typical straw man argument creates the illusion of having completely refuted or defeated an opponent’s proposition through the covert replacement of it with a different proposition (i.e., “stand up a straw man”) and the subsequent refutation of that false argument (“knock down a straw man”) instead of the opponent’s proposition.[2][3] Straw man arguments have been used throughout history in polemical debate, particularly regarding highly charged emotional subjects.

I believe I’ve listed everything that Docker and Kubernetes are good for, all of their strengths. If I missed anything, please add that in. But please, try to say something specific about the technology (your last two comments were about my writing style — I get that you don’t like it, but that is a matter of personal preference)

October 4, 2019
7:31 am

By Gorgi Kosev

I won’t list anything concrete that you missed, because that will just give you ammunition to build the next artificially constructed imaginary straw man where you remove the concrete things.

For Docker I simply think you continue to miss the mark. I would love if you could point out which VM based system makes it simpler and faster to build and reset a deployable entity that has a guaranteed “clean slate” state which you can instantly recreate (possibly multiple times) compared to a Dockerfile.

Suggesting Go or bundling everything in a JAR misses the mark – that means you’re restricting the technology you use to your operational needs, rather than business needs (what if you need to do data science and R/Python happens to be the best choice, or you’re doing document manipulation on the side and need to spin up a LibreOffice instance etc)

I was hoping you would get at the core of what Kubernetes is i.e. https://www.youtube.com/watch?v=ZuIQurh_kDk and describe exactly how that doesn’t work and why. There is none of that in this article.

My thoughts are that if we can make this declarative self-healing virtual cloud interface thing work, it will be simple and glorious. If thats the case, it might make sense to take some risk and invest in going with Kubernetes earlier than strictly necessary since it has a lot of momentum.

So while I actually agree with you on your thought process and writing style, I don’t think you’ve done any serious analysis of Kubernetes, or at least you haven’t shared any on this blog.

October 4, 2019
12:40 pm

By lawrence

Gorgi Kosev, about this:

I would love if you could point out which VM based system makes it simpler and faster to build and reset a deployable entity that has a guaranteed “clean slate” state

Packer, sometimes with some Ansible. The combination of Packer and Terraform typically gives me what I need. Editing a Packer manifest is no more difficult than editing a Dockerfile. I typically use the Groovy DSL in Jenkins to write a build script that can rebuild a clean VM, quickly and easily.

About this:

My thoughts are that if we can make this declarative self-healing virtual cloud interface thing work, it will be simple and glorious.

I agree with that, but it is a very large “if”. My suggestion to most CEOs/CTOs is that they wait 5 years and let the dust settle before they jump in.

This could only be good advice for those businesses that are in fact in the devops business and offering devops as a service:

If thats the case, it might make sense to take some risk and invest in going with Kubernetes earlier than strictly necessary since it has a lot of momentum.

About this:

So while I actually agree with you on your thought process and writing style, I don’t think you’ve done any serious analysis of Kubernetes, or at least you haven’t shared any on this blog.

This simply wasn’t that kind of essay. Jessie Frazelle’s essay was much more that kind of thing, and I linked to her essay. Check out what she wrote:

https://blog.jessfraz.com/post/the-business-executives-guide-to-kubernetes/

To be clear about my intent with this essay, I advise startups. That is my focus. I talk to CTOs and other types of engineering leads. This essay was meant to offer some general warnings about what they should consider when making a decision.

What is the ideal devops setup for a given company? It would be difficult to offer a rough rule of thumb, given that there are so many things to consider:

What is the size of the company?

What technologies are already in use in the company?

How many different tech teams are in the company, and how much do their skill sets overlap?

Also, how many different tech teams are in the company, and do they speak the same human language?

Does the company culture emphasize written communication or spoken communication?

How old is the oldest technology in the company? 2015? 2010? 1995? 1985?

Is software core to the business, or does it mostly help with auxiliary tasks such as preparing for tax audit?

Do the gross margins of the business resemble those of a software company, or those of a hardware company?

Does the CTO personally know every software developer, or is the company too big for that?

How much political power does the current decision maker have? Are they changing a small team in a big company, or are they in charge of the whole company?

Is there sufficient budget to support dramatic technical change?

Would the relationship between the software developers and the non-technical staff be damaged by any changes to current workflows?

What are the current CI/CD tools?

What is the current devops setup?

How much tech debt is the company carrying?

In short, there are so many variables that it is impossible to come up with any easy heuristics. The decision about devops style will always demand subtle judgement and sober expertise. It is a question that hinges on hundreds of nuances. It would take a large book to look at most of the possible variations. As such, in this essay I was simply trying to offer some warnings to decision makers, things they should be aware of when they make the decision about devops style.

October 4, 2019
5:14 pm

By Gorgi Kosev

> Packer, sometimes with some Ansible. The combination of Packer and Terraform typically gives me what I need. Editing a Packer manifest is no more difficult than editing a Dockerfile. I typically use the Groovy DSL in Jenkins to write a build script that can rebuild a clean VM, quickly and easily.

Ok, prove it. Show me an example using Packer and Ansible to prepare a nodejs image that is as simple (or simpler) and as fast (or faster) to rebuild cleanly than this Dockerfile example that creates a docker image for a node application:

“`
FROM node:10
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 8080
CMD [ "node", "server.js" ]
“`

and can work on both developer machines and in the cloud.

October 4, 2019
8:44 pm

By lawrence

Gorgi Kosev, I am working to clean up some of my Packer/Terraform code so I can release it on Github, and then I plan to write about that. I’ll let you know when I make it public.

But also, to your example, for many people it is easier to just run a few lines of bash to install NodeJs, which is a point that I made repeatedly in the essay above. Lots of software developers build software that they run on the bare metal of their own machine, and then also on the bare machines of the production servers.

Which style of development is appropriate for your company? That depends on the details of your company. As I wrote above:

Each developer had to setup the software on their own machines, and it often took a few days of effort to get all the software installed and running on local machines. So that is a tax that every new developer pays. How often should your company pay that tax? If you have one or two developers, and they stay with you for many years, then the tax is not worth worrying about. But if you have 300 software developers, that tax will be horrendous, so you need to adopt a different style of development.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>