May 13th, 2018
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: email@example.com
I can see the advantage for dev boxes where a developer might want to setup a load of containers on their machine to emulate a staging or production environment. But I don’t really understand why you’d want to base your entire production infrastructure on it.
The most upvoted response demonstrates a sad confusion:
It’s the UNIX philosophy applied to applications, everything must be as small as possible, do only one task and do it properly and I think that’s part of docker’s popularity.
There might be great reasons to use Docker, but this isn’t one of them. Using Docker, building images of our apps, with an internal illusion of user paths and environment variables, living in isolation from one another, baked into an immutable image — all of that carries us very far away from the good things we can all admire about the original Unix spirit. There is no sense of “small pieces loosely joined,” but rather, Docker takes us into the world of “complex configuration to manage your earlier configuration and standardize it so that even more complex orchestration tools will know how to work with it.”
I’ll respond to this in particular:
“- An application will not mess with the configuration of another app (that’s solving the problem of virtualenv, rvm and apt incompatibilities).”
Right, so that is why I wrote “Why would anyone choose Docker over fat binaries?“. Rather than use Ruby or Python, and rely on the underlying operating system, why not use an uber binary that has no outside dependencies?
Let’s consider the counter-argument first. Ryan Tomayko wrote I Love Unicorn Because It’s Unix in 2009,
Eric Wong’s mostly pure-Ruby HTTP backend, Unicorn, is an inspiration. I’ve studied this file for a couple of days now and it’s undoubtedly one of the best, most densely packed examples of Unix programming in Ruby I’ve come across…
We’re going to get into how Unicorn uses the OS kernel to balance connections between backend processes using a shared socket, fork(2), and accept(2) – the basic Unix prefork model in 100% pure Ruby.
We should be doing more of this. A lot more of this. I’m talking about fork(2), execve(2), pipe(2), socketpair(2), select(2), kill(2), sigaction(2), and so on and so forth. These are our friends. They want so badly just to help us.
Ruby, Python, and Perl all have fairly complete interfaces to common Unix system calls as part of their standard libraries. In most cases, the method names and signatures match the POSIX definitions exactly.
This was the attitude that drove Ruby and Python and Perl during the last 30 years: that the operating system was powerful and a light weight scripting language should exist partly to offer a convenient wrapper over OS calls. Larry Wall said that Perl was appropriate for any app that was too big for Bash but didn’t need to be in C. That is a huge space. And Ruby and Python and Perl all were built with the idea that the developer should rely on the OS as much as possible. This was brilliant in the 90s, but it means that the apps written in these languages tend to be dependent on file paths and environment variables and user paths and OS permissions — the app is heavily dependent on the overall context of the machine, because the app is supposed to be a lightweight wrapper around all the functionality already provided by the OS.
This paradigm used to be brilliant but it becomes pathological when it tries to transition to cloud computing.
There is a devops joke that says that when you’ve got a handful of servers you name them like pets: bob, alice, li, lo. But when you’ve got hundreds of servers, you simply number them, like widgets coming off the assembly line at the widget factory.
The paradigm that Ryan Tomayko praises is well suited to a world where the servers are named like pets. But calling fork() and join() from your Ruby app, when you’ve got a thousand instances of your Ruby app running on a thousand servers, is a very bad idea. At that point you need higher level frameworks for dealing with concurrency.
I loved Ryan Tomayko’s essay at the time and I sent it to all of my friends. I wanted everyone to understand and appreciate it. It influenced how I wrote Ruby. But as I worked on bigger and bigger systems, I realized, that paradigm needs to die. fork() and join() can not control the concurrency of your system when your system is spread across a large number of servers. To handle these bigger systems, many new frameworks have emerged. Ruby programmers can now use Celluloid an actor framework “which lets you build multithreaded programs out of concurrent objects just as easily as you build sequential programs out of regular objects.” Developers who write Scala are in love with Akka, which appears to be very good. In the world of Clojure, Michael Drogalis has lead the way with the Onyx framework. And the Go language has very good primitives for certain kinds of concurrency, and it has the Circuit framework for large scale distributed computing.
Some people have said to me, “With Docker, I can run two instances of my Python app, or 5, or even 20, on the same host, and I can automate how many instances are running, so as to scale up the number of instances based on how much traffic/demand I need to deal with.” Okay, awesome. So Docker helps manage concurrency? And this is important with Python because Python has historically had a difficult time handling concurrency (the GIL), and even now, Python programmers tend to spin up new processes, rather than using something they consider ambiguous, such as threads. But if that is your need, why not use a language/eco-system that has first class support for concurrency? There are older, mature options, such as Java and C# and Erlang, and there are many newer options, such as Go or Elixir or Clojure.
What is Docker for? You can take your old Python and Ruby and Perl apps and wrap them up in Docker, and thus those old apps can make the transition to the modern world of cloud computing. In that sense, Docker allows you to take apps developed with a paradigm from the 1990s, and deploy it in 2018. The folks working with Python and Ruby and Perl (and PHP) are jealous of the way a Java programmer can create an uberjar, and they are jealous of the way a Golang programmer can create a binary that has no outside dependencies — and so the Python programmer, and the Ruby and Perl and PHP programmer, they turn to Docker, which allows them to create the equivalent of an uberjar. But if that is what they want, maybe they should simply use a language that supports that natively, without the need for an additional technology?
Many people regard this as one of the greatest things about Docker, but I regard the entire effort as an example of what is wrong with the tech industry. We suffer an unwillingness to confront the reality of the emerging situation, and commit to new paradigms that are well adapted to the new situation. Instead we commit to very complex technologies that allow us to wallow in the past. This is “conservative” in the negative sense: rigid, nostalgic, reactionary.
I know a great many developers are going to dismiss this blog post, but as a thought experiment, you might want to consider two different companies. One spends the next 5 years committing to those languages and eco-systems that have been built for the era. The other spends the next 5 years using Docker so they can keep using script languages from the 1990s. Now it is the year 2023, and a crisis happens at both companies. Which of those two companies do you think will be more ready to adapt to the crisis?
[ [ UPDATE 2018-06-14] ]
Myself and a friend just spent an hour trying to get a short Python script running on an EC2 instance. We got stuck dealing with this error:
ModuleNotFoundError: No module named ‘MySQLdb’
The EC2 instance was running Python 2.7 by default. Thinking we needed to use pip3 for this install, we upgraded to Python 3 and pip3. But we still got the same error. We tried a few other things.
Eventually, my friend said, “Hey, let me take this home and write a real install script, and we can try to run this in a few days. Maybe I can build this in Docker.”
Of all the forces that currently push Docker forward, I suspect that the Python community is the strongest. And that is because the dependency management in the Python community is so badly broken.
Compare the Python community with the Java community. At no point in the last 10 years have I had a Java project where I ran into the kinds of dependency management problems that I run into, routinely, with Python.
And again, many of the problem that Python faces goes back to that idea that Python should rely on the underlying machine, and the underlying OS — a set of ideas which the whole tech industry is now trying to get away from.
I absolutely understand why you want to use Docker, if you are working with Python. Because Python is broken. But you owe it to yourself, and your company, to consider that the time you invest in Docker might be better spent moving away from Python.
[ [ UPDATE 2018-07-09 ] ]
The following happened today. This is exactly the kind of thing that Docker is supposed to protect us from, and it can’t even get this right. Really sad.
This is me and a co-worker, trying to reconcile our different parts of a Python app:Source