May 13th, 2018
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: firstname.lastname@example.org
Do you know about giraffes? In particular, their recurrent laryngeal nerve? Here is the deal: the nerve used to go straight from the brain to larynx, and in fish this is a short, direct connection, with the nerve passing behind the gills, but as fish evolved into creatures that lived on land, the gills got pulled into the body of the creature, and became what we call lungs. As that evolution happened, the nerve got pulled further and further into the body, because it was stuck looping around the lungs. The extreme case is the giraffe. Remember, this nerve is suppose to go from the brain to larynx in the throat, yet now it has to go all the way down the neck, loop around the lungs, then travel all the way back up the neck to reach the larynx. It’s a huge waste. As Wikipedia says:
The route of the recurrent laryngeal nerve is such that it travels from the brain to the larynx by looping around the aortic arch. This same configuration holds true for many animals; in the case of the giraffe, this results in about twenty feet of extra nerve.
Some biologists refer to this as Incompetent Design. The problem is that nobody ever sat down to full redesign creatures so that they could live on land. Rather, each living creature simply wanted to survive, and pass on its genes, and its children were just a little bit different compared to the parent. If God, or you, were to sit down today, knowing all that we now know about life on land, you could design a creature that didn’t have these kinds of obvious mistakes. But instead what happened was something much uglier: the design of the moment, the ancestors of the giraffe, saw small, incremental changes, which over time morphed into a new shape, full of obvious design errors.
Python grew up in the world of 1990s, when a developer might work on the same server for many years. Servers were permanent. In that world, it didn’t seem like a problem if a library was installed globally. After all, the developer had years to get to know the various paths on that server, and years to set the environment variables to whatever their project needed. But that paradigm broke down in the new world of cloud computing. Servers became impermanent. And then Docker came along to help fix some of the problems that eco-systems such as Python faced in this new world of fast changing servers. Certainly, Docker helped a lot with paths and managing environment variables. For this reason, the use of Python tends to lead to the use of Docker, and the use of Docker encourages the use of Kubernetes. And in the end you have the recurrent laryngeal nerve of the giraffe. You ened up with something enormously complex, that arose incrementally, by trying to keep alive some pre-existing system. But if you were to sit down and design something entirely new, knowing all that you know now, you can could build something much cleaner and simpler than what Python/Docker/Kubernetes gives you.
“- An application will not mess with the configuration of another app (that’s solving the problem of virtualenv, rvm and apt incompatibilities).”
Right, so that is why I wrote “Why would anyone choose Docker over fat binaries?“. Rather than use Ruby or Python, and rely on the path and variables of the underlying server, or operating system, why not use an uber binary that has no outside dependencies? Why not keep it clean and simple and isolated?
Let’s consider the counter-argument first. Ryan Tomayko wrote I Love Unicorn Because It’s Unix in 2009,
Eric Wong’s mostly pure-Ruby HTTP backend, Unicorn, is an inspiration. I’ve studied this file for a couple of days now and it’s undoubtedly one of the best, most densely packed examples of Unix programming in Ruby I’ve come across…
We’re going to get into how Unicorn uses the OS kernel to balance connections between backend processes using a shared socket, fork(2), and accept(2) – the basic Unix prefork model in 100% pure Ruby.
We should be doing more of this. A lot more of this. I’m talking about fork(2), execve(2), pipe(2), socketpair(2), select(2), kill(2), sigaction(2), and so on and so forth. These are our friends. They want so badly just to help us.
Ruby, Python, and Perl all have fairly complete interfaces to common Unix system calls as part of their standard libraries. In most cases, the method names and signatures match the POSIX definitions exactly.
This was the attitude that drove Ruby and Python and Perl during the last 30 years: that the operating system was powerful and a light weight scripting language should exist partly to offer a convenient wrapper over OS calls. Larry Wall said that Perl was appropriate for any app that was too big for Bash but didn’t need to be in C. That is a huge space. And Ruby and Python and Perl all were built with the idea that the developer should rely on the OS as much as possible. This was brilliant in the 90s, but it means that the apps written in these languages tend to be dependent on file paths and environment variables and user paths and OS permissions — the app is heavily dependent on the overall context of the machine, because the app is supposed to be a lightweight wrapper around all the functionality already provided by the OS.
This paradigm used to be brilliant but it becomes pathological when it tries to transition to cloud computing.
There is a devops joke that says that when you’ve got a handful of servers you name them like pets: bob, alice, li, lo. But when you’ve got hundreds of servers, you simply number them, like widgets coming off the assembly line at the widget factory.
The paradigm that Ryan Tomayko praises is well suited to a world where the servers are named like pets. But calling fork() and join() from your Ruby app, when you’ve got a thousand instances of your Ruby app running on a thousand servers, is a very bad idea. At that point you need higher level frameworks for dealing with concurrency.
I loved Ryan Tomayko’s essay at the time and I sent it to all of my friends. I wanted everyone to understand and appreciate it. It influenced how I wrote Ruby. But as I worked on bigger and bigger systems, I realized, that paradigm needs to die. fork() and join() can not control the concurrency of your system when your system is spread across a large number of servers. To handle these bigger systems, many new frameworks have emerged. Ruby programmers can now use Celluloid an actor framework “which lets you build multithreaded programs out of concurrent objects just as easily as you build sequential programs out of regular objects.” Developers who write Scala are in love with Akka, which appears to be very good. In the world of Clojure, Michael Drogalis has lead the way with the Onyx framework. And the Go language has very good primitives for certain kinds of concurrency, and it has the Circuit framework for large scale distributed computing.
Some people have said to me, “With Docker, I can run two instances of my Python app, or 5, or even 20, on the same host, and I can automate how many instances are running, so as to scale up the number of instances based on how much traffic/demand I need to deal with.” Okay, awesome. So Docker helps manage concurrency? And this is important with Python because Python has historically had a difficult time handling concurrency (the GIL), and even now, Python programmers tend to spin up new processes, rather than using something they consider ambiguous, such as threads. But if that is your need, why not use a language/eco-system that has first class support for concurrency? There are older, mature options, such as Java and C# and Erlang, and there are many newer options, such as Go or Elixir or Clojure.
What is Docker for? You can take your old Python and Ruby and Perl apps and wrap them up in Docker, and thus those old apps can make the transition to the modern world of cloud computing. In that sense, Docker allows you to take apps developed with a paradigm from the 1990s, and deploy it in 2018. The folks working with Python and Ruby and Perl (and PHP) are jealous of the way a Java programmer can create an uberjar, and they are jealous of the way a Golang programmer can create a binary that has no outside dependencies — and so the Python programmer, and the Ruby and Perl and PHP programmer, they turn to Docker, which allows them to create the equivalent of an uberjar. But if that is what they want, maybe they should simply use a language that supports that natively, without the need for an additional technology?
Many people regard this as one of the greatest things about Docker, but I regard the entire effort as an example of what is wrong with the tech industry. We suffer an unwillingness to confront the reality of the emerging situation, and commit to new paradigms that are well adapted to the new situation. Instead we commit to very complex technologies that allow us to wallow in the past. This is “conservative” in the negative sense: rigid, nostalgic, reactionary.
I know a great many developers are going to dismiss this blog post, but as a thought experiment, you might want to consider two different companies. One spends the next 5 years committing to those languages and eco-systems that have been built for the era. The other spends the next 5 years using Docker so they can keep using script languages from the 1990s. Now it is the year 2023, and a crisis happens at both companies. Which of those two companies do you think will be more ready to adapt to the crisis?
[ [ UPDATE 2018-06-14] ]
Myself and a friend just spent an hour trying to get a short Python script running on an EC2 instance. We got stuck dealing with this error:
ModuleNotFoundError: No module named ‘MySQLdb’
The EC2 instance was running Python 2.7 by default. Thinking we needed to use pip3 for this install, we upgraded to Python 3 and pip3. But we still got the same error. We tried a few other things.
Eventually, my friend said, “Hey, let me take this home and write a real install script, and we can try to run this in a few days. Maybe I can build this in Docker.”
Of all the forces that currently push Docker forward, I suspect that the Python community is the strongest. And that is because the dependency management in the Python community is so badly broken.
Compare the Python community with the Java community. At no point in the last 10 years have I had a Java project where I ran into the kinds of dependency management problems that I run into, routinely, with Python.
And again, many of the problem that Python faces goes back to that idea that Python should rely on the underlying machine, and the underlying OS — a set of ideas which the whole tech industry is now trying to get away from.
I absolutely understand why you want to use Docker, if you are working with Python. Because Python is broken. But you owe it to yourself, and your company, to consider that the time you invest in Docker might be better spent moving away from Python.
[ [ UPDATE 2018-07-09 ] ]
The following happened today. This is exactly the kind of thing that Docker is supposed to protect us from, and it can’t even get this right. Really sad.
This is me and a co-worker, trying to reconcile our different parts of a Python app:Source