Smash Company Splash Image

May 13th, 2018

In Technology

2 Comments

Docker protects a programming paradigm that we should get rid of

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com

Do you know about giraffes? In particular, their recurrent laryngeal nerve? Here is the deal: the nerve used to go straight from the brain to larynx, and in fish this is a short, direct connection, with the nerve passing behind the gills, but as fish evolved into creatures that lived on land, the gills got pulled into the body of the creature, and became what we call lungs. As that evolution happened, the nerve got pulled further and further into the body, because it was stuck looping around the lungs. The extreme case is the giraffe. Remember, this nerve is suppose to go from the brain to larynx in the throat, yet now it has to go all the way down the neck, loop around the lungs, then travel all the way back up the neck to reach the larynx. It’s a huge waste. As Wikipedia says:

The route of the recurrent laryngeal nerve is such that it travels from the brain to the larynx by looping around the aortic arch. This same configuration holds true for many animals; in the case of the giraffe, this results in about twenty feet of extra nerve.

Some biologists refer to this as Incompetent Design. The problem is that nobody ever sat down to full redesign creatures so that they could live on land. Rather, each living creature simply wanted to survive, and pass on its genes, and its children were just a little bit different compared to the parent. If God, or you, were to sit down today, knowing all that we now know about life on land, you could design a creature that didn’t have these kinds of obvious mistakes. But instead what happened was something much uglier: the design of the moment, the ancestors of the giraffe, saw small, incremental changes, which over time morphed into a new shape, full of obvious design errors.

Python grew up in the world of 1990s, when a developer might work on the same server for many years. Servers were permanent. In that world, it didn’t seem like a problem if a library was installed globally. After all, the developer had years to get to know the various paths on that server, and years to set the environment variables to whatever their project needed. But that paradigm broke down in the new world of cloud computing. Servers became impermanent. And then Docker came along to help fix some of the problems that eco-systems such as Python faced in this new world of fast changing servers. Certainly, Docker helped a lot with paths and managing environment variables. For this reason, the use of Python tends to lead to the use of Docker, and the use of Docker encourages the use of Kubernetes. And in the end you have the recurrent laryngeal nerve of the giraffe. You ened up with something enormously complex, that arose incrementally, by trying to keep alive some pre-existing system. But if you were to sit down and design something entirely new, knowing all that you know now, you can could build something much cleaner and simpler than what Python/Docker/Kubernetes gives you.

Consider these comments in favor Docker:

- An application will not mess with the configuration of another app (that’s solving the problem of virtualenv, rvm and apt incompatibilities).

Right, so that is why I wrote “Why would anyone choose Docker over fat binaries?“. Rather than use Ruby or Python, and rely on the path and variables of the underlying server, or operating system, why not use an uber binary that has no outside dependencies? Why not keep it clean and simple and isolated?

Let’s consider the counter-argument first. Ryan Tomayko wrote I Love Unicorn Because It’s Unix in 2009,

Eric Wong’s mostly pure-Ruby HTTP backend, Unicorn, is an inspiration. I’ve studied this file for a couple of days now and it’s undoubtedly one of the best, most densely packed examples of Unix programming in Ruby I’ve come across…

We’re going to get into how Unicorn uses the OS kernel to balance connections between backend processes using a shared socket, fork(2), and accept(2) – the basic Unix prefork model in 100% pure Ruby.

We should be doing more of this. A lot more of this. I’m talking about fork(2), execve(2), pipe(2), socketpair(2), select(2), kill(2), sigaction(2), and so on and so forth. These are our friends. They want so badly just to help us.

Ruby, Python, and Perl all have fairly complete interfaces to common Unix system calls as part of their standard libraries. In most cases, the method names and signatures match the POSIX definitions exactly.

This was the attitude that drove Ruby and Python and Perl during the last 30 years: that the operating system was powerful and a light weight scripting language should exist partly to offer a convenient wrapper over OS calls. Larry Wall said that Perl was appropriate for any app that was too big for Bash but didn’t need to be in C. That is a huge space. And Ruby and Python and Perl all were built with the idea that the developer should rely on the OS as much as possible. This was brilliant in the 90s, but it means that the apps written in these languages tend to be dependent on file paths and environment variables and user paths and OS permissions — the app is heavily dependent on the overall context of the machine, because the app is supposed to be a lightweight wrapper around all the functionality already provided by the OS.

This paradigm used to be brilliant but it becomes pathological when it tries to transition to cloud computing.

There is a devops joke that says that when you’ve got a handful of servers you name them like pets: bob, alice, li, lo. But when you’ve got hundreds of servers, you simply number them, like widgets coming off the assembly line at the widget factory.

The paradigm that Ryan Tomayko praises is well suited to a world where the servers are named like pets. But calling fork() and join() from your Ruby app, when you’ve got a thousand instances of your Ruby app running on a thousand servers, is a very bad idea. At that point you need higher level frameworks for dealing with concurrency.

I loved Ryan Tomayko’s essay at the time and I sent it to all of my friends. I wanted everyone to understand and appreciate it. It influenced how I wrote Ruby. But as I worked on bigger and bigger systems, I realized, that paradigm needs to die. fork() and join() can not control the concurrency of your system when your system is spread across a large number of servers. To handle these bigger systems, many new frameworks have emerged. Ruby programmers can now use Celluloid an actor framework “which lets you build multithreaded programs out of concurrent objects just as easily as you build sequential programs out of regular objects.” Developers who write Scala are in love with Akka, which appears to be very good. In the world of Clojure, Michael Drogalis has lead the way with the Onyx framework. And the Go language has very good primitives for certain kinds of concurrency, and it has the Circuit framework for large scale distributed computing.

Some people have said to me, “With Docker, I can run two instances of my Python app, or 5, or even 20, on the same host, and I can automate how many instances are running, so as to scale up the number of instances based on how much traffic/demand I need to deal with.” Okay, awesome. So Docker helps manage concurrency? And this is important with Python because Python has historically had a difficult time handling concurrency (the GIL), and even now, Python programmers tend to spin up new processes, rather than using something they consider ambiguous, such as threads. But if that is your need, why not use a language/eco-system that has first class support for concurrency? There are older, mature options, such as Java and C# and Erlang, and there are many newer options, such as Go or Elixir or Clojure.

What is Docker for? You can take your old Python and Ruby and Perl apps and wrap them up in Docker, and thus those old apps can make the transition to the modern world of cloud computing. In that sense, Docker allows you to take apps developed with a paradigm from the 1990s, and deploy it in 2018. The folks working with Python and Ruby and Perl (and PHP) are jealous of the way a Java programmer can create an uberjar, and they are jealous of the way a Golang programmer can create a binary that has no outside dependencies — and so the Python programmer, and the Ruby and Perl and PHP programmer, they turn to Docker, which allows them to create the equivalent of an uberjar. But if that is what they want, maybe they should simply use a language that supports that natively, without the need for an additional technology?

Many people regard this as one of the greatest things about Docker, but I regard the entire effort as an example of what is wrong with the tech industry. We suffer an unwillingness to confront the reality of the emerging situation, and commit to new paradigms that are well adapted to the new situation. Instead we commit to very complex technologies that allow us to wallow in the past. This is “conservative” in the negative sense: rigid, nostalgic, reactionary.

I know a great many developers are going to dismiss this blog post, but as a thought experiment, you might want to consider two different companies. One spends the next 5 years committing to those languages and eco-systems that have been built for the era. The other spends the next 5 years using Docker so they can keep using script languages from the 1990s. Now it is the year 2023, and a crisis happens at both companies. Which of those two companies do you think will be more ready to adapt to the crisis?

[ [ UPDATE 2018-06-14] ]

Myself and a friend just spent an hour trying to get a short Python script running on an EC2 instance. We got stuck dealing with this error:

ModuleNotFoundError: No module named ‘MySQLdb’

The EC2 instance was running Python 2.7 by default. Thinking we needed to use pip3 for this install, we upgraded to Python 3 and pip3. But we still got the same error. We tried a few other things.

Eventually, my friend said, “Hey, let me take this home and write a real install script, and we can try to run this in a few days. Maybe I can build this in Docker.”

Of all the forces that currently push Docker forward, I suspect that the Python community is the strongest. And that is because the dependency management in the Python community is so badly broken.

Compare the Python community with the Java community. At no point in the last 10 years have I had a Java project where I ran into the kinds of dependency management problems that I run into, routinely, with Python.

And again, many of the problem that Python faces goes back to that idea that Python should rely on the underlying machine, and the underlying OS — a set of ideas which the whole tech industry is now trying to get away from.

I absolutely understand why you want to use Docker, if you are working with Python. Because Python is broken. But you owe it to yourself, and your company, to consider that the time you invest in Docker might be better spent moving away from Python.

.

.

[ [ UPDATE 2018-07-09 ] ]

The following happened today. This is exactly the kind of thing that Docker is supposed to protect us from, and it can’t even get this right. Really sad.

This is me and a co-worker, trying to reconcile our different parts of a Python app:






Source



Check out my book:





RECENT COMMENTS

January 18, 2019 10:22 am

From Justin McGuire on When will the era of CyberPunk end?

"The reason cyberpunk doesn't die is because it all came true. From Noah Smith on twitter: "The cool thing a..."

December 16, 2018 9:06 am

From lawrence on Yair Lapid: What does it say about us that Israel has become the only democracy in the world in which Jews don’t have freedom of religion?

"Cat Mara, thank you for catching that. I've fixed it now. (The URL was a "v" by mistake. Looks like I was tryi..."

December 12, 2018 7:50 pm

From lawrence on Object Oriented Programming is an expensive disaster which must end

"Jussi Nurminen, thank you for writing. I believe you are correct, in the sense that Python 2.x had all the bas..."

December 12, 2018 5:13 am

From Jussi Nurminen on Object Oriented Programming is an expensive disaster which must end

"Hello! I've lately became a bit more suspicious of OO designs (including my own), so I read your original 2014..."

December 4, 2018 9:22 am

From lawrence on Docker is the dangerous gamble which we will regret

"GK, thank you for writing, but I don't understand what you mean when you write: "However, at that point you..."

December 4, 2018 7:14 am

From GK on Docker is the dangerous gamble which we will regret

"A development VM is a fine choice, provided that it comes with tools that make it just as easy to run commands..."

November 30, 2018 7:04 pm

From lawrence on Docker is the dangerous gamble which we will regret

"GK, thank you for writing. About this part: "That thing is writing portable shell scripts. The moment you n..."

November 30, 2018 1:41 pm

From GK on Docker is the dangerous gamble which we will regret

"The fat binaries article was nice, but full blown fat binaries are not really necessary. Whats needed is that ..."

November 27, 2018 1:13 am

From lawrence on Object Oriented Programming is an expensive disaster which must end

"Andres Moreno, thank you for writing. Among other points to be said, I'll say I'm almost heart broken about Py..."

November 26, 2018 9:11 pm

From Andres Moreno on Object Oriented Programming is an expensive disaster which must end

"I am stunned! Why did it take so long to show that the Emperor has no clothes? I got bit by the Lisp bug early..."

November 23, 2018 8:24 am

From Just An Observer on Hillary Clinton keeps making the same mistakes

"Similar to the "Let's dump Nancy Pelosi, since the Republicans don't like her" talk we hear all the time...."

November 22, 2018 11:49 pm

From Free Speech Message Board on Zed Shaw is angry (abstraction and indirection)

"Americans used to think hippies were lazy fruits for dropping out during the Vietnam War, but maybe the hippie..."

November 19, 2018 5:13 pm

From Justin McGuire on To start with, tran­sient query surges are no longer a prob­lem?

"I think the idea is that a surge in traffic results in more messages on the queue, which can be handled at a n..."

November 19, 2018 12:45 pm

From J on Docker is the dangerous gamble which we will regret

"What are everyone's recommendations for fat binary bare metal deployments preferably with hooks into continuou..."

2 COMMENTS

May 15, 2018
12:11 am

By Agam Brahma

Great article, so true @ “unwillingness to confront the reality of the emerging solution”.

Minor typo: s/Onxy/Onyx/

May 15, 2018
10:32 am

By lawrence

Agam Brahma, thank you. I have now fixed the typo.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>