June 6th, 2019
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: firstname.lastname@example.org
Here is a comment that Nils Meyer on LinkedIn, in response to something I said. I can agree that I would see some use to Docker/Kubernetes in a non-virtual world, the irony is that I’ve only seen Docker/Kubernetes used in virtual setups.
I would agree that using packer and treating the VM like a container (or rather a pod in Kubernetes Parlance) is the easier approach. It’s also somewhat bizarre that a tarball of multiple tarballs and shell scripts concatenated with “&&” are now considered the pinnacle of software packaging. EDIT: There is a lot to dislike about docker as a container runtime, the image format, the standard way to build images, use of outdated lowest common denominator technology (iptables anyone? And why oh why must we use IPv4 and NAT?), feature creep etc.. Containers are a neat idea but docker leaves a lot to be desired.
There are advantages to using containers, especially with an additional orchestration layer, especially when NOT running on virtualization (most container deployments probably are to VMs). It is very difficult to get there though and requires a lot more work on infrastructure so that’s not a step to be taken lightly. It’s also fairly easy to get wrong with disastrous consequences for security, very often I see people pull in containers and code from all manner of sources, and many of the “official” container images have known vulnerabilities.
For context, see my earlier essay, Docker Is The Dangerous Gamble Which We Will Regret.
[ [ UPDATE 2019-06-10 ] ]
Nils Meyer added a comment to this blog post, which I’m incorporating into the blog post itself:
To elaborate a bit on my remarks: Many organizations use containers and container orchestration without having a defined and valid use case and without the necessary skillset and manpower in house to manage a complex setup. It’s extremely easy to get up and running with containers, setting up Kubernetes is also extremely easy when you’re just using kops. There are of course hosted solutions as well. This can be very deceptive since you skip ahead on the learning curve.
A risk therein is that people don’t actually understand what they have built, especially when it’s mostly developers with little Linux background setting it up. There is a lot of technology involved that you should understand when running a complex setup: Networking, Routing, Network overlays, the Linux distributions you’re using, overlay filesystems as well as underlying filesystems, proxying, if you’re using a proprietary cloud you should have an understanding of how those components work as well.
The risk is that you’ll get a lot of rope to hang yourself with since re-use of components is very easy. For example, you can pull in a lot of stuff from docker hub and other container registries, but there is no quality assurance or curation there (like you would get with python core modules for example) – there was some recent research that found a lot of fixed security issues in “official” docker images simply because the underlying OS layer wasn’t updated.
So to do this properly you would end up building your own container images, which means a lot of duplicated effort. Since you often end up running different Linux distributions you’ll need to know how to manage those as well. You will need a CI/CD system for that. You’ll want your own container registry. You’ll want to run vulnerability scans on your containers and have alerting when an image is vulnerable.
Once you have a large orchestration layer it becomes more and more difficult to get things to run similarly on developers machines. You’ve already lost when developers run an OS that doesn’t natively support containers and most of your developers probably don’t run Linux.
You need to be able to debug a container build – this can be especially annoying with Dockerfiles due to the layered approach, if the list of commands you chained together with && \ fails you’ll have some trouble trying to fix it. This is of course true of other packaging systems to a certain degree as well. Wouldn’t it be great instead of having every command create a new layer to just create a layer explicitly (just like database transactions)?
If you’re in a strongly regulated business you’ll want to be able to audit what software in what version and under which license you’re running at any given time. That also means you need to keep old container images around but be able to prevent their use.
Once you have all this you can do some pretty cool things – for example you can run optimized builds of your software, you can use newer versions of libraries only where you need them instead of completely running a bleeding edge distro, you can achieve far better utilization with containers on given hardware, containers scale very fast, virtual machines suffer from some unique CPU bugs that aren’t as high impact with containers. Some of these benefits you’ll realize a lot easier by running on bare metal, but few do this.
All of this doesn’t even take into account the most difficult thing: Storing data. At a certain point you have to store data, and usually do to limitations in networking and storage systems this can’t be very elastic and it gets very difficult if you have certain requirements for durability.