If Unix is good for Unicorn, why can’t Unicorn handle slow connections?

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com

I wrote about this recently, but I want to add to what I said.

In what I now think of as a famous essay, Ryan Tomayko said “I like Unicorn because it’s Unix“. There must be something to this because the essay has been widely quoted, and I remember it, and I have re-read 3 times in the last 4 years. It had an impact.

And yet, nothing in it convinced me to adopt that model. I rejected it and went in the other direction: toward Clojure, the JVM and threads. And lately I’ve been wondering why, exactly, my admiration for this essay did not translate into a desire to pursue the advocated style of programming.

It is possible that the weakness of the example gave me some trepidation. I mean, if Unicorn is good because it implements the Unix style of concurrency, then why is Unicorn so limited?

A “slow client” can be any client outside of your datacenter. Network traffic within a local network is always faster than traffic that crosses outside of it. The laws of physics do not allow otherwise.

Persistent connections were introduced in HTTP/1.1 reduce latency from connection establishment and TCP slow start. They also waste server resources when clients are idle.

Persistent connections mean one of the unicorn worker processes (depending on your application, it can be very memory hungry) would spend a significant amount of its time idle keeping the connection alive and not doing anything else. Being single-threaded and using blocking I/O, a worker cannot serve other clients while keeping a connection alive. Thus unicorn does not implement persistent connections.

If your application responses are larger than the socket buffer or if you’re handling large requests (uploads), worker processes will also be bottlenecked by the speed of the client connection. You should not allow unicorn to serve clients outside of your local network.

As it says later on that page, Unicorn can only be made to work if you put Nginx between it and the person who is trying to use Unicorn. But Nginx does not rely on the Unix model of concurrency. So if the Unix model of concurrency is so great, and Unicorn uses it, why is Unicorn so helpless that it has to depend on a piece of software that does not use the Unix model of concurrency?

This is how they describe their need for Nginx:

By acting as a buffer to shield unicorn from slow I/O, a reverse proxy will inevitably incur overhead in the form of extra data copies. However, as I/O within a local network is fast (and faster still with local sockets), this overhead is negligible for the vast majority of HTTP requests and responses.

The ideal reverse proxy complements the weaknesses of unicorn. A reverse proxy for unicorn should meet the following requirements:

It should fully buffer all HTTP requests (and large responses). Each request should be “corked” in the reverse proxy and sent as fast as possible to the backend unicorn processes. This is the most important feature to look for when choosing a reverse proxy for unicorn.

It should spend minimal time in userspace. Network (and disk) I/O are system-level tasks and usually managed by the kernel. This may change if userspace TCP stacks become more popular in the future; but the reverse proxy should not waste time with application-level logic. These concerns should be separated

It should avoid context switches and CPU scheduling overhead. In many (most?) cases, network devices and their interrupts are only be handled by one CPU at a time. It should avoid contention within the system by serializing all network I/O into one (or few) userspace processes. Network I/O is not a CPU-intensive task and it is not helpful to use multiple CPU cores (at least not for GigE).

It should efficiently manage persistent connections (and pipelining) to slow clients. If you care to serve slow clients outside your network, then these features of HTTP/1.1 will help.

It should (optionally) serve static files. If you have static files on your site (especially large ones), they are far more efficiently served with as few data copies as possible (e.g. with sendfile() to completely avoid copying the data to userspace).

nginx is the only (Free) solution we know of that meets the above requirements.

Indeed, the folks behind unicorn have deployed nginx as a reverse-proxy not only for Ruby applications, but also for production applications running Apache/mod_perl, Apache/mod_php and Apache Tomcat. In every single case, performance improved because application servers were able to use backend resources more efficiently and spend less time waiting on slow I/O.

Going back to “I like Unicorn because it is Unix”, this seems like part of denial that pervades the Ruby community regarding concurrency:

There’s another problem with Unix programming in Ruby that I’ll just touch on briefly: Java people and Windows people. They’re going to tell you that fork(2) is bad because they don’t have it on their platform, or it sucks on their platform, or whatever, but it’s cool, you know, because they have native threads, and threads are like, way better anyways.

Fuck that.

Don’t ever let anyone tell you that fork(2) is bad. Thirty years from now, there will still be a fork(2) and a pipe(2) and a exec(2) and smart people will still be using them to solve hard problems reliably and predictably, just like they were thirty years ago.

MRI Ruby people need to accept, like Python (you have seen multiprocessing, yes?), that Unix processes are one of two techniques for achieving reliable concurrency and parallelism in server applications. Threads are out. You can use processes, or async/events, or both processes and async/events, but definitely not threads. Threads are out.

I have the impression that Rubyists celebrate Unix processes because MRI Ruby lacks the tools they need to manage concurrency, so they fall back on the underlying operating system. But please, let us all consider the irony of a Rubyist relying on the operating system: people turn to Ruby because they want a high level language that masks much of the complexity of the underlying system, yet Ruby lacks abstractions for concurrency, so these high-level programmers end up dealing with the low level grunt work operating system processes. An honest person would admit that this is a contradiction. Please read this with an open mind:

Unicorn, and preforking servers in general, create a listening socket in a parent process and then fork off one or more child processes, each of which calls accept(2) on the same shared listening socket. The kernel manages the task of distributing connections between accepting processes.

Let’s start with a simplified example. A simple echo server that balances connections between three child processes:

# simple preforking echo server in Ruby
require ‘socket’

# Create a socket, bind it to localhost:4242, and start listening.
# Runs once in the parent; all forked children inherit the socket’s
# file descriptor.
acceptor = Socket.new(Socket::AF_INET, Socket::SOCK_STREAM, 0)
address = Socket.pack_sockaddr_in(4242, ‘localhost’)

# Close the socket when we exit the parent or any child process. This
# only closes the file descriptor in the calling process, it does not
# take the socket out of the listening state (until the last fd is
# closed).
# The trap is guaranteed to happen, and guaranteed to happen only
# once, right before the process exits for any reason (unless
# it’s terminated with a SIGKILL).
trap(‘EXIT’) { acceptor.close }

# Fork you some child processes. In the parent, the call to fork
# returns immediately with the pid of the child process; fork never
# returns in the child because we exit at the end of the block.
3.times do
fork do
# now we’re in the child process; trap (Ctrl-C) interrupts and
# exit immediately instead of dumping stack to stderr.
trap(‘INT’) { exit }

puts “child #$$ accepting on shared socket (localhost:4242)”
loop {
# This is where the magic happens. accept(2) blocks until a
# new connection is ready to be dequeued.
socket, addr = acceptor.accept
socket.write “child #$$ echo> ”
message = socket.gets
socket.write message
puts “child #$$ echo’d: ‘#{message.strip}’”

# Trap (Ctrl-C) interrupts, write a note, and exit immediately
# in parent. This trap is not inherited by the forks because it
# runs after forking has commenced.
trap(‘INT’) { puts “\nbailing” ; exit }

# Sit back and wait for all child processes to exit.

That’s a lot of work, isn’t it? If you really wanted to deal with low level details such as Unix signals, then why would you work in a high-level language like Ruby? Why not work in a language that allows you to get closer to the metal? Why not write in C or C++, which would you give something like a 50x speed boost? Normally a Rubyist would say “I don’t want to work in C because I don’t want to deal with the low-level details of the platform” but that is exactly what we are doing here.

I want to be a happy and productive programmer, therefore I want to work at a higher level of abstraction.

For an extemely different take on how to build a server, consider what Zach Tellman says about Aleph. And check the code on GitHub.