Can Gunicorn die if bash quits before forking is done?

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

A very intersting bug:

Sly010 does some epic digging to find out why Gunicorn sometimes dies on startup:

Did a bit more digging:

# relevant section from daemonize() in util.py

if os.fork():
    os._exit(0)
os.setsid()
...

# (added 3 prints)
# 3 out of 5 times this prints “1” then silence. No gunicorn.
# There is no error message and nothing in the logs.

if os.fork():
    print "1"
    os._exit(0)
print "2"
os.setsid()
print "3"
...
# (added an extra sleep)
# This prints 1,2,3 (but I would also accept 2,3,1 or even 2,1,3)
# and everything works as expected.

if os.fork():
    print "1"
    time.sleep(0.1)
    os._exit(0)
print "2"
os.setsid()
print "3"
...

I am not saying this is a gunicorn bug, this might just be how linux works, but it seems to me if the process that started gunicorn (in my case bash) exits before the fork can call os.setsid() the whole process group can get killed by the os.

Again, I only have superficial knowledge of how processes daemonize, so I will let you decide
if this needs fixing or not. I am completely happy with just adding a “sleep 1” to my shell script.

This is some epic debugging:

Reading about setsid(). Quote:
“””
When a user logs out from a session, all processes associated with that session are killed.
For […] daemons you do not want this to happen. The solution is to call setsid.
“””

I think part of the magic trick is that I am running things with fabric on a small ec2 instance.
Perhaps fabric “logs out from the session” right after the parent exits, but before the fork has a chance to call setsid().

I just managed to reproduce it with this small script by calling it with fabric on ec2.
It prints 0,1 but never gets to print 2.

import os
import time

def daemonize():

    print "0"

    if os.fork():
        print "1"
        os._exit(0)

    print "2"
    os.setsid()

    print "3"

    if os.fork():
        print "4"
        os._exit(0)

    print "5"

if __name__ == "__main__":
    daemonize()
    while True:
        print "i am a daemon"
        time.sleep(1)

So on one hand this may affect other python daemons too, on the other hand it’s a very very subtle race condition that only happens if you log our right after starting gunicorn and it probably requires the collaboration of the linux scheduler itself.

Either that or I am totally missing something.

In the end, sly010 traces the problems to Fabric, rather than Gunicorn:

So far I couldn’t reproduce this with plain openssh, but if I use fabric it fails to start very simple daemons regardless of programming language.
I might keep looking out of curiosity, but this is definitely not a gunicorn bug so I think you are safe to close this.

Post external references

  1. 1
    https://github.com/benoitc/gunicorn/issues/949
Source