The sad, slow way a system of cron scripts becomes ugly

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com

Do you have a chore that needs to run in the background, maybe once a day, or once an hour? Cron scripts will save you! They are the most amazingly amazing thing God has invented since Adam and Eve! For sheer wonderfulality they have no equal among the products of mortal fallen flesh!

At least at first.

The simple cron script is wonderfully direct and efficient. But the first can lead to a second. The second can lead to a third. The third can lead to hell.

Let’s take a recent example from my own life.

I agreed to work for a company for 6 months, to help them rescue their tech. They want to get rid of their PHP and Ruby and move to NodeJs. Okay, awesome. Sounds like a fun project.

However, the old system needs to keep working for several more months. And the old system had a PHP script that ran once a day to import some data. Okay, fine. Simple and straightforward.

But then there were changes. The Business Intelligence team signed a contract with an outside firm, to send us more data about more industries.

Suddenly, the PHP script wasn’t good enough. We had to import millions of records a day. It was taking 15 or even 20 hours to import the data!

How do we speed things up? PHP is single threaded. We need concurrency. We want to run queries in parallel, to max out the pipe going into MySql.

Of course, the previous developer was in a hurry. Was there a quick tweak that delivers more speed? Sure, we could use more scripts, and possibly more cron scripts. But rememeber, a single script might be simple and straightforward, but multiple scripts, each taking as input the output of some other script, can fast become a mess. The road to hell is paved primarily with cron scripts.

How about this:

1.) we re-write the first PHP script. Instead of inserting data to MySQL, it simply puts the data on a RabbitMQ queue

2.) we write a second PHP script which can take data from RabbitMQ and put it into MySQL

3.) we use Supervisord to launch 10 instances of the second PHP script. So now writes to MySQL are happening 10 times faster, possibly hitting the limit in terms of bandwidth to MySQL

And that gives us a speed boost?

Yes!

So now everything is like a better version of Utopia?

Not exactly.

The 10 worker scripts do an awesome job the first day they run.

The second day they fail to work. Why? Well, of course, the connection to MySQL has hit its timeout.

Damn.

So how do we stay connected to MySQL forever?

We have a few options. Two are obvious. One is terrible and one is okay:

1.) reconnect to MySQL for every record — this is grossly inefficient and will exhaust the number of connections to MySQL

2.) use a function like mysql_ping to renew the connection to the database.

Option #2 is great and allows the 10 worker scripts to work for several days.

But then something happens. Perhaps the database instance in AWS is unavailable, or needs to reboot. You eventually run into a situation where mysql_ping is not enough.

So what do you do then? There are two more options, one terrible and one good, but a bit long:

1.) write a cron script, set to run as root, which hits the 10 worker scripts with “kill”. We know that Supervisord will automatically restart them, and when they restart they will reconnect to MySQL again. So this is a seemingly easy way to get the 10 worker scripts to renew their database connection everyday.

2.) Have the PHP code listen for connection errors to MySQL. Try to figure out which kind of connection errors you are getting, and try to accomodate them all. This is the best option, but it can get verbose, depending on how many failure modes you need to support. Option #1 is simple and clean.

But what exactly have we built? Can you imagine being the next programmer who has to figure this system out?

Assuming we go with option #1, what if this software does not get rebuilt, even after everything moves to NodeJS? What if some programmer has to figure out this system 2 years from now?

If something goes wrong, here are all the places the programmer of the future should check:

/var/log/supervisor.log

/var/log/rabbitmq.log

/var/log/php.error.og

/var/log/import-records.stdout.log

/var/log/import-records.stderr.log

The last one was specified as the stdout and stderr logs in the Supervisord conf entry that launches the 10 workers.

They could also check the MySQL logs, which are on a different server.

If they’d like to see what launches the 3 scripts, they need to look in 3 places:

The crontab for our standard user “ad-first”

The crontab for root

/etc/supervisord/conf.d/import-records.conf

Many smart people, including some of my smartest friends, insist that a monolithic framework can save this situation. Some love Ruby On Rails, some love Symphony and some love Django. And I 100% agree that using such a framework greatly improves the situation. At least in terms of handling the username and password and host for connecting to the database — a framework can centralized that. Also, instead of using cron, a framework like Rails has some gems that offers all kinds of scheduling options.

But I personally would not use a big monolithic framework for this, unless the company was already using a big monolithic framework. Yes, totally, if the whole company uses Ruby On Rails, then using Rails for this becomes obvious. Rails has great functionality for this kind of thing (So does Symphony, Django, etc, etc, etc, please insert your favorite framework here). However, if the company isn’t using any of those frameworks, introducing something so massive for something so small is truly overkill.

More so, what we are talking about here is clearly a system, that is, a collection of functionality that has to work cohesively to accomplish the goal. So my preference for dealing with the situation would be a specialized app whose whole goal is accomplishing what these cron scripts do. The app can be in any language, so long as it all exists together. Since the company I’m currently at is moving to NodeJS, the appropriate place to start would be to build a Node app. However, Node has the problem that it does not handle unbounded concurrency very well. See “Backpressure and Unbounded Concurrency in Node.js” by Matt Ranney. This isn’t relevant in most situations, but can become an issue in this particular case, importing massive amounts of data.

If I were facing a situation where the data coming in was such that back pressure were a concern, I would write a Clojure app and use Supervisord to be sure it is running. The app could then provide the concurrency that’s needed to import all this data. I would use a library like at-at to ensure the timing of the imports.

Unix has the philosophy of small utilities that do one thing and do it well. And this would be in keeping with that philosophy. I would build a specialized app that was focused on this import, and which could handle every aspect of this import, including the various failures.

Cron scripts are really nice, up to a point. They are easy to write and very easy to launch. But the moment you find yourself writing cron scripts to process or help the work of another cron script, you should stop and remind yourself, “The road to hell is paved primarily with cron scripts”. And then you should think about finding another way forward.

Source