An architecture of small apps

(written by lawrence, however indented passages are often quotes)

I am currently working at a well known magazine company. For their website we use a monstrous PHP/Symfony CMS which we call “megatron”. There has been some discussion about adopting a better architecture. The current system is very slow: it takes an average of 10 seconds to render a page. The North American websites for this company get 23 million page views a month. We rely on several cache strategies, including Varnish and also the built-in cache system that is used by Symfony. Without cache it would take 230,000,000 seconds to render the web pages for everyone — that’s 2662 days (7.3 years) worth of rendering that needs to be done in 30 days. Obviously that can be handled if you spin up enough web servers, but at peak traffic the sysadmins find themselves struggling to manage the load. The tech team and the sysadmins are constantly dealing with certain problems:

1.) clearing the cache — because there are so many cache strategies in use, clearing the cache is very complicated. We have a script that is suppose to clear the cache, but it has been plagued with bugs

2.) slow rendering times that lead to the pages timing out, and then these error messages get cached in Varnish, which forces us to clear the cache again

3.) warming up the cache after the cache has been cleared — when traffic is high and the cache is empty, the web servers are overwhelmed

4.) spinning up servers and then spinning servers down — sometimes an article goes viral and our traffic spikes, and which point the sysadmins must race to spin up new servers. When the traffic fades the sysadmins need to turn off the servers, so as to save money.

All of these problems have 2 sources:

a.) the software is very slow

b.) the framework is monolithic such that the full codebase must be copied to each server — you can not put the user system on one server and the slideshows on a different server — the monolithic nature of the codebase limits the freedom of the sysadmins to invent an intelligent server topology.

Internally, the tech team uses Yammer to share articles and talk about issues. We have had several conversations about the correct architecture for the company’s needs. I wrote:

Many companies have moved to a model where they have many small apps, each specialized to do one thing well, and each running on a different port, and then they use Nginx to reverse proxy all of those apps to port 80, to create the illusion that the website is just a single app running on port 80.

The website Talking Points Memo had an excellent article about why the traditional CMS is broken, and why the new architecture, of small apps, is important. The article seems to be offline right now, but I found a cached version of it:

http://web.archive.org/web/20120428130433/http://labs.talkingpointsmemo.com/2011/07/the-twilight-of-the-cms.php

In my personal projects, I’ve moved to a model where I write a small Clojure app for each task that needs to happen, and run each app on a different port. For instance, user signup can be its own app, and user login can be its own app, and any background task can be its own app. For anything where you might write a cron job, I tend to write a separate app. Anything that handles email can be its own app. Slideshows can be handled by an app whose output is JSON, with a template handled by Angular.js. Any kind of admin dashboard can be its own app. For pages that do not have to update more frequently than every hour or two, it should be easy to have an app that grabs data from the database and writes static HTML files which can then be easily served in a traditional manner.

Small indepdent apps are less likely to have some feature that is unexpectedly broken by what was suppose to be an unrelated change. And small apps are easier for newly hired developers to learn about. Large, monolithic CMSs tend to impose a steep, up-front cost on developers — the developer needs to learn a lot before they can be productive. And small, indepdent apps give sysadmins far more freedom about how they spread load across servers.

One last thing: there is an important asymmetry between an architecture of small apps and an architecture of The Monolithic CMS. If you have small apps, and decide you want to move to a monolithic CMS, then you must do The Big Rewrite: the exhausting effort of reproducing all of your funtionality so that it is handled by your one, all-consuming CMS. But when you move from the monolithic CMS to an architecture of small apps, there is no need for The Big Rewrite. Instead you take some small part of your CMS, rewrite it as an independent app, set the app on its own port, use your webserver to proxy the app-on-a-port to whatever URL the CMS was previously using for that functionality, and thus you’ve taken a step towards a new architecture without having to rewrite anything but the functionality covered by your new, small app

One of my co-workers responded:

I think there is a disparity here between our API service providing a variety of views (REST: XML/JSON) on data gathered from a multitude of data sources (db, solr, feeds, 4d, etc.), versus the term CMS (which for us can refer to megatron/admin, a Symfony2 app served by our API service). In fact, quickly scanning the article, the closest match to the term CMS in the article we have is …ContentSectionGenerator. I agree, it needs replacing; it’s clunky and slow, and awkward to develop for, but I guess met the business needs at the time. The web has moved on, and it seem to be bodging in features.

In my opinion, a multitude of small apps can lead to code duplication (“I didn’t know that code already existed”), developer/qa unfamiliarity (“I’ve never worked on that app”), and a confusing release strategy (“which version of Y is running?”) with potential integration incompatibility; a middle ground needs to be found.

Proxying multiple apps on the same port might allow a CMS user to change a small element at a time, but not construct a fully featured web page. Proxying is definitely more useful for front-end than admin, or instead the use of ESI (which Symfony supports) would allow a better inclusion of multiple components into a single page. However, there seems to be a confusion above between the use of proxying, which is great for front-end users, and the ability of an admin to build a page, say a Hub page. Also, admins don’t care about maintaining CMS URLs, and replicating the same URL structure/parameters of ContentSectionGenerator would be a fallacy.

I should note: the “ContentSectionGenerator” that he mentions is an “app” inside of our Symfony system which allows our staff to create new blocks of content for the site using either text or HTML or Javascript or PHP code, and then these sections can be made to appear either in the sidebar or in the nav bar or in the main section where we show articles.

I responded:

You speak as if apps-on-different-ports-proxied-to-port-80 is a theory, but most of the big sites use a variation on this architecture. In fact, most have moved much further, to extremely complex workflows that dynamically map various ports to various workflows. I don’t want to bombard you with links, but there are literally hundreds of articles I could point to regarding Facebook, Google, Yahoo, or Twitter. I will limit myself to 1 article about Twitter, which I think is an extreme case, in terms of the kinds of complex mappings that they describe:

https://blog.twitter.com/2011/twitter-search-now-3x-faster

The section under “MULTIPLEXING INCOMING REQUESTS” describes an advanced system that I’m sure goes beyond our needs, but it still an interesting read.

Regarding code duplication, as I’m sure you know, the best way to handle that is with shared gems (in Ruby) or shared jars (on the JVM). Any code that is needed by multiple apps should become its own gem/jar, in its own repository. I have worked at companies that re-wrote their code from PHP to Ruby and used shared gems to limit code redundancy — this worked fairly well.

I agree that it is a bad idea to proxy multiple apps to “construct a fully featured web page”. I normally think of the different apps handling different functionality. Assuming you have the domain “example.com” then you might have an app that handles user login and you spin that up on, say, port 30000 and you map it to port 80 such that is appears as:

http://www.example.com/login

This same app would also be available at http://www.example.com:30000/ but we could use a firewall to block direct access to port 30000. I see no harm in allowing the app to appear http://www.example.com:30000/ except that it might cause confusion for some users, if they ever stumbled upon that URL. The normal argument for mapping everything to port 80 is that many users will be behind firewalls that limit them to ports 80 and 443 (also, if your frontend makes Ajax calls, calls to a different port run into the Same-Origin policy, so everything needs to be proxied to port 80 so your Ajax code will see it as being on the same domain as the rest of your site). Indeed, here in New York City some of my favorite coffeeshops have firewalls that block everything except 80 and 443 — I have written to some of them and explained to them the importance of leaving port 22 open, as I can not get any work done without ssh.

You might also have an app that handles user signups which you spin up on port 30001 and you map to port 80 such that it appears as:

http://www.example.com/signup

You might also have an admin app that you spin up on port 30002 and you map to

http://admin.example.com/

You might also have an app that allows users to update their profile information and you spin that up on port 30003 and map that to

http://www.example.com/profile

And perhaps you have an app that allows users to engage in live chat, so you spin that up on port 30004 and map that to

http://www.example.com/chat

And you might have an app that publishes much of your content as static HTML files, which you spin up on no port, as it does not accept TCP/IP requests — instead it queries the database and then creates static html files, which you save to some directory such as /var/www/example.com/public_html/ and you map that to

http://www.example.com/

You can see how this gives the sysadmins a lot of freedom to spread load across servers in creative ways — if 1 app becomes especially popular, or resource hungry, the sysadmins can rather easily move it to its own server (or set of servers). This is one of the main reasons most big sites move to an architecture like this — it facilitates fine grained control of what sort of requests go to a particular server.

Above, I keep saying “proxy the app to port 80″. How is that done? On one server I have, running Apache, I open this file:

/etc/apache2/sites-available/000-default.conf

And I added these lines:

ProxyPreserveHost on
ProxyPass /api/ http://127.0.0.1:34000/
ProxyPassReverse /api/ http://127.0.0.1:34000/
ProxyPass /api http://127.0.0.1:34000/
ProxyPassReverse /api http://127.0.0.1:34000/

ProxyPass /user/ http://127.0.0.1:34002/
ProxyPassReverse /user/ http://127.0.0.1:34002/
ProxyPass /user http://127.0.0.1:34002/
ProxyPassReverse /user http://127.0.0.1:34002/

So my API app, on port 34000, gets proxied to “/api” on port 80, and my “user” app, on port 34002, gets proxied to “/user” on port 80.

Finally, about this:

” developer/qa unfamiliarity (“I’ve never worked on that app”) ”

That is a good point, but of course, at a certain size, this has to happen — companies reach a size where no developer can be expected to know all of the software in the company. Specialization occurs. Our company has already reached that point where the mobile apps are concerned — the IOS developers no nothing about the website and the web developers know nothing about the mobile apps. If in 5 years the company doubles in size, I’m sure the company will be at the point where no 1 developer knows all the software in use on the web site. At some point, you simply need to accept that as a fact of life. And when that day arrives, small apps at least offer the advantage that a new developer can learn one small part of the company’s code base, and learn it well, and quickly, and start making useful contributions to that part of the company’s software.

One last thought: having separate apps means you can have complete separation between your dynamic pages and your static pages. Frameworks like Rails and Symfony tend to assume you have a dynamic site, and then they throw in some caching to help with the static bits. Separate apps allow you serve your static content as genuinely static content. And the real-time services you offer, and the interactions that you offer to logged in users — those can all be handled by specialized apps.

My co-worker wrote:

Another potential problem with this approach (small specialized apps) is, how do you set it up for a development environment? Since you have to set up several apps working together, what do you do? Do you set up just the app you are going to be working on and use some sort of shared development app for the others? Do you set up everything in your local environment? Any article/post on how someone has sorted this out? Thanks!

To which I responded:

Those are good questions. Before answering them, I would point out that at certain size, it is no longer possible for each programmer in a company to set up each app used by the company — at some point the programmers specialize, and only focus on a few of the apps that are in use at the company. That is already happening at our company: the mobile team is mostly separate from the web team.

But, regarding your questions, how to set up multiple apps would depend on what language/eco-system you are working with. With PHP, you would setup a bunch of different vhosts, linked to a bunch of different directories — the various directories where you have your various apps. If you are using PHP-FPM and Nginx, most of the configuration, mapping ports to directories, would happen in the Nginx config file. In most Nginx config files, for each vhost you will see the line “listen 80″ but you don’t have to listen on port 80. You can change that number to any port.

I’ve been writing Clojure apps, where the whole things becomes easy. Instead of using Apache or Nginx, I have 1 line of code that embeds a webserver (Jetty) inside of my app, and that webserver takes its config information from what I set in the app. I can list the port number on the command line when I start the app. Because the webserver is inside of my app, if I have 10 apps running, I also have 10 different webservers running, each one differently configured.

I would start the app from the command line like this:

java -jar user-signup-app.jar 30000

That starts the signup app on port 30000. And then:

java -jar user-profile-app.jar 30001

That starts the user profile app on port 30001.

This has worked for me so far, though I have read that this is inefficient because I am running multiple JVM instances and they compete against each other for resources. The newest version of the JVM is suppose to be multi-tenant so I believe this problem will disappear in the future.

I understand you are asking how does this scale? I have never had to start more than 6 apps at once, but I think you are asking “What if I was in a complex environment where I had 200 apps and I had to start them all because they all needed services from each other?” I would offer 2 answers to that: one, you can easily write a shell script that has 200 lines, each like the examples I gave above, but, two, you should not have 200 apps that all need each other, that would be spaghetti code. Typically I only have 2 or 3 apps that the other apps depend on — the login app and an API app being 2 common apps that I need running for everything else I do.

In the land of PHP one normally deals with a lot of different software. For instance, the webserver might be Apache or Nginx, and they are configured independently of PHP. Also you have databases that are usually indepedent: MySql, Solr, MongoDb. And also, with PHP, if you want to store information in memory, you use Memcache, which has its own configuration. Also, with any of the interpreted scripting languages (PHP, Ruby, Perl, Python) you typically have to deal with thousands of files, which sometimes makes deployment a pain. With compiled languages things are different — the app manages its own memory, it doesn’t need Memcache. In my Clojure apps, I’ve been compiling everything into 1 binary file: all of the HTML files, all of the CSS files, all of the Javascript files, all of the images, all of the code, and also the webserver, all get compiled into a single binary file that I can then put anywhere on the server.

When your whole app is only 1 file, deployment becomes very easy.

Using “java -jar” from the command line is an informal way to start an app, but it works fine when you are doing development. When you go into production, you need to daemonize your Java app. The simplest way to detach your command from the terminal (other than using hte “screen” command, which is also informal) is something like:

java -jar user-profile-app.jar 30001 </dev/null >/dev/null 2>&1 &

There are 4 main ways of running JVM apps as daemons:

1.) Use the Java Service Wrapper

2.) Use the Apache Jakarta Commons Daemon package

3.) Use a shell script

4.) start the daemon from init.d

I usually go with #4. I use the stop-start-daemon to start my other daemons, I put this in init.d:

#!/bin/sh
### BEGIN INIT INFO
# Provides: login_service
# X-Interactive: true
# Short-Description: Start/stop login_service server
### END INIT INFO

WORK_DIR=”/home/dega”
NAME=”login_service”
JAR=”login_service-0.1-standalone.jar”
USER=”dega”
DAEMON=”/usr/bin/java”
DAEMON_ARGS=” -jar $WORK_DIR/$JAR ”

#export LOGIN_SERVICE_TOKEN=”

start () {
echo “Starting login_service…”
if [ ! -f $WORK_DIR/login_service.pid ]; then
start-stop-daemon –start –verbose –background –chdir $WORK_DIR –exec $DAEMON –pidfile $WORK_DIR/login_service.pid –chuid “$USER” –make-pidfile — $DAEMON_ARGS
else
echo “login_service is already running…”
fi
}

stop () {
echo “Stopping login_service…”
start-stop-daemon –stop –exec $DAEMON –pidfile $WORK_DIR/login_service.pid
rm $WORK_DIR/login_service.pid
}

case $1 in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
esac

But in production you need to worry about what happens when the server dies. There are many systems that handle monitoring and automatic restarts, but I do not know the details of how you might configure something like Supervisor or Chef to manage your apps. For Ruby apps there are services like Heroku that automatically restart if your dyno dies, and for Clojure there is a similar service from Amazon called Amazon Beanstalk.

These apps on different ports need to talk to each other. They could use IP/TCP but some would argue that in this context that can be a little slow, as it involves the whole network stack. They could also use a Linux primitive like a FIFO named pipe. I have started using ZeroMQ, which despite its name is actually a network library, one that puts beautiful abstractions over some of the older inter-process communication protocols.

By the way, if you want to read about a really radically unusual kind of architecture, you should read what my friend Colin Steele wrote. He is the CTO of RoomKey.com, and at RoomKey they did something very extreme: they compiled everything, including their database, into their app. So all of their HTML, CSS, Javascript, code, webserver, and the database, all end up as one file, which they can deploy anywhere on their server. As Colin says, since the app has no outside dependencies of any kind, it is guaranteed to work anywhere there is a functioning JVM. Even if, years from now, you fetch an old copy of their app from years before, the app will simply work, since it has everything it needs inside of itself. Colin’s essay is here:

http://www.colinsteele.org/post/27929539434/60-000-growth-in-7-months-using-clojure-and-aws

The end result of RoomKey’s unusual decisions:

“…resulting in a system that has scaled to nearly 700,000 uniques/day with just a handful of machines.”

and:

“Last month, when traffic shot up by over 50%, no one noticed.”

Oh, and regarding this:

“how do you set it up for a development environment?”

I forgot to give you an answer for Rails. I’m sure there is a lot of ways to do this, but for me the easiest would be to use the StandAlone version of Phusion Passenger.

cd to the directory of a Rails app and start it like this on the command line:

passenger start -p 80

You can set any port number, so if you had several apps you would cd to the directory of each app and start each of them up on different ports.

Of course if you were following my advice, and building small apps, you would probably be using something like Sinatra, rather than Rails.

My co-worker wrote:

But how would you do that for a PHP application like our CMS?

I responded:

In terms of running different apps on different ports, I think the PHP eco-system lags behind in terms of making this easy. The short answer is: Nginx config. It should be possible to have different vhosts that listen on different ports. That is the point of the line “listen 80″ that you see in Nginx config files. You are specifiying the port the given app should listen on.

A different co-worker wrote:

Seems like a bit of a contradiction to say that the monolithic application has issues because one change can have far reaching effects and then claim that the mini-apps can reduce duplication by sharing libraries.

To which I responded:

The advantage is that separate apps make explicit what functionality you really need to share.

I have worked at companies where the code base was an endless maze — this happens too often. If all the code is contained within a single framework, you will often find it difficult to track the dependencies. In our own code, megatron, we have a rough guess about where the dependencies are, and yet just this month we were surprised by bugs because a change in one place caused a bug in another place — a place we assumed would be unaffected. And I recently saw a big PHP framework pulled apart into several small Ruby apps, and the dependencies were made explicit in the shared gems — and I think we were surprised at how much the dependencies could be simplified.

Likewise, I suspect, if our CMS was rewritten as a bunch of separate apps, we would be surprised at how much we could minimize the shared dependencies, and we would also gain the advantage of knowing exactly what the shared dependencies really are.

Another co-worker (an advocate for Scala) wrote:

You complain about bugs and slowness but the real problem is the use of dynamic languages. We are using Scala for some projects here in London and I would like to see more use of Scala. The strictness of Scala protects from many of the things you are complaining about. Clojure is no better than PHP or Ruby.

To which I responded:

In terms of speed, benchmarks show that Clojure runs, on average, 20 to 30 times faster than Ruby or PHP. However, with type hinting, Clojure can run as fast as the fastest Java code, which is very fast. And, if you really need to, you can use macros to generate code that is fully type-hinted. John Lawrence Aspden has written a very interesting article called Clojure Faster than Machine Code? in which he examines some very clever tricks you can do with macros and type hinting and he concludes:

This technique strikes me as very general, and very useful. All sorts of things can be represented as lookups in tables. I think that the program should be as fast as the equivalent java program would be, although I haven’t got around to actually testing that, so I may have dropped the ball somewhere. The JVM is widely thought to be around the same speed as native machine code or optimized C. I’m absolutely sure that I’m not able to write the equivalent program in Java, C, or assembler without code-generation. The code generation would be very very very much harder in Java, C or assembler. And so I wonder, is Clojure the fastest computer language in the world?

As for the strictness of Scala, there are attempts being made to add typing to Clojure. Also, I find most times the strictness of static langauges goes far beyond what is practical — the language becomes verbose with very little gain in correctness. If strict typing actually eliminated all bugs then we would all be using strict typing, but of course, we all realize how far from true such an assertion is.

Added on 2014-03-31: I see that what I wrote about is now being called something like a “micro-services” approach. Check out this excellent piece on the benefits of Clojure/MongoDB and microservices.

Source