February 11th, 2017
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: email@example.com
I just had a project where I had to write an app that could fire several million requests against an API, with data pulled from a database. I wrote the whole thing in NodeJS and HapiJS, and it was an agony. The uncontrolled async of NodeJS tripped me up. If I need to fire 10 million HTTP requests, how do I do that without blowing the stack? As it was, the functions would pile up, and then I’d get a “stack exhausted” error. This was true even if I gave the app 8 gigs of RAM.
I eventually re-wrote the whole thing in Clojure.
So, I’m open to hearing some criticism of NodeJS.
The primary thing these applications crave are awesome JSON APIs (and Websockets… stay tuned). So why should you use Rails for a JSON API? Isn’t Rails designed for HTML/JS pages? What benefit does Rails give you for building JSON APIs? And isn’t Rails really slow?
Well no, I’ve been through this before. If you are building API-only applications with a single-page HTML5/JS frontend, you should definitely check out Rails::API. Rails::API completely eliminates any ActionView-centrism you may be worried about in Rails, and gives you awesome tools for building JSON APIs, like ActiveModel::Serializers. But that alone can’t express what Rails brings to the table, so here as list of features Rails provides which are useful for JSON APIs, courtesy the Rails::API README:
Handled at the middleware layer:
Reloading: Rails applications support transparent reloading. This works even if your application gets big and restarting the server for every request becomes non-viable.
Development Mode: Rails application come with smart defaults for development, making development pleasant without compromising production-time performance.
Test Mode: Ditto test mode.
Logging: Rails applications log every request, with a level of verbosity appropriate for the current mode. Rails logs in development include information about the request environment, database queries, and basic performance information.
Security: Rails detects and thwarts IP spoofing attacks and handles cryptographic signatures in a timing attack aware way. Don’t know what an IP spoofing attack or a timing attack is? Exactly.
Parameter Parsing: Want to specify your parameters as JSON instead of as a URL-encoded String? No problem. Rails will decode the JSON for you and make it available in params. Want to use nested URL-encoded params? That works too.
Conditional GETs: Rails handles conditional GET, (ETag and Last-Modified), processing request headers and returning the correct response headers and status code. All you need to do is use the stale? check in your controller, and Rails will handle all of the HTTP details for you.
Caching: If you use dirty? with public cache control, Rails will automatically cache your responses. You can easily configure the cache store.
HEAD requests: Rails will transparently convert HEAD requests into GET requests, and return just the headers on the way out. This makes HEAD work reliably in all Rails APIs.
A problem I am certain you have run into is the manual nature of serializing JSON. Exactly how should you translate from a domain object into a JSON representation? What if the client wants to avoid repeat requests by eagerly loading other domain objects which are associated with the one you want to retrieve and including them in the JSON result? And wouldn’t it be great if there were a single canonical representation for all of this that a standardized domain object abstraction running in the browser could automatically consume for us, so we don’t have to manually write a bunch of JSON serialization and deserialization logic for everything in our system?
Can we put JSON on Rails? Yes we can: it’s called ActiveModel::Serializers and Ember Data. All that glue code you’ve been writing over and over for serializing and unserializing JSON? Stop that. Seriously. You have better things to do than deal with the idiosyncrasies of whether you should wrap a particular array in an object or return a literal string or number as opposed to an object for future proofing. You are wasting your time with this minutiae and chances are the ActiveModel::Serializers representation is better than the one you are using. Let’s take a look at why.
Celluloid solves every single problem you’re whining about better than Node
Node has a lot of problems, and I’m not just talking about the audience it attracts. Let me start by saying this: many of the things I have built in Celluloid are based off of technologies originally developed for Node. My web server Reel uses the Node HTTP parser, and it’s quite likely that the next iteration of nio4r I develop will be based off of libuv.
All that said, let me start with Node’s fundamental problem: callback-driven I/O. Celluloid::IO is one of many systems, including Erlang and Go, that demonstrate that “nonblocking” and “evented” I/O are orthogonal to callbacks. Celluloid uses Ruby’s coroutine mechanism to provide a synchronous I/O API on top of an underlying nonblocking system. However, where systems like Node force you to use nonblocking I/O for everything, Celluloid lets you mix and match blocking and nonblocking I/O as your needs demand.
If you have ever worked in a language like C(++) or Java, you probably know an amazing property of sockets: you can mix and match blocking and nonblocking I/O, even over the lifecycle of a single socket. Perhaps you will handle incoming sockets in a nonblocking manner at first, but if they make a complex request, you might change the socket to a blocking mode and hand it off to a worker thread.
Celluloid::IO makes this handoff completely transparent: simply by giving the socket to another Ruby thread which isn’t a Celluloid::IO actor, it will automatically switch from nonblocking to blocking mode completely transparently.
But let’s talk about Node’s real fundamental problem, one that is extremely difficult to solve in any callback-driven system: flow control. Unfortunately the Node.js community has adopted the phrase “flow control” to mean “building abstractions around managing callbacks”, however the phrase “flow control” has a very specific definition relating to the rates at which data is transmitted between systems.
In general, callback-driven systems can’t manage flow control effectively. The most notable pathological case is the producer-consumer problem, whereby a slow consumer might force a system like Node to unboundedly buffer data from an unchecked producer. There’s a clear and simple solution to this problem: make all I/O synchronous. Using coroutines that provide blocking-style APIs, you can easily compose producer/consumer problems in a manner that doesn’t result in unbounded writes to a buffer, because simply by virtue of a virtual blocking API, the rate at which data is transfered from producer to consumer is kept in check.