Why is code so awful at Etsy?

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

Paul Graham says that startups need to beat the averages:

In a big company, you can do what all the other big companies are doing. But a startup can’t do what all the other startups do. I don’t think a lot of people realize this, even in startups.

The average big company grows at about ten percent a year. So if you’re running a big company and you do everything the way the average big company does it, you can expect to do as well as the average big company– that is, to grow about ten percent a year.

The same thing will happen if you’re running a startup, of course. If you do everything the way the average startup does it, you should expect average performance. The problem here is, average performance means that you’ll go out of business. The survival rate for startups is way less than fifty percent. So if you’re running a startup, you had better be doing something odd. If not, you’re in trouble.

And yet this is how the technology at Etsy has evolved, and this does not sound like a healthy direction:

The first step down that path to escaping from the architectural hole Etsy had put itself in was stabilizing Sprouter and the rest of the infrastructure. That included improving metrics and monitoring of performance—which, Snyder joked, could be improved by “having any metrics and monitoring.” The engineering team also upgraded Etsy’s database hardware as much as practically possible. “We upgraded the master database to the limits of what was possible,” Snyder said. “It still wasn’t enough, but it bought us breathing room.”

With that little bit of breathing room (and the accompanying downtime for the upgrade), Etsy began to shift to a new architecture—still based on PHP on the front end, but now running on Apache web servers with connections to databases directly through object-relational mapping.

And the team started to shift feature by feature away from a semi-monolithic Postgres back-end to sharded MySQL databases. “It’s a battle-tested approach,” Snyder said. “Flickr is using it on an enormous scale. It scales horizontally, basically, to near infinity, and there’s no single point of failure—it’s all master to master replication.”

With frequent small releases, and incremental migration of features away from Sprouter, it took until spring of this year for Etsy to completely move off the middleware and turn it off for good. “I got to be the one to remove it from source control,” Snyder said. The Postgres database, however, still remains—and likely will for some time.

One of the lessons learned from Sprouter, Snyder said, was that “if you’re doing something ‘clever,” you’re probably doing it wrong.”

Post external references

  1. 1
    http://www.paulgraham.com/avg.html
  2. 2
    http://arstechnica.com/business/2011/10/when-clever-goes-wrong-how-etsy-overcame-poor-architectural-choices/
Source