The specialization that comes with growth

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at:, or follow me on Twitter.

Lethain’s experience at Uber. This is very good and worth reading:

When I joined Uber, the Infrastructure organization consisted of three teams (whose names were unhelpfully generic, so I’m renaming them a bit for clarity): developer productivity who worked on build and test (~4 engineers), storage engineering (~6 engineers) who worked on scaling real-time storage, and operations (~5 engineers) who did everything else to support the company’s ~200 engineers, ~2,000 employees, and ~400% YoY growth in both usage and engineering headcount.

The first two teams were focused on acute, critical projects: keeping the engineering team productive and sharding our data to ensure we didn’t exhaust the disk space on the largest-we-could-buy hardware supporting our primary database cluster. The third team, the one I joined as its engineering manager, was responsible for keeping everything else going while the first two teams addressed their urgent focus areas.

On operations, our immediate challenges were significant: our self-managed compute cluster ran out of capacity every Friday leading to reduced availability (and at that point we were in a managed datacenter with limited capacity), our Kafka cluster was experiencing significant challenges with load, our Graphite cluster was frequently going down under load, the recently introduced move to a service oriented architecture depended on our team doing one to two days of work for each additional service, with new service provisioning requests coming in daily, and we handled on-call for the entire company with literally hundreds of alerts coming in most on-call shifts (it was not unusual for your phone’s battery to die during the 12 hour, follow the sun shift).

This was, objectively, a pretty difficult situation. 

Post external references

  1. 1