December 2nd, 2015
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: email@example.com
This was the first time we really learned about the thundering herd problem. The site went down 30 minutes before the broadcast was scheduled to start, as many fans had gone to the page in advance and started doing stuff on the site: signing up, logging in, favoriting the Jonas Brothers’ channel page. All these dynamic actions, plus the constant refreshing of the page to check if the stream was up (our equivalent of processes “going to back to sleep, only to wake up again”), created a massive strain on the application servers, and the site proceeded to fall over. Concurrently the video system fell over, because there were too many simultaneous requests, and we couldn’t bring online enough video servers to serve them.
While Emmett and Kyle scrambled to statically cache the page, shut down any and all dynamic features on the site, and live release production changes to the video player to manually control when requests were made to the video system, Michael and I took turns on the phone with Jonas Brothers management explaining what was going on (or trying to). At first, when the site was down before the scheduled broadcast, we told them we had taken the site down for maintenance to make sure everything worked (I’m not proud for lying about this – I literally had no idea what to say). As time ticked by and the broadcast start time came and went, we ran out of excuses and started just telling whoever from management happened to be calling (different people were calling us angrily every few minutes), that they should call whoever wasn’t on the phone at the moment for the most up-to-date update (Michael and I were standing in the same room, sweating bullets).
Just at that moment, while Michael and I were at our peak freak-out, our office manager, Arram (who went on to found ZeroCater), walked by and casually said something that’s stuck with me to this day: “Officers don’t have morale problems.”
I wish I could say that I realized the truth of those words and immediately pulled myself together, and provided an example of calm and stability to a very stressed out team. Instead I think I screamed out something along the lines of “What the fuck are you talking about?!” and continued bemoaning whatever I’d done in a previous life to deserve having been so close to startup success only to see our prospects swirling down the drain.
After what felt like decades, Kyle and Emmett got the site in a functional state and the broadcast proceeded, late by 25 minutes or so. In retrospect, not the end of the world. Hollywood Records lost all faith in us and did the rest of their promotional broadcasts on Ustream. Eventually, we got better at scaling.