Why are large companies so difficult to rescue (regarding bad internal technology)

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

I worry there is a lot of glib, superficial rhetoric coming out of Silicon Valley about the importance of being agile in one’s development processes. There are too many assumptions being made about the ease of introducing agile techniques, and which problems are solved by agile techniques. This essay is my attempt to offer a corrective.

Over the last 20 years I was the technical co-founder at three startups, two of which grew to modest size and then were sold (I talked about this a little in the first book that I wrote). I have done some consulting at medium to large sized companies, by which I mean, companies that had a few offices and maybe, at most, 3,000 people. Especially after I moved to New York City (in 2009) I worked as a rescue agent, trying to save older media companies that had fallen behind the times and were trying to catch up.

Now I am working with the largest company that I have ever worked with. It has 11,000 employees and it operates in 180 companies. It is a famous company that most of you would know, and many of you have been its customers. In this essay, I will call it SuperRentalCorp.

A lot has been written on the theme “startups are agile but big companies are dinosaurs that can’t get anything done.” And a lot of business writers have written interesting books about how big companies can restructure to be a bit more agile. For instance, Eric Ries offers some interesting ideas in his book The Startup Way.

But why is it so hard to rescue the technology situation at a big company?

SuperRentalCorp is a great example of some of the problems that come up.

Here is how I was originally told that they needed help. A friend of mine who was already consulting with them said “Hey, this company needs help setting up an API. Do you know how to setup an API? Can you help them with that?”

He told me that the company had been trying to build an API for two years, but so far they had failed.

I thought to myself, “Two years to build an API! For God’s Sake, it is 2019, there are a million tools that make it easy to set up an API. How can they possibly think this is difficult? A good engineer can knock this out in a day. Why are they struggling?”

But my initial thinking sort of assumed type of scenario where you have one database, and you are just putting the API between the database and the outside world, which is a common scenario at the early stage startups that I’ve worked at. With a greenfield Ruby On Rails project, you can seriously build this kind of API in an hour.

There are two big problems that plague rescue efforts at big companies: history and trust. I’ll talk about history first.

When I say “history” I don’t mean simply dealing with legacy apps, I also mean dealing with the consequences of decisions that different CEOs have made over the last 30 or 40 years, as the company adapted to the changing winds of each decade’s zeitgeist.

SuperRentalCorp got started more than 100 years ago, when large parts of the world were governed by a few empires run from Europe, and most corporations ran worldwide operations from their homes in some Western country. But then in the era between 1948 and 1980 all of the old empires broke apart, and were replaced by over a hundred new countries, each of which was protective of its new independence. So SuperRentalCorp decided to adapt a decentralized structure. It setup subsidiaries for most countries, and the subsidiaries operated as independent companies. Sometimes partial ownership of the subsidiary was sold. This meant that when the first wave of big databases arrived in the late 1970s, this company had no central database, no central IT department, no CTO or CIO.

A little later, as the political situation seemed safer, the benefits of consolidating some services became obvious, so some of the subsidiaries merged, creating regional companies. There was one corporation for the MidEast and North Africa, one for Europe, one for North America, one for Asia, one for South American. With this structure, the company entered the 1990s, and this is when the company got serious about managing all services through networked databases.

Facing intense global competition in the 1990s, the company decided to grow by buying its most successful competitors. If I told you all of the name brands, you would recognize most of them, but you’d probably be surprised to learn that all of them are now owned by a single corporation. I was surprised, as these were several companies that I thought of as competitors. But they are no longer competitors.

Then, around 2005, the Web 2.0 moment arrived, leading to an explosion of nimble competitors who were using the Internet to offer a service similar to SuperRentalCorp, but in new ways. Again, SuperRentalCorp bought several of these startups to squash the competition by absorbing it. Many of these startups only exist in one country, or a single market, such as the EU.

Very recently, the new CEO decided it would be best to unify the company. Several of the international subsidiaries have been 100% purchased and are now being restructured so they will operate as departments inside of the company, rather than independent corporations.

As part of this new focus on unification, SuperRentalCorp would like to create a single unified API, so the outside world thinks the company has an internally unified tech architecture. That is, SuperRentalCorp would like to create the illusion that the company has a single database, and interacting with that database is easy.

But what is the reality? SuperRentalCorp has 20 major databases, run by 20 different teams, in at least 10 different countries, many with a history of operating as an independent company, each team guarding their data, partly out of security fears, partly out of concerns about local laws regarding user privacy and the regulation of international transfers of user data, and partly out of sheer stubbornness.

As with any database operations, there are two concerns here, the reads and the writes. The reads are not that hard. We could pull the (necessary) data from 20 different databases, store the data in a centralized database that would act as a cache, and put an API between that database and the rest of the world. There would be minor issues about stale data, and we’d have to experiment to figure which data is high priority and needs to be copied over in a matter of seconds. Less important data could be copied over every 5 minutes, or on a trigger when updated.

The reads are not that difficult (not easy, but not impossible).

The writes are another thing. If a customer in London wants to rent a resource from the subsidiary we have in London, the rent request (the database write) needs to go to the central API, but does that mean the central API has to know which internal database that particular write is supposed to go to? Likewise the write requests happening in Nigeria, and Germany, and Brazil, each of which will go to different databases. This becomes a bit of a nightmare. Twenty years ago this line of thinking lead to the creation of the Enterprise Service Bus architecture, but ESB is now going out of favor, because it was too complex and rigid and unwieldy.

Two years ago, SuperRentalCorp decided to become a customer of MuleSoft, to help create their new API. They have so far spent about $25 million on their efforts involving MuleSoft. MuleSoft has some great tools for building APIs but those tools seem to help with the reads much more than with the writes. Which is to say, MuleSoft helps with the easy stuff, but not so much the hard stuff. (Having said that, I’ll add that there are engineers working for SuperRentalCorp who love MuleSoft.)

In terms of the best integration architecture, what seems to me the only long-term solution is something like the unified log architecture that Jay Kreps wrote about back in 2013. All incoming writes need to go into a centralized log, such as Kafka, and then from there the various databases can pull what they need, with each team making its own decisions about what it needs from that central log. However, SuperRentalCorp has retail outlets with POS (point of sale) systems which talk directly to specific databases, and the path of that write (straight from the POS to the database) is hardcoded in ways that will be difficult to change, so it will be a few years before the company can have a single write-point. For now, each database team needs to be accepting writes from multiple sources. But a unified log is the way to go in the long-term. And that represents a large change of process for every one of those 20 teams. Which helps explain why the company has spent 2 years and $25 million trying to build an API, and so far they have failed.

The problem is partly technical and partly psychological. Each team has to cede some power and then trust a process that is out of their control. And what happens when the company gets a new CEO? What if she decides to break up the company again? How much can the teams trust the durability of the current corporate strategy? Should they leave themselves the space to go back to the way they used to do things?

Having said all that, I am pleased to say that this new API project is now in beta testing. After two years, it is finally going to be rolled out to the public this year.

So far I’ve only talked about the problems arising from internal databases, and internal processes. In a sense, those are all easy, compared to the reliance on external suppliers of services, none of which are under the control of SuperRentalCorp. Starting in the 1990s there arose a management idea that argued that a company should focus on its “core competencies” and outsource everything else. The idea is if you are a newspaper, you don’t need to hire janitors to keep your office clean, but rather, you outsource cleaning to a company that specializes in cleaning corporate offices. Focus on what you are good at, and leave everything else to someone else. If you try to do everything yourself, then you are guilty of “Not Invented Here Syndrome”. There is a lot in this argument that I agree with, though I’ve come to see the downsides. Outsourcing limits your flexibility, since you end up in long-term relationships with external companies that may not evolve with your needs. And though it might seem easy to fire one cleaning service and hire another, there are some kinds of services that are very difficult to replace. Back in the 1990s SuperRentalCorp decided to outsource the management of its customer loyalty program, as that was seen as a financial function, and SuperRentalCorp is not a finance company. The company they outsourced too is astoundingly primitive and behind the times — that company does not offer a public API for their service. So now, when SuperRentalCorp would like to make the loyalty system embeddable in a variety of CRM (customer relationship management) and POS systems, SuperRentalCorp can not do so, because it has no control over the technology decisions being made at the company that controls the loyalty program. Yes, SuperRentalCorp could end their relationship with that other company, and develop their own technology for managing their loyalty program, but they are already fighting a great many tech battles. For now, ending their relationship with the company that manages their loyalty program is not considered an option.

All of which helps explain why technology rescues at bigger, older companies are so difficult. One is constantly fighting against history.

There is one other issue, and it is all about trust. Multi-billion dollar companies are constantly dealing with both internal and external actors who are acting in bad faith. This isn’t theoretical, this is a daily reality. Glib rhetoric about how they should be more agile is not very helpful, as they are up against real questions of corporate structure, ownership, and strategy. As much as I love the startup community, I feel like too much of the writing that comes out of Silicon Valley simply assumes away the problems that bigger, older companies are facing. In particular, much of the writing assumes that issues of trust are silly, rather than important and real. Some of the superficial advice tends to “assume goodwill” as if multi-billion dollar companies are just like Wikipedia. The reality is that large companies are constantly facing the risk that they will be destroyed by the greed of people outside and inside the company. Startups have an easier time with the issue of trust, because when the whole team is 5 people, and you can all look each other in the eye, you can double-check your co-workers when they seem to be acting aberrant. That isn’t possible when you have 11,000 employees in 180 countries. To a large extent “be agile” is almost synonymous with “trust each other.” If you’re wondering why large companies have trouble being agile, it is partly because it is impossible for 11,000 people to trust each other the way 5 people can. That is simply reality. Until someone can figure out the magic spell that allows vast groups of people, in different countries, with different cultures, speaking different languages, to all trust each other as if they were good friends, then people in the startup community need to be a lot more careful about how carelessly they recommend that larger companies should be more agile.

[ [ UPDATE 2019-06-24 ] ]

Good conversation about this at Hacker News:


Post external references

  1. 1
  2. 2
  3. 3
  4. 4