How should your company handle data safety and version control?

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com

I’ve been writing software professionally since 1999. The first few years, I didn’t work at a company that used version control. Colin Steele mentioned Subversion to me in 2005. I used it till 2011. Since 2011, every company I’ve worked at has used Git.

From 2005 to 2011 we used Subversion to keep track of all projects. This was at both startups that I helped run, and also while doing most client work. During these years, I only had one major client who didn’t use version control (M Shanken, who put out magazines like WineSpectator.com)

Most of the non-technical people I worked with thought that Subversion was fun. Most were working on Windows machines, and they would use TortoiseSVN as their Subversion client. TortoiseSVN has bright colors and buttons, a fun interface that gave it some of the advantages later enjoyed by apps like Slack.

While we used Subversion, everyone on the team was able to check stuff into version control. Mangers, artists, data analysts, the QA team, everyone used Subversion. They treated it as an infinite undo, which thrilled them. They also found it useful for tracking down when a bug was introduced into the code. I recall the graphic designer felt empowered when she realized an image had overwritten another image with the same name, and she was able to reach into Subversion and get the old image. She felt empowered, because she was able to fix problems on her own.

Subversion has some problems, but a lot can be forgiven in software that is so easy to use that everyone on staff enjoys using it. I am frustrated that so many leaders fail to see the importance of this.

Here are some minor failure modes I’ve seen with Git:

1. a branch that stays open for many months, perhaps even a year (for instance, at Maternity Neighborhood)

2. data is erased for good because someone makes a mistake while using rebase

3. a developer introduces a new bug while trying to resolve a merge conflict

4. widespread but fine-grained cherry picking leaves the team unclear about what’s been merged and what has not been merged

5. a developer makes a change in the wrong branch because they forgot what branch they were in

6. a developer is unable to recover a stash because they forgot where they were when they created the stash, or they simply forget that they have work in a stash

7. developers confused by working in an unattached commit, after a botched attempt to revert

8. a developer feels the need to delete the repo from their harddrive and clone it again, because the whole repo got into a state that they seemed unable to resolve

9. the “blame” command is nearly useless — maybe its because we never know in which branch a given change was made, finding who made a mistake is very difficult

10. developers get confused about which branch will be deployed (I see this especially in shops that have lots of repos for lots of small apps, and use different Git workflow strategies for the different repos, often because different programmers or different teams control the different repos)

11. developers push their changes to “origin” but forget to push to “upstream” or vice versa.

But all of that stuff is trivial compared to the major flaw:

Graphic designers, writers, HTML/CSS frontenders, managers, data analysts and QA staff can’t use Git, even though they all used Subversion.

(I am being unfair by picking on Git like this — I could write a similar list about Mercury or Bazaar, though Mercury is much better about keeping track of which change was made on which branch, and Bazaar was much better about tracking what items had been cherry-picked.)

There are a lot of problems with Subversion. I won’t list them all here. The ideal version control system does not exist. But Subversion gets a few big things right:

1. there is no doubt what the canonical repo is

2. there is no doubt what the canonical branch is

3. all merge conflicts can be resolved with “accept theirs” and “accept mine”

4. the “blame” command is easy to use, so it is easy to figure out who made a mistake. This can be useful for educational purposes, or if you need to justify firing someone.

5. the “revert” command is simple and straightforward and does exactly what you expect it to. When non-technical staff make a mistake, they can easily revert their own mistake.

6. Graphic designers, writers, HTML/CSS frontenders, managers, data analysts and QA staff can use it. I’ve worked with many who thought Subversion was fun.

When I list these complaints for developers, most of them respond “You are complaining about Git’s power. The stuff you list isn’t really a flaw, rather those are all examples of how amazing Git is. It is flexible enough that you can do almost anything with it.”

I agree, Git is amazing and very powerful. What I’m suggesting is that we should recognize that it has a very high cost. It might empower complex workflows for sophisticated teams of experienced computer programmers, but it exiles the rest of the staff, and this has significant productivity costs. And Git is intimidating, not just to non-technical staff, but also to inexperienced programmers. In How To Destroy A Tech Startup In Three Easy Steps I talk about Sital, and his unwillingness to commit things to Git. He was learning a great deal about many other technologies, and he didn’t have any spare energy to learn about Git. He went a month without making a commit, and then he only did so because I insisted. After I put a lot of pressure on him, he got to the point where he would make one commit a day, at night, when he was stopping for the day. He would commit to the master branch, because he was confused how to handle different branches. When there was a merge conflict, I would resolve it for him. We worked together for 6 months, and in that time he learned a great deal about a lot of important topics, but he never really learned how to use Git, because it was a low priority, for both him and our CEO.

Git is very powerful? I’m willing to go along with that line of thought so long as we all understand that using a tool that is more powerful than needed can lead to problems.

Git was created by Linus Torvalds to help him manage the development of Linux. It is designed for a situation where thousands of volunteers are committing work, which will be reviewed by Torvalds’s lieutenants, who will decide if Torvalds should see the code. Much of the code is rejected. In such a situation, it makes sense that developers should work in their own repo, rather than being given access to the repo controlled by Torvalds. Git does not enforce a canonical repo, rather, you can easily fork a repo and your new repo might become canon for whoever likes your fork more than they like the repo that you forked from.

I have never worked at a company that had the same needs as Torvalds. Never. I’ve never worked at a company that sponsored non-canon development. Every company I’ve worked at has a canonical repo for each app that is under development (or multiple apps in one canonical repo). And yet, I have worked at companies that, for some odd reason, went ahead and implemented very complex Git strategies. For instance, when I was at Timeout.com they insisted that every developer create their own repo for each app, and do code review on their own repos, and then after code review push their branch to the canonical repo and open a pull request. While this was all possible, and we all followed the rules, there was no gain. It was a lot of rituals and complexity without any benefit. Occasionally we would each make some minor mistake, such as pushing to “origin” but forgetting to push to “upstream”, and then telling the QA team that they should test our new branch, and the QA team replying with “We can’t find your new branch, are you sure it exists?” because of course they are looking at “upstream” and we only pushed to “origin”. Minor, but annoying, and what were we gaining for this extra trouble? The funny thing is, the company still had repos that were clearly canonical. We weren’t building Open Source software. We weren’t Torvalds. We were a company that had to deploy proprietary software to servers that we controlled. We gained nothing from the distributed nature of Git.

Nearly every place I’ve worked would have been better off with Subversion, as it would have allowed smoother workflows integrating the work of the whole team, in particular the non-technical staff (data analysts, graphic designers, QA team, HTML/CSS frontenders).

When I say all this, I am often misunderstood.

I posted some of these thoughts to Hacker News, and “rootlocus” misunderstood me completely, writing in response:

No. Just no. Please stop spreading FUD like it’s candy. Git only deletes commits after a GC, which won’t erase commits from reflog and will keep unreferenced commits for at least a month before deleting them. And rebasing generates new commits, leaving the old ones exactly how they were. If somebody lost a commit after a rebase, and nobody nearby could help them recover it, they should consider spending a few hours learning about git.

I’ve been using git for 4 years both at work (with a team of 40+ people) and at home, without ever having any of the problems listed here (except 3 which has nothing to do with git). It takes a few hours, maybe a few days to understand how git works and how to use it. Instead of blaming the tools, you (and your team) should probably learn how to use them.

After a few posts, back and forth, I responded:

It is frustrating that you continue to take your advanced skills for granted. It is frustrating that you can not see what should be an obvious fact: that your skills are above average and therefore it is a mathematical fact that most people have less skill than you, and their lack of skill is a real world business situation that needs to be dealt with realistically. And more so, for the rest of your career your skills will continue to develop, so the gap between you and the average will continue to grow, and therefore the damage that you can do will continue to grow, if you fail to recognize that you are above average.

I can assure that I’ve seen data lost forever because of “git rebase”. It doesn’t matter that someone with your skills could have saved the situation. You were not there, therefore your skills don’t matter! It is very important that you see this, or you will never be able to give accurate advice to business leaders.

If the leadership of a company decided to hire people with a skill level of x, then they should not also use a technology that requires a skill level of x + 1. You can reasonably tell them “For what you are trying to do, you should hire people with a skill level of x + 1.” That is exactly what I did in the situation that I describe in my book How To Destroy A Tech Startup In Three Easy Steps

But sometimes the business leadership will disagree with you. They may have terrible reasons, but if you can not get them to change their minds, then you need to deal with the consequences of their bad decisions. At which point it makes sense to advocate for a technology that only requires a skill level of x.

I’ll point out that you are demonstrating a classic case of the Dunning–Kruger effect. In particular:

“the miscalibration of the highly competent stems from an error about others.”

That is, you have above average IQ and skill, therefore you perceive things to be easy, which are in fact not easy for the average.

The essence of my argument then, the best version control system, for your company, is the one that is easiest to use by the people in your company. Your workers might be very intelligent and have important skills, but learning about version control might not be among their skills, yet you will still want to give them access to your version control system. So look for the easiest version control system that works for your whole team.

Source