January 20th, 2014
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: email@example.com
First, when I did my book of interviews with programmers, Coders at Work, I asked pretty much everyone about code reading. And while most of them said it was important and that programmers should do it more, when I asked them about what code they had read recently, very few of them had any great answers. Some of them had done some serious code reading as young hackers but almost no one seemed to have a regular habit of reading code. Knuth, the great synthesizer of computer science, does seem to read a lot of code and Brad Fitzpatrick was able to talk about several pieces of open source code that he had read just for the heck of it. But they were the exceptions.
If that wasn’t enough, after I finished Coders I had a chance to interview Hal Ableson, the famous MIT professor and co-author of the Structure and Interpretation of Computer Programs. The first time I talked to him I asked my usual question about reading code and he gave the standard answer—that it was important and we should do it more. But he too failed to name any code he had read recently other than code he was obliged to: reviewing co-workers’ code at Google where he was on sabbatical and grading student code at MIT. Later I asked him about this disconnect:
Seibel: I’m still curious about this split between what people say and what they actually do. Everyone says, “People should read code” but few people seem to actually do it. I’d be surprised if I interviewed a novelist and asked them what the last novel they had read was, and they said, “Oh, I haven’t really read a novel since I was in grad school.” Writers actually read other writers but it doesn’t seem that programmers really do, even though we say we should.
Abelson: Yeah. You’re right. But remember, a lot of times you crud up a program to make it finally work and do all of the things that you need it to do, so there’s a lot of extraneous stuff around there that isn’t the core idea.
Seibel: So basically you’re saying that in the end, most code isn’t worth reading?
Abelson: Or it’s built from an initial plan or some kind of pseudocode. A lot of the code in books, they have some very cleaned-up version that doesn’t do all the stuff it needs to make it work.
Seibel: I’m thinking of the preface to SICP, where it says, “programs must be written for people to read and only incidentally for machines to execute.” But it seems the reality you just described is that in fact, most programs are written for machines to execute and only incidentally, if at all, for people to read.
Abelson: Well, I think they start out for people to read, because there’s some idea there. You explain stuff. That’s a little bit of what we have in the book. There are some fairly significant programs in the book, like the compiler. And that’s partly because we think the easiest way to explain what it’s doing is to express that in code.
As I prepared my presentation, I found myself falling into my usual pattern when trying to really understand a piece of code—in order to grok it I have to essentially rewrite it. I’ll start by renaming a few things so they make more sense to me and then I’ll move things around to suit my ideas about how to organize code. Pretty soon I’ll have gotten deep into the abstractions (or lack thereof) of the code and will start making bigger changes to the structure of the code. Once I’ve completely rewritten the thing I usually understand it pretty well and can even go back to the original and understand it too. I have always felt kind of bad about this approach to code reading but it’s the only thing that’s ever worked for me.
My presentation to the code reading group started with stock backbone.js and then walked through the changes I would make to it to make it, by my lights, more understandable. At one point I asked if people thought we should move on to the group discussion but nobody seemed very interested. Hopefully seeing my refactoring gave people some of the same insights into the underlying structure of the original that I had obtained by doing the refactoring.
The second meeting of the Etsy code reading group featured Avi Bryant demonstrating how to use the code browsing capabilities of Smalltalk to navigate through some code. In that case, because few of the Etsy engineers had any experience with Smalltalk, we had no expectation that folks would read the code in advance. But the presentation was an awesome chance for folks to get exposed to the power of the Smalltalk development environment and for me to heckle Avi about Smalltalk vs Lisp differences.
…It was sometime after that presentation that I finally realized the obvious: code is not literature. We don’t read code, we decode it. We examine it. A piece of code is not literature; it is a specimen. Knuth said something that should have pointed me down this track when I asked him about his own code reading:
Knuth: But it’s really worth it for what it builds in your brain. So how do I do it? There was a machine called the Bunker Ramo 300 and somebody told me that the Fortran compiler for this machine was really amazingly fast, but nobody had any idea why it worked. I got a copy of the source-code listing for it. I didn’t have a manual for the machine, so I wasn’t even sure what the machine language was.
But I took it as an interesting challenge. I could figure out BEGIN and then I would start to decode. The operation codes had some two-letter mnemonics and so I could start to figure out “This probably was a load instruction, this probably was a branch.” And I knew it was a Fortran compiler, so at some point it looked at column seven of a card, and that was where it would tell if it was a comment or not.
After three hours I had figured out a little bit about the machine. Then I found these big, branching tables. So it was a puzzle and I kept just making little charts like I’m working at a security agency trying to decode a secret code. But I knew it worked and I knew it was a Fortran compiler—it wasn’t encrypted in the sense that it was intentionally obscure; it was only in code because I hadn’t gotten the manual for the machine.
Eventually I was able to figure out why this compiler was so fast. Unfortunately it wasn’t because the algorithms were brilliant; it was just because they had used unstructured programming and hand optimized the code to the hilt.
It was just basically the way you solve some kind of an unknown puzzle—make tables and charts and get a little more information here and make a hypothesis. In general when I’m reading a technical paper, it’s the same challenge. I’m trying to get into the author’s mind, trying to figure out what the concept is. The more you learn to read other people’s stuff, the more able you are to invent your own in the future, it seems to me.
He’s not describing reading literature; he’s describing a scientific investigation.