Mutable iterators are the work of the Devil

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

The assignment from Hell

Assume you die, and then you wake up in a beautiful place, floating on a cloud, and you hear a trumpet in the distance, and you hear beautiful singing. Then a voice from up on high says to you, “I would like you to count all of the positive integers. I will give you all eternity to do it.” You assume you are now in heaven, and God is giving you your first heaven-based assignment. Can you do it? Sure. Though the positive integers are infinite, you are being given an infinite amount of time to count them, so you should be able to do it. You get started right away, and you make great progress, but just when you get to 8 septillion, the voice says to you, “I want you to count all of the negative integers. I will give you all of eternity to do it.” Okay, God must have changed plans. I guess that happens. Things come up. Can you still succeed? Of course, since you are again being given infinite time to finish an infinite task. No problem. You get going. You start counting all the negative integers, but just when you get to negative 3 googols, the voice speaks to you again and says “I want you to count all of the numbers that are not in the Happy Number Sequence. I will give you all eternity to do it.” Huh. Frustrating. Counting up to negative 3 googols took some real effort and you hate to throw it all away. But whatever. This is Heaven and you know you should try to be a good sport. So you start counting all the integers that are not in the Happy Sequence. A long era passes during which several universes are born, give birth to trillions and trillions of stars, and then slowly die away again, but you keep counting. Then the voice says to you “I want you to count all the Mersenne primes. I will give you all eternity to do it.” This really sucks, doesn’t it? Every time you think you are successfully iterating over a collection, the collection is changed. Can you count all the Mersenne primes? Sure, given all Eternity. And yet, you are not being given all of Eternity. Someone is lying to you. And that doesn’t seem like something God would do, right? What sort of lying, manipulative psychopath would repeatedly mutate the collection that you are trying to iterate over? And then it occurs to you, you don’t really resemble a guy living the good life in Heaven. You resemble Sisyphus, whose punishment in Hell was that he was never allowed to finish his task. But if you are in Hell, then you are not actually talking to God. That implies that you are actually talking to… Yes, that makes sense. For truly, mutable iterators are the work of the Devil.

A common mistake

When I talk to Javascript programmers about the dangers of mutability, the smarter ones reply that they already know the benefits of immutability, and thanks to closures, it is easy to create private fields on Javascript objects, and if they add no getters or setters, then they have what amounts to an immutable objects. (I can have a similar conversation with programmers of PHP or Ruby or any other highly mutable language.) And modern Javascript has imported most of the good ideas from Functional programming.

That is great, in so far as it goes. The more that Javascript programmers adopt the Functional Paradigm, the safer their code will be.

However, it is insufficient to simply make objects that are immutable. It’s important that, when we iterate over a collection, we do so in a way such that the iterator is itself not mutated.

I’ll start with a famous Java example and then I’ll go back to talking about Javascript.

This is from Joshua Bloch’s well-regarded book “Effective Java”, page 213.

This is Item 46 “Prefer for-each loops to traditional loops”. The goal here is to create a deck of cards.

  // Can you spot the bug?

  enum Suit { CLUB, DIAMOND, HEART, SPADE }
  enum Rank { ACE, DEUCE, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, JACK, QUEEN, KING  }

  Collection suits = Arrays.asList(Suit.values());
  Collection ranks = Arrays.asList(Rank.values());

  List deck = new ArrayList();
	    
  for (Iterator i = suits.iterator(); i.hasNext(); )
    for (Iterator j = ranks.iterator(); j.hasNext(); )
      deck.add(new Card(i.next(), j.next()));
  

Don’t feel bad if you didn’t spot the bug. Many expert programmers have made this mistake at one time or another. the problem is that the next method is called too many times on the iterator for the outer collection (suits). It should be called from the outer loop, so that it is called once per suit, but instead it is called from the inner loop, so it is called once per card. After you run out of suits, the loop throws a NoSuchElementException.

Note that this line does not protect you:

i.hasNext();

because you are not calling it each time you call:

i.next()

Bloch continues:

If you’re really unlucky and the size of the out collection is a multiple of the size of the inner collection the loop will terminate normally, but it won’t do what you want. For example, consider this ill-conceived attempt to print all of the possible rolls of a pair of dice:

    // Same bug, different symptom!

    enum Face { ONE, TWO, THREE, FOUR, FIVE, SIX }

    Collection faces = Array.asList(Face.values());

    for (Iterator i = faces.iterator(); i.hasNext(); )
      for (Iterator j = faces.iterator(); j.hasNext(); )
        System.out.println(i.next() + " " + j.next());
  

This program doesn’t throw an exception but it prints only the six “doubles” (from “ONE ONE” to “SIX SIX”), instead of the expected thirty-six combinations.

To fix the bugs in these examples, you must add a variable in the scope of the outer loop to hold the outer element:

    // Fixed, but ugly -- you can do better!

    for (Iterator i = suits.iterator(); i.hasNext()) {
      Suit suit = i.next();
      for (Iterator j = ranks.iterator(); j.hasNext(); )
	deck.add(new Card(suit, j.next())); 
    }
  

If instead you use a nested for-each loop, the problem simply disappears. The resulting code is as succinct as you could wish for:

    // Preferred idiom for nested iteration on collections and arrays

    for (Suit suit : suits)
      for (Rank rank : ranks)
        deck.add(new Card(suit, rank));
  

You can write beautiful code using Javascript

Just to be clear, a great programmer can write great code in Javascript.

I am about to show you some bad Javascript. All of my friends who favor Javascript will complain “You deliberately made Javascript look bad, by using atrocious code in your examples.”

But be patient. I’ll clean this up later, and in a later article we will focus on Functional Javascript. The following code is knowingly bad. But also, the sad truth is, this example of bad Javascript is very common. I still see abundant examples of Javascript written in the old style everything-is-mutable.

Mutable Javascript versus Local Bindings in Clojure

Let’s assume we have 2 arrays, each full of objects. In the first array, each object has the name of a user, and some prize money they won. In the second array, each object is about some future contest, what category it is in, and what the prize will be. Suppose we have all this code in one place, and we make a simple typo at the end, like this:

var array_of_users = [{"name" : "henry", "prize_money" : 200},
                      {"name" : "henry", "prize_money" : 30},
                      {"name" : "craig", "prize_money" : 340},
                      {"name" : "craig", "prize_money" : 100},
                      {"name" : "pasha", "prize_money" : 2330},
                      {"name" : "pasha", "prize_money" : 1130},
                      {"name" : "pasha", "prize_money" : 430},
                      {"name" : "eli", "prize_money" : 60},
                      {"name" : "eli", "prize_money" : 330},
                      {"name" : "eli", "prize_money" : 89}];

var array_of_contests = [{"category" : "housing", "prize_money" : 100},
                         {"category" : "housing", "prize_money" : 30},
                         {"category" : "housing", "prize_money" : 340},
                         {"category" : "housing", "prize_money" : 210},
                         {"category" : "crisis", "prize_money" : 100},
                         {"category" : "crisis", "prize_money" : 2330},
                         {"category" : "crisis", "prize_money" : 90},
                         {"category" : "restaurants", "prize_money" : 1130},
                         {"category" : "restaurants", "prize_money" : 430},
                         {"category" : "theater", "prize_money" : 60},
                         {"category" : "theater", "prize_money" : 90},
                         {"category" : "theater", "prize_money" : 130},
                         {"category" : "theater", "prize_money" : 30},
                         {"category" : "theater", "prize_money" : 270} ];

howMuchPrizeMoney = 0;
arrayOfMoneyPerCategory =[];

for (i=0; i < array_of_users.length; i++) {
  u = array_of_users[i];
  howMuchPrizeMoney += u["prize_money"];
}

for (j=0; j < array_of_contests.length; j++) {
  c = array_of_contests[j];
  if (typeof arrayOfMoneyPerCategory[c.category] === "undefined") {
    arrayOfMoneyPerCategory[c.category] = 0;
  }
  arrayOfMoneyPerCategory[c.category] = arrayOfMoneyPerCategory[c.category] + u.prize_money;
}

So now we have a bug, a simple typo, and yet I don't get an error message. When I do this:

for (var k in arrayOfMoneyPerCategory){
  document.write("
Key is " + k + ", value is " + arrayOfMoneyPerCategory[k]); }

I will get:

Key is housing, value is 356
Key is crisis, value is 267
Key is restaurants, value is 178
Key is theater, value is 445

and we know that cannot be correct. Each of these numbers is a multiple of 89, and that is simply because the last record in array_of_users had a prize of 89:

{"name" : "eli", "prize_money" : 89}

You might think it is easy to spot the bug in this example, but I have seen functions much larger than this, where finding the bug gets harder. A simple typo can be made in any language, but in Clojure I get an error message that tells me where the problem is. Assume I was typing something like this at the REPL (this example does not have a typo):

(def users [{:name :henry, :prize-money 200}
            {:name :henry, :prize-money 30}
            {:name :craig, :prize-money 340}
            {:name :craig, :prize-money 100}
            {:name :pasha, :prize-money 2330}
            {:name :pasha, :prize-money 1130}
            {:name :pasha, :prize-money 430}
            {:name :eli, :prize-money 60}
            {:name :eli, :prize-money 330}
            {:name :eli, :prize-money 89}])
              
(def contests [{:category :housing, :prize-money 100}
               {:category :housing, :prize-money 30}
               {:category :housing, :prize-money 340}
               {:category :housing, :prize-money 210}
               {:category :crisis, :prize-money 100}
               {:category :crisis, :prize-money 2330}
               {:category :crisis, :prize-money 90}
               {:category :restaurants, :prize-money 1130}
               {:category :restaurants, :prize-money 430}
               {:category :theater, :prize-money 60}
               {:category :theater, :prize-money 90}
               {:category :theater, :prize-money 130}
               {:category :theater, :prize-money 30}
               {:category :theater, :prize-money 270}])
            
(loop [u users total 0]
     (if (first u)
       (recur
           (rest u)
           (+ (:prize-money (first u)) total))
      total))

(loop
  [m {}
   c contests]
   (if (first c)
     (recur
       (assoc m (:category (first c))
         (+ (get m (:category (first c)) 0)
         (:prize-money (first c))))
        (rest c))
      m))

The loop over users correctly tells me that users have won $5,039. (I can see the return values at the REPL, so I don't feel the need to write a bunch of print statements here.) The second loop correctly returns:

{:theater 580, :restaurants 1560, :crisis 2520, :housing 680}

Note that, in Clojure, instead of getting the next element like this:

u = users[i];

I get it like this:

  (first u)

and then each time the loop recurs, I rebind u as (rest u), and (rest u) includes all of u except for , and therefore rebinding u to (rest u) ensures that (first u) is a different element each time we recur through this loop. But at no point do we create a variable that might interfere with code somewhere else in our function. Is there any risk of a typo? Sure, but I won't get a confusingly incorrect number back; instead, I'll be told that I'm using a var that doesn't exist. If I stupidly do this (typing “u” instead of “c”):

(loop
  [m {}
   c contests]
   (if (first u)
     (recur
       (assoc m (:category (first c))
         (+ (get m (:category (first c)) 0)
         (:prize-money (first c))))
        (rest c))
      m))

I will get:

CompilerException java.lang.RuntimeException: Unable to resolve symbol: u in this context, compiling:(NO_SOURCE_PATH:4:8)

Please don't get sidetracked about the fact that this compiled -- if this was interpreted like Javascript, we would still get an error. My point is that this would not pretend to work but give us the wrong answer.

The u is a local binding that only exists inside of that first loop, it does not pollute the rest of my code. This is one of the things that helps support “REPL Oriented Programming,” which is part of the culture of Clojure programming. Writing code directly into a live environment has the benefit that someone can see all of their mistakes as soon as they happen:

At this point a Javascript programmer (or PHP programmer, or Ruby programmer, or others) would say "Your example is stupid. Obviously variables should be restricted to their functions. You should never have more than one loop in any function. So long as the code is well written, then this kind of error will never happen."

That is true. So let me ask, have any of you ever seen any Javascript code that had more than one loop in a function? And I don't mean to single out Javascript. How about PHP? Ruby? It's a bad practice and I see it everywhere.

I once did a job interview at thumb.it. They took me into one of their conference rooms and they brought up some code on the screen. It was all PHP. They said, "This is some of our older code and it is a bit gnarly. This was written by our lead engineer, but he's planning to clean some of this up eventually. How would you clean this up?" What I saw was amazing, and not in a good way. I saw one function that had 7 loops. Even more amazing, is that the developer was aggressively using the PHP unset function to destroy variables after each loop. In other words:

$users = fetchAllUsers(); 
  
for ($i=0; $i < count($users); $i++) {
   removeInactiveUser($users[$i]);
}

unset($i);

for ($i=0; $i < count($users); $i++) {
   updateRecentLogins($users[$i]);
}

Imagine several loops like this, all in one function.

I asked, "Why is he using unset?"

They answered, "He does that because he's been bit too many times by variables that interact with the rest of his code."

I asked, "Why doesn't he just move each loop to its own function?"

They said, "He's worried about the code being too verbose, which happens when you move every action to its own function. Still, we are open-minded about refactoring this."

Is this too extreme a case for you? Are you ready to dismiss this an outlier? Good for you. I'm glad you have not had to run into this kind of code on a continual basis. But I have. I have seen vast oceans of bad Javascript code, bad PHP code, bad Ruby code and more.

And yes, for the millionth time, you can write great code in Javascript and PHP and Ruby. But I feel that highly mutable languages tend to encourage bad habits, whereas languages that default to immutability, such as Clojure, tend to encourage good habits.

By the way, I didn't get that job. I may have shown too much emotion when I saw how they were using "unset".

What should we name the variables?

An aside:

You might be wondering why I would do something stupid, like name a variable "array_of_users". The whole point of dynamic languages is that you can change the type of the variable and isn't this such a great thing?

If I'm working in a language that does not have a strongly established culture of naming conventions, I tend to assign names such as "array_of_users" which amounts to a note to myself about the type I eventually expect to enforce on that variable. Of course, if I work in an organization that has rules about naming conventions, then I follow those rules.

I don't put types in the names of vars in Clojure because the Clojure community does have some rules about how to name things. See Stuart Sierra's post How to name Clojure functions:

Pure functions which return values are named with nouns describing the value they return.

If I have a function to compute a user’s age based on their birthdate, it is called age, not calculate-age or get-age.

Think of the definition: a pure function is one which can be replaced with its value without affecting the result. So why not make that evident in the name?

The fancy term for this idea is "referential transparency". With a pure function you could replace the function with the value it returns. You can't do this if a function has side-effects, but you can and should do this with pure functions. But the implication is, if I should use "age" and not "calculate-age" then if I had a var with someone's age in it, I should call it "age" not "age-integer".

Bad Habits With Mutable Data

Let's move on to a more dangerous example. Assume we have these 2 data structures:

1.) A hashmap of users which holds a hashmap of history which holds a vector of each round of a contest

2.) A hashmap which holds the outcome of the newest round of the contest

Assume we need to do 3 things:

1.) Add up how much each user has won, in each category, during the newest round

2.) Update the history in the users hashmap, adding in the new round

3.) Delete any user who has not yet won at least $500

In Javascript, we might do this:

var users = {
             "henry": {
               "history": 
               [
                 {"housing": 25, "restaurants": 40, "theater": 930},
                 {"restaurants": 30, "crisis": 220}
               ]
              },
              "lisa":  {
               "history": 
               [
                 {"theater": 80},
                 {"housing": 445, "restaurants": 15, "theater": 35}
               ]
              },
              "pasha":  {
               "history": 
               [
                 {"restaurants": 5},
                 {"restaurants": 40, "theater": 60}
               ]
              },
              "eli":  {
               "history": 
               [
                 {"crisis": 135, "restaurants": 440, "theater": 65},
                 {"theater": 95}
               ]
              }
            };

var array_of_contests = [{"category" : "housing", "prize_money" : 100, "winner" : "eli"},
                         {"category" : "housing", "prize_money" : 30, "winner" : "henry"},
                         {"category" : "housing", "prize_money" : 340, "winner" : "henry"},
                         {"category" : "housing", "prize_money" : 45, "winner" : "susan"},
                         {"category" : "housing", "prize_money" : 15, "winner" : "henry"},
                         {"category" : "housing", "prize_money" : 10, "winner" : "pasha"},
                         {"category" : "housing", "prize_money" : 25, "winner" : "pasha"},
                         {"category" : "crisis", "prize_money" : 100, "winner" : "eli"},
                         {"category" : "crisis", "prize_money" : 2330, "winner" : "henry"},
                         {"category" : "crisis", "prize_money" : 90, "winner" : "henry"},
                         {"category" : "restaurants", "prize_money" : 1130, "winner" : "eli"},
                         {"category" : "restaurants", "prize_money" : 130, "winner" : "pasha"},
                         {"category" : "theater", "prize_money" : 60, "winner" : "eli"},
                         {"category" : "theater", "prize_money" : 90, "winner" : "pasha"},
                         {"category" : "theater", "prize_money" : 130, "winner" : "pasha"},
                         {"category" : "theater", "prize_money" : 830, "winner" : "susan"},
                         {"category" : "theater", "prize_money" : 90, "winner" : "susan"},
                         {"category" : "theater", "prize_money" : 270, "winner" : "eli"}];

var winners = {}

for (var i=0; i < array_of_contests.length; i++) {
  var c = array_of_contests[i];
  if (typeof winners[c.winner] === "undefined") {
    winners[c.winner] = {};
  }
  if (typeof winners[c.winner][c.category] === "undefined") {
    winners[c.winner][c.category] = 0;
  }
  winners[c.winner][c.category] = winners[c.winner][c.category] + c.prize_money;   
}

for (var k in winners) {
  if (typeof users[k] === "undefined") {
    users[k] = { k : { "history" : [] } }
  }
  if (typeof users[k]["history"] === "undefined") {
    users[k].history = [];
  }
  users[k].history.push(winners[k]);
}

If I do:

console.log(users);

I will get:

eli:{
    "history": [
        {"crisis": 135 "restaurants": 440 "theater": 65},
        {"theater": 95},
        {"crisis": 100 "housing": 100 "restaurants": 1130 "theater": 330},
    ]
},
henry:{
    "history": [
        {"housing": 25 "restaurants": 40 "theater": 930},
        {"crisis": 220 "restaurants": 30},
        {"crisis": 2420 "housing": 385}
    ]
},
lisa:{
    "history": [
        {"theater": 80},
        {"housing": 445 "restaurants": 15 "theater": 35}
    ]
},
pasha:{
    "history": [
        {"restaurants": 5},
        {"restaurants": 40 "theater": 60},
        {"housing": 35 "restaurants": 130 "theater": 220}
    ]
},
susan:{
    "history": [
        {"housing": 45 "theater": 920}
    ]
}

And then if I do:

for (var k in users) {
  var user_total = 0; 
  var user_history = users[k].history;
  for (var i=0; i < user_history.length; i++) {
    c = user_history[i]; 
    for (var user_history_key in c) {
      user_total += c[user_history_key];
    }
  }
  users[k].total = user_total; 
  if (user_total < 500) {
    delete(users[k]);
  }
}

and then:

console.log(users);

I will get:

"eli": {
  "history": [
    {"crisis": 135, "restaurants": 440, "theater": 65},
    {"theater": 95}, 
    {"crisis": 100, "housing": 100, "restaurants": 1130, "theater": 330} 
  ],
  "total": 2395
},
"henry": {
  "history": [
    {"housing": 25, "restaurants": 40, "theater": 930},
    {"crisis": 220, "restaurants": 30},
    {"crisis": 2420, "housing": 385}
  ],
  "total": 4050
},
"lisa": {
  "history": [
    {"theater": 80},
    {"housing": 445, "restaurants": 15, "theater": 35}
  ],
  "total": 575
},
"susan": {
  "history": [
    {"housing": 45, "theater": 920}
  ],
  "total": 965
}

What can we say about this kind of code? Let's start by admiring how direct it is. This is classic “procedural” code. It goes through its tasks, one step at a time, and it does everything in a straightforward manner. We could make this a bit more direct by consolidating some of the loops, but my goal here is to talk about mutability, not performance.

A Javascript developer, a Java developer, and a Clojure developer would all agree that this code is dangerous – the only disagreement would be about how to fix this code. The solutions to the problems in this code will end up sacrificing some speed in exchange for greater safety (though I feel most professional developers, no matter what language they prefer, would find this a wise sacrifice).

I'll briefly point out some of the easiest mistakes we could make. If instead of this:

for (var k in winners) {

we wrote:

for (var k in users) {

everything would seem to work. All the data we got back would be accurate, but the results would be missing Susan, since she is a new contestant. She is in the newest round but has never been in users before.

Likewise, if, instead of this:

  users[k].history.push(winners[k]);

we accidentally wrote:

  users[k].history = winners[k];

we would get no error, but we would have erased the past history. We would have the winnings from the newest round, but we are now storing the new round as a map, instead of a map inside of an array. This is especially problematic because we are changing the array_of_users variable itself, and we can do this because array_of_users is mutable. If this was a long-running app, and we were holding this data in memory, then we just wiped out real data that our app needs.

Finally, if, instead of this:

    delete(users[k]);

we accidentally wrote:

    delete(users[i]);

or:

    delete(users);

we would, again, be erasing some data critical to our app.

And yes, you can make mistakes in any language. But some languages make it easier to make mistakes.

So how do we fix this?

If you are any kind of Object Oriented programmer (anything, including Javascript and PHP and Ruby but also static languages such as Java), you probably have list of ideas for cleaning up this code:

1.) refactor functions into separate classes, grouping related functions together

2.) refactor variables, keeping them with the functions that modify them -- behavior and state should go together

3.) put each loop in its own method

4.) add guard clauses

5.) make all variables private

Obviously this can be made to work, since this is how most software is written nowadays. Given enough effort, you can work with highly mutable objects and yet build safe, reliable systems.

For those of you who love the paradigm, I doubt I can say anything that will convince you to change your ways. Still, you might want to ask yourself two questions:

1.) is there another approach that is more concise?

2.) is there another approach that imposes less of a conceptual burden?

On this second point, I would add, my work in languages such as Ruby have left me tired with certain conversations that never seem to resolve. Every time I have worked at a company that has an old Ruby On Rails app, I have been drawn into the conversation "How should we clean up our fat models?" Objects bring with them a certain amount of conceptual overhead, and we immediately get caught up in some fairly abstract debates. Should "user" be an object? Should contest be an object? What about history? What about category? What about restaurants? What about theater? How far down that road do we go? How many objects do we need to create? And then immediately we run into the problem, How do we share functionality? We surely want to obey the DRY principle (Don't Repeat Yourself), but how? We probably want an object for users and contests – that seems obvious. But we probably also want an object for category, and category is something that both users and contests make use of. So how do we make the decision? Object-Oriented Programming teaches us that we have 2 ways of doing this, either Inheritance or Composition. Is inheritance appropriate here? To answer that, we need to ask the question, Are users and contests both children of category? If we went down that road, we would end up with a tall inheritance tree, which typically becomes brittle over time. But the other approach is Composition, and that has its own problems. If we use Composition, does that mean we need to use a Dependency Injection system? The complexity piles up fast. (See Get rid of all Dependency Injection)

How is Immutability different?

So how would this look if we tackled the same problem with Clojure? A first attempt, somewhat naïve, might look like this:

(def users  {
             :henry {
               :history 
               [
                 {:housing  25, :restaurants 40, :theater 930},
                 {:restaurants  30, :crisis  220}
               ]
              },
              :lisa  {
               :history 
               [
                 {:theater  80},
                 {:housing  445, :restaurants  15, :theater  35}
               ]
              },
              :pasha  {
               :history 
               [
                 {:restaurants  5},
                 {:restaurants  40, :theater  60}
               ]
              },
              :eli  {
               :history 
               [
                 {:crisis  135, :restaurants  440, :theater  65},
                 {:theater  95}
               ]
              }
            })

(def contests [{:category :housing, :prize-money 100, :winner :eli},
               {:category :housing, :prize-money 30, :winner :henry},
               {:category :housing, :prize-money 340, :winner :henry},
               {:category :housing, :prize-money 45, :winner :susan},
               {:category :housing, :prize-money 15, :winner :henry},
               {:category :housing, :prize-money 10, :winner :pasha},
               {:category :housing, :prize-money 25, :winner :pasha},
               {:category :crisis, :prize-money 100, :winner :eli},
               {:category :crisis, :prize-money 2330, :winner :henry},
               {:category :crisis, :prize-money 90, :winner :henry},
               {:category :restaurants, :prize-money 1130, :winner :eli},
               {:category :restaurants, :prize-money 130, :winner :pasha},
               {:category :theater, :prize-money 60, :winner :eli},
               {:category :theater, :prize-money 90, :winner :pasha},
               {:category :theater, :prize-money 130, :winner :pasha},
               {:category :theater, :prize-money 830, :winner :susan},
               {:category :theater, :prize-money 90, :winner :susan},
               {:category :theater, :prize-money 270, :winner :eli}])

;; this adds an empty map to the start of each history, but is this what we truly want?
(loop [u users nu {}] 
  (if (first u) 
    (recur
      (rest u)
      (assoc nu (get (first u) 0) {:history (into [] (cons {} (:history (get (first u) 1))))}))
    nu))
   
(reduce 
  (fn [nu c] 
    (update-in 
      (if (nil? ((:winner c) nu))
          (assoc nu (:winner c) {:history [{}]})
        nu) 
      [(:winner c) :history 0 (:category c)] (fnil #(+ %1 (:prize-money c)) 0)))
      (loop [u users nu {}] 
        (if (first u) 
          (recur
            (rest u)
            (assoc nu (get (first u) 0) {:history (into [] (cons {} (:history (get (first u) 1))))}))
          nu))
      contests)

Regarding the "loop" inside of the "reduce", please don't ever write this code in real life -- these should really be two different functions. But I wrote the code this way to make a point. As we saw at the start of this article, a common source of bugs in a language like Javascript (or Java, or PHP, etc.) is a mutable iterator used in a loop inside of a loop. I wrote this kind of code for many years, and I made this mistake many times:

for(var i=0; i < array_of_users.length; i++) {
  for(var i=0; i < array_of_contests.length; i++) {
    // do some important work here
  }
}

Of course, the bug is not usually this obvious, but the repeated use of some mutable iterator such as "i" causes all kinds of trouble. The Clojure example is different, because the "nu" that exists inside of "loop" is not the same as the "nu" that exists outside of "loop". Again, this kind of Clojure code would seem a bit dirty in real life; but it is safe -- I wrote this to demonstrate that "nu" is a local binding inside of "loop" and it does not conflict with the code around it.

In real life, we would break up this code into functions:

		   
(defn add-placeholder-to-history [us]
      (loop [u us nu {}] 
        (if (first u) 
          (recur
            (rest u)
            (assoc nu (get (first u) 0) {:history (into [] (cons {} (:history (get (first u) 1))))}))
          nu)))
          
(defn update-history [nu c]          
    (update-in 
      (if (nil? ((:winner c) nu))
          (assoc nu (:winner c) {:history [{}]})
        nu) 
      [(:winner c) :history 0 (:category c)] (fnil #(+ %1 (:prize-money c)) 0)))

And so in the end we would simply call:

(reduce
  update-history
  (add-placeholder-to-history users)
  contests)

But what is this for?

      (if (nil? ((:winner c) nu))
          (assoc nu (:winner c) {:history [{}]})
        nu) 

We do this for Susan. She is certainly a new contestant who won some money in the newest round of contests; however, she does not yet exist in "users", so we need to create a space for her. Without the above 3 lines of code, we get this for her:

{:susan {:history {0 {:theater 920, :housing 45}}},

But with these 3 lines of code, we get the correct results for her, as well as for everyone else:

 
{
:susan {
  :history [{:theater 920, :housing 45}]
  }
:lisa {
  :history [{} 
            {:theater 80}
            {:housing 445, :restaurants 15, :theater 35}]
  }
:henry {
  :history [{:crisis 2420, :housing 385}
            {:housing 25, :restaurants 40, :theater 930}
            {:crisis 220, :restaurants 30}]
  }
:eli {
  :history [{:theater 330, :restaurants 1130, :crisis 100, :housing 100}
            {:crisis 135, :restaurants 440, :theater 65}
            {:theater 95}]
  }
:pasha {
  :history [{:theater 220, :restaurants 130, :housing 35}
            {:restaurants 5}
            {:restaurants 40, :theater 60}]
  }
}

All the same, guard clauses like this are ugly and reminiscent of the mutable style:

      (if (nil? ((:winner c) nu))
          (assoc nu (:winner c) {:history [{}]})
        nu) 

When I presented this issue to the Clojure group on Google Groups, Francis Avila offered this very clever code:

(defn winnings-by-user [contests]
  (reduce
    (fn [user+winning {:keys [category winner prize-money]}]
      (update-in user+winning [winner category] (fnil + 0) prize-money))
    {} contests))

(defn update-histories [users new-winnings]
  (let [all-user-keys (set (concat (keys users) (keys new-winnings)))]
    (reduce (fn [users ukey]
              (update-in users [ukey :history]
                #(into [(get new-winnings ukey {})] (take 2 %))))
      users all-user-keys)))

(update-histories users (winnings-by-user contests))

This avoids the need for the nil? check:

(set (concat (keys users) (keys new-winnings)))

We can check that line alone at the REPL, at which point we see that it returns:

#{:pasha :eli :henry :susan :lisa}

In other words, instead of a nil check for Susan, we simply concatenate all the keys from "users" and all the keys from "new-winnings", which gives us all the old winners, plus Susan, who is winning for the first time.

This then becomes "all-user-keys" which is the sequence that "reduce" reduces over. We no longer have to worry about Susan not being in users or Lisa not being in contests.

Some might feel this is a bit of a cheat:

(take 2 %)

Here we hardcode the 2 knowing that we only have two rounds of contests so far. This is the simplest way to say “Give me a sequence of the history, up to two rounds.” If we wanted to reuse this function for additional rounds of contests, this 2 would have to become a parameter of the function, something that could change every time we called it. A wonderful truth about take is that this does not throw an error:

(take 2 nil)

And nil is what it is fed when Lisa's name is iterated over (since she did not win anything in the newest round). Also, since Lisa did not win anything in the newest round, we need to add an empty hashmap when we don't find her name in new-winnings:

(get new-winnings ukey {})

Some might wonder why this returns an empty hashmap for Lisa, when, after all, update-in automatically returns an empty hashmap for levels that don't yet exist. Again, from the official Clojure documentation:

(update-in m [k & ks] f & args)

'Updates' a value in a nested associative structure, where ks is a
sequence of keys and f is a function that will take the old value
and any supplied args and return the new value, and returns a new
nested structure. If any levels do not exist, hash-maps will be
created.

http://clojuredocs.org/clojure.core/update-in

But we are returning a vector to update-in (the key is :history) so update-in has no reason to create an empty hashmap, and our real goal is to get an empty hashmap inside of an array:

#(into [(get new-winnings ukey {})] (take 2 %))

To which we also add the two previous rounds of contests, if they exist for this user.

How would we remove those users who have won less than $500 (i.e., Pasha)? If we were being lazy, we could do something obvious like this:

(defn total-this-user [users]
  (reduce 
    (fn [total next-contest]
      (+ total (apply + (vals next-contest))))
    0
    users))

(defn add-total-to-users []
    (reduce 
      (fn [survivors [k v]]
        (if (> (total-this-user (:history v)) 500)
          (assoc survivors k v))) 
      {} 
      users))

However, that causes us to do two extra reduce statements, which means a lot of iterating over the same data structure. There is a better way, using the Reduce/Combine Pattern, which we will look at later.

Slightly less mutable Javascript

Nowadays, Javascript has imported most of the good ideas that arose from Functional programming. So for instance, Javascript has a reduce function now that allows us to sum numbers without having to use mutable iterators. A slightly cleaner version of what we did before:

var winners = {};

contests.reduce(
    function(previousValue, currentValue, currentIndex, array) {
	
	if (typeof winners[currentValue.winner] === "undefined") {
 	    winners[currentValue.winner] = {};
	}
	if (typeof winners[currentValue.winner][currentValue.category] === "undefined") {
 	    winners[currentValue.winner][currentValue.category] = 0;
	}
	winners[currentValue.winner][currentValue.category] = winners[currentValue.winner][currentValue.category] + currentValue.prize_money;

	return previousValue;
    });

console.log(winners); 

I am cheating here, since I'm still mutating "winners" inside of reduce. Using Immutable.js, it is possible to elimnate the two "if" guard clauses and just do a straight merge. I'm going to show some examples of Immutable in a later article.

The beauty of immutable Javascript

John D Cook says:

I do most of my work in object-oriented languages and I don’t see that changing any time soon. I’m more interested in functional techniques than functional languages: using pure functions, using functions as arguments and return values, etc. As Joe Armstrong says, such code is easier to reuse.

I believe his point of view represents the great majority of software developers. Object Oriented Programming is deeply entrenched and there is no easy way to get rid of it. I am not especially happy about this, but this is the reality we face in 2016. So, as much as possible I try to put my own preferences aside, and I try to look at techniques that allow for beautiful Functional code in nominally Object Oriented languages.

Joshua Block's advice was "Prefer for-each loops to traditional loops" and Javascript now has a foreach(). It is much slower than Javascript's for(), but safety is typically worth the performance penalty.

Trying to recreate all the work that goes into enabling immutability is well beyond what I can do for this tutorial, but thankfully the crew at Facebook has already done this, as part of their work on React (which is an immutable framework for web pages).

The Facebook page for Immutable.js is well worth reading:

Immutable data cannot be changed once created, leading to much simpler application development, no defensive copying, and enabling advanced memoization and change detection techniques with simple logic. Persistent data presents a mutative API which does not update the data in-place, but instead always yields new updated data.

Immutable provides Persistent Immutable List, Stack, Map, OrderedMap, Set, OrderedSet and Record. They are highly efficient on modern JavaScript VMs by using structural sharing via hash maps tries and vector tries as popularized by Clojure and Scala, minimizing the need to copy or cache data.

Immutable also provides a lazy Seq, allowing efficient chaining of collection methods like map and filter without creating intermediate representations. Create some Seq with Range and Repeat.

This goes a long way to give Javascript the best ideas from languages like Clojure.

Much of what makes application development difficult is tracking mutation and maintaining state. Developing with immutable data encourages you to think differently about how data flows through your application.

Subscribing to data events throughout your application, by using Object.observe, or any other mechanism, creates a huge overhead of book-keeping which can hurt performance, sometimes dramatically, and creates opportunities for areas of your application to get out of sync with each other due to easy to make programmer error. Since immutable data never changes, subscribing to changes throughout the model is a dead-end and new data can only ever be passed from above.

This model of data flow aligns well with the architecture of React and especially well with an application designed using the ideas of Flux.

When data is passed from above rather than being subscribed to, and you're only interested in doing work when something has changed, you can use equality. Immutable always returns itself when a mutation results in an identical collection, allowing for using === equality to determine if something has changed.

  
  var map1 = Immutable.Map({a:1, b:2, c:3});
  var map2 = map1.set('b', 2);
  assert(map1 === map2); // no change
  var map3 = map1.set('b', 50);
  assert(map1 !== map3); // change

If an object is immutable, it can be "copied" simply by making another reference to it instead of copying the entire object. Because a reference is much smaller than the object itself, this results in memory savings and a potential boost in execution speed for programs which rely on copies (such as an undo-stack).

  var map1 = Immutable.Map({a:1, b:2, c:3});
  var clone = map1;

In a later article I'm going to use Immutable to build some fairly exact copies of what we've already done with Clojure.

Premature optimization is the root of all evil

I have not mentioned performance in all of this, because doing so can introduce great evil into a conversation. But of course, we could have that conversation.

Of all the highly mutable languages, I believe Javascript is now the fastest. Ruby and PHP are both as slow as crippled hamsters, but Google has put huge resources into optimizing the V8 engine, so nowadays people argue about whether Javascript is faster than Java. Python is somewhere in the middle of the pack. If you ever get to the point where performance is an issue for you, you should consider the JVM and V8 engines.

But don't ever make a tech decision based on performance speed, until you know for sure that you need performance speed.

Post external references

  1. 1
    http://javascript.crockford.com/private.html
  2. 2
    http://thumb.it/
  3. 3
    http://php.net/manual/en/function.unset.php
  4. 4
    http://stuartsierra.com/2016/01/09/how-to-name-clojure-functions
  5. 5
    http://clojuredocs.org/clojure.core/update-in
  6. 6
    http://www.johndcook.com/blog/2011/07/19/you-wanted-banana/
  7. 7
    https://facebook.github.io/immutable-js/
  8. 8
    https://www.cloudbees.com/blog/about-paypals-node-vs-java-%E2%80%9Cfight%E2%80%9D
Source