Smash Company Splash Image

April 26th, 2015

No Comments

If you enjoy this article, see the other most popular articles

If you enjoy this article, see the other most popular articles

If you enjoy this article, see the other most popular articles

New data structures in Clojure 1.8

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.

Apparently there are a lot of new data structures which may arrive in Clojure 1.8, thanks to Zach Tellman:

So, at the end of this exercise we have more than 5000 lines of Java, and we want to add them to the core implementation of Clojure. Ideally, we won’t introduce any bugs in the process. But the same unrolling that makes the code faster makes it significantly harder to simply read the code and verify it’s “correct”. The Clojure code which generates the Java, while more compact, is mostly concerned with string concatenating its way to proper syntax. The semantics of both codebases are a bit opaque.

But even if the code were perfectly clear, data structures are easy to get wrong and difficult to test. On a map, boundary conditions that need to be tested include key collisions, hash collisions, removing non-existent keys, conversions to and from transient collisions, and so on. The exact boundary conditions depend on the underlying implementation, so a test suite that covers one implementation of a vector or map won’t necessarily cover another, even if they behave identically.

Given this, it seems like writing exhaustive unit tests is a poor way for us to make sure our implementation is sound. A better approach would be to use property-based testing, which instead of defining both inputs and expected behavior, allows us to just define invariants which must hold true across all inputs. The testing framework will then search through the space of inputs for one which breaks the invariant, and then reduce the input down to the simplest possible reproducing case.

This is an especially nice approach for us, since both the inputs and invariants are straightforward. The inputs are all possible actions that can be performed on the data structure (conj, disj, assoc, dissoc, etc.), and the invariant is that it must behave just like the existing implementation, no matter what actions we take. Luckily there’s a readymade library for this purpose: collection-check. This library has been used to validate most (possibly all) of the alternate data structures in the Clojure ecosystem. It also uncovered an bug in Clojure’s own implementation of transient maps, which is discussed in more detail in this talk.

But while collection-check is a useful tool for validating the unrolled collections, we still need to map it effectively onto the underlying implementation. The initial tests for the collections only checked maps of integers onto integers, which skipped over the special code paths dedicated to keywords. When an additional test was run for maps of keywords onto integers, an off-by-one error in the Duff’s device snippet discussed above was found.

And it might arrive in 1.8:

The results of this work can be found in the cambrian-collections library, the output of which has been submitted as a patch for Clojure, and is currently under review to be merged into the Clojure 1.8.0 release. Initial performance tests are promising: the cost of building collections when deserializing JSON in Cheshire is halved with unrolled collections, giving us a 33% overall improvement in performance. Since at Factual we spend quite a bit of time deserializing data from disk, this will have a meaningful and immediate impact on our daily operations. We hope it will have a similar benefit for others.

Post external references

1
http://blog.factual.com/using-clojure-to-generate-java-to-reimplement-clojure

Source

Check out my books:

RECENT COMMENTS

February 8, 2022 9:33 am

From Michael S on How I recovered from Lyme Disease: I fasted for two weeks, no food, just water

"Did you have Bartonella, too? Seems it uses autogenesis..."

January 11, 2022 4:33 am

From Essie on Docker is the dangerous gamble which we will regret

"Once in 1990s, there are popular high performance solution called HPC software, many commercial softwares are ..."

December 17, 2021 7:32 pm

From John Carston on The ethics of being a high level tech consultant (a Fractional CTO)

"It helped when you mentioned that it is important to have a real connection with your consumer. My cousin ment..."

September 2, 2021 7:47 pm

From Mojavedfo on Where PHP regex fails

"55 thousand Greek, 30 thousand Armenian..."

August 7, 2021 9:53 am

From Colin Steele on The ethics of being a high level tech consultant (a Fractional CTO)

"Fantastic essay. Thoughtful, well-constructed, timely and applicable. I think every part-timer in the tech f..."

August 5, 2021 3:02 pm

From Rachiovwn on Where PHP regex fails

"consists of the book itself..."

October 19, 2019 3:08 am

From Bernd Schatz on Object Oriented Programming is an expensive disaster which must end

"I really enjoyed your article. But i can't understand the example with the interface. The example is reall..."

October 17, 2019 4:50 pm

From Anderson Nascimento Nunes on The conventional wisdom among social media companies is that you can’t put too much of the onus on users to personalize their own feeds

"Can't speak for anyone else, but on my feed reader: 5K bookmarked feeds, 50K regex on the killfile to filter o..."

October 10, 2019 11:17 am

From روابط: البث المباشر – صفحات صغيرة on RSS has been damaged by in-fighting among those who advocate for it

"[...] تاريخ تقنية RSS، مقال قديم ويلقي نظرة على الناس الذين طوروا التقنية [...]..."

October 9, 2019 3:08 pm

From Dan Campbell on Object Oriented Programming is an expensive disaster which must end

"Object-Oriented Programming is Bad https://www.youtube.com/watch?v=QM1iUe6IofM..."

October 4, 2019 8:44 pm

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, I am working to clean up some of my Packer/Terraform code so I can release it on Github, and then..."

October 4, 2019 5:14 pm

From Gorgi Kosev on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"> Packer, sometimes with some Ansible. The combination of Packer and Terraform typically gives me what I ne..."

October 4, 2019 12:40 pm

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, about this: "I would love if you could point out which VM based system makes it simpler and..."

October 4, 2019 7:31 am

From Gorgi Kosev on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"I won't list anything concrete that you missed, because that will just give you ammunition to build the next a..."

October 4, 2019 1:39 am

From lawrence on My final post regarding the flaws of Docker / Kubernetes and their eco-system

"Gorgi Kosev, also, I don't think you understand what a "straw man argument" is. This is a definition from Wiki..."

NO COMMENTS

Leave a Reply Cancel reply