If a language does not guarantee order in a hashmap can the hashmap be referentially transparent?

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com

It’s an interesting question. Clojure says it is referentially transparent, meaning you can remove any function and replace it with its return value, with no effect on the program, but can Clojure be referentially transparent if it does not guarantee the order of its hashmaps (especially since hashmaps are also functions in Clojure)?

Note that this issue is similar to ensuring that a hash function is consistent with equals in a language, i.e. if two values x and y are equal, then hash(x) is equal to hash(y). This property is necessary to ensure that hash table implementations using those values as keys will work correctly. Referential transparency is like saying “and do that not only for the hash function, but for all functions”.

Among other things, referential transparency for at least a particular function foo is necessary to memoize that function.

The most common example raised is that conj adds elements to lists at the beginning, but to vectors at the end. Despite this difference, = returns true if given a list and a vector, the items in them are equal, and in the same order.

;; The behavior below has been consistent for many versions of
;; Clojure, and seems unlikely to change. In case it does change,
;; here is the version used for this session.

user=> (clojure-version)

;; Lists and vectors are different types. Vectors provide O(log n)
;; time access to any item given an integer index (base 32 log), and
;; update of the value at any index, but lists do not.

user=> (class ‘(1 2 3))
user=> (class [1 2 3])

;; Despite the differences in these types, Clojure’s = has long
;; returned true when comparing lists and vectors, if they contain
;; equal items in the same order.

user=> (= ‘(1 2 3) [1 2 3])

;; conj is documented to add items at different ‘places’ for different
;; types of collections. At the beginning for lists, at the end for
;; vectors.

user=> (conj ‘(1 2 3) 0)
(0 1 2 3)
user=> (conj [1 2 3] 0)
[1 2 3 0]

;; Unsurprisingly, the return values of these two calls are unequal
;; according to =

user=> (= (conj ‘(1 2 3) 0) (conj [1 2 3] 0))
This occurs due to an explicit design choice in Clojure. conj pays attention to the particular type of collection it is working on, and behaves differently for different types. = was designed to be able to return true when comparing lists and vectors, I believe because it was considered useful and convenient. Many Clojure functions like map operate on vectors and lists, but return sequences. It would be inconvenient when comparing return values to do explicit type conversions, or use a different equality function depending upon the types of its arguments.

I am not aware of anything like the example above in Haskell. I would guess most Haskell developers would consider allowing equality between different types an unpleasant hack. Unless one created a sum type that included both lists and vectors, one would not be able to implement this in Haskell at all.

The root of this behavior: Clojure’s = is ignoring the type of its arguments when comparing lists, vectors, and sequences, by design choice.

Also, from Dave Sann:

“x is equal to y” to imply “f(x) is equal to f(y)” – can only be true where a function is really based only on its arguments. I hadn’t considered this before – while it seems simple, it is also a bit subtle.

for example: seq on a set cannot be pure unless you were to provide an argument to explicitly specify which order the items should be in. If you do not do this, the order is defined either by some random interaction of the of the data and function implementations – that you thought was irrelevant – or has to be literally picked at random by the implementation. The former of these will appear to be pure until you hit the special cases.

I speculate that only functions that retain or reduce the information content of their inputs can be pure. So if you rearrange or filter or map they can be pure. But if you implicitly add new information – as (seq set) does – then they cannot be.