February 13th, 2016
If you enjoy this article, see the other most popular articles
If you enjoy this article, see the other most popular articles
If you enjoy this article, see the other most popular articles
(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: email@example.com, or follow me on Twitter.
Duck-typing versus Label-typing
If it waddles like a duck, and it quacks like a duck, and it swims like a duck, is it a duck?
Suppose you get a box that has the label “Hammers”. Do you believe the contents are hammers?
Figuring out a duck is a duck can take some effort. After all, you have to check for waddling and quacking and swimming. That’s not too bad, really, but what if you had to check for 20 different features? Or 30? Or a 100? At some point it would take too much time to check for all those things, right? Then again, do you need 100% absolute certainty that you are dealing with a duck? What if you simply have a creature and you want to put it down in water? In that case, it might be enough to check to see that it can swim. If it can swim, then you can put it in water. If you only need to know about one or two features, and you check those one or two features, then you know for sure that your creature has the features you need. You just checked.
A box labeled “Hammers” sounds fairly specific. You know what you can do with “Hammers”, right? You can use them to drive nails into wood. But what if the box has a label such as “Tool”? That is fairly vague. It could be a “Hammer” but it could be a “Saw” or a “Screwdriver” or a “Wrench”.
What if you got a box that was labeled “Product”? It could be a Hammer or a Tool or a Saw or a Screwdriver or a Wrench or a Jacket or a Chair or a Kettle or Korres Greek Yoghurt Moisturizing Face Cream. “Product” is very vague. It could be many different things. How will you use it? Your boss might be upset with you if you promised to show up for work with a Hammer and instead you show up with Korres Greek Yoghurt Moisturizing Face Cream.
In dynamic languages we typically investigate the structure to see what features it has. This offers us certainty: we can never be wrong about using a feature because we just checked to be sure the feature was there. This works fine when we only need to be sure of a few features. But this can be very slow when we have to check many features.
In static data-type languages, we usually trust some kind of label. If the structure has a label such as “Hammer” then we assume it is a “Hammer”. This is safe because the compiler checks these things for us. We can be certain that a structure with the label of “Hammer” really is a Hammer. But we can get into trouble when the labels are vague. If the label says “Product” we might think we really have a Hammer, and we might try to use the Product as if it was a Hammer, but we will be embarrassed if the Product was really a Jacket or a Chair or a Kettle. Our co-workers will wonder why we are trying to drive nails with a Kettle. We might even hurt ourselves.
Likewise, we might order a “Fine Feathered Friend” hoping we will get a duck. If we get a different kind of Fine Feathered Friend we will be very disappointed. But if we can trust the labels then relying on labels is much less work than Duck Typing.
Then again, if we only need to check a few features, then Duck Typing is a very good process, and reasonably fast. But if we need to check a lot of features, we will hit a bottleneck.
The rage of the Functional programmers
In some camps, Functional programming is synonymous with static data-types. Certainly, Haskell and Scala programmers feel that way. They would argue:
1.) static data-types allow us to prove our programs correct, in the same way a mathematician can prove a math theorem correct
2.) if you are going to favor immutable values, then it is senseless to avoid immutable types
The argument for #1 is that we gain safety and reliability with static data-types. The simplest counter-argument I have #1 is to point to Java. It has always had static data-types, and yet there remains a great deal of buggy Java code in this world. If static data-types really did fix all bugs, then this debate would be over and we would all be using static data-types. But in fact, static data-types only take care of a small category of bugs, and to achieve even that we have to accept a large burden of inflexibility. As with everything else in computer programming, we are confronted with a series of trade-offs, and we have to carefully pick which benefits we want, and what we are willing to pay for them.
The argument for #2 is straightforward – if our values are immutable, then the data-types of the values should also be immutable. But the counter-argument is equally straightforward – if we are going to allow any changes to our state, then we might as well allow that state to vary in both value and type. We know we need to enforce contracts to be sure that our state only changes in very specific ways, and we can leave it to those contracts to determine what changes are safe. It’s possible that we do, in fact, wish to disallow any changes of type. But that should be optional; again, if we are going to allow any changes at all, then it is senseless to argue that values can change by types can not.
Gradual Typing gives us the best of both worlds
Gradual typing is a growing trend in many computer languages:
I am a fan of Gradual Typing for reasons that I wrote about in:
That is, when facing a new problem, I like to start off with the freedom to change my types, but as I become more certain about the shape of the code, I like to add in type checks.
The most obvious argument against static data-typing is that it claims a level of confidence that it can not possibly have. Developers using static data-typing specify all data-types at the beginning of their project, before they have learned anything about the problem they need to solve. The advocates of static data-typing will make the counter-argument that it is easy to change types as one learns more about the problem. But I would assert that it is misleading to pretend that you understand the system at the beginning. I want to be honest about how ignorant I am when I begin to work on a problem that I have never encountered before.
There is a group of programmers who start off saying “There is a great deal that I do not know about this particular problem, and I am unwilling to commit myself to absolute statements of truth, while I’m still groping in the dark.” They proceed with small experiments, they see where they fail, they learn about the problem, they discover the solutions, and they add in data-type information incrementally, as they feel more confident that they understand the problem and possess a workable solution. These are the dynamic data-type advocates.
Alan Kay invented Object Oriented Programming, but he was a strong proponent of dynamic programming, exactly because he felt that programmers should remain open to all they were going to learn during a project:
Mark Tarver is a professor of computer science. He is also the inventor of the Shen programming language. Shen is a fascinating language because it has a rich type system (the type system is itself Turing Complete) which is optional. Taver’s explains the benefits of this approach:
[Some programmers feel reluctance] in migrating to statically typed languages – that they are losing something, a degree of freedom that the writer identifies as hampering creativity.
Is this true? I will argue, to a degree – yes. A type checker for a functional language is in essence, an inference engine; that is to say, it is the machine embodiment of some formal system of proof. What we know, and have known since Godel’s incompleteness proof, is that the human ability to recognise truth transcends our ability to capture it formally. In computing terms our ability to recognise something as correct predates and can transcend our attempt to formalise the logic of our program. Type checkers are not smarter than human programmers, they are simply faster and more reliable, and our willingness to be subjugated to them arises from a motivation to ensure our programs work.
That said, not all type checkers are equal. The more rudimentary and limited our formal system, the more we may have to compromise on our natural coding impulses. A powerful type system and inference engine can mitigate the constraints placed on what Racketnoob terms our creativity. At the same time a sophisticated system makes more demands of the programmer in terms of understanding. The invitation of adding types was thus taken up by myself, and the journey to making this program type secure in Shen emphasises the conclusion in this paragraph.
Dynamic languages have long been criticised for being hard to maintain at scale. When you grow to a large team or a large code base, it becomes more difficult to refactor a code base, to understand how it works, and to make sure it does what it should.
The standard solution is great testing, and obviously we at CircleCI are big fans of great automated testing. However, what we’re really about is productivity, and optional typing can make you much more productive, because you it can can provide guarantees that testing can’t. This is amazing for hard-to-test features, and to increase your coverage without spending a massive amount of time doing so.
Technically, Typed Clojure is a different form of type-checking than Java and older languages. Those languages use nominal typing, where you type-check the names or classes of a value. Although that is still used in Typed Clojure, the major feature is about structural typing, where you can type-check the structure of an object or hash or list.
This allows you to have incredibly powerful type-checks, while staying lightweight and flexible. For example, you can guarantee that:
* nil isn’t allowed
* a hash must have a defined set of keys (each with their own types!)
* a hash must have only known keys
* a value must be one of a set of types
* a vector has a known size
* many other properties about set, lists, vectors, and object instances.
I think we can all agree that whenever software reaches a certain scale, we need to be able to enforce contracts on it. And yet when a project is young, we need to value flexibility, as we are still ignorant about the eventual ideal shape of the software. So clearly, the ideal path is one of Gradual Typing, where we start off without much enforcement of data-types, and yet we become more strict as time passes.
Do we need objects to check for data-types?
Over the past year I have used Prismatic Schema extensively on a large Clojure project. We really like the aid that if offers in understanding, debugging, and using our code. However, we regularly feel a bit of a let down when an argument to a function is another function. Prismatic Schema doesn’t allow you to say much in these cases beyond: this arg is a function.
To address this we extended Prismatic Schema to allow us to add type annotations to the function arguments in higher-order functions (in addition to several other small extensions). This is done by calling s/fn which expects [output-schema & input-schemas]
Types for functions takes us back to the world of object oriented programming, for better or for worse. In theory, so long as the arguments to the function were tested when the function was created, you don’t need to re-test when passing the function itself as an argument. But it is interesting that this need keeps appearing.
We don’t have to agree that “Types for functions” is the same as Object Oriented Programming, but it doesn’t matter. There are clearly important benefits for both Duck Typing and also Label Typing. Both needs keeps appearing. What we should crave, above all else, is the flexibility to use any good idea when it seems like it would be appropriate. Adding labels to functions, so the labels can be checked, is a worthwhile endeavor in some situations.
Should “Types for functions” or types for vars be mandatory?Clearly, no, there is no benefit to that level of strictness. We get most of the benefit of the Functional style from Immutability and pure functions. We can also do a great deal of data-type checking without the need to make it mandatory. As CircleCi said above, we can go beyond the limits of standard static data-type checking, which is merely Label Typing, and we can engage in Structural Typing, checking features deep inside the structures that given to our functions. This is a powerful style, and we get most of the benefits without having to give up the flexibility of dynamic data-types.
Functional programming is not the same as static data-type checking.Source