Does Python need strict data-type enforcement?

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com

Andrew Montalenti has a long post voicing his view of data-type enforcement in Python. He seems to lean against strictness (I will quote him at length below). I am mostly ignorant about Python and I would not want to match my knowledge of Python against his, but I have seen this debate occur in other languages, so I’m going to write about my own experience adding types to dynamic languages.

For the last 2 years I’ve been working with Clojure, which is a dynamic language. I found that it was helpful to me to do type checking while the software was under development. I’ll post an example below.

I wrote a multi-method for interacting with my database. This multi-method gives me polymorphic dispatch depending on whether I want to do an insert, an update, a delete, or a select. I found it was important to do a lot of checking on the dispatch function, so I ended up adding a lot of asserts. The asserts run while the software is under development. When I’m ready to go to production, I can pass a flag to the compiler that strips out all of the asserts:

(defn- which-query [context-wrapper-for-database-call]
{:pre [
(map? context-wrapper-for-database-call)
(string? (:name-of-collection context-wrapper-for-database-call))
(map? (:where-clause-map context-wrapper-for-database-call))
(map? (:document context-wrapper-for-database-call))
(keyword? (:query-name context-wrapper-for-database-call))
(if (:_id (:document context-wrapper-for-database-call))
(string? (:_id (:document context-wrapper-for-database-call)))
true)
(if (:_id (:where-clause-map context-wrapper-for-database-call))
(string? (:_id (:where-clause-map context-wrapper-for-database-call)))
true)
(if (:created-at (:document context-wrapper-for-database-call))
(= (type (:created-at (:document context-wrapper-for-database-call)) org.joda.time.DateTime))
true)
(if (:updated-at (:document context-wrapper-for-database-call))
(= (type (:updated-at (:document context-wrapper-for-database-call)) org.joda.time.DateTime))
true)
(if (:database-fields-to-return-vector context-wrapper-for-database-call)
(vector? (:database-fields-to-return-vector context-wrapper-for-database-call))
true)
]}
(:query-name context-wrapper-for-database-call))

When I go to production, the compiler shrinks this to:

(defn- which-query [context-wrapper-for-database-call]
(:query-name context-wrapper-for-database-call))

For those of you who don’t know Clojure “defn” creates a function, and “defn-” creates a private function that can only be called by other functions in this same namespace. In this case, the function lives in a namespace called “persistence” and it should only be called by code in that namespace.

Some Lisps have a data-type system, but Clojure started without a data-type enforcement system. Ambrose Bonnaire-Sergeant has built “Typed Clojure” . Peter Fraenkel has written some great posts about it.

I have found that, for me, writing asserts on functions does most of what people need from unit testing, with less ceremony. And I have become much more sympathetic to the use of strict typing. In my blog post “How ignorant am I, and how do I formally specify that in my code?” I wrote that I preferred an optional type system where I could add in types once I felt I understood a problem. That blog post was about the conflict between the strict-type advocates, who insist that all types be defined at the start of any work, versus those (including myself) who would prefer to begin with a dynamically typed system and then move to adding types as my certainty of the problem grows. As I wrote of my side of the debate:

“the other group of programmers start off saying “There is a great deal that I do not know about this particular problem, and I am unwilling to commit myself to absolute statements of truth, while I’m still groping in the dark.” They proceed with small experiments, they see where they fail, they learn about the problem, they discover the solutions, and they add in type information incrementally, as they feel more confident that they understand the problem and possess a workable solution. These are the optional type advocates.”

The Python project Mypy seems to be following a path very similar to Typed Clojure:

31 Aug 2014: Mypy no longer runs your programs, it just type checks them. Use a Python interpreter to run programs. -Jukka

Typed Clojure is written so the compiler will throw warnings at compile time, it does not do much else, so it is similar to Mypy. The syntax in Mypy seems straight forward to read (vaguely similar to Haskell).

If I read Andrew correctly, he dislikes the syntax suggested by Mypy, because he’d prefer “Some sort of composability of annotations.”

Andrew says:

I might suggest that the best we could do is let Python annotations support type annotations not for a type checking / linting / IDE use case, but merely for a documentation use case.

People make the same arguments for unit testing: it helps with documentation, and it catches certain kinds of mistakes. That is my attitude toward type checking in a dynamic language. Somewhere in my code, I want to be able to check the contract on a function, and I’m often annoyed by the labor that unit tests demand of me, in particular when I have to mock a great deal of fake functionality to test one function in isolation.

On a different topic, if Python moved to strict-typing, that would offer performance benefits, but only if Python adopted 2 other things:

1.) a compiler that understood the type enforcement language, and I am under the impression that Python does not currently have that.

2.) programmers who wrote code which avoided the dynamic use of variables (their data-types would not be free to change). I am under the impression that this would break most Python libraries.

My knowledge of Python is weak, but Andrew’s example of a way forward with annotations looks verbose (though when I look at PEP 3107, it appears the verbosity is a problem of 3107):

We could work around PEP 3107′s syntax by mostly ignoring it. For example, let’s say we created a class called “Annotation” and a corresponding “ann” function. Annotations could be declared thusly:

foo_args = Annotation(std_type_check=list, doc=”arguments”)
foo_kwargs = Annotation(std_type_check=dict, doc=”keyword arguments
foo_return = Annotation(std_type_check=Bar, doc=”a bar instance)

def foo(*args: foo_args, **kwargs: foo_kwargs) -> foo_return:

del foo_args, foo_kwargs, foo_return
… or …

def foo(*args: ann(std_type_check=list, doc=”arguments”),
**kwargs: ann(std_type_check=list, doc=”keyword arguments”)) \
-> ann(std_type_check=Bar, doc=”a Bar instance”):

The keyword arguments that the Annotation class could take could be registered dynamically by frameworks. For example, if you import foo.stdtypecheck, then std_type_check becomes available. This is due to a registration mechanism. Of course, these aren’t very heavily namespaced, but they could be if we were to tweak the design a bit.

I wish he had said how this made annotations more composable. I don’t know enough Python to understand the advantages of his system. He does make this point, which seems solid to me:

An aside: my personal opinion is that the function annotation syntax is very, very ugly, and will quickly clutter function definitions so as to make them totally unreadable. If you compare how annotations would look using the PEP 3107 syntax to the kinds of ‘annotation decorators’ you find in the Pyanno project, you can see exactly what I mean.

I click through to the Pyanno project and I see this, which is certainly readable:

@returnType(int, int)
@parameterTypes(int, int)
def divisionAndModulus(numerator, denominator):
   # return quotient and remainder of numerator and denominator
   quotient, remainder = divmod(numerator, denominator)
   return quotient, remainder

But is that composable? “Composable” can mean a few different things. I have not worked with composable data-type annotations (where the data-type specification is a language unto itself and can be composed as easily as functions are composed) but I know the experimental language Shen provides such abilities:

(datatype env
____________
[] : env;

Var : symbol; Val : expr; Env : env;
====================================
[[Var | Val] | Env] : env;)

The structure of this rule reflects the recursive nature of the datatype; a base case (embodied in the first rule) states that an empty list is an environment and a recursive case (embodied in an LR rule) states the recursive condition. The ==== is the keyboard substitute for the LR notation of Reeves and Clarke.

A truly composable type system is one where where you can use a rule for a type as part of the definition of a new rule. I understand Shen takes this very far (its type specification system is Turing complete language unto itself). I am looking for a reason to use Shen in a project, just so I can learn a bit more about it.

I am curious if that is part of what Andrew meant when he spoke of “composable”. I can easily believe the word has a specific meaning that would be easily understood by anyone in the Python community. He describes the problem somewhat here:

There is a huge flaw with the creation of Python annotations, IMO. Lack of composability. The problem only arises when you consider that at some point in the future, there may be more than one use case for function annotations (as the PEP suggests). For example, let’s say that in my code, I use function annotations both for documentation and for optional run-time type checking. If I have a framework that expects all the annotations on my function definition to be docstrings, and another framework that expects all the annotations to be classes, how do I annotate my function with both documentation and type checks? This amounts to lack of a standard for layering function annotations.

With Clojure, a var can have a “:doc” string that is part of the meta-data of the var, but this has nothing to do with the annotations on the var. For instance:

(defn
^{:doc “mymax [xs+] gets the maximum value in xs using > ”
:test (fn []
(assert (= 42 (mymax 2 42 5 4))))
:user/comment “this is the best fn ever!”}
mymax
([x] x)
([x y] (if (> x y) x y))
([x y & more]
(reduce mymax (mymax x y) more)))

I agree with Andrew, it is a bad idea to use annotations for both human-readable strings, and data-type specification. Clojure separates the 2 uses, and perhaps Python should follow likewise.

Source