Type Constraints as Contracts

To make our implementation of elem compile, we changed its signature from

elem :: a -> [a] -> Bool

to

elem :: Eq a => a -> [a] -> Bool

Note that the first arrow is =>, not ->. The arrow -> is used to separate function arguments and return values from each other in a function signature. The arrow => is used to separate type constraints from the function signature. Here, the signature is a -> [a] -> Bool. The type constraint is Eq a.

Without type constraints, the signature a -> [a] -> Bool is a function type that should work for any type a. The type constraint imposes restrictions on a; the function no longer works for just any type a. Here, we require that a is an instance of Eq.

For elem, it was enough to have equality tests. Other functions may need equality tests and arithmetic operations. Here's a (probably not particularly meaningful) example: It takes a list xs and a value end. It searches for the first occurrence of end in xs. If it finds such an occurrence, then it sums all the numbers up to and including this occurrence. If end is not contained in xs, we simply sum the elements in xs:

sumUpTo xs end = go 0 xs
  where
    go total []                 = total
    go total (x:xs) | x == end  = total'
                    | otherwise = go total' xs
      where
        total' = total + x

The type signature of sumUpTo is [a] -> a -> a. Its first argument, xs, is a list of as. Its second argument, end, is an a. And the return type is also an a. What type constraints are necessary for this function to compile. We could try to run this through GHCi and see which type classes it complains about. We'll do that to verify that we got it right below, but right now let's try to figure it out ourselves.

The first pattern guard x == end clearly needs equality testing of as, so we need Eq a. The calculation of the updated total, total' = total + x, needs addition, which is part of the Num type class. Beyond that, our implementation only performs pattern matching on the arguments of go. Thus, we need a to be an instance of Eq and Num.

Let's see whether we got that right:

GHCi

>>> :{
  | sumUpTo xs end = go 0 xs
  |   where
  |     go total []                 = total
  |     go total (x:xs) | x == end  = total'
  |                     | otherwise = go total' xs
  |       where
  |         total' = total + x
  | :}
>>> :t sumUpTo
sumUpTo :: (Eq p, Num p) => [p] -> p -> p

Yup. We defined sumUpTo without providing a type signature and let GHCi infer the most general type that works. Remember that I mentioned early on that type signatures for non-local functions are a good idea, but GHC(i) is perfectly capable of inferring the correct type. This is what we exploited here. The type signature says that sumUpTo has type [p] -> p -> p—GHCi chose a different name for the type variable, but that's irrelevant—and the type p must be an instance of both Eq and Num.

Whenever we have multiple type constraints, we separate them using commas. Due to what I consider a syntactic quirk of Haskell, multiple type constraints cannot simply be listed as, say, Eq a, Num a. They need to be enclosed in parentheses. The quirk is that we can do this also for individual type constraints, but we don't have to. Both

elem ::  Eq a  => a -> [a] -> Bool
elem :: (Eq a) => a -> [a] -> Bool

are fine.

Since sumUpTo has more than one type constraint, they must be enclosed in parentheses:

sumUpTo :: (Eq p, Num p) => [p] -> p -> p

sumUpTo works as expected:

GHCi

>>> [5,8,10,2,14,10,9] `sumUpTo` 10 == sum [5,8,10]
True
>>> [5,8,10,2,14,10,9] `sumUpTo` 15 == sum [5,8,10,2,14,10,9]
True

In the first expression, sumUpTo should stop when it encounters the first occurrence of 10 in the input, so the result should be sum [5,8,10], and it is. In the second expression, 15 does not occur in the list, so sumUpTo should just sum up the whole list. Our comparison with sum [5,8,10,2,14,10,9] confirms that it does.

Here is another function, one that's actually reasonably realistic, even though the implementation we discuss here isn't particularly efficient. We want a function accumById, which takes a list of records (student records on banner, sales records in some business application, athlete records in some training software, etc.). We don't care what type of record it is, so we use a type variable r to refer to the record type, and the input list has type [r]. Each record has some key we care about. It may be an item number in a sales database or an athlete ID in a list of training sessions of athletes. The second argument of accumById is a function of type r -> i that extracts this ID from a given record. Again, we don't care about the type of the ID. It can be a string (an athlete's name) or an Int (an item number) or any other type. We use i to refer to the ID type. Finally, there is some numerical field in each record. In a sales database at a car dealership, it could be the price for which a specific vehicle sold. Note that these price values could differ for different sales of the same kind of car: one customer may haggle harder than another, one customer may add some extras, etc. Again, we don't know anything about the record type r, so we need some function of type r -> v to gain access to the field we care about. We also don't care about whether our number type is an integer, rational number, floating point number or complex number. We just require that v is a number type. What accumById should produce is a list of pairs of type [(i, v)], that is, ID-value pairs. There should be exactly one entry for each unique ID in this list. The associated value should be the sum of the values of all records with this ID.

For example, if we have a sales database:

Item Number	Sales Price
3	$15
1	$12
2	$1
1	$11
3	$16
1	$9

Then the resulting list should be [(1,32),(2,1),(3,31)] because Item 1 was sold three times, at costs $12, $11, and $9. The total is $32. Item 2 was sold once, at cost $1. Item 3 was sold twice, at costs $15 and $16. The total is $31.

Before even implementing this function, we can deduce its type signature, including its type constraints. We make the two functions to extract the ID and value of a record the first two arguments. The list of records is the third argument. So the signature becomes:

accumById :: (r -> i) -> (r -> v) -> [r] -> [(i, v)]

If we want to group records with the same IDs together, to accumulate their associated values, the minimum requirement we need to impose is that we can test whether two IDs are the same: Eq i. Similarly, we want to add values of type v, so v better be a number type: Num v. Beyond that, based on the discussion above, we shouldn't require anything else. So the complete signature including type constraints becomes

accumById :: (Eq i, Num v) => (r -> i) -> (r -> v) -> [r] -> [(i, v)]

I chose this example to illustrate that we can have functions whose types involve multiple type variables, and then the list of type constraints may include constraints for each of these type variables. Here, we don't assume anything about r, but i must be equality-testable, and v must be a number type. With these constraints, we can implement accumById as follows:

accumById id val = go []
  where
    go totals []     = totals
    go totals (r:rs) = go totals' rs
      where
        i       = id  r
        v       = val r
        totals' = addToTotals totals

        addToTotals [] = [(i, v)]
        addToTotals (p@(i',t):ps) | i == i'   = (i, t + v) : ps
                                  | otherwise = p : addToTotals ps

go is a function we use to "iterate" over the list of records. The first argument is the current list of totals for all the IDs we have seen so far. For each record r, we add its (id r, val r) pair to the list of totals. Specifically, addToTotals iterates over the list of totals currently in the list. If it finds a total with the same ID, it adds the value of the current record to the total for its ID. That's the i == i' branch. Otherwise, we add the (id r, val r) pair to the tail of the list of totals. If we keep doing this without finding a total for r's ID, then we add (id r, val r) as a new pair to the list. That's the first equation for the empty list of totals.

If i were a type that can be ordered, an instance of Ord, we would be able to obtain a faster implementation of accumById. The one here takes quadratic time. For Ord types, we would be able to achieve $O(n \lg n)$ time. So there is a trade-off in choosing the right type constraints. An implementation that requires i to be an instance of Ord would be faster, but an implementation that requires i to be an instance of only Eq is applicable even to records whose IDs do not support ordering.¹

You should think of type classes as the language to write legal contracts between the implementer of a function and the user of a function. The compiler is the judge that checks whether both sides stick to the contract. By listing a set of type class constraints on the type variables of a function f, the implementer of f promises that f works for arguments of any types that satisfy these constraints. In other words, the implementer promises not to use any functions in the implementation of f that are not part of the listed type classes (or functions that themselves guarantee that they work for all instances of these type classes). If we require only that some type a is an instance of Eq, then we cannot inspect values of type a in any way other than testing whether they're equal or using functions that work for arguments of Eq types, such as elem. The compiler can check that the implementer upholds their side of the contract by verifying that only functions in the listed type classes are used in the implementation.

That's exactly what GHCi did when it rejected our earlier implementation of elem. That implementation had no type class constraints on the type a. Thus, we claimed that a does not need to support any operations for the implementation of elem to work. The compiler checked this claim and found it to be wrong because we did test values of type a for equality.

The user of f in turn needs to promise not to call f on any arguments that aren't instances of the required type classes. Again, the compiler can easily check whether the user keeps this promise. For example, as we have seen before, if we try to evaluate B 1 `elem` [B 1, B 2, B 3] without making BannerNumber an instance of the Eq type class, then the compiler complains because elem supports only types that are instances of Eq.

Type classes, just as interfaces in Java or traits in Rust and Go, allow us to write code that is applicable to arguments of many different types while restricting the permissible types to exactly the right subset that supports the operations we need in the implementation of our function.

Now let's look at some finer points of type classes.

I ran into this trade-off in some code I wrote and wished that Haskell allowed me to provide multiple implementations of the same function, with increasingly stringent type constraints. When applying my function to a particular type, I wished that the compiler would pick the implementation with the most stringent constraints that are satisfied by my type. The idea was that if more stringent constraints translate into a faster implementation, then the compiler would automatically pick the fastest implementation for this particular type. Unfortunately, this is not possible. The core of the problem is that when a function definition involves multiple type constraints, then different function definitions may be incomparable by their stringency, because one definition may impose tighter constraints on the first type and less tight constraints on the second type. ↩