Type Constraints as Contracts
To make our implementation of elem
compile, we changed its signature from
elem :: a -> [a] -> Bool
to
elem :: Eq a => a -> [a] -> Bool
Note that the first arrow is =>
, not ->
. The arrow ->
is used to separate
function arguments and return values from each other in a function signature.
The arrow =>
is used to separate type constraints from the function signature.
Here, the signature is a -> [a] -> Bool
. The type constraint is Eq a
.
Without type constraints, the signature a -> [a] -> Bool
is a function type
that should work for any type a
. The type constraint imposes restrictions on
a
; the function no longer works for just any type a
. Here, we require that
a
is an instance of Eq
.
For elem
, it was enough to have equality tests. Other functions may need
equality tests and arithmetic operations. Here's a (probably not particularly
meaningful) example: It takes a list xs
and a value end
. It searches for the
first occurrence of end
in xs
. If it finds such an occurrence, then it sums
all the numbers up to and including this occurrence. If end
is not contained
in xs
, we simply sum the elements in xs
:
sumUpTo xs end = go 0 xs
where
go total [] = total
go total (x:xs) | x == end = total'
| otherwise = go total' xs
where
total' = total + x
The type signature of sumUpTo
is [a] -> a -> a
. Its first argument, xs
, is
a list of a
s. Its second argument, end
, is an a
. And the return type is
also an a
. What type constraints are necessary for this function to compile.
We could try to run this through GHCi and see which type classes it complains
about. We'll do that to verify that we got it right below, but right now let's
try to figure it out ourselves.
The first pattern guard x == end
clearly needs equality testing of a
s, so we
need Eq a
. The calculation of the updated total, total' = total + x
, needs
addition, which is part of the Num
type class. Beyond that, our implementation
only performs pattern matching on the arguments of go
. Thus, we need a
to be
an instance of Eq
and Num
.
Let's see whether we got that right:
>>> :{
| sumUpTo xs end = go 0 xs
| where
| go total [] = total
| go total (x:xs) | x == end = total'
| | otherwise = go total' xs
| where
| total' = total + x
| :}
>>> :t sumUpTo
sumUpTo :: (Eq p, Num p) => [p] -> p -> p
Yup. We defined sumUpTo
without providing a type signature and let GHCi infer
the most general type that works. Remember that I mentioned early on that type
signatures for non-local functions are a good idea, but GHC(i) is perfectly
capable of inferring the correct type. This is what we exploited here. The type
signature says that sumUpTo
has type [p] -> p -> p
—GHCi chose a different
name for the type variable, but that's irrelevant—and the type p
must be an
instance of both Eq
and Num
.
Whenever we have multiple type constraints, we separate them using commas. Due
to what I consider a syntactic quirk of Haskell, multiple type constraints
cannot simply be listed as, say, Eq a, Num a
. They need to be enclosed in
parentheses. The quirk is that we can do this also for individual type
constraints, but we don't have to. Both
elem :: Eq a => a -> [a] -> Bool
elem :: (Eq a) => a -> [a] -> Bool
are fine.
Since sumUpTo
has more than one type constraint, they must be enclosed in
parentheses:
sumUpTo :: (Eq p, Num p) => [p] -> p -> p
sumUpTo
works as expected:
>>> [5,8,10,2,14,10,9] `sumUpTo` 10 == sum [5,8,10]
True
>>> [5,8,10,2,14,10,9] `sumUpTo` 15 == sum [5,8,10,2,14,10,9]
True
In the first expression, sumUpTo
should stop when it encounters the first
occurrence of 10 in the input, so the result should be sum [5,8,10]
, and it
is. In the second expression, 15 does not occur in the list, so sumUpTo
should
just sum up the whole list. Our comparison with sum [5,8,10,2,14,10,9]
confirms that it does.
Here is another function, one that's actually reasonably realistic, even though
the implementation we discuss here isn't particularly efficient. We want a
function accumById
, which takes a list of records (student records on banner,
sales records in some business application, athlete records in some training
software, etc.). We don't care what type of record it is, so we use a type
variable r
to refer to the record type, and the input list has type [r]
.
Each record has some key we care about. It may be an item number in a sales
database or an athlete ID in a list of training sessions of athletes. The second
argument of accumById
is a function of type r -> i
that extracts this ID
from a given record. Again, we don't care about the type of the ID. It can be a
string (an athlete's name) or an Int
(an item number) or any other type. We
use i
to refer to the ID type. Finally, there is some numerical field in each
record. In a sales database at a car dealership, it could be the price for which
a specific vehicle sold. Note that these price values could differ for different
sales of the same kind of car: one customer may haggle harder than another, one
customer may add some extras, etc. Again, we don't know anything about the
record type r
, so we need some function of type r -> v
to gain access to the
field we care about. We also don't care about whether our number type is an
integer, rational number, floating point number or complex number. We just
require that v
is a number type. What accumById
should produce is a list of
pairs of type [(i, v)]
, that is, ID-value pairs. There should be exactly one
entry for each unique ID in this list. The associated value should be the sum of
the values of all records with this ID.
For example, if we have a sales database:
Item Number | Sales Price |
---|---|
3 | $15 |
1 | $12 |
2 | $1 |
1 | $11 |
3 | $16 |
1 | $9 |
Then the resulting list should be [(1,32),(2,1),(3,31)]
because Item 1 was
sold three times, at costs $12, $11, and $9. The total is $32. Item 2 was
sold once, at cost $1. Item 3 was sold twice, at costs $15 and $16. The total
is $31.
Before even implementing this function, we can deduce its type signature, including its type constraints. We make the two functions to extract the ID and value of a record the first two arguments. The list of records is the third argument. So the signature becomes:
accumById :: (r -> i) -> (r -> v) -> [r] -> [(i, v)]
If we want to group records with the same IDs together, to accumulate their
associated values, the minimum requirement we need to impose is that we can test
whether two IDs are the same: Eq i
. Similarly, we want to add values of type
v
, so v
better be a number type: Num v
. Beyond that, based on the
discussion above, we shouldn't require anything else. So the complete signature
including type constraints becomes
accumById :: (Eq i, Num v) => (r -> i) -> (r -> v) -> [r] -> [(i, v)]
I chose this example to illustrate that we can have functions whose types
involve multiple type variables, and then the list of type constraints may
include constraints for each of these type variables. Here, we don't assume
anything about r
, but i
must be equality-testable, and v
must be a number
type. With these constraints, we can implement accumById
as follows:
accumById id val = go []
where
go totals [] = totals
go totals (r:rs) = go totals' rs
where
i = id r
v = val r
totals' = addToTotals totals
addToTotals [] = [(i, v)]
addToTotals (p@(i',t):ps) | i == i' = (i, t + v) : ps
| otherwise = p : addToTotals ps
go
is a function we use to "iterate" over the list of records. The first
argument is the current list of totals for all the IDs we have seen so far. For
each record r
, we add its (id r, val r)
pair to the list of totals.
Specifically, addToTotals
iterates over the list of totals currently in the
list. If it finds a total with the same ID, it adds the value of the current
record to the total for its ID. That's the i == i'
branch. Otherwise, we add
the (id r, val r)
pair to the tail of the list of totals. If we keep doing
this without finding a total for r
's ID, then we add (id r, val r)
as a new
pair to the list. That's the first equation for the empty list of totals.
If i
were a type that can be ordered, an instance of Ord
, we would be able
to obtain a faster implementation of accumById
. The one here takes quadratic
time. For Ord
types, we would be able to achieve \(O(n \lg n)\) time. So there
is a trade-off in choosing the right type constraints. An implementation that
requires i
to be an instance of Ord
would be faster, but an implementation
that requires i
to be an instance of only Eq
is applicable even to records
whose IDs do not support ordering.1
You should think of type classes as the language to write legal contracts
between the implementer of a function and the user of a function. The compiler
is the judge that checks whether both sides stick to the contract. By listing a
set of type class constraints on the type variables of a function f
, the
implementer of f
promises that f
works for arguments of any types that
satisfy these constraints. In other words, the implementer promises not to use
any functions in the implementation of f
that are not part of the listed type
classes (or functions that themselves guarantee that they work for all instances
of these type classes). If we require only that some type a
is an instance of
Eq
, then we cannot inspect values of type a
in any way other than testing
whether they're equal or using functions that work for arguments of Eq
types,
such as elem
. The compiler can check that the implementer upholds their side
of the contract by verifying that only functions in the listed type classes are
used in the implementation.
That's exactly what GHCi did when it rejected our earlier implementation of
elem
. That implementation had no type class constraints on the type a
.
Thus, we claimed that a
does not need to support any operations for the
implementation of elem
to work. The compiler checked this claim and found it
to be wrong because we did test values of type a
for equality.
The user of f
in turn needs to promise not to call f
on any arguments that
aren't instances of the required type classes. Again, the compiler can easily
check whether the user keeps this promise. For example, as we have seen before,
if we try to evaluate B 1 `elem` [B 1, B 2, B 3]
without making BannerNumber
an instance of the Eq
type class, then the compiler complains because elem
supports only types that are instances of Eq
.
Type classes, just as interfaces in Java or traits in Rust and Go, allow us to write code that is applicable to arguments of many different types while restricting the permissible types to exactly the right subset that supports the operations we need in the implementation of our function.
Now let's look at some finer points of type classes.
-
I ran into this trade-off in some code I wrote and wished that Haskell allowed me to provide multiple implementations of the same function, with increasingly stringent type constraints. When applying my function to a particular type, I wished that the compiler would pick the implementation with the most stringent constraints that are satisfied by my type. The idea was that if more stringent constraints translate into a faster implementation, then the compiler would automatically pick the fastest implementation for this particular type. Unfortunately, this is not possible. The core of the problem is that when a function definition involves multiple type constraints, then different function definitions may be incomparable by their stringency, because one definition may impose tighter constraints on the first type and less tight constraints on the second type. ↩