Type Families
This section is optional and looks at an advanced aspect of Haskell. It'll
introduce you to type families. There are actually two ways to define type
families. We'll look at only one of them, as it helps with solving a minor
problem with our Vec
type class that we have developed so far.
Here's the type class definition again, without all the comments:
class Vec v where
(|+|) :: v -> v -> v
(|-|) :: v -> v -> v
(|*|) :: Double -> v -> v
(|/|) :: v -> Double -> v
(|.|) :: v -> v -> Double
vecLen :: v -> Double
vecLen v = sqrt (v |.| v)
zeroVec :: v
The use of Double
as the return type of (|.|)
and vecLen
, and as an
argument type of (|*|)
and (|/|)
, essentially forces us to consider only
vector types whose coordinates are Double
s. But that's not always what we
want. We may just want single-precision coordinates, of type Float
, or we may
want to write some code that works with vector spaces over
\(\mathbb{Z}/2\mathbb{Z}\), the Galois-field of size 2. In that case, we'd
probably opt for coordinates of type Int
or Bool
.
Let's try to fix this. Our first attempt is to replace the coordinate type
Double
with a type variable. Let's call it c
for "coordinate":
class Vec v where
(|+|) :: v -> v -> v
(|-|) :: v -> v -> v
(|*|) :: c -> v -> v
(|/|) :: v -> c -> v
(|.|) :: v -> v -> c
vecLen :: v -> c
vecLen v = sqrt (v |.| v)
zeroVec :: v
But that doesn't quite work. The problem is that we're essentially saying that
to be a vector type, v
needs to support multiplication with any type c
.
Even multiplying a vector with a binary tree should be supported, and the result
should be a vector again. That's clearly not what we want. We only want to
support multiplication with whatever the coordinate type of our vector type v
is. So we need a way to link v
with its coordinate type c
.
The first step towards this goal is by enabling the MultiparamTypeClasses
extension of GHC. This is not exactly standard Haskell, but it might as well be
because GHC is the de facto standard Haskell compiler. To enable such a language
extension in a source code file, you have to add the pragma
{-# LANGUAGE MultiparamTypeClasses #-}
at the beginning of the file. In GHCi, you write
>>> :set -XMultiparamTypeClasses
With this, we can now define a type class Vec
that has two parameters:
class Vec v c where
(|+|) :: v -> v -> v
(|-|) :: v -> v -> v
(|*|) :: c -> v -> v
(|/|) :: v -> c -> v
(|.|) :: v -> v -> c
vecLen :: v -> c
vecLen v = sqrt (v |.| v)
zeroVec :: v
Where the old type class Vec
said that v
is a vector type, the new type
class with two parameters could be read as defining that v
is a vector type
with coordinate type c
.
This still doesn't quite work. The new problem we're running into is that nobody prevents us from defining, for example,
instance Vec Vector Double where
-- Function implementations omitted
instance Vec Vector Int where
-- Function implementations omitted
We're essentially defining that Vector
is a vector type with both
coordinates of type Double
and coordinates of type Int
. For some
multi-parameter type classes, this makes perfect sense. We would even be able to
provide sensible implementations for the two instances Vec Vector Double
and
Vec Vector Int
. The problem we run into is really just a technicality, but an
important one: Assume we have a vector v :: Vector
. Now we call vecLen v
.
What should the return type be? We have two implementations of vecLen
, one
that returns a Double
and another that returns an Int
. The compiler cannot
choose for us, so this is not allowed.
In general, to enable type checking, for any function with a polymorphic return type, the return type must be determined by the argument types of the function. That's a restriction that Haskell imposes. Let's look at a few examples:
-
foo :: [a] -> a
is okay because the element type of the list passed as an argument tofoo
tells us the return type. -
foo :: [a] -> b
is not okay. Given a list of type[a]
as argument, the return type can be anything. It's not determined by the function arguments. -
map :: (a -> b) -> [a] -> [b]
is okay. This was the type signature of ourmap
function for lists. The reason why this is okay is that the type of the elements of the return lists is determined by the return type of the function passed tomap
as the first argument.
In our definition of the Vec v c
type class, vecLen :: v -> c
is not okay
because the type c
is not determined by the argument type v
.
One approach in GHC to deal with this is to extend the syntax for
multi-parameter type classes using the FunctionalDependencies
extension:
>>> :set -XFunctionalDependencies
This allows us to write the following:
class Vec v c | v -> c where
(|+|) :: v -> v -> v
(|-|) :: v -> v -> v
(|*|) :: c -> v -> v
(|/|) :: v -> c -> v
(|.|) :: v -> v -> c
vecLen :: v -> c
vecLen v = sqrt (v |.| v)
zeroVec :: v
The v -> c
part in the class definition says that if we know v
, then we know
c
: c
is fully determined by v
. This certainly makes vecLen :: v -> c
legal because it just says what we need to be true: The return type c
of the
function is now determined by the argument type v
.
In practice, once we impose the restriction v -> c
on the Vec v c
type
class, we can no longer define two instance of Vec
for the same type v
.
So this no longer works:
instance Vec Vector Double where
-- Function implementations omitted
instance Vec Vector Int where
-- Function implementations omitted
We can only have one of these two instances. Thus, if we try to call vecLen v
for some value v :: Vector
, the compiler only needs to find an instance
Vec Vector c
. Due to the functional dependency, there can be only one such
instance. This instance tells us what c
is, so c
is indeed determined by the
argument type in this case.
Functional dependencies are syntactically light-weight, but in some scenarios,
type families are the better tool. Let's look at these next. Let's review what
we want to express. We don't want a type class Vec v c
that says v
is a
vector type with coordinate type c
. We really want our original version of
Vec
that simply says that v
is a vector type, period. Just one type
variable. This vector type v
has an associated coordinate type. This is the
crux: We want to be able to express that every instance v
of the Vec
type
class has some associated coordinate type. To do so, we need to enable the
TypeFamilies
extension. In fact, our implementation of the Vec
type class
below needs another extension, DefaultSignatures
:
>>> :set -XTypeFamilies -XDefaultSignatures
And now we can define
class Vec v where
type Coord v :: *
(|+|) :: v -> v -> v
(|-|) :: v -> v -> v
(|*|) :: Coord v -> v -> v
(|/|) :: v -> Coord v -> v
(|.|) :: v -> v -> Coord v
vecLen :: v -> Coord v
default vecLen :: Floating (Coord v) => v -> Coord v
vecLen v = sqrt (v |.| v)
zeroVec :: v
The important lines have been highlighted. First, we define that our vector type
v
has an associated coordinate type Coord v
. We also need to specify the
kind of the type. See the discussion of kinds in an
earlier aside to understand
what kinds are. As a reminder, *
is the kind of all types that can have
values. Any other kind is a type constructor kind, such as * -> *
for the
Maybe
type constructor.
A type class can have multiple associated types, and the associated types do not
have to have kind *
, but here we only need one associated type Coord v
,
which is just a plain old type, of kind *
.
With this type declaration in place, we changed the type signatures of (|*|)
and (|/|)
so they don't take a scalar of just any type c
as argument, but
one of whatever the coordinate type of v
is. Similarly, (|.|)
and vecLen
return a value of whatever the coordinate type of v
is.
With the changes discussed so far, we have maneuvered ourselves into another
pickle: The default implementation of vecLen
takes the square root of
v |.| v
. This works only for Floating
types:
>>> :info sqrt
type Floating :: * -> Constraint
class Fractional a => Floating a where
...
sqrt :: a -> a
...
-- Defined in ‘GHC.Float’
Floating
is a subclass of Fractional
that we haven't discussed before. It's
the class of all floating point number types. sqrt
can't be defined for all
Fractional
types because rational numbers are Fractional
and in general do
not support taking square roots. For this, we need real numbers, which include
irrational numbers, and floating point numbers are our approximation of real
numbers.
Since sqrt
is defined only for Floating
types, we need to impose the
constraint that Coord v
is a Floating
type. We could add the constraint
class Floating (Coord v) => Vec v where
...
but this means that we can no longer define vector types with Int
or Bool
coordinate types, as in our example of vectors over \(\mathbb{Z}/2\mathbb{Z}\)
above. So let's not do that.
The next possible fix is to add the constraint Fractional (Coord v)
only to
the type signature of vecLen
:
vecLen :: Fractional (Coord v) => v -> Coord v
This is better. We can now define vector types with Int
or Bool
coordinate
types, but they still don't support any vecLen
method.
The correct fix is enabled by the DefaultSignatures
extension, which allows us
to provide two type signatures for vecLen
:
vecLen :: v -> Coord v
default vecLen :: Floating (Coord v) => v -> Coord v
This says that vecLen
has type v -> Coord v
and that the default
implementation has type Floating (Coord v) => v -> Coord v
, that is, the
default implementation can be used only for vector types that satisfy
Floating (Coord v)
. Thus, if Coord v = Double
, we can use the default
implementation of vecLen
and don't need to provide a custom implementation of
vecLen
for our vector type v
. If Coord v = Bool
, then Floating Bool
is
not satisfied, so we cannot use the default implementation. In this case, the
instance definition of Vec v
must provide its own implementation of vecLen
.
Let's put all this into action now. We'll define three class instances as examples.
First, we consider our geographic location type again, but we change it to use
Float
coordinates:
data Location = Loc { lat, long :: Float }
instance Vec Location where
type Coord Location = Float
Loc lat1 long1 |+| Loc lat2 long2 = Loc (lat1 + lat2) (long1 + long2)
Loc lat1 long1 |-| Loc lat2 long2 = Loc (lat1 - lat2) (long1 - long2)
s |*| Loc lat long = Loc (s * lat ) (s * long )
Loc lat long |/| s = Loc (lat / s ) (long / s )
Loc lat1 long1 |.| Loc lat2 long2 = lat1 * lat2 + long1 * long2
zeroVec = Loc 0 0
The only change in the instance definition is the addition of the highlighted line.
Similarly, our Vector
instance for Point
s only needs to add the definition
of the coordinate type of Point
s:
data Point = P { x, y z :: Double }
instance Vec Point where
type Coord Point = Double
P x1 y1 z1 |+| P x2 y2 z2 = P (x1 + x2) (y1 + y2) (z1 + z2)
P x1 y1 z1 |-| P x2 y2 z2 = P (x1 - x2) (y1 - y2) (z1 - z2)
s |*| P x y z = P (s * x ) (s * y ) (s * z )
P x y z |/| s = P (x / s ) (y / s ) (z / s )
P x1 y1 z1 |.| P x2 y2 z2 = x1 * x2 + y1 * y2 + z1 * z2
zeroVec = P 0 0 0
And, finally, we can also support vector spaces over \(\mathbb{Z}/2\mathbb{Z}\):
data GF2Vec = GV { gvCoords :: [Bool] }
instance Vec GF2Vec where
type Coord GF2Vec = Bool
GV v1 |+| GV v2 = GV $ zipWithExtend (/=) v1 v2
(|-|) = (|+|)
s |*| GV v = GV $ map (s &&) v
GV v |/| False = error "Cannot divide by 0"
v |/| True = v
GV v1 |.| GV v2 = foldl (/=) False $ zipWith (&&) v1 v2
vecLen (GV v) = or v
zeroVec = GV []
Let's take this apart. Arithmetic over \(\mathbb{Z}/2\mathbb{Z}\) is integer
arithmetic modulo 2. If we use False
to represent 0, and True
to represent
1, then we have False + False = False
, False + True = True
,
True + False = True
, and True + True = False
(because 1 + 1 = 2, which is 0
modulo 2). Note that this means that x + y
produces the same result as
x /= y
, and we can use (/=)
as the addition operator over
\(\mathbb{Z}/2\mathbb{Z}\). Since vector addition should be coordinate-wise
addition, we obtain that GV v1 |+| GV v2 = GV $ zipWithExtend (/=) v1 v2
.
Since we have \(1 = -1 \pmod 2\) and subtracting 1 is the same as adding –1, we
conclude that adding two vectors over \(\mathbb{Z}/2\mathbb{Z}\) is the same as
subtracting them: (|-|) = (|+|)
. Boolean "and" is the same as multiplication
over \(\mathbb{Z}/2\mathbb{Z}\). Thus, we implement scalar multiplication using
(&&)
. Division is a bit weird: We can't divide by 0, and dividing by 1 gives
the original result back. That's what our implementation says. For the inner
product, we need to pairwise multiply the coordinates of the two vectors and
then sum the resulting products. Since (/=)
is addition for Booleans and
(&&)
is multiplication, we get
GV v1 |.| GV v2 = foldl (/=) False $ zipWith (&&) v1 v2
as the analog of the
definition V v1 |.| V v2 = foldl (+) 0 $ zipWith (*) v1 v2
for the
Vector
type. We can't really represent vector lengths over
\(\mathbb{Z}/2\mathbb{Z}\). The only thing we can do is distinguish between the
0-vector and a non-0 vector. So we want vecLen v = False
if v
is the
0-vector, and vecLen v = True
otherwise. In other words, we want that
veclen v = True
if there is at least one True
coordinate in v
:
vecLen (GV v) = or v
. The 0-vector is once again defined as GV []
because we
treat absent coordinates as zeroes.