Type Families

This section is optional and looks at an advanced aspect of Haskell. It'll introduce you to type families. There are actually two ways to define type families. We'll look at only one of them, as it helps with solving a minor problem with our Vec type class that we have developed so far.

Here's the type class definition again, without all the comments:

class Vec v where
    (|+|) :: v -> v -> v
    (|-|) :: v -> v -> v
    (|*|) :: Double -> v -> v
    (|/|) :: v -> Double -> v
    (|.|) :: v -> v -> Double

    vecLen :: v -> Double
    vecLen v = sqrt (v |.| v)

    zeroVec :: v

The use of Double as the return type of (|.|) and vecLen, and as an argument type of (|*|) and (|/|), essentially forces us to consider only vector types whose coordinates are Doubles. But that's not always what we want. We may just want single-precision coordinates, of type Float, or we may want to write some code that works with vector spaces over $\mathbb{Z}/2\mathbb{Z}$, the Galois-field of size 2. In that case, we'd probably opt for coordinates of type Int or Bool.

Let's try to fix this. Our first attempt is to replace the coordinate type Double with a type variable. Let's call it c for "coordinate":

class Vec v where
    (|+|) :: v -> v -> v
    (|-|) :: v -> v -> v
    (|*|) :: c -> v -> v
    (|/|) :: v -> c -> v
    (|.|) :: v -> v -> c

    vecLen :: v -> c
    vecLen v = sqrt (v |.| v)

    zeroVec :: v

But that doesn't quite work. The problem is that we're essentially saying that to be a vector type, v needs to support multiplication with any type c. Even multiplying a vector with a binary tree should be supported, and the result should be a vector again. That's clearly not what we want. We only want to support multiplication with whatever the coordinate type of our vector type v is. So we need a way to link v with its coordinate type c.

The first step towards this goal is by enabling the MultiparamTypeClasses extension of GHC. This is not exactly standard Haskell, but it might as well be because GHC is the de facto standard Haskell compiler. To enable such a language extension in a source code file, you have to add the pragma

{-# LANGUAGE MultiparamTypeClasses #-}

at the beginning of the file. In GHCi, you write

GHCi

>>> :set -XMultiparamTypeClasses

With this, we can now define a type class Vec that has two parameters:

class Vec v c where
    (|+|) :: v -> v -> v
    (|-|) :: v -> v -> v
    (|*|) :: c -> v -> v
    (|/|) :: v -> c -> v
    (|.|) :: v -> v -> c

    vecLen :: v -> c
    vecLen v = sqrt (v |.| v)

    zeroVec :: v

Where the old type class Vec said that v is a vector type, the new type class with two parameters could be read as defining that v is a vector type with coordinate type c.

This still doesn't quite work. The new problem we're running into is that nobody prevents us from defining, for example,

instance Vec Vector Double where
    -- Function implementations omitted

instance Vec Vector Int where
    -- Function implementations omitted

We're essentially defining that Vector is a vector type with both coordinates of type Double and coordinates of type Int. For some multi-parameter type classes, this makes perfect sense. We would even be able to provide sensible implementations for the two instances Vec Vector Double and Vec Vector Int. The problem we run into is really just a technicality, but an important one: Assume we have a vector v :: Vector. Now we call vecLen v. What should the return type be? We have two implementations of vecLen, one that returns a Double and another that returns an Int. The compiler cannot choose for us, so this is not allowed.

In general, to enable type checking, for any function with a polymorphic return type, the return type must be determined by the argument types of the function. That's a restriction that Haskell imposes. Let's look at a few examples:

foo :: [a] -> a is okay because the element type of the list passed as an argument to foo tells us the return type.
foo :: [a] -> b is not okay. Given a list of type [a] as argument, the return type can be anything. It's not determined by the function arguments.
map :: (a -> b) -> [a] -> [b] is okay. This was the type signature of our map function for lists. The reason why this is okay is that the type of the elements of the return lists is determined by the return type of the function passed to map as the first argument.

In our definition of the Vec v c type class, vecLen :: v -> c is not okay because the type c is not determined by the argument type v.

One approach in GHC to deal with this is to extend the syntax for multi-parameter type classes using the FunctionalDependencies extension:

GHCi

>>> :set -XFunctionalDependencies

This allows us to write the following:

class Vec v c | v -> c where
    (|+|) :: v -> v -> v
    (|-|) :: v -> v -> v
    (|*|) :: c -> v -> v
    (|/|) :: v -> c -> v
    (|.|) :: v -> v -> c

    vecLen :: v -> c
    vecLen v = sqrt (v |.| v)

    zeroVec :: v

The v -> c part in the class definition says that if we know v, then we know c: c is fully determined by v. This certainly makes vecLen :: v -> c legal because it just says what we need to be true: The return type c of the function is now determined by the argument type v.

In practice, once we impose the restriction v -> c on the Vec v c type class, we can no longer define two instance of Vec for the same type v. So this no longer works:

instance Vec Vector Double where
    -- Function implementations omitted

instance Vec Vector Int where
    -- Function implementations omitted

We can only have one of these two instances. Thus, if we try to call vecLen v for some value v :: Vector, the compiler only needs to find an instance Vec Vector c. Due to the functional dependency, there can be only one such instance. This instance tells us what c is, so c is indeed determined by the argument type in this case.

Functional dependencies are syntactically light-weight, but in some scenarios, type families are the better tool. Let's look at these next. Let's review what we want to express. We don't want a type class Vec v c that says v is a vector type with coordinate type c. We really want our original version of Vec that simply says that v is a vector type, period. Just one type variable. This vector type v has an associated coordinate type. This is the crux: We want to be able to express that every instance v of the Vec type class has some associated coordinate type. To do so, we need to enable the TypeFamilies extension. In fact, our implementation of the Vec type class below needs another extension, DefaultSignatures:

GHCi

>>> :set -XTypeFamilies -XDefaultSignatures

And now we can define

class Vec v where
    type Coord v :: *

    (|+|) :: v -> v -> v
    (|-|) :: v -> v -> v
    (|*|) :: Coord v -> v -> v
    (|/|) :: v -> Coord v -> v
    (|.|) :: v -> v -> Coord v

    vecLen :: v -> Coord v
    default vecLen :: Floating (Coord v) => v -> Coord v
    vecLen v = sqrt (v |.| v)

    zeroVec :: v

The important lines have been highlighted. First, we define that our vector type v has an associated coordinate type Coord v. We also need to specify the kind of the type. See the discussion of kinds in an earlier aside to understand what kinds are. As a reminder, * is the kind of all types that can have values. Any other kind is a type constructor kind, such as * -> * for the Maybe type constructor.

A type class can have multiple associated types, and the associated types do not have to have kind *, but here we only need one associated type Coord v, which is just a plain old type, of kind *.

With this type declaration in place, we changed the type signatures of (|*|) and (|/|) so they don't take a scalar of just any type c as argument, but one of whatever the coordinate type of v is. Similarly, (|.|) and vecLen return a value of whatever the coordinate type of v is.

With the changes discussed so far, we have maneuvered ourselves into another pickle: The default implementation of vecLen takes the square root of v |.| v. This works only for Floating types:

GHCi

>>> :info sqrt
type Floating :: * -> Constraint
class Fractional a => Floating a where
  ...
  sqrt :: a -> a
  ...
    -- Defined in ‘GHC.Float’

Floating is a subclass of Fractional that we haven't discussed before. It's the class of all floating point number types. sqrt can't be defined for all Fractional types because rational numbers are Fractional and in general do not support taking square roots. For this, we need real numbers, which include irrational numbers, and floating point numbers are our approximation of real numbers.

Since sqrt is defined only for Floating types, we need to impose the constraint that Coord v is a Floating type. We could add the constraint

class Floating (Coord v) => Vec v where
    ...

but this means that we can no longer define vector types with Int or Bool coordinate types, as in our example of vectors over $\mathbb{Z}/2\mathbb{Z}$ above. So let's not do that.

The next possible fix is to add the constraint Fractional (Coord v) only to the type signature of vecLen:

vecLen :: Fractional (Coord v) => v -> Coord v

This is better. We can now define vector types with Int or Bool coordinate types, but they still don't support any vecLen method.

The correct fix is enabled by the DefaultSignatures extension, which allows us to provide two type signatures for vecLen:

vecLen :: v -> Coord v
default vecLen :: Floating (Coord v) => v -> Coord v

This says that vecLen has type v -> Coord v and that the default implementation has type Floating (Coord v) => v -> Coord v, that is, the default implementation can be used only for vector types that satisfy Floating (Coord v). Thus, if Coord v = Double, we can use the default implementation of vecLen and don't need to provide a custom implementation of vecLen for our vector type v. If Coord v = Bool, then Floating Bool is not satisfied, so we cannot use the default implementation. In this case, the instance definition of Vec v must provide its own implementation of vecLen.

Let's put all this into action now. We'll define three class instances as examples.

First, we consider our geographic location type again, but we change it to use Float coordinates:

data Location = Loc { lat, long :: Float }

instance Vec Location where
    type Coord Location = Float

    Loc lat1 long1 |+| Loc lat2 long2 = Loc (lat1 + lat2) (long1 + long2)
    Loc lat1 long1 |-| Loc lat2 long2 = Loc (lat1 - lat2) (long1 - long2)
    s              |*| Loc lat  long  = Loc (s    * lat ) (s     * long )
    Loc lat  long  |/| s              = Loc (lat  / s   ) (long  / s    )
    Loc lat1 long1 |.| Loc lat2 long2 = lat1 * lat2 + long1 * long2
    zeroVec                           = Loc 0 0

The only change in the instance definition is the addition of the highlighted line.

Similarly, our Vector instance for Points only needs to add the definition of the coordinate type of Points:

data Point = P { x, y z :: Double }

instance Vec Point where
    type Coord Point = Double

    P x1 y1 z1 |+| P x2 y2 z2 = P (x1 + x2) (y1 + y2) (z1 + z2)
    P x1 y1 z1 |-| P x2 y2 z2 = P (x1 - x2) (y1 - y2) (z1 - z2)
    s          |*| P x  y  z  = P (s  * x ) (s  * y ) (s  * z )
    P x  y  z  |/| s          = P (x  / s ) (y  / s ) (z  / s )
    P x1 y1 z1 |.| P x2 y2 z2 = x1 * x2 + y1 * y2 + z1 * z2
    zeroVec                   = P 0 0 0

And, finally, we can also support vector spaces over $\mathbb{Z}/2\mathbb{Z}$:

data GF2Vec = GV { gvCoords :: [Bool] }

instance Vec GF2Vec where
    type Coord GF2Vec = Bool

    GV v1 |+| GV v2 = GV $ zipWithExtend (/=) v1 v2

    (|-|) = (|+|)

    s    |*| GV v = GV $ map (s &&) v

    GV v |/| False = error "Cannot divide by 0"
    v    |/| True  = v

    GV v1 |.| GV v2 = foldl (/=) False $ zipWith (&&) v1 v2

    vecLen (GV v) = or v

    zeroVec = GV []

Let's take this apart. Arithmetic over $\mathbb{Z}/2\mathbb{Z}$ is integer arithmetic modulo 2. If we use False to represent 0, and True to represent 1, then we have False + False = False, False + True = True, True + False = True, and True + True = False (because 1 + 1 = 2, which is 0 modulo 2). Note that this means that x + y produces the same result as x /= y, and we can use (/=) as the addition operator over $\mathbb{Z}/2\mathbb{Z}$. Since vector addition should be coordinate-wise addition, we obtain that GV v1 |+| GV v2 = GV $ zipWithExtend (/=) v1 v2. Since we have $1 = -1 \pmod 2$ and subtracting 1 is the same as adding –1, we conclude that adding two vectors over $\mathbb{Z}/2\mathbb{Z}$ is the same as subtracting them: (|-|) = (|+|). Boolean "and" is the same as multiplication over $\mathbb{Z}/2\mathbb{Z}$. Thus, we implement scalar multiplication using (&&). Division is a bit weird: We can't divide by 0, and dividing by 1 gives the original result back. That's what our implementation says. For the inner product, we need to pairwise multiply the coordinates of the two vectors and then sum the resulting products. Since (/=) is addition for Booleans and (&&) is multiplication, we get GV v1 |.| GV v2 = foldl (/=) False $ zipWith (&&) v1 v2 as the analog of the definition V v1 |.| V v2 = foldl (+) 0 $ zipWith (*) v1 v2 for the Vector type. We can't really represent vector lengths over $\mathbb{Z}/2\mathbb{Z}$. The only thing we can do is distinguish between the 0-vector and a non-0 vector. So we want vecLen v = False if v is the 0-vector, and vecLen v = True otherwise. In other words, we want that veclen v = True if there is at least one True coordinate in v: vecLen (GV v) = or v. The 0-vector is once again defined as GV [] because we treat absent coordinates as zeroes.