Skip to content

Our Own Type Classes

Just as we may develop libraries in Rust that provide their own traits or Java libraries that provide new interfaces, we can of course define our own type classes in Haskell. After all, the standard library provides a whole long list of type classes, and the standard library is just plain old Haskell code.

You've already seen a long list of standard type classes in this chapter. As an exercise, let's develop our own type class here.

Let's say we want to implement some code that works with vectors in \(d\)-dimensional space. It could be some graphics software or some machine learning code, such as a clustering algorithm or dimensionality reduction algorithm. Such code does a lot of linear algebra. We'll focus on a few simple operations here:

  • Adding two vectors, coordinate by coordinate.
  • Subtracting one vector from another, coordinate by coordinate.
  • Scaling a vector, that is, multiplying each of its coordinates with some coefficient.
  • Dividing each coordinate of a vector by some coefficient.
  • Taking the inner product of two vectors, that is, multiplying the first coordinates of both vectors with each other, multiplying the second coordinates with each other, and so on, and finally summing all these products.
  • Computing the length of a vector (not the number of coordinates, but the length in \(d\)-dimensional space).
  • Choosing the 0-vector, the vector where every coordinate is 0.

These primitives go a long way towards implementing simple linear algebra code. For example, these operations are more than enough to implement \(k\)-means clustering, which we will do as a project a little later in this book, and we will use our fancy new type class there.

The challenge is that some GIS software that needs these operations may work with a type

data Location = Loc { lat, long :: Double }

A 3-d graphics package may work with points in 3-d space:

data Point = P { x, y z :: Double }

A clustering algorithm in machine learning should work for vectors in arbitrary dimensions. We could implement such a type as

data Vector = V { coords :: [Double] }

Since each vector stores a list of coordinates, this type can be used to represent 3-d vectors or 200-d vectors. The only caveat is that the type does not enforce that all vectors in our program have the same number of coordinates. There is no good way to do this for such a variable-dimensionality vector type.

If our linear algebra package should work for all of these data types, then it clearly must be polymorphic. All its operations should be implemented in terms of the primitives listed above. These primitives should be provided by a type class Vec, and it is the responsibility of the user of our code to ensure that the vector type they want to use with our package is an instance of Vec.

All of this is fairly easy. Here's the Vec type class:

class Vec v where
    -- Addition of two vectors: u |+| v
    (|+|) :: v -> v -> v

    -- Subtraction of two vectors: u |-| v
    (|-|) :: v -> v -> v

    -- Multiplication of a vector with a scalar: s |*| v
    (|*|) :: Double -> v -> v

    -- Division of a vector by a scalar: v |/| s
    (|/|) :: v -> Double -> v

    -- Inner product of two vectors: u |.| v
    (|.|) :: v -> v -> Double

    -- Vector length
    vecLen :: v -> Double
    vecLen v = sqrt (v |.| v)

    -- The vector with all 0 coordinates
    zeroVec :: v

This type class uses the ability to define new operators. Remember, operators are simply functions whose names are made up of special characters. They don't already have to exist in the language or standard library. The reason why I surrounded the operators by vertical bars here is that normal addition, subtraction, etc. operators are already part of the Num type class, and we can't make vectors instances of Num because they're not numbers.

Given this type class, we can make all of our types above instances of Vec:

instance Vec Location where
    Loc lat1 long1 |+| Loc lat2 long2 = Loc (lat1 + lat2) (long1 + long2)
    Loc lat1 long1 |-| Loc lat2 long2 = Loc (lat1 - lat2) (long1 - long2)
    s              |*| Loc lat  long  = Loc (s    * lat ) (s     * long )
    Loc lat  long  |/| s              = Loc (lat  / s   ) (long  / s    )
    Loc lat1 long1 |.| Loc lat2 long2 = lat1 * lat2 + long1 * long2
    zeroVec                           = Loc 0 0

instance Vec Point where
    P x1 y1 z1 |+| P x2 y2 z2 = P (x1 + x2) (y1 + y2) (z1 + z2)
    P x1 y1 z1 |-| P x2 y2 z2 = P (x1 - x2) (y1 - y2) (z1 - z2)
    s          |*| P x  y  z  = P (s  * x ) (s  * y ) (s  * z )
    P x  y  z  |/| s          = P (x  / s ) (y  / s ) (z  / s )
    P x1 y1 z1 |.| P x2 y2 z2 = x1 * x2 + y1 * y2 + z1 * z2
    zeroVec                   = P 0 0 0

instance Vec Vector where
    V v1 |+| V v2 = V $ zipWithExtend (+) v1 v2
    V v1 |-| V v2 = V $ zipWithExtend (-) v1 v2
    s    |*| V v  = V $ map (s *) v
    V v  |/| V v  = V $ map (/ s) v
    V v1 |.| V v2 = V $ sum $ zipWith (*) v1 v2
    zeroVec       = V []

zipWithExtend :: (a -> a -> a) -> [a] -> [a] -> [a]
zipWithExtend f = go
  where
    go    []     ys  = ys
    go    xs     []  = xs
    go (x:xs) (y:ys) = f x y : go xs ys

The implementations for Location and Point shouldn't need any explanation. The implementation for Vector is more interesting because a Vector can have an arbitrary number of dimensions. This makes it difficult to generate a zeroVec because we don't know how many zeroes to fill the vector with. We solve this problem by defining zeroVec = V [] and then treating absent coordinates as zeroes in all the vector operations.

For scalar multiplication and division, multiplying or dividing coordinates that are 0 by a scalar does not do anything. So all we have to do is multiply or divide the coordinates that are present. map (s *) v and map (/ s) v do just that.

Similarly, when taking the inner product of two vectors, all the coordinate products where one coordinate is 0 are themselves 0 and thus do not contribute anything to the inner product. Thus, it suffices to multiply all those coordinate pairs for which we have a coordinate in both vectors. If one vector is longer than the other, then the additional coordinates of the longer vector beyond the length of the shorter vector are simply ignored. That's exactly what zipWith does. We discussed it before. Thus, we can simply implement the inner product as V $ sum $ zipWith (*) v1 v2. We multiply coordinate pairs from both vectors and then sum up the products using sum.

For adding and subtracting two vectors, zipWith isn't quite the right tool. For example, adding the 0-vector to any vector v should just produce v. However, we defined zeroVec = V []. Thus, if we defined V v1 |+| V v2 = V $ zipWith (+) v1 v2, then we would obtain V v |+| zeroVec = V v |+| V [] = V [], which in general is not the same as V v. What we want is a version of zipWith that doesn't discard the extra elements in the longer list. Instead, it should simply add these additional elements to the output list. Since no such version of zipWith exists, we implemented our own above, called zipWithExtend. We can now define V v1 |+| V v2 = V $ zipWithExtend (+) v1 v2 and V v1 |-| V v2 = V $ zipWithExtend (-) v1 v2.