Skip to content

Records (Structs)

The final twist in using data to define new data types comes in the form of record syntax. Consider our definition of a point once more:

data Point2D = P Double Double

This definition works in a pinch, but it has two shortcomings:

  • When using P to construct a point, we better remember that the first argument is the \(x\)-coordinate, and the second argument is the \(y\)-coordinate. For points, putting \(x\)-coordinate before \(y\)-coordinate is pretty standard, but if you have a more complex type, such as

    data StudentRecord = SR Int String String Transcript
    

    it's rather silly to have to remember that the Int comes first and represents the banner number, the second string is the student's name, the third string is the address, and the last argument, of type Transcript, is the student's transcript. It would be nice to be able to provide these arguments in any order, and the compiler rearranges the arguments for us so they are in the right order.

  • Once we have a Point2D, there is no way to access its coordinates except using pattern matching. Thus, to gain access to the coordinates of a point, we had to define accessor functions that use pattern matching:

    xCoord, yCoord :: Point2D -> Double
    xCoord (P x _) = x
    yCoord (P _ y) = y
    

    Again, this is tedious boilerplate code that we'd prefer not to have to write.

Record syntax solves both problems for us. It allows us to define our Point2D type as

data Point2D = P { x, y :: Double }

and our StudentRecord type as

data StudentRecord = SR
    { banner     :: Int
    , name       :: String
    , address    :: String
    , transcript :: Transcript
    }

This gives us four things:

  • First, we can use these data constructors just as if we had defined them as before. P takes two arguments. The first argument is the \(x\)-coordinate. The second argument is the \(y\)-coordinate. So we can still construct points by providing positional arguments:

    GHCi
    >>> data Point2D = P { x, y :: Double }
    >>> p = P 3 4
    
  • However, we can also use the field names of the Point2D structure we're constructing to make sure we assign each value to the right field. The following assignments both construct the same point:

    GHCi
    >>> p = P { x = 3, y = 4 }
    >>> q = P { y = 4, x = 3 }
    

    In both cases, we set the \(x\)-coordinate to 3, and the \(y\)-coordinate to 4. Order no longer matters. The field names matter.

  • The definition of Point2D using record syntax also introduces x and y as accessor functions:

    GHCi
    >>> :t x
    x :: Point2D -> Double
    >>> :t y
    y :: Point2D -> Double
    

    Both take a Point2D as argument and return a Double, because both fields have type Double.

    GHCi
    >>> x p
    3.0
    >>> y q
    4.0
    

    For the StudentRecord type, the accessor functions have different types, corresponding to the fields they access:

    GHCi
    >>> :{
      | data StudentRecord = SR
      |     { banner     :: Int
      |     , name       :: String
      |     , address    :: String
      |     , transcript :: Transcript
      |     }
      |
      | data Transcript = T
      | :}
    >>> :t banner
    banner :: StudentRecord -> Int
    >>> :t name
    name :: StudentRecord -> String
    >>> :t address
    address :: StudentRecord -> String
    >>> :t transcript
    transcript :: StudentRecord -> Transcript
    

    The fact that every field of a record becomes an accessor function leads to an important wrinkle that annoys most Haskell programmers, but it's just something we have to live with: We cannot have two types with fields of the same name. This, for example wouldn't work:1

    data Point2D = P2 { x, y    :: Double }
    data Point3D = P3 { x, y, z :: Double }
    

    If you think about it, this makes sense. Both x and y would have to be accessor functions that can take either a Point2D or a Point3D as argument, but that's not allowed in Haskell's type system.2

    Haskell programmers work around this by prefixing such conflicting accessors with the type name or some mnemonic related to the type name:

    data Point2D = P2 { p2X, p2Y      :: Double }
    data Point3D = P3 { p3X, p3Y, p3Z :: Double }
    
  • The final useful tool we gain by using record syntax to define our types is the ability to "update" records by changing only some of their fields. Now, updating a record in place is a destructive operation that replaces the old record contents with the new ones. That's not allowed in a functional language. Hence the quotes. "Updating" a record in Haskell still means that we construct a brand-new record, just as if we had used a data constructor to do so. Record "update" syntax merely allows us to say succinctly that the new record should look exactly like the old record, except that the specified set of fields should have different values. For example,

    GHCi
    >>> data Point2d = P { x, y :: Double }
    >>> p = P 3 4
    >>> q = p { y = 5 }
    >>> x q
    3.0
    >>> y q
    5.0
    

    The expression p { y = 5 } constructs a new Point2D whose fields have the same value as the corresponding fields of p, except y; y has the value 5. Querying the coordinates of q using our accessor functions x and y confirms this.

    For points, it would not have been the end of the world to have to explicitly specify both coordinates, but for larger records, it can be extremely handy to specify only the changed fields. For example,

    GHCi
    >>> :{
      | john = SR
      |     { banner     = 1
      |     , name       = "John Doe"
      |     , address    = "1 John's Lane, Johnstown"
      |     , transcript = T
      |     }
      |
      | :}
    >>> jane = john { banner = 2, name = "Jane Doe" }
    >>> name john
    "John Doe"
    >>> name jane
    "Jane Doe"
    >>> address john
    "1 John's Lane, Johnstown"
    >>> address jane
    "1 John's Lane, Johnstown"
    

As a final note, we can use record syntax to define parameterized types, even though the examples above don't do this. For example, when talking about monads later, we will encounter this type3

data Reader r a = Reader { runReader :: r -> a }

or this one

data State s a = State { runState :: s -> (a, s) }

Both types are parameterized by two types, r and a or s and a, and they use record syntax to name the value they contain, a function of type r -> a or a function of type s -> (a, s).


  1. PureScript is a programming language that is almost the same as Haskell with some minor differences. It allows Haskell programmers to survive in the web programming world: You write your code in PureScript, and the PureScript compiler translates your beautiful functional code into Javascript, which can then be loaded into your webpage to do fancy stuff there.

    One less minor difference between PureScript and Haskell is the manner in which PureScript provides field access for record types. The Point2D and Point3D types defined here would be perfectly fine in PureScript. If p has type Point2D and q has type Point3D, then we would access their coordinates using dot-syntax familiar from object-oriented languages: p.x, p.y, q.x, q.y, q.z. On the surface, this seems like a good idea because it overcomes one of the more major nuisances in Haskell. In practice, I have found that this field access syntax does not mesh very well with the rest of PureScript, where everything is a function, just as in Haskell. So I personally consider this design choice a mistake because it creates more problems than it solves. 

  2. In an upcoming chapter, we define type classes as a means to introduce polymorphic functions that work only for some types. We can use these to allow functions x and y that can take arguments of different types:

    data Point2D = P2 { p2X, p2Y      :: Double }
    data Point3D = P3 { p3X, p3Y, p3Z :: Double }
    
    class HasX t where
        x :: t -> Double
    
    class HasY t where
        y :: t -> Double
    
    class HasZ t where
        z :: t -> Double
    
    instance HasX Point2D where
        x = p2X
    
    instance HasY Point2D where
        y = p2Y
    
    instance HasX Point3D where
        x = p3X
    
    instance HasY Point3D where
        y = p3Y
    
    instance HasZ Point3D where
        z = p3Z
    

    With these definitions, this works now

    >>> p = P2 1 2
    >>> q = P3 3 4 5
    >>> x p
    1.0
    >>> x q
    3.0
    

    However, this is a lot of boilerplate code, and we'd write it only if the benefits of having polymorphic functions such as x, y, and z outweigh the downside of having to write all this code. Moreover, x, y, and z are only accessors for the coordinates of points. The definitions of our point types still need to use un-named fields or give the fields in each type different names, as I did here. In particular, we still cannot create points using p = P2 { x = 3, y = 4 } and q = P3 { x = 3, y = 4, z = 5 } or use x, y, and z together with record update syntax. The type class trick works only to unify the accessor functions corresponding to field names, not the field names themselves.

    It is conceivable that some future version of the Haskell compiler auto-generates these class definitions for us to eliminate the tedium of writing this boilerplate code while still allowing us to define different types with fields with the same name. Currently, the compiler doesn't do this. It's also possible that this feature will never be introduced. The reason is that polymorphic functions are often more costly to call. The mechanism to implement them under the hood is very similar to implementing overloaded methods in object-oriented languages: Every object of a class with overloaded functions has a hidden field, its VTABLE. This table stores information about which implementation of an overloaded method should be used for this particular object. Thus, calling an overloaded method, or calling a function in Haskell that is defined as part of a type class, requires a VTABLE lookup to find which version of the method or function should be called. A non-overloaded function does not require this and thus can be called faster. 

  3. Actually, these two types are defined as

    newtype Reader r a = Reader { runReader :: r -> a }
    

    and

    newtype State s a = State { runState :: s -> (a, s) }
    

    We'll talk about newtype next. Semantically, it is irrelevant whether we use newtype or data to define these types, but a type defined using newtype can be manipulated more efficiently.