Skip to content

Data

Let's take C as a starting point (because other languages like Java or Python obscure the structure of the data by introducing object-oriented machinery on top of it). There are three ways to introduce new data types in C:

Structs allow us to create data types that represent objects whose description is composed of smaller parts. For example, a student record may include a banner number, student name, contact information, transcript, and a few other pieces of information. The struct representing a student record bundles all these pieces of information into a single entity, a student record.

struct student_record {
    int banner_number;
    char *student_name;
    char *address;
    transcript transcript;
};

The important thing to note is that we need all of the building blocks,

  • A banner number,
  • A name,
  • An address, and
  • A transcript

to build a student record.

Unions allow us to represent objects that may store one of a number of different types of information:

union address_phone_number_or_email {
    char *email;
    char *address;
    int phone_number;
};

A value of type address_phone_number_or_email stores exactly one piece of information, an email address, an address or a phone number.

Enums are types with a finite set of explicitly specified values:

enum color {
    RED,
    GREEN,
    BLUE
};

A value of type color can have one of three values: RED, GREEN or BLUE.

In C, enums are little more than syntactic sugar around integers and constant definitions. This is because, under the hood, enums are implemented as integers, and C allows us to do arbitrary arithmetic with enum values. Thus, the above definition of color is semantically equivalent to

typedef int color;

#define RED   0
#define GREEN 1
#define BLUE  2

In Haskell, enums are different types than Int or Integer, and we cannot do arithmetic with them at all. In fact, out of the box, we cannot do anything with them other than pattern matching.

Haskell does not distinguish between structs, unions, and enums, and lumps them all together under the term "algebraic data type".1 We use the data keyword to define such types.


  1. Algebraic data types are called this because a struct type is isomorphic to a Cartesian product of the types of its individual fields. A union is isomorphic to the disjoint union of the types of its variants. This disjoint union is nothing but the coproduct (sum) of the types of the variants in the category Set. Enumerations are nothing but unions of singleton types. Algebraic data types in Haskell are arbitrary combinations of unions and structs, so it is essentially like we're building expressions in an algebra whose objects are types and whose operations are product and sum. That's the "algebraic" part.