Case

You may have seen C-style switch statements before. They are more restrictive than if then else statements because they don't allow us to check arbitrary conditions for each branch. Instead, they compare a value against a list of values associated with the branches of the switch statement and choose the first branch that matches. Haskell's case expression—again, it's an expression, not a statement—is more powerful but follows the same idea. We can use it to implement our sign function as follows:

sign :: Int -> Int
sign x = case x of
    0 -> 0
    y -> if y > 0 then 1 else -1

GHCi

>>> :{
  | sign x = case x of
  |     0 -> 0
  |     y -> if y > 0 then 1 else -1
  | :}
>>> sign 5
1
>>> sign (-10)
-1
>>> sign 0
0

I'm not sure this is exactly an improvement over the previous version, but that's because we haven't discovered the full power of case expressions yet (and because the sign function isn't the best example to illustrate the type of pattern matching supported by the case expression).

The argument between case and of can be an arbitrary expression. Each ... -> ... line is one branch of the case expression. To the left of the ->, there's a pattern. If the value of the expression after case matches this pattern, then the value of the case expression is the expression to the right of ->. If not, we inspect the next branch of the case expression.

So what's a pattern? I'll be able to fully explain this once we talk about defining data types in detail. For now, the following definition suffices:

A pattern can be a constant of the same type as the expression after case. (Note that a constant doesn't have to be just a number. 2 + 3 * 4 is also a constant, because it can be computed at compile time by the compiler.) In this case, the expression after case matches the pattern if it has exactly this value. The first branch in our definition of sign illustrates this. The pattern is 0 for that branch. Since the expression after the -> of the first branch is also 0, this says that if x is 0, then the whole case expression, and thus sign x, has the value 0.
A pattern can be a variable. This pattern always matches. Moreover, as part of matching the expression after case to a variable pattern, the value of the expression is assigned to the variable. This variable can be used in the expression to the right of the -> of this branch. The second branch of our definition of sign illustrates this type of pattern. The pattern is the variable y. Whatever value x has, it matches this pattern, and the value of x gets assigned to y. The value of the whole case expression is thus the value of the expression if y > 0 then 1 else -1. Note that this expression refers to the variable y introduced by the pattern match. If y > 0 (that is, if x > 0 because y has the same value as x), the value of sign x is thus 1. Otherwise, the value of sign x is –1.

Pattern matching is what distinguishes case expressions from C-style switch statements. The latter allow only the first type of pattern: constant values. Matching against variables, which always succeeds, is hardly an earth-shattering feature. However, as we will learn later, there are much more complex patterns that can be used in case expressions, and that does make case expressions a more powerful tool to write elegant code.

Apart from the fact that the above implementation of sign isn't particularly pretty (but no uglier than our previous implementation using if then else), it's also not particularly idiomatic Haskell. Introducing a variable y that has exactly the same value as x certainly doesn't help the clarity of the code. We have to remember that x = y to "see" that the condition y > 0 really checks whether x > 0. We can fix this by writing

sign :: Int -> Int
sign x = case x of
    0 -> 0
    y -> if x > 0 then 1 else -1

Since the case statement is inside the sign function, it has access to all arguments of sign. Thus, the if then else expression can refer directly to the value of x, thereby making it much clearer that the condition we're testing here is whether x > 0.

Now we have another problem: The second branch introduces the variable y as its pattern, as a means to match any value, but then the variable y is not used anywhere. Haskell has a mechanism to say "there should be a value here, but I don't care what it is". It's called the wildcard and is written as _. So the final version of our sign function in this section becomes:

sign :: Int -> Int
sign x = case x of
    0 -> 0
    _ -> if x > 0 then 1 else -1

GHCi

>>> :{
  | sign x = case x of
  |     0 -> 0
  |     _ -> if x > 0 then 1 else -1
  | :}
>>> sign 5
1
>>> sign (-10)
-1
>>> sign 0
0

This ensures that the last branch matches any value without assigning this value to a variable we don't need. The wildcard has much more interesting uses than this. As we will see when discussing lists, the pattern (_:_) matches any non-empty list; we don't care what its contents are. The pattern (_:_:_) would match any list containing at least two elements. The discussion at that point will also make it clear why you need two wildcards to ensure that the list has at least one element, and three wildcards to ensure that the list has at least two elements.

Exercise

Search for the isDigit function on Hoogle to learn what it does. Implement your own version of isDigit, called myIsDigit, that does exactly the same as isDigit.

Some of you may think, if myIsDigit should do exactly the same as isDigit, then why can't we just define myIsDigit = isDigit. If that was what you were thinking, then congratulations, you're on your way to becoming a great software engineer. Code reuse is always better than reinventing the wheel. As an exercise though, I do want you to reinvent the wheel here, so do implement myIsDigit from scratch, using a case expression.

Solution

myIsDigit :: Char -> Bool
myIsDigit ch = case ch of
    '0' -> True
    '1' -> True
    '2' -> True
    '3' -> True
    '4' -> True
    '5' -> True
    '6' -> True
    '7' -> True
    '8' -> True
    '9' -> True
    _   -> False

Okay, this was probably the worst use of a case expression in the history of programming. Clearly, a simple range check would be significantly more succinct here:

myIsDigit :: Char -> Bool
myIsDigit ch = ch >= '0' && ch <= '9'

Exercise

If you read the Haskell documentation on www.haskell.org—don't right now—then you will see that if then else is once again merely syntactic sugar for an equivalent case expression. In other words, the compiler first translates any if then else expression into an equivalent case expression and then translates this case expression into executable code.¹ Describe a general strategy for translating any if then else expression into an equivalent case expression.

Solution

Let's start with the behaviour of the expression:

if <cond> then <this> else <other>

It evaluates <cond>, which must be an expression of type Bool. If <cond> evaluates to True, then the whole expression should have the value <this>. Otherwise, it should have the value <other>. This can be expressed using the following case expression:

case <cond> of
    True -> <this>
    _    -> <other>

Indeed, this expression evaluates <cond>. If <cond> evaluates to True, then it matches the first branch of the expression, so the whole expression has the value <this>. Otherwise, the first branch doesn't match, and we try the second branch. Since this branch uses the wildcard, it always matches, and the whole expression has the value <other> then. This is exactly the same behaviour as that of the if then else expression.

In fact, a huge chunk of Haskell's syntax is just syntactic sugar. The compiler translates any piece of Haskell code into a minimal version of Haskell, called "Core Haskell" or simply "Core", and then the Core code gets optimized and translated into machine code. Core contains little more than let bindings (which we will discuss later), case expressions to implement case distinctions, function definitions, and function calls. This goes to show that pure functions are indeed enough to express arbitrary computations (which we know already, thanks to \(\lambda\)-calculus).

The advantage of using Core as an intermediate code representation is that it makes it easy to add new syntax constructs to the Haskell compiler. All one needs to figure out is how to translate the new syntax into Core, and the Core-to-machine code translation is completely unaffected by the new syntax. If GHC didn't use core as an intermediate representation, then every new syntax construct would have to be accompanied by a strategy to translate it directly into machine code. This would be more difficult and tedious and would lead to significant code duplication in the implementation of the compiler itself. ↩