Case
You may have seen C-style switch
statements before. They are more restrictive
than if then else
statements because they don't allow us to check arbitrary
conditions for each branch. Instead, they compare a value against a list of
values associated with the branches of the switch
statement and choose the
first branch that matches. Haskell's case
expression—again, it's an
expression, not a statement—is more powerful but follows the same idea. We can
use it to implement our sign
function as follows:
sign :: Int -> Int
sign x = case x of
0 -> 0
y -> if y > 0 then 1 else -1
>>> :{
| sign x = case x of
| 0 -> 0
| y -> if y > 0 then 1 else -1
| :}
>>> sign 5
1
>>> sign (-10)
-1
>>> sign 0
0
I'm not sure this is exactly an improvement over the previous version, but
that's because we haven't discovered the full power of case
expressions yet
(and because the sign
function isn't the best example to illustrate the type
of pattern matching supported by the case
expression).
The argument between case
and of
can be an arbitrary expression. Each
... -> ...
line is one branch of the case
expression. To the left of the
->
, there's a pattern. If the value of the expression after case
matches
this pattern, then the value of the case
expression is the expression to the right
of ->
. If not, we inspect the next branch of the case
expression.
So what's a pattern? I'll be able to fully explain this once we talk about defining data types in detail. For now, the following definition suffices:
-
A pattern can be a constant of the same type as the expression after
case
. (Note that a constant doesn't have to be just a number.2 + 3 * 4
is also a constant, because it can be computed at compile time by the compiler.) In this case, the expression aftercase
matches the pattern if it has exactly this value. The first branch in our definition ofsign
illustrates this. The pattern is0
for that branch. Since the expression after the->
of the first branch is also0
, this says that ifx
is 0, then the whole case expression, and thussign x
, has the value 0. -
A pattern can be a variable. This pattern always matches. Moreover, as part of matching the expression after
case
to a variable pattern, the value of the expression is assigned to the variable. This variable can be used in the expression to the right of the->
of this branch. The second branch of our definition ofsign
illustrates this type of pattern. The pattern is the variabley
. Whatever valuex
has, it matches this pattern, and the value ofx
gets assigned toy
. The value of the wholecase
expression is thus the value of the expressionif y > 0 then 1 else -1
. Note that this expression refers to the variabley
introduced by the pattern match. Ify > 0
(that is, ifx > 0
becausey
has the same value asx
), the value ofsign x
is thus 1. Otherwise, the value ofsign x
is –1.
Pattern matching is what distinguishes case
expressions from C-style switch
statements. The latter allow only the first type of pattern: constant values.
Matching against variables, which always succeeds, is hardly an earth-shattering
feature. However, as we will learn later, there are much more complex patterns
that can be used in case
expressions, and that does make case
expressions
a more powerful tool to write elegant code.
Apart from the fact that the above implementation of sign
isn't particularly
pretty (but no uglier than our previous implementation using if then else
),
it's also not particularly idiomatic Haskell. Introducing a variable y
that
has exactly the same value as x
certainly doesn't help the clarity of the
code. We have to remember that x = y
to "see" that the condition y > 0
really checks whether x > 0
. We can fix this by writing
sign :: Int -> Int
sign x = case x of
0 -> 0
y -> if x > 0 then 1 else -1
Since the case
statement is inside the sign
function, it has access to all
arguments of sign
. Thus, the if then else
expression can refer directly to
the value of x
, thereby making it much clearer that the condition we're
testing here is whether x > 0
.
Now we have another problem: The second branch introduces the variable y
as
its pattern, as a means to match any value, but then the variable y
is not
used anywhere. Haskell has a mechanism to say "there should be a value here,
but I don't care what it is". It's called the wildcard and is written as
_
. So the final version of our sign
function in this section becomes:
sign :: Int -> Int
sign x = case x of
0 -> 0
_ -> if x > 0 then 1 else -1
>>> :{
| sign x = case x of
| 0 -> 0
| _ -> if x > 0 then 1 else -1
| :}
>>> sign 5
1
>>> sign (-10)
-1
>>> sign 0
0
This ensures that the last branch matches any value without assigning this value
to a variable we don't need. The wildcard has much more interesting uses than
this. As we will see when discussing lists, the pattern (_:_)
matches any
non-empty list; we don't care what its contents are. The pattern (_:_:_)
would
match any list containing at least two elements. The discussion at that point
will also make it clear why you need two wildcards to ensure that the list has
at least one element, and three wildcards to ensure that the list has at
least two elements.
Exercise
Search for the isDigit
function on Hoogle to learn what it does. Implement
your own version of isDigit
, called myIsDigit
, that does exactly the same
as isDigit
.
Some of you may think, if myIsDigit
should do exactly the same as
isDigit
, then why can't we just define myIsDigit = isDigit
. If that was
what you were thinking, then congratulations, you're on your way to becoming
a great software engineer. Code reuse is always better than reinventing the
wheel. As an exercise though, I do want you to reinvent the wheel here, so
do implement myIsDigit
from scratch, using a case
expression.
Solution
myIsDigit :: Char -> Bool
myIsDigit ch = case ch of
'0' -> True
'1' -> True
'2' -> True
'3' -> True
'4' -> True
'5' -> True
'6' -> True
'7' -> True
'8' -> True
'9' -> True
_ -> False
Okay, this was probably the worst use of a case
expression in the
history of programming. Clearly, a simple range check would be
significantly more succinct here:
myIsDigit :: Char -> Bool
myIsDigit ch = ch >= '0' && ch <= '9'
Exercise
If you read the Haskell documentation on
www.haskell.org—don't right
now—then you will see that if then else
is once again merely syntactic
sugar for an equivalent case
expression. In other words, the compiler
first translates any if then else
expression into an equivalent case
expression and then translates this case
expression into executable
code.1 Describe a general strategy for translating any if then else
expression into an equivalent case
expression.
Solution
Let's start with the behaviour of the expression:
if <cond> then <this> else <other>
It evaluates <cond>
, which must be an expression of type Bool
. If
<cond>
evaluates to True
, then the whole expression should have the
value <this>
. Otherwise, it should have the value <other>
. This can
be expressed using the following case
expression:
case <cond> of
True -> <this>
_ -> <other>
Indeed, this expression evaluates <cond>
. If <cond>
evaluates to
True
, then it matches the first branch of the expression, so the whole
expression has the value <this>
. Otherwise, the first branch doesn't
match, and we try the second branch. Since this branch uses the
wildcard, it always matches, and the whole expression has the value
<other>
then. This is exactly the same behaviour as that of the
if then else
expression.
-
In fact, a huge chunk of Haskell's syntax is just syntactic sugar. The compiler translates any piece of Haskell code into a minimal version of Haskell, called "Core Haskell" or simply "Core", and then the Core code gets optimized and translated into machine code. Core contains little more than
let
bindings (which we will discuss later),case
expressions to implement case distinctions, function definitions, and function calls. This goes to show that pure functions are indeed enough to express arbitrary computations (which we know already, thanks to \(\lambda\)-calculus).The advantage of using Core as an intermediate code representation is that it makes it easy to add new syntax constructs to the Haskell compiler. All one needs to figure out is how to translate the new syntax into Core, and the Core-to-machine code translation is completely unaffected by the new syntax. If GHC didn't use core as an intermediate representation, then every new syntax construct would have to be accompanied by a strategy to translate it directly into machine code. This would be more difficult and tedious and would lead to significant code duplication in the implementation of the compiler itself. ↩