Characters and Strings
The character type in Haskell is Char
. Whereas in C and C++, characters are
8-bit characters, representing the extended ASCII character set, Haskell's
characters are unicode characters, as in most modern programming languages. We
again have our standard comparison operators for characters:
>>> 'a' == 'b'
False
>>> 'a' < 'b'
True
Beyond this, there isn't all that much we can do with characters out of the box
in Haskell. If we want to use more interesting functions that allow us to
inspect and manipulate character types, we need to import the Data.Char
module. Then we can for example test whether a given character is a digit:
>>> import Data.Char
>>> isDigit 'a'
False
>>> isDigit '5'
True
>>> isDigit '+'
False
Or we can convert a character to uppercase or lowercase:
>>> toUpper 'a'
'A'
>>> toLower 'Z'
'z'
>>> toUpper '5'
'5'
Data.Char
contains a wide range of character functions. Look this module up
on Hoogle and see which functions it provides.
In Haskell, strings are just lists of characters. The type of a list that
contains elements of type a
is [a]
. So, the String
type in Haskell is
literally defined as a type alias for [Char]
:
type String = [Char]
>>> :info String
type String = [Char] -- Defined in ‘GHC.Base’
We'll learn about the type
keyword to define type aliases
later.
Given that strings are lists, we can manipulate them using all the standard
list functions we have at our disposal. Search for Data.List
on Hoogle to
get an idea of the types of functions there are to manipulate lists. You can
of course also define your own. Here, I mention a few very basic ones. Other
commonly used list functions will be introduced when we need them.
First, there is the function that tests whether a list is empty:
null :: [a] -> Bool
Since strings are special types of lists, we can use it to test whether a string is empty or not:
>>> null ""
True
>>> null "Hello"
False
The length
function tells us the length of a list:
length :: [a] -> Int
>>> length ""
0
>>> length "Hello"
5
The functions head
and tail
are undefined if the input list is empty. For
a non-empty list, head
returns the first element of the list, that is, the
first character when applied to a string. tail
returns the whole list with
the first element removed:
>>> head "Hello"
'H'
>>> tail "Hello"
"ello"
>>> head ""
*** Exception: Prelude.head: empty list
>>> tail ""
*** Exception: Prelude.tail: empty list
Finally, we have the (:)
operator (sometimes called "cons" for "construct"),
which takes as arguments an element x
of type t
and a list xs
of type
[t]
and constructs a new list x:xs
of type [t]
whose first element (head)
is x
and whose tail is xs
:
>>> 1 : [2,3,4,5]
[1,2,3,4,5]
>>> 'H' : "ello"
"Hello"
>>> 'H' : 'e' : 'l' : 'l' : 'o' : ""
"Hello"
Note the subtle distinction between single quotes and double quotes here. Single
quotes are used to delimit individual characters. Double quotes are used to
delimit strings. So 'a'
is a single character, of type Char
. "a"
is a
string of length one, of type String
, that is, [Char]
. Char
and [Char]
are two different types. That's the same as in Java, where we use single quotes
to delimit characters, and double quotes to delimit strings. In Python, we can
use single or double quotes interchangeably to delimit strings, because Python
does not have a character type. Individual characters in Python are represented
as one-character strings, a questionable choice in my opinion.
The cons operator is the most fundamental operator to manipulate lists because,
as we will see soon, we can also use it in
pattern matching expressions to
decompose a list into its head and tail. All the other list functions, such as
null
, length
, head
, tail
, and all the functions in Data.List
are
implemented using cons and pattern matching. For example:
length :: [a] -> Int
length [] = 0
length (_:xs) = 1 + length xs
It's okay if you don't understand this function definition yet. You will soon.
This definition says that the length of the empty list []
is 0, and the length
of any non-empty list is one more than the length of its tail, a recursive
definition to compute the length of any list.
Finally, there is a list concatenation operator, (++)
, which we can also use
to concatenate strings:
>>> "Hello" ++ ", world!"
"Hello, world!"
That's it for characters and strings for now. As you learn about more advanced list functions that you can use to transform, filter or partition lists, and many more, remember that all of them can also be applied to strings because strings are lists of characters.