Reading and Writing Binary Data
Lots of data we process is stored in text format. For those types of data, the
String
I/O facilities offered by System.IO
and the Text
I/O facilities
offered by Data.Text.IO
or Data.Text.Lazy.IO
are sufficient. However, not
all data are stored in text form. Examples include PDF files, image files,
compiled programs, database files, etc. Every programming language worth its
salt can also read such binary data files.
To work with binary data, we have the Data.ByteString
module. It provides the
ByteString
type to store, well, a string of bytes, or rather a sequence of
bytes. Under the hood, ByteString
s are arrays of bytes, so they are once again
very space-efficient.
The way we work with ByteString
s is fairly similar to working with Text
. The
Data.ByteString
module provides counterparts to all the major list processing
functions, such as map
, filter
, etc., but they operate on ByteString
s
instead of lists. We can also unpack
a ByteString
to obtain a list of its
elements, and we can pack
a list of bytes into a ByteString
.
In contrast to Data.Text
, the I/O functions aren't provided by a separate
module but are defined right inside Data.ByteString
.
ByteString
s come in a number of variations. Basic ByteString
s store bytes.
Haskell's data type to represent bytes, that is, unsigned 8-bit integers, is
Word8
. Thus, the elements of a ByteString
are Word8
s. In particular,
unpack
produces a list of Word8
s, and pack
expects a list of Word8
s.
When working with ASCII text, which encodes every character in a single byte,
then using ByteString
s to work with such text can be more efficient than using
Text
, which encodes characters in UTF-8 under the hood. To facilitate this use
of ByteString
s, the Data.ByteString.Char8
module provides versions of the
ByteString
type and its accompanying functions that interpret the bytes in the
ByteString
as Char
acters. In particular, unpack
produces a String
and
pack
expects a String
as argument, just as when using Text
.
Finally, we have lazy versions of ByteString
s, both for the Word8
and the
Char
version. They are provided by the Data.ByteString.Lazy
and
Data.ByteString.Lazy.Char8
modules. Under the hood, they are implemented as
lists of small arrays, analogously to the implementation of lazy Text
.
I don't think I want to say more about ByteString
s and working with binary
data here. You won't need these facilities in this course, but I wanted to
mention that they exist for when you do need them.