Reading and Writing Binary Data
Lots of data we process is stored in text format. For those types of data, the
String I/O facilities offered by System.IO and the Text I/O facilities
offered by Data.Text.IO or Data.Text.Lazy.IO are sufficient. However, not
all data are stored in text form. Examples include PDF files, image files,
compiled programs, database files, etc. Every programming language worth its
salt can also read such binary data files.
To work with binary data, we have the Data.ByteString module. It provides the
ByteString type to store, well, a string of bytes, or rather a sequence of
bytes. Under the hood, ByteStrings are arrays of bytes, so they are once again
very space-efficient.
The way we work with ByteStrings is fairly similar to working with Text. The
Data.ByteString module provides counterparts to all the major list processing
functions, such as map, filter, etc., but they operate on ByteStrings
instead of lists. We can also unpack a ByteString to obtain a list of its
elements, and we can pack a list of bytes into a ByteString.
In contrast to Data.Text, the I/O functions aren't provided by a separate
module but are defined right inside Data.ByteString.
ByteStrings come in a number of variations. Basic ByteStrings store bytes.
Haskell's data type to represent bytes, that is, unsigned 8-bit integers, is
Word8. Thus, the elements of a ByteString are Word8s. In particular,
unpack produces a list of Word8s, and pack expects a list of Word8s.
When working with ASCII text, which encodes every character in a single byte,
then using ByteStrings to work with such text can be more efficient than using
Text, which encodes characters in UTF-8 under the hood. To facilitate this use
of ByteStrings, the Data.ByteString.Char8 module provides versions of the
ByteString type and its accompanying functions that interpret the bytes in the
ByteString as Characters. In particular, unpack produces a String and
pack expects a String as argument, just as when using Text.
Finally, we have lazy versions of ByteStrings, both for the Word8 and the
Char version. They are provided by the Data.ByteString.Lazy and
Data.ByteString.Lazy.Char8 modules. Under the hood, they are implemented as
lists of small arrays, analogously to the implementation of lazy Text.
I don't think I want to say more about ByteStrings and working with binary
data here. You won't need these facilities in this course, but I wanted to
mention that they exist for when you do need them.