Skip to content

Reading and Writing Binary Data

Lots of data we process is stored in text format. For those types of data, the String I/O facilities offered by System.IO and the Text I/O facilities offered by Data.Text.IO or Data.Text.Lazy.IO are sufficient. However, not all data are stored in text form. Examples include PDF files, image files, compiled programs, database files, etc. Every programming language worth its salt can also read such binary data files.

To work with binary data, we have the Data.ByteString module. It provides the ByteString type to store, well, a string of bytes, or rather a sequence of bytes. Under the hood, ByteStrings are arrays of bytes, so they are once again very space-efficient.

The way we work with ByteStrings is fairly similar to working with Text. The Data.ByteString module provides counterparts to all the major list processing functions, such as map, filter, etc., but they operate on ByteStrings instead of lists. We can also unpack a ByteString to obtain a list of its elements, and we can pack a list of bytes into a ByteString.

In contrast to Data.Text, the I/O functions aren't provided by a separate module but are defined right inside Data.ByteString.

ByteStrings come in a number of variations. Basic ByteStrings store bytes. Haskell's data type to represent bytes, that is, unsigned 8-bit integers, is Word8. Thus, the elements of a ByteString are Word8s. In particular, unpack produces a list of Word8s, and pack expects a list of Word8s.

When working with ASCII text, which encodes every character in a single byte, then using ByteStrings to work with such text can be more efficient than using Text, which encodes characters in UTF-8 under the hood. To facilitate this use of ByteStrings, the Data.ByteString.Char8 module provides versions of the ByteString type and its accompanying functions that interpret the bytes in the ByteString as Characters. In particular, unpack produces a String and pack expects a String as argument, just as when using Text.

Finally, we have lazy versions of ByteStrings, both for the Word8 and the Char version. They are provided by the Data.ByteString.Lazy and Data.ByteString.Lazy.Char8 modules. Under the hood, they are implemented as lists of small arrays, analogously to the implementation of lazy Text.

I don't think I want to say more about ByteStrings and working with binary data here. You won't need these facilities in this course, but I wanted to mention that they exist for when you do need them.