Reading and Writing Files
putChar, putStr, putStrLn, getChar, getLine, and getContents have
counterparts whose names are prefixed with an h. The h-versions of these
functions read from a file handle (h for handle). In fact, if you import the
System.IO module, then you get access to the standard input/output streams
available to any program:
stdin, stdout, stderr :: Handle
putChar, putStr, and putStrLn are defined by calling hPutChar,
hPutStr, and hPutStrLn with stdout as their first arguments:
putChar = hPutChar stdout
putStr = hPutStr stdout
putStrLn = hputStrLn stdout
Similarly, getChar, getLine, and getContents read from stdin using
hGetChar, hGetLine, and hGetContents:
getChar = hGetChar stdin
getLine = hGetLine stdin
getContents = hGetContents stdin
The types of hPutChar, hPutStr, hPutStrLn, hGetChar, hGetLine, and
hGetContents are
hPutChar :: Handle -> Char -> IO ()
hPutStr :: Handle -> String -> IO ()
hPutStrLn :: Handle -> String -> IO ()
hGetChar :: Handle -> IO Char
hGetLine :: Handle -> IO String
hGetContents :: Handle -> IO String
A Handle is what we use in Haskell to hold on to an open file from which we
can read and to which we can write. stdin, stdout, and stderr are files
that are always available, without the need to open them. If we want to read
from a file on our file system, then we need to open this file using
openFile :: FilePath -> IOMode -> IO Handle
FilePath is the type used to represent file names. It's an alias for String.
type FilePath = String
IOMode indicates whether we want to open the file for reading, writing, both,
or appending.
data IOMode
= ReadMode -- File can only be read
| WriteMode -- File can only be written
| AppendMode -- File can only be written past its original content
| ReadWriteMode -- File can be read and written
In particular, opening a file in WriteMode erases its original content.
ReadMode, WriteMode, and ReadWriteMode then position the file cursor at
the beginning of the file. AppendMode does not alter the file when opening it
and positions the file cursor at the end of the file, to allow adding more
content at the end.
A file opened with openFile can be closed using
hClose :: Handle -> IO ()
For example, if we have the file
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
then we can open it using openFile, read its contents using hGetContents,
print them on screen using putStr, and finally close the file using hClose:
>>> import System.IO
>>> :{
| do
| h <- openFile "mantra.txt" ReadMode
| txt <- hGetContents h
| putStr txt
| hClose h
| :}
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
An important point we will explore more closely shortly: In other languages, this would work just fine:
>>> :{
| do
| h <- openFile "mantra.txt" ReadMode
| txt <- hGetContents h
| hClose h
| putStr txt
| :}
*** Exception: mantra.txt: hGetContents: illegal operation (delayed read on closed handle)
Apparently, GHCi doesn't like what we're doing. We were trying to read the
contents of the file into txt using hGetContents. Being good programmers who
clean up after themselves as soon as possible, we closed the handle h, and
then we printed the file contents using putStr.
The reason why this doesn't work so well is Haskell's lazy evaluation. In most
languages, if we read the entire file using the equivalent of hGetContents, we
get a string object that stores the contents of the whole file. In Haskell, we
also get a String, but a String is a list of Characters, and this list is
produced lazily, as we consume its characters. Most of the time, this is a good
thing because it avoids the dance necessary in other languages, where we read
large files in chunks and process these chunks one at a time to avoid using too
much memory. In Haskell, we read the file contents as we need them, and the
garbage collector disposes of data read earlier unless we hold on to it in a
variable.
Here, lazy evaluation bit us in our behind. We closed the file using hClose
before we retrieved all its contents: We didn't inspect txt in any way before
closing the file, so no data was read before closing the file. Then we tried to
print txt, so the runtime system tried to retrieve the contents of h, and it
refused to do this because h was already closed. We'll explore this more
closely in the next subsection.
You can avoid these pitfalls of lazy I/O by simply never calling hClose. A
file handle gets closed automatically when the whole file contents have been
read or when the handle gets garbage collected. The only reason why you may want
to call hClose is if your program reads or writes lots of files in rapid
succession. In that case, the garbage collector may not be fast enough to
collect all the handles of previously opened files before you open new files,
and this may lead to your program running out of file handles.1
Before having a closer look at lazy I/O, let's discuss four more useful functions for reading and writing files. First, there's
withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
In Java, a common idiom to read a file is
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
try {
// Do something with br
} finally {
br.close();
}
The key point is that things can go wrong while reading from br. This throws
an exception. By closing br in a finally clause, the file gets closed
whether we read it successfully or encountered an exception.
withFile behaves exactly the same. The first argument is a file path, the
second argument is the mode in which we want to open the file. The third
argument is what you can think of as the body of the try-block in the
equivalent Java code. It's a function that takes a handle to the file opened by
withFile and produces some value of type r. withFile runs this function on
the file it opened and returns the result this function produces, but before
doing so, it closes the file. So our little code example above can be
expressed more idiomatically as
>>> :{
| withFile "mantra.txt" ReadMode $ \h -> do
| txt <- hGetContents h
| putStr txt
| :}
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
We need to make sure that we consume the contents of the file within the
withFile block. Otherwise, the file gets closed before we are finished reading
it, and we run into the same problem as before:
>>> :{
| do
| txt <- withFile "mantra.txt" ReadMode hGetContents
| putStr txt
| :}
*** Exception: mantra.txt: hGetContents: illegal operation (delayed read on closed handle)
The fact that file handles get closed automatically once the whole file has been
read or the handle gets garbage collected makes it less necessary to use
withFile to ensure files get closed unless we need to carefully manage our
program's open file handles.
For reading the contents of a whole file or writing the contens of a whole file in one go, we have the following convenient functions:
readFile :: FilePath -> IO String
writeFile :: FilePath -> String -> IO ()
readFile reads the whole file and returns its contents in a string.
writeFile writes the given string to the file. No need to fiddle with file
handles. readFile is lazy, as can be seen from its implementation:
readFile name = openFile name ReadMode >>= hGetContents
Thus, it relies on the file getting closed eventually once the contents have
been read or the handle gets garbage collected. Thus, we cannot use readFile
if we need to manage open file handles carefully.
You can use our old friend :sprint to verify that readFile is indeed lazy:
>>> txt <- readFile "mantra.txt"
>>> :sprint txt
txt = _
>>> putStr txt
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
>>> :sprint txt
txt = "I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
"
Note the output of the first :sprint txt. It says that txt = _, that is,
txt has not been evaluated at all. Printing txt using putStr txt forces
txt to be evaluated fully, that is, to be read fully from the input file.
After that, :sprint txt prints the complete contents of the file.
writeFile is not lazy because there is no need to keep the file open: It
consumes the string to be written fully, and then it is done writing to the
file. Thus, the implementation of writeFile uses withFile to make sure that
the file gets closed immediately before writeFile returns:
writeFile name txt = withFile name WriteMode (\h -> hPutStr h txt)
The final function I should mention here is hSeek. When reading a file whose
contents are effectively a data structure, it is often necessary to follow
pointers between different file locations. Thus, the file is no longer read
sequentially. This is enabled using
hSeek :: Handle -> SeekMode -> Integer -> IO ()
This behaves pretty much exactly as the fseek function in C. It allows the
file cursor to be moved. The third argument, an Integer, is the position to
which to set the file cursor. The SeekMode argument determines how to
interpret this position:
data SeekMode
= AbsoluteSeek -- Position is relative to start of the file
| RelativeSeek -- Position is relative to the current cursor position
| SeekFromEnd -- Position is relative to the end of the file
So a position of 5 with AbsoluteSeek as the SeekMode places the cursor on
the 5th byte of the file. With RelativeSeek as the SeekMode, the new
position is 5 bytes after the current position. A position of -5 with
SeekFromEnd as the SeekMode positions the cursor on the 5th byte from the
end of the file.
>>> :{
| conts file = do
| hSeek file AbsoluteSeek 0
| go
| where
| go = do
| atEnd <- hIsEOF file
| if atEnd then
| return []
| else do
| x <- hGetChar file
| xs <- go
| return (x:xs)
| :}
>>> :{
| do
| file <- openFile "mantra.txt" ReadMode
| txt1 <- conts file
| txt2 <- conts file
| putStr $ txt1 ++ txt2
| :}
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
I had to cook my own function conts to read the contents of a file here.
That's because hGetContents file puts the file handle file into a
"semi-closed" state. This state allows further data to be retrieved from the
handle by inspecting the lazy list returned by hGetContents, but no other file
operations are permitted anymore. conts does not put the handle into a
semi-closed state, so we are able to call conts on the same handle a second
time. conts resets the file cursor to the beginning of the file and then reads
the characters in the file one character at a time.
There are many more functions for working with files and file handle. Hoogle
System.IO to learn about which ones there are. You won't need any I/O
functions not discussed here for any project in this course, but if you decide
to continue programming in Haskell, it will help to have an overview of the
different functions offered by the standard library.
-
On most operating systems, file handles are a limited resource. A program can never have more than a certain number of handles open at any given time. The default limit on Linux systems is 1024, but this can be changed by the system administrator. The default limit on MacOS is 256. I don't have a Windows system to test the limit there. ↩