Reading and Writing Files
putChar
, putStr
, putStrLn
, getChar
, getLine
, and getContents
have
counterparts whose names are prefixed with an h
. The h
-versions of these
functions read from a file handle (h for handle). In fact, if you import the
System.IO
module, then you get access to the standard input/output streams
available to any program:
stdin, stdout, stderr :: Handle
putChar
, putStr
, and putStrLn
are defined by calling hPutChar
,
hPutStr
, and hPutStrLn
with stdout
as their first arguments:
putChar = hPutChar stdout
putStr = hPutStr stdout
putStrLn = hputStrLn stdout
Similarly, getChar
, getLine
, and getContents
read from stdin
using
hGetChar
, hGetLine
, and hGetContents
:
getChar = hGetChar stdin
getLine = hGetLine stdin
getContents = hGetContents stdin
The types of hPutChar
, hPutStr
, hPutStrLn
, hGetChar
, hGetLine
, and
hGetContents
are
hPutChar :: Handle -> Char -> IO ()
hPutStr :: Handle -> String -> IO ()
hPutStrLn :: Handle -> String -> IO ()
hGetChar :: Handle -> IO Char
hGetLine :: Handle -> IO String
hGetContents :: Handle -> IO String
A Handle
is what we use in Haskell to hold on to an open file from which we
can read and to which we can write. stdin
, stdout
, and stderr
are files
that are always available, without the need to open them. If we want to read
from a file on our file system, then we need to open this file using
openFile :: FilePath -> IOMode -> IO Handle
FilePath
is the type used to represent file names. It's an alias for String
.
type FilePath = String
IOMode
indicates whether we want to open the file for reading, writing, both,
or appending.
data IOMode
= ReadMode -- File can only be read
| WriteMode -- File can only be written
| AppendMode -- File can only be written past its original content
| ReadWriteMode -- File can be read and written
In particular, opening a file in WriteMode
erases its original content.
ReadMode
, WriteMode
, and ReadWriteMode
then position the file cursor at
the beginning of the file. AppendMode
does not alter the file when opening it
and positions the file cursor at the end of the file, to allow adding more
content at the end.
A file opened with openFile
can be closed using
hClose :: Handle -> IO ()
For example, if we have the file
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
then we can open it using openFile
, read its contents using hGetContents
,
print them on screen using putStr
, and finally close the file using hClose
:
>>> import System.IO
>>> :{
| do
| h <- openFile "mantra.txt" ReadMode
| txt <- hGetContents h
| putStr txt
| hClose h
| :}
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
An important point we will explore more closely shortly: In other languages, this would work just fine:
>>> :{
| do
| h <- openFile "mantra.txt" ReadMode
| txt <- hGetContents h
| hClose h
| putStr txt
| :}
*** Exception: mantra.txt: hGetContents: illegal operation (delayed read on closed handle)
Apparently, GHCi doesn't like what we're doing. We were trying to read the
contents of the file into txt
using hGetContents
. Being good programmers who
clean up after themselves as soon as possible, we closed the handle h
, and
then we printed the file contents using putStr
.
The reason why this doesn't work so well is Haskell's lazy evaluation. In most
languages, if we read the entire file using the equivalent of hGetContents
, we
get a string object that stores the contents of the whole file. In Haskell, we
also get a String
, but a String
is a list of Char
acters, and this list is
produced lazily, as we consume its characters. Most of the time, this is a good
thing because it avoids the dance necessary in other languages, where we read
large files in chunks and process these chunks one at a time to avoid using too
much memory. In Haskell, we read the file contents as we need them, and the
garbage collector disposes of data read earlier unless we hold on to it in a
variable.
Here, lazy evaluation bit us in our behind. We closed the file using hClose
before we retrieved all its contents: We didn't inspect txt
in any way before
closing the file, so no data was read before closing the file. Then we tried to
print txt
, so the runtime system tried to retrieve the contents of h
, and it
refused to do this because h
was already closed. We'll explore this more
closely in the next subsection.
You can avoid these pitfalls of lazy I/O by simply never calling hClose
. A
file handle gets closed automatically when the whole file contents have been
read or when the handle gets garbage collected. The only reason why you may want
to call hClose
is if your program reads or writes lots of files in rapid
succession. In that case, the garbage collector may not be fast enough to
collect all the handles of previously opened files before you open new files,
and this may lead to your program running out of file handles.1
Before having a closer look at lazy I/O, let's discuss four more useful functions for reading and writing files. First, there's
withFile :: FilePath -> IOMode -> (Handle -> IO r) -> IO r
In Java, a common idiom to read a file is
BufferedReader br = new BufferedReader(new FileReader("file.txt"));
try {
// Do something with br
} finally {
br.close();
}
The key point is that things can go wrong while reading from br
. This throws
an exception. By closing br
in a finally
clause, the file gets closed
whether we read it successfully or encountered an exception.
withFile
behaves exactly the same. The first argument is a file path, the
second argument is the mode in which we want to open the file. The third
argument is what you can think of as the body of the try
-block in the
equivalent Java code. It's a function that takes a handle to the file opened by
withFile
and produces some value of type r
. withFile
runs this function on
the file it opened and returns the result this function produces, but before
doing so, it closes the file. So our little code example above can be
expressed more idiomatically as
>>> :{
| withFile "mantra.txt" ReadMode $ \h -> do
| txt <- hGetContents h
| putStr txt
| :}
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
We need to make sure that we consume the contents of the file within the
withFile
block. Otherwise, the file gets closed before we are finished reading
it, and we run into the same problem as before:
>>> :{
| do
| txt <- withFile "mantra.txt" ReadMode hGetContents
| putStr txt
| :}
*** Exception: mantra.txt: hGetContents: illegal operation (delayed read on closed handle)
The fact that file handles get closed automatically once the whole file has been
read or the handle gets garbage collected makes it less necessary to use
withFile
to ensure files get closed unless we need to carefully manage our
program's open file handles.
For reading the contents of a whole file or writing the contens of a whole file in one go, we have the following convenient functions:
readFile :: FilePath -> IO String
writeFile :: FilePath -> String -> IO ()
readFile
reads the whole file and returns its contents in a string.
writeFile
writes the given string to the file. No need to fiddle with file
handles. readFile
is lazy, as can be seen from its implementation:
readFile name = openFile name ReadMode >>= hGetContents
Thus, it relies on the file getting closed eventually once the contents have
been read or the handle gets garbage collected. Thus, we cannot use readFile
if we need to manage open file handles carefully.
You can use our old friend :sprint
to verify that readFile
is indeed lazy:
>>> txt <- readFile "mantra.txt"
>>> :sprint txt
txt = _
>>> putStr txt
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
>>> :sprint txt
txt = "I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
"
Note the output of the first :sprint txt
. It says that txt = _
, that is,
txt
has not been evaluated at all. Printing txt
using putStr txt
forces
txt
to be evaluated fully, that is, to be read fully from the input file.
After that, :sprint txt
prints the complete contents of the file.
writeFile
is not lazy because there is no need to keep the file open: It
consumes the string to be written fully, and then it is done writing to the
file. Thus, the implementation of writeFile
uses withFile
to make sure that
the file gets closed immediately before writeFile
returns:
writeFile name txt = withFile name WriteMode (\h -> hPutStr h txt)
The final function I should mention here is hSeek
. When reading a file whose
contents are effectively a data structure, it is often necessary to follow
pointers between different file locations. Thus, the file is no longer read
sequentially. This is enabled using
hSeek :: Handle -> SeekMode -> Integer -> IO ()
This behaves pretty much exactly as the fseek
function in C. It allows the
file cursor to be moved. The third argument, an Integer
, is the position to
which to set the file cursor. The SeekMode
argument determines how to
interpret this position:
data SeekMode
= AbsoluteSeek -- Position is relative to start of the file
| RelativeSeek -- Position is relative to the current cursor position
| SeekFromEnd -- Position is relative to the end of the file
So a position of 5
with AbsoluteSeek
as the SeekMode
places the cursor on
the 5th byte of the file. With RelativeSeek
as the SeekMode
, the new
position is 5 bytes after the current position. A position of -5
with
SeekFromEnd
as the SeekMode
positions the cursor on the 5th byte from the
end of the file.
>>> :{
| conts file = do
| hSeek file AbsoluteSeek 0
| go
| where
| go = do
| atEnd <- hIsEOF file
| if atEnd then
| return []
| else do
| x <- hGetChar file
| xs <- go
| return (x:xs)
| :}
>>> :{
| do
| file <- openFile "mantra.txt" ReadMode
| txt1 <- conts file
| txt2 <- conts file
| putStr $ txt1 ++ txt2
| :}
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
I will learn to program in Haskell!
I had to cook my own function conts
to read the contents of a file here.
That's because hGetContents file
puts the file handle file
into a
"semi-closed" state. This state allows further data to be retrieved from the
handle by inspecting the lazy list returned by hGetContents
, but no other file
operations are permitted anymore. conts
does not put the handle into a
semi-closed state, so we are able to call conts
on the same handle a second
time. conts
resets the file cursor to the beginning of the file and then reads
the characters in the file one character at a time.
There are many more functions for working with files and file handle. Hoogle
System.IO
to learn about which ones there are. You won't need any I/O
functions not discussed here for any project in this course, but if you decide
to continue programming in Haskell, it will help to have an overview of the
different functions offered by the standard library.
-
On most operating systems, file handles are a limited resource. A program can never have more than a certain number of handles open at any given time. The default limit on Linux systems is 1024, but this can be changed by the system administrator. The default limit on MacOS is 256. I don't have a Windows system to test the limit there. ↩