Variables vs Constants, Functions vs Operators
Let us get a lexical quirk out of the way first. Most programming languages
come with certain conventions concerning coding style. C programmers tend to
give all variables, functions and types lowercase names, with the parts of a
name composed of multiple words separated by underscores, as in
a_really_long_function_name
. In Python, class names usually start with
uppercase letters and function names start with lowercase letters. In most
programming languages, these are just conventions, and sticking to them helps
other programmers to understand your code. However, you are completely free to
break these conventions if you really want to. Maybe the compiler or
interpreter will give you a warning, telling you that you really shouldn't
break the conventions, but your code still works.
In Haskell, some rules surrounding capitalization are once again simply
convention. For example, Haskell programmers tend to use "camelcase" to separate
words in long function or type names, as in aReallyLongFunctionName
or
SomeType
, but once again you are welcome to write
a_really_long_function_name
or Some_tYPE
. However, there are some
capitalization rules that are part of the syntax. Breaking them changes the
meaning of your program and will almost always prevent it from compiling or make
it do something you didn't intend.
There are three types of identifiers in Haskell.
Constants must start with uppercase letters.
Examples include True
, False
, Nothing
, and many more.
Variables must start with a lowercase letter. This includes function names.
Examples include aFunction
, partition
, null
, and many more.
Finally, we have operators such as +
, -
, /
or >=>
.
Operator names are composed of special characters.
Operators are in fact just regular functions. However, as explained before,
Operators are to be used in infix notation by default and need to be enclosed in parentheses if we want to use them in prefix notation.
"Normal" functions are used in prefix notation by default and need to be enclosed in backticks if we want to use them in infix notation.
The name "variable" is a bit of a misnomer in Haskell because
A variable in Haskell is not variable as in imperative languages. You cannot define a variable and change its value over time, as you can in Java or C.
Variables in Haskell are like variables in mathematics. We think about variables in Java or C as names for memory locations whose contents we can modify. In mathematics, a variable is a name, a shorthand, for some potentially complex or even unknown expression or object, as in "let \(x\) be the smallest prime number greater than 1,000,000". The value of \(x\) does not change over time, we simply use it as a shorthand to refer to some object. Haskell variables are the same.
A Haskell variable is a name that refers to the value of an expression, not a memory location.
What we can do, however, is have multiple variables \(x\) that have different values in different contexts.
In mathematics, this "context" can be a particular proof or paper. The same variable may be used to refer to different objects in different proofs or different papers. In a paper about graph theory, \(F\) is likely to refer to a forest, and it will likely refer to an arbitrary forest, possibly to different forests in different parts of the paper. In a paper about category theory, \(F\) most likely refers to a functor, again an arbitrary one. Within the same proof or paper, however, it always refers to the same object.
In Haskell, the "context" is a particular function or function call—we can have local variables with the same name but with different values and even different types in different functions. For different recursive calls of the same function, the type of a given variable must be the same in both recursive calls because Haskell is a statically typed language, but the values assigned to this variable may differ between these two recursive calls. Each recursive call has its own copies of all local variables of the function, and the values of these copies can differ between recursive calls.
Haskell also distinguishes between constants and variables at the type level.
Concrete types are like type constants. Int
is Int
, it doesn't change its
meaning in different contexts. Thus, if a function expects an argument of type
Int
, you can only call it with a value of type Int
as argument.
In contrast, a type variable a
can refer to the type Int
in one context, to
the type Char
in a different context, and maybe even to a list of lists of
Int
-Bool
pairs in yet another context. That would be the type [[(Int,
Bool)]]
. What this means is that if a function argument is specified to have
a variable type, such as a
, then you can call it with, say, an integer
argument in some part of your code and with an argument of a completely
different type in a different part of your code.
The naming rules for types are the same as for values.
The names of concrete types must start with uppercase letters. The names of type variables must start with lowercase letters.
The notion of a type variable may seem bewildering at this point. If you've used
or implemented a generic function in Java or Rust, or a template in C++ before,
then you should be familiar with the idea that a given function may be
applicable to arguments of different types. Possibly one of the the most
ubiquitous generic types in Java is ArrayList
. ArrayList
isn't really a
type. You can't have a variable of type ArrayList
. You can only have a
variable of type ArrayList<Integer>
or ArrayList<String>
. ArrayList
is a
parameterized type or, in Haskell parlance, a type constructor. ArrayList
is
defined as ArrayList<T>
, where T
can be instantiated to any type—T
is a
type variable. The equivalent type in Haskell would be Array Int t
, an array
that can store values of any type t
, referenced using integer indices.