2.6.1. Canonical Form
An LP over a set of variables \(x_1, \ldots, x_n\) is in canonical form if
It is a maximization LP,
Every variable \(x_i\) is required to be non-negative: The LP contains a non-negativity constraint
\[x_i \ge 0\]
for every variable \(x_i\).
All other constraints are upper bound constraints: They provide a constant upper bound on some linear combination of the variables. For example:
\[x_1 + 3x_2 - 2x_4 \le 20.\]
We can represent such an LP as an \(n\)-element row vector
\[c = (c_1, \ldots, c_n),\]
an \(m\)-element column vector
\[b = \begin{pmatrix} b_1\\ b_2\\ \vdots\\ b_m \end{pmatrix},\]
and an \(m \times n\) matrix
\[A = \begin{pmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n}\\ a_{2,1} & a_{2,2} & \cdots & a_{2,n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{pmatrix}.\]
\(A\), \(b\), and \(c\) represent the following LP in canonical form:
\[\begin{gathered} \text{Maximize } \sum_{j=1}^n c_jx_j\\ \begin{aligned} \text{s.t. }\sum_{j=1}^n a_{ij}x_j &\le b_i && \forall 1 \le i \le m && \text{(upper bound constraints)}\\ x_j &\ge 0 && \forall 1 \le j \le n && \text{(non-negativity constraints)} \end{aligned} \end{gathered}\]
If we define that \(x \le y\) for two \(m\)-element vectors if and only if \(x_i \le y_i\) for all \(1 \le i \le m\), and we write \(0\) to denote the \(0\)-vector \((0, \ldots, 0)^T\),1 then this LP can be written more compactly as
\[\begin{gathered} \text{Maximize } cx\\ \begin{aligned} \text{s.t. } Ax &\le b\\ x &\ge 0, \end{aligned} \end{gathered}\tag{2.18}\]
where \(cx\) is the inner product of the two vectors \(c\) and \(x\) and \(Ax\) is the usual matrix-vector product. (Throughout the remainder of this chapter and throughout the next two chapters, we treat \(x\) as a column vector.)
Throughout most of these notes, we adopt the following definition of equivalence of two LPs:
Two LPs are equivalent if
- They have the same variables,
- They have the same set of feasible solutions, and
- Their objective functions assign the same value to each feasible solution.
In the following two lemmas, we use a more general notion of equivalence of LPs, which considers two LPs \(P\) and \(P'\) over variables \(x = (x_1, \ldots, x_n)^T\) and \(y = (y_1, \ldots, y_r)^T\) equivalent if there exist two linear functions \(\phi : \mathbb{R}^n \rightarrow \mathbb{R}^r\) and \(\psi : \mathbb{R}^r \rightarrow \mathbb{R}^n\) such that
- \(\hat x\) is a feasible solution of \(P\) if and only if \(\phi\bigl(\hat x\bigr)\) is a feasible solution of \(P'\),
- \(\hat y\) is a feasible solution of \(P'\) if and only if \(\psi\bigl(\hat y\bigr)\) is a feasible solution of \(P\),
- If \(\hat x\) is an optimal solution of \(P\), then \(\phi\bigl(\hat x\bigr)\) is an optimal solution of \(P'\), and
- If \(\hat y\) is an optimal solution of \(P'\), then \(\psi\bigl(\hat y\bigr)\) is an optimal solution of \(P\).
Less formally, this says that \(\phi\) and \(\psi\) are mappings between solutions of \(P\) and \(P'\) that preserve feasible solutions and optimal solutions.
Equivalence of LPs is useful because it allows us to obtain an optimal solution of an LP by solving any of its equivalent LPs. By the following lemma, every LP has an equivalent LP in canonical form:
Lemma 2.11: Every linear program can be transformed into an equivalent linear program in canonical form.
Proof: An LP \(P\) may not be in canonical form for four reasons:
- It may be a minimization LP.
- Some of its constraints may be equality constraints.
- Some of its inequality constraints may be lower bounds, as opposed to upper bounds.
- Some variables may not have non-negativity constraints.
We define a sequence of LPs \(P = P_0, P_1, P_2, P_3, P_4 = P'\) such that \(P'\) is in canonical form and for all \(1 \le i \le 4\), \(P_{i-1}\) and \(P_i\) are equivalent. Thus, \(P'\) is an LP in canonical form that is equivalent to \(P\).
From minimization LP to maximization LP: First we construct a maximization LP \(P_1\) that is equivalent to \(P_0 = P\). If \(P_0\) is itself a maximization LP, then \(P_1 = P_0\). Otherwise, the objective of \(P_0\) is to minimize \(cx\). This is the same as maximizing \(-cx\). Thus, \(P_1\) has the objective function \(-cx\) and asks us to maximize it. \(P_0\) and \(P_1\) have the same variables and the same constraints, so they have the same set of feasible solutions. As just observed, any feasible solution that minimizes \(cx\) also maximizes \(-cx\). Thus, \(P_0\) and \(P_1\) are equivalent.
From equalities to inequalities: The next LP \(P_2\) we construct from \(P_1\) is a maximization LP without equality constraints. Clearly,
\[\sum_{j=1}^n a_{ij}x_j = b_i\]
if and only if
\[\begin{aligned} \sum_{j=1}^n a_{ij}x_j &\ge b_i \text{ and}\\ \sum_{j=1}^n a_{ij}x_j &\le b_i. \end{aligned}\]
Thus, we construct \(P_2\) by replacing every equality constraint in \(P_1\) with these two corresponding inequality constraints. \(P_1\) and \(P_2\) have the same variables and the same objective function, and every inequality constraint in \(P_1\) is also a constraint in \(P_2\). As just observed, any solution \(\hat x = \bigl(\hat x_1, \ldots, \hat x_n\bigr)\) satisfies an equality constraint in \(P_1\) if and only if it satisfies the two corresponding inequality constraints in \(P_2\). Thus, \(P_1\) and \(P_2\) have the same set of feasible solutions and the same objective function. They are therefore equivalent.
From lower bounds to upper bounds: The next LP \(P_3\) we construct from \(P_2\) is a maximization LP with only upper bound constraints. This transformation is similar to the transformation from a minimization LP to a maximization LP: Every lower bound constraint
\[\sum_{j=1}^n a_{ij}x_j \ge b_i\]
in \(P_2\) is satisfied if and only if the upper bound constraint
\[\sum_{j=1}^n -a_{ij}x_j \le -b_i\]
is satisfied, so we replace it with this constraint in \(P_3\). Since \(P_2\) and \(P_3\) have the same variables and the same objective function, and it is easy to see that \(\hat x = \bigl(\hat x_1, \ldots, \hat x_n\bigr)\) is a feasible solution of \(P_2\) if and only if it is a feasible solution of \(P_3\), \(P_2\) and \(P_3\) are equivalent.
Introducing non-negativity constraints: \(P_3\) is in canonical form, except that it does not include non-negativity constraints for its variables. To obtain the final LP \(P_4 = P'\) in canonical form, we replace every variable in \(P_3\) with a pair of non-negative variables in \(P_4\).
\(P_4\) has the set of variables
\[\bigl\{x_j', x_j'' \mid 1 \le j \le n\bigr\}.\]
The objective function and constraints of \(P_4\) are obtained from the objective function and constraints of \(P_3\) by replacing every occurrence of \(x_j\) with \(x_j' - x_j''\) for all \(1 \le j \le n\) and by adding non-negativity constraints for all variables of \(P_4\). The resulting LP \(P_4\) is in canonical form.
The linear transformations \(\phi\) and \(\psi\) that establish the equivalence between \(P_3\) and \(P_4\) are defined as follows:
\[\phi(x_1, \ldots, x_n) = \bigl(x_1', x_1'', \ldots, x_n', x_n''\bigr),\]
where
\[\begin{aligned} x_j' &= \begin{cases} x_j & \text{if } x_j \ge 0\\ 0 & \text{if } x_j < 0 \end{cases}\\ x_j'' &= \begin{cases} 0 & \text{if } x_j \ge 0\\ -x_j & \text{if } x_j < 0 \end{cases} \end{aligned}\]
for all \(1 \le j \le n\), and
\[\psi\bigl(x_1', x_1'', \ldots, x_n', x_n''\bigr) = (x_1, \ldots, x_n),\]
where
\[x_j = x_j' - x_j''\]
for all \(1 \le j \le n\).
Given a feasible solution \(\bigl(\hat x_1, \ldots, \hat x_n\bigr)\) of \(P_3\), let \(\bigl(\hat x_1', \hat x_1'', \ldots, \hat x_n', \hat x_n''\bigr) = \phi\bigl(\hat x_1, \ldots, \hat x_n\bigr)\). By the definition of \(\phi\), all values \(\hat x_1', \hat x_1'', \ldots, \hat x_n', \hat x_n''\) are non-negative. Moreover, we have \(\hat x_j' - \hat x_j'' = \hat x_j\) for all \(1 \le j \le n\). Since \(\bigl(\hat x_1, \ldots, \hat x_n\bigr)\) satisfies all constraints of \(P_3\) and every occurrence of \(x_j\) in \(P_3\) is replaced with \(x_j' - x_j''\) in \(P_4\), for all \(1 \le j \le n\), \(\bigl(\hat x_1', \hat x_1'', \ldots, \hat x_n', \hat x_n''\bigr)\) satisfies all constraints of \(P_4\) and achieves the same objective function value as \(\bigl(\hat x_1, \ldots, \hat x_n\bigr)\).
Conversely, given a feasible solution \(\bigl(\hat x_1', \hat x_1'', \ldots, \hat x_n', \hat x_n''\bigr)\) of \(P_4\), let \(\bigl(\hat x_1, \ldots, \hat x_n\bigr) = \psi\bigl(\hat x_1', \hat x_1'', \ldots, \hat x_n', \hat x_n''\bigr)\). Then, once again, \(\hat x_j' - \hat x_j'' = \hat x_j\) for all \(1 \le j \le n\). Since \(\bigl(\hat x_1', \hat x_1'', \ldots, \hat x_n', \hat x_n''\bigr)\) satisfies all constraints of \(P_4\) and every occurrence of \(x_j\) in \(P_3\) is replaced with \(x_j' - x_j''\) in \(P_4\), for all \(1 \le j \le n\), \(\bigl(\hat x_1, \ldots, \hat x_n\bigr)\) satisfies all constraints of \(P_3\) and achieves the same objective function value as \(\bigl(\hat x_1', x_1'', \ldots, \hat x_n', x_n''\bigr)\).
Since \(\phi\) and \(\psi\) map feasible solutions to feasible solutions and preserve their objective function values, it follows that \(\phi\bigl(x_1, \ldots, x_n)\) is an optimal solution of \(P_4\) if \(\bigl(x_1, \ldots, x_n\bigr)\) is an optimal solution of \(P_3\), and \(\psi\bigl(x_1', x_1'', \ldots, x_n', x_n''\bigr)\) is an optimal solution of \(P_3\) if \(\bigl(x_1', x_1'', \ldots, x_n', x_n''\bigr)\) is an optimal solution of \(P_4\). Thus, \(P_3\) and \(P_4\) are equivalent. ▨
Here is an illustration of the transformation in the proof of Lemma 2.11. It uses a slightly modified version of an LP used later when discussing minimum-cost flows in Chapter 6. This LP describes a flow through a graph \(G\) whose edges are partitioned into two subsets \(E_b\) and \(E_u\). The LP has one variable \(p_v\) per vertex \(v \in V\) and one variable \(s_{u,v}\) for every edge \((u, v) \in E_b\). There is no variable corresponding to any edge in \(E_u\). The values \(q_{u,v}\) are constants:
\[\begin{gathered} \text{Minimize } \sum_{(u,v) \in E_b} c_{u,v}s_{u,v} - \sum_{u \in V} b_up_u\\ \begin{aligned} \text{s.t. } p_u - p_v - s_{u,v} &\le q_{u,v} && \forall (u,v) \in E_b\\ p_v - p_u &\ge q_{u,v} && \forall (u,v) \in E_u\\ s_{u,v} &\ge 0 && \forall (u,v) \in E_b \end{aligned} \end{gathered}\]
I will first illustrate the process outlined in the above proof verbatim, without modifications. Then I will discuss some shortcuts one may take to arrive at the result faster, and at a simpler result.
First we turn the LP into a maximization LP:
\[\text{Maximize } \sum_{u \in V} b_up_u - \sum_{(u,v) \in E_b} c_{u,v}s_{u,v}\]
There are no equality constraints, so the second transformation is not needed. Next, we turn lower bounds into upper bounds:
\[\begin{aligned} p_u - p_v - s_{u,v} &\le q_{u,v} && \forall (u,v) \in E_b\\ p_u - p_v &\le -q_{u,v} && \forall (u,v) \in E_u\\ -s_{u,v} &\le 0 && \forall (u,v) \in E_b \end{aligned}\]
Finally, we replace every variable \(p_v\) with the difference \(p'_v - p''_v\) of two non-negative variables \(p'_v\) and \(p''_v\), and we do the same for variables \(s_{u,v}\). This gives the following LP in canonical form:
\[\begin{gathered} \text{Maximize } \sum_{u \in V} b_up_u' - \sum_{u \in V} b_up_u'' - \sum_{(u,v) \in E_b} c_{u,v}s_{u,v}' + \sum_{(u,v) \in E_b} c_{u,v}s_{u,v}''\\ \begin{aligned} \text{s.t. } p_u' - p_u'' - p_v' + p_v'' - s_{u,v}' + s_{u,v}'' &\le q_{u,v} && \forall (u,v) \in E_b\\ p_u' - p_u'' - p_v' + p_v'' &\le -q_{u,v} && \forall (u,v) \in E_u\\ s_{u,v}'' - s_{u,v}' &\le 0 && \forall (u,v) \in E_b\\ p_v' &\ge 0 && \forall v \in V\\ p_v'' &\ge 0 && \forall v \in V\\ s_{u,v}' &\ge 0 && \forall (u,v) \in E_b\\ s_{u,v}'' &\ge 0 && \forall (u,v) \in E_b \end{aligned} \end{gathered}\]
While reading through this example, I hope that you thought that it was silly to turn the constraint \(s_{u,v} \ge 0\) into the constraint \(-s_{u,v} \le 0\) as part of the elimination of lower bound constraints, only to then split \(s_{u,v}\) into two non-negative variables \(s'_{u,v}\) and \(s''_{u,v}\). This is the primary shortcut we take in practice:
We leave any lower bound constraint of the form \(x_i \ge 0\) alone during the conversion of lower bound constraints into upper bound constraints, because it is simply a non-negativity constraint. Any variable for which the input LP has such a non-negativity constraint can be included unaltered in the output LP, exactly because it is already constrained to be non-negative: We do not need to split it. In our example, this means that we leave the variables \(s_{u,v}\) and their non-negativity constraints alone and only split each variable \(p_v\) into the two variables \(p'_v\) and \(p''_v\):
\[\begin{gathered} \text{Maximize } \sum_{u \in V} b_up_u' - \sum_{u \in V} b_up_u'' - \sum_{(u,v) \in E_b} c_{u,v}s_{u,v}\\ \begin{aligned} \text{s.t. } p_u' - p_u'' - p_v' + p_v'' - s_{u,v} &\le q_{u,v} && \forall (u,v) \in E_b\\ p_u' - p_u'' - p_v' + p_v'' &\le -q_{u,v} && \forall (u,v) \in E_u\\ p_v' &\ge 0 && \forall v \in V\\ p_v'' &\ge 0 && \forall v \in V\\ s_{u,v} &\ge 0 && \forall (u,v) \in E_b \end{aligned} \end{gathered}\]
A similar simplification concerns variables that are constrained to be non-positive in the input LP: \(x_i \le 0\). Instead of keeping this upper bound constraint and then replacing \(x_i\) with \(x_i' - x_i''\), where \(x_i' \ge 0\) and \(x_i'' \ge 0\), we observe that \(x_i \le 0\) is really a non-negativity constraint on \(-x_i\). Thus, we can simply replace every occurrence of \(x_i\) in the input LP with \(-x_i'\) and then convert the resulting constraint \(-x_i' \le 0\) into the non-negativity constraint \(x_i' \ge 0\).
To summarize:
-
Any unconstrained variable \(x_i\) is replaced by the difference \(x_i' - x_i''\) of two non-negative variables \(x_i'\) and \(x_i''\).
-
Any variable with a non-negativity constraint in the input LP is included without change in the output LP, as is its non-negativity constraint.
-
Any variable \(x_i\) with a non-positivity constraint in the input LP is replaced with \(-x_i'\), and its non-positivity constraint \(x_i \le 0\) is replaced with a non-negativity constraint \(x_i' \ge 0\).
This reduces the number of variables in the output LP because we introduce two variables in the output LP only for unconstrained variables in the input LP.
Recall that \(A^T\) denotes the transpose of the matrix \(A\). A column vector is simply an \(m \times 1\) matrix. A row vector is a \(1 \times n\) matrix. Here, transposing the row vector \((0, \ldots, 0)\) produces a column vector.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.