7.5.3. A Faster Implementation*

Our goal in this section is to reduce the running time of the Hungarian Algorithm to \(O\bigl(n^3\bigr)\). We do not change the basic algorithm at all: Each iteration computes an alternating forest and uses it to either find a tight augmenting path for \(M\) or update \(\pi\) so that more vertices in \(W\) are reachable from unmatched vertices in \(U\) via tight alternating paths.

The key to reducing the running time of the algorithm to \(O\bigl(n^3\bigr)\) is to reduce the running time of each iteration to \(O(n)\).

The main steps that each iteration performs are

Construct an alternating forest \(F^\pi\) and the sets \(X\), \(Y\), \(\bar X\), and \(\bar Y\).
Trace an augmenting path in \(F^\pi\) and augment the matching (if there exists an unmatched vertex in \(Y\)).
Compute \(\delta\) (if all vertices in \(Y\) are unmatched).
Update \(\pi\) (if all vertices in \(Y\) are unmatched).

Steps 2 and 4 already take \(O(n)\) time. Steps 1 and 3 are the ones that take quadratic time. To reduce their cost to \(O(n)\) per iteration, we observe that neither \(F_\pi\) nor \(\delta\) needs to be computed from scratch in each iteration. We do need to compute them from scratch in each phase, every time the matching \(M\) changes. However, updating \(\pi\) only grows \(X\) and \(Y\), as shown in the proof of Lemma 7.20. The proof of Lemma 7.18 shows that every edge in \(F^\pi\) remains tight when we update \(\pi\) (because its endpoints are in \(X\) and \(Y\)). Thus, instead of recomputing \(F^\pi\) in each iteration, we simply add to the forest \(F^\pi\) from the previous iteration, and we update \(\delta\) in the process to maintain that it is the minimum slack of the edges between \(X\) and \(\bar Y\).

An elegant implementation of this idea embeds the adjustment of \(\pi\) directly into the construction of \(F^\pi\). The construction of \(F^\pi\) proceeds in up to \(2n\) iterations, during which the algorithm maintains the following information:

Every vertex \(z \in X \cup Y\) stores its parent \(p_z\) in \(F^\pi\). Any unmatched vertex \(u \in U\) satisfies \(p_u = \textrm{NULL}\). We use these parent pointers to trace an augmenting path \(P_w\) at the end of the phase, as soon as we discover an unmatched vertex \(w\) in \(F^\pi\).
Every vertex \(w \in \bar Y\) also stores a "parent" \(p_w\). This is the vertex \(p_w \in X\) such that \(c_{p_w,w} - \pi_{p_w} - \pi_w = \min_{u \in X} (c_{u,w} - \pi_u - \pi_w)\). In other words, \(p_w\) is the other endpoint of the edge with minimum slack among all edges connecting \(w\) to a vertex in \(X\). For \(w \in \bar Y\), \((p_w,w)\) is not an edge in \(F^\pi\) yet. Think about \(p_w\) as the vertex that becomes \(w\)'s parent in \(F^\pi\) once we raise \(\pi_{p_w}\) sufficiently.
Every vertex \(w \in \bar Y\) stores its slack \(\delta_w = \min_{u \in X}(c_{u,w} - \pi_u - \pi_w)\). This is exactly the slack of the edge \((p_w, w)\).
We store the overall slack \(\delta = \min_{w \in \bar Y} \delta_w\). Note that this is exactly the value \(\delta\) we use to adjust \(\pi\), only we now maintain \(\delta\) as we change \(F^\pi\) instead of computing it from scratch when we cannot grow \(F^\pi\) anymore.

The construction of \(F^\pi\) maintains the sets \(X\), \(Y\), \(\bar X\), and \(\bar Y\) explicitly. Initially, \(X\) contains all unmatched vertices in \(U\), \(\bar X = U \setminus X\), \(Y = \emptyset\), and \(\bar Y = W\). This corresponds to initializing the queue of the alternating BFS with the unmatched vertices in \(U\), but we do not explicitly maintain this queue and may not discover the vertices in \(F^\pi\) in breadth-first order. With this initial definition of \(X\), \(Y\), \(\bar X\), and \(\bar Y\), we initialize parent pointers by setting \(p_u = \textrm{NULL}\) for all \(u \in U\). For every vertex \(w \in W = \bar Y\), we set \(\delta_w = \min_{u \in X} (c_{u,w} - \pi_u - \pi_w)\) and \(p_w = \textrm{argmin}_{u \in X} (c_{u,w} - \pi_u - \pi_w)\). We set \(\delta = \min_{w \in W} \delta_w\). This initialization is easily seen to take \(O\bigl(n^2\bigr)\) time. Since we perform it only once per phase, these initialization steps take \(O\bigl(n^3\bigr)\) time across all \(n\) phases of the algorithm.

Now, as long as \(Y\) does not contain any unmatched vertex, we repeat the following steps:

If \(\delta = 0\), then there exists a vertex \(w \in \bar Y\) with \(\delta_w = 0\), that is, the edge \((p_w, w)\) is tight. We add \(w\) and the edge \((p_w, w)\) to \(F^\pi\), by moving \(w\) from \(\bar Y\) to \(Y\).
- If \(w\) is unmatched, then \(F^\pi\) now contains an unmatched vertex in \(W\). The construction of \(F^\pi\) ends and the phase concludes by replacing \(M\) with \(M \oplus P_w\).
- If \(w\) is matched to a vertex \(u \in U\), then \(u \in \bar X\) because we always add mates to \(F^\pi\) together. In this case, we set \(p_u = w\) and add \(u\) and the edge \((p_u, u)\) to \(F_\pi\) by moving \(u\) from \(\bar X\) to \(X\). This makes \(u\) a new neighbour in \(X\) of every vertex \(w' \in \bar Y\), so we update the slack of \(w'\) to \(\delta_{w'} = \min(\delta_{w'}, c_{u,w'} - \pi_u - \pi_{w'})\) and set \(p_{w'} = u\) if \(\delta_{w'} = c_{u,w'} - \pi_u - \pi_{w'}\) now. Finally, we set \(\delta = \min_{w' \in W} \delta_{w'}\) and proceed to the next iteration.
Note that moving \(w\) from \(\bar Y\) to \(Y\) and \(u\) from \(\bar X\) to \(X\) is easily accomplished in constant time. Updating the slack values of all vertices in \(\bar Y\) and updating \(\delta\) take constant time per vertex in \(\bar Y\), \(O(n)\) time in total. Thus, the cost of each iteration in which \(\delta = 0\) is \(O(n)\).
If \(\delta > 0\), we increase \(\pi_u\) by \(\delta\) for every vertex \(u \in X\), decrease \(\pi_w\) by \(\delta\) for every vertex \(w \in Y\), decrease \(\delta_w\) by \(\delta\) for every vertex \(w \in \bar Y\), decrease \(\delta\) by \(\delta\), that is, set \(\delta = 0\), and then proceed to the next iteration. This clearly takes \(O(n)\) time.

Note that all edges in \(F^\pi\) remain tight after this update of \(\pi\), that increasing \(\pi_u\) by \(\delta\) for all \(u \in X\) does indeed decrease the slack of every vertex \(w \in \bar Y\) by \(\delta\), and that \(\delta\) does indeed become \(0\) as a result. Also, since all vertices in \(X\) have their potentials decreased by \(\delta\), the minimum-slack edge \((p_w, w)\) connecting \(w\) to a vertex \(p_w \in X\) remains the same for every vertex \(w \in \bar Y\). Thus, we do not need to update \(p_w\) for any vertex \(w \in \bar Y\).

We have argued that each iteration takes \(O(n)\) time. Every iteration in which \(\delta = 0\) adds a vertex in \(\bar Y\) to \(Y\), so there can be at most \(n\) such iterations per phase. If \(\delta > 0\), then the iteration sets \(\delta = 0\), so every iteration in which \(\delta > 0\) is followed by an iteration in which \(\delta = 0\), and there are also at most \(n\) iterations in which \(\delta > 0\). Thus, there are at most \(2n\) iterations per phase. Since the cost per iteration is \(O(n)\) and the initialization of each phase takes \(O\bigl(n^2\bigr)\) time, the cost per phase is thus \(O\bigl(n^2\bigr)\). Since the algorithm runs for \(n\) phases, its running time is therefore \(O\bigl(n^3\bigr)\) and we obtain the following result:

Theorem 7.22: A minimum-weight perfect matching in a complete bipartite graph can be computed in \(O\bigl(n^3\bigr)\) time.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Algorithms II

7.5.3. A Faster Implementation*