22. A Simple Union-Find Data Structure

In this appendix, I describe a simple union-find data structure that can be used in the implementation of Gabow's algorithm in Section 7.7.2.

A union-find data structure maintains a collection of sets and allows us to identify the set a given element is contained in. At any point in time, every element is contained in at most one set. Specifically, it supports three operations:

\(\boldsymbol{\textbf{MakeSet}(x)}\) creates a new set containing only the element \(x\) and adds it to the set collection.
\(\boldsymbol{\textbf{Union}(A, B)}\) Removes the sets \(A\) and \(B\) from the set collection and adds their union \(A \cup B\) to the set collection.
\(\boldsymbol{\textbf{Find}(x)}\) returns the set in the set collection that contains the element \(x\). This operation may be called only after calling \(\boldsymbol{\textbf{MakeSet}(x)}\) to ensure that there exists a set in the set collection that contains \(x\).

We represent each set \(S\) in the set collection as an object that stores

The size of \(S\) and
Pointers to the head and tail of a doubly-linked list \(L_S\) of the elements in \(S\).

Each node in \(L_S\) stores a pointer back to the object representing \(S\), which we will simply refer to as \(S\) from here on, as it represents the set \(S\).

MakeSet

A \(\boldsymbol{\textbf{MakeSet}(x)}\) operation creates a one-element doubly-linked list \(L_S\) storing \(x\) and a new set \(S\) that stores the size \(|S| = 1\) as well as pointers to the head and tail of \(L_S\). Clearly, this implementation of MakeSet takes constant time.

Find

Given an element \(x\), the set \(S\) that contains \(x\) can be found by following the pointer from \(x\)'s list node to the object \(S\) such that \(x \in L_S\). Thus, \(\boldsymbol{\textbf{Find}(x)}\) takes constant time.

Union

To perform a \(\boldsymbol{\textbf{Union}(A, B)}\) operation, we compare \(|A|\) and \(|B|\) and swap \(A\) and \(B\) if necessary to ensure that \(|A| \ge |B|\). We destroy the object representing \(B\) and add the elements in \(L_B\) to \(L_A\). This turns \(A\) into an object that now represents the union \(A \cup B\).

In detail, we concatenate the two lists \(L_A\) and \(L_B\), which takes constant time because \(L_A\) and \(L_B\) are represented as doubly-linked lists. We update the tail pointer associated with \(A\) to point to the tail of \(L_B\). We add \(|B|\) to \(|A|\) to ensure that \(A\) now stores the combined size \(|A| + |B|\) of the two sets \(A\) and \(B\). Every element in \(L_A\) already points to \(A\). Every element in \(L_B\) points to \(B\). Thus, we need to traverse \(L_B\) and make every element in \(L_B\) point to \(A\). The object \(A\) now represents the union \(A \cup B\).

The cost of this operation is \(O(1 + |B|)\). Thus, in the worst case, the cost per union operation can be linear in the size of the two sets being union. Next we prove that the amortized cost of Union operations is much lower:

Theorem 22.1: Consider a sequence of \(m\) operations on a union-find data structure that includes \(n\) MakeSet operations. Then the total cost of all operations in the sequence is \(O(n \lg n + m)\).

Proof: Every MakeSet or Find operation contributes \(O(1)\) to the total cost of all operations. Thus, their total cost is \(O(m)\). To bound the cost of all Union operations, assume that we apply them to set pairs \((A_1, B_1), \ldots, (A_k,B_k)\), and assume that the two sets in each pair are already ordered so that \(|A_i| \ge |B_i|\) for all \(1 \le i \le k\). Then the total cost of all all Union operations is

\[\sum_{i=1}^k O(1 + |B_i|) = O(m) + \sum_{i=1}^k O(|B_i|)\]

because \(k \le m\).

Next we prove that \(\sum_{i=1}^k |B_i| \le n \lg n\). Thus, the total cost of all Union operations is \(O(n \lg n + m)\), and the theorem follows.

Let \(x_1, \ldots, x_n\) be the elements to which we apply MakeSet operations. For each such element \(x_j\), let \(s_j\) be the number of sets among \(B_1, \ldots, B_k\) that contain \(x_j\). Then

\[\sum_{i=1}^k |B_i| = \sum_{j=1}^n s_j.\]

It suffices to prove that \(s_j \le \lg n\) for all \(1 \le j \le n\). To show this, fix an arbitrary element \(x_j\) and consider the sequence of \(\boldsymbol{\textbf{Union}(A_i, B_i)}\) operations such that \(x_j \in B_i\). We prove that the set that contains \(x_j\) after

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Algorithms II

22. A Simple Union-Find Data Structure

MakeSet

Find

Union