9.2.2. Proof of Approximation Ratio

Next, let us prove that the greedy set cover algorithm does indeed achieve an approximation ratio of \(O(\lg n)\).

Theorem 9.2: The greedy set cover algorithm computes an \(O(\lg n)\)-approximation of an optimal set cover.

Proof: We prove that the algorithm computes a \(H_n\)-approximation of an optimal set cover, where \(H_n = \sum_{i=1}^n \frac{1}{i}\) is the \(\boldsymbol{n}\)th harmonic number. It is easy to prove that \(\ln (n+1) \le H_n \le \ln n + 1\),1 that is, \(H_n = \Theta(\lg n)\).

To prove this, we start by charging the elements in \(U\) for the weight of the computed cover \(\mathcal{C}\). Consider the moment we add a set \(S\) to \(\mathcal{C}\). \(S \setminus C\) is the set of elements in \(S\) not yet covered by \(\mathcal{C}\). We charge each element \(e \in S \setminus C\) a price

\[p_e = \frac{w_S}{|S \setminus C|}.\]

The prices charged to the elements in \(S \setminus C\), which are the elements newly covered by adding \(S\) to \(\mathcal{C}\), therefore pay for the cost of adding \(S\) to \(\mathcal{C}\). Overall, this gives

\[w(\mathcal{C}) = \sum_{S \in \mathcal{C}} w_S = \sum_{e \in U} p_e.\]

Now order the elements in \(U\) in the order in which we cover them, breaking ties arbitrarily. In other words, we start with the elements in the first set we add to \(\mathcal{C}\), followed by the elements in the second set we add to \(\mathcal{C}\) that are not also in the first set, and so on. Let \(\langle e_1, \ldots, e_n \rangle\) be the resulting sequence of elements. We prove that each element \(e_i\) satisfies

\[p_{e_i} \le \frac{\textrm{OPT}}{n - i + 1}.\]

This implies that

\[w(\mathcal{C}) = \sum_{i=1}^n p_{e_i} \le \sum_{i=1}^n \frac{\textrm{OPT}}{n - i + 1} = \textrm{OPT} \cdot \sum_{i=1}^n \frac{1}{i} = \textrm{OPT} \cdot H_n,\]

which proves the theorem.

Consider any element \(e_i\), and consider the states of \(\mathcal{C}\) and \(C\) immediately before adding the first set \(S\) to \(\mathcal{C}\) that contains \(e_i\). Since \(p_{e_i} = \frac{w_S}{|S \setminus C|}\) and \(S\) is the set with lowest amortized weight \(\frac{w_S}{|S \setminus C|}\) among all sets in \(\S \setminus \mathcal{C}\), it suffices to prove that there exists a set \(S' \in \mathcal{S} \setminus \mathcal{C}\) of amortized weight

\[\frac{w_{S'}}{|S' \setminus C|} \le \frac{\textrm{OPT}}{n - i + 1}.\]

Let \(\mathcal{C}^*\) be an optimal set cover: \(w(\mathcal{C}^*) = \textrm{OPT}\). Since \(\mathcal{C}^*\) is a set cover of \(U\) and \(\mathcal{C}\) covers exactly the elements in \(C\), \(\mathcal{C}^* \setminus \mathcal{C}\) is a set cover of \(U \setminus C\). Let \(\mathcal{C}' \subseteq \mathcal{C}^* \setminus \mathcal{C}\) be a minimal set cover of \(U \setminus C\), that is, there is no proper subset of \(\mathcal{C}'\) that covers \(U \setminus C\). Note that \(\mathcal{C}' \subseteq \mathcal{S} \setminus \mathcal{C}\), so it suffices to prove that there exists a set \(S' \in \mathcal{C}'\) with amortized weight

\[\frac{w_{S'}}{|S' \setminus C|} \le \frac{\textrm{OPT}}{n - i + 1}.\]

We assign each element \(e \in U \setminus C\) to some set \(S'' \in \mathcal{C}'\) that contains it. Since \(\mathcal{C}'\) is minimal, there exists an element \(e \in U \setminus C\) for every set \(S'' \in \mathcal{C}'\) such that \(e \in S''\) and no other set in \(\mathcal{C}'\) contains \(e\). Indeed, if there were no such element \(e\), then we could remove \(S''\) from \(\mathcal{C}'\) and \(\mathcal{C}'\) would still be a set cover of \(U \setminus C\), that is, \(\mathcal{C}'\) would not be a minimal set cover of \(U \setminus C\).

Since there exists an element \(e \in U \setminus C\) for every set \(S'' \in \mathcal{C}'\) that is contained only in \(S''\), every set \(S'' \in \mathcal{C}'\) is assigned at least one element in \(U \setminus C\). To pay for each set \(S'' \in \mathcal{C}'\) with \(a \ge 1\) assigned elements, we charge each element \(e\) assigned to \(S''\) a price \(p'_e = \frac{w_{S''}}{a}\). This ensures that

\[\sum_{e \in U \setminus C} p'_e = w(\mathcal{C}') \le w(\mathcal{C}^*) = \textrm{OPT}.\]

Therefore, there exists an element \(e \in U \setminus C\) whose price \(p'_e\) satisfies

\[p'_e \le \frac{\textrm{OPT}}{|U \setminus C|}.\]

Since \(C\) is the set of elements covered by \(\mathcal{C}\) and \(e_i\) is not covered by \(\mathcal{C}\) yet, none of the elements \(e_i, \ldots, e_n\) is in \(C\). Thus, \(|U \setminus C| \ge n - i + 1\), that is,

\[p'_e \le \frac{\textrm{OPT}}{n - i + 1}.\]

Now let \(S'\) be the set in \(\mathcal{C}'\) to which \(e\) is assigned. Let \(a\) be the number of elements assigned to \(S'\). Then \(a \le |S' \setminus C|\) because only elements in \(S' \setminus C\) can be assigned to \(S'\). This gives

\[\frac{w_{S'}}{|S' \setminus C|} \le \frac{w_{S'}}{a} = p'_e \le \frac{\textrm{OPT}}{n - i +1},\]

that is, \(S'\) is the desired set in \(\mathcal{C}'\). ▨

If you paid attention, then you noticed that the proof of the approximation factor of the greedy set cover algorithm does not explicitly provide a lower bound on \(\textrm{OPT}\), even though we stated in Section 8.2 that, in general, we prove bounds on the approximation ratios of approximation algorithms by comparing the objective function value of the computed solution with a lower bound or upper bound on \(\textrm{OPT}\). We will revisit this algorithm in Chapter 11 and will use the method of dual fitting to provide an explicit lower bound on \(\textrm{OPT}\). By comparing the solution computed by the algorithm to this lower bound, we arrive at the above result a bit more elegantly, and we will be able to obtain a slightly better approximation ratio for inputs where every set in \(\mathcal{S}\) is small.

1

Indeed, we have \[H_n = \sum_{i=1}^n \frac{1}{i} = \int_{1}^{n+1}\frac{1}{\lfloor x\rfloor} dx \ge \int_{1}^{n+1}\frac{1}{x}dx = \ln (n+1) - \ln 1 = \ln (n+1)\] and \[H_n = 1 + \sum_{i=2}^n \frac{1}{i} = 1 + \int_{1}^{n}\frac{1}{\lceil x\rceil} dx \le 1 + \int_{1}^{n}\frac{1}{x}dx = 1 + \ln n - \ln 1 = 1 + \ln n.\]


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.