11.1. Set Cover Revisited

The greedy set cover algorithm discussed in Section 9.2 computes an \(O(\lg n)\)-approximation of an optimal set cover, but we proved this only using a fairly complicated structural analysis of an optimal set cover. We did not explicitly establish a lower bound on the cost of an optimal set cover. In this section, we explicitly construct such a lower bound to obtain a more natural proof of the \(O(\lg n)\) approximation ratio of the algorithm.

To apply dual fitting, we need an ILP formulation of the set cover problem:

\[\begin{gathered} \text{Minimize } \sum_{S \in \mathcal{S}} w_Sx_S\\ \begin{aligned} \text{s.t. } \sum_{e \in S} x_S &\ge 1 && \forall e \in U\\ x_S &\in \{0,1\} && \forall S \in \mathcal{S} \end{aligned} \end{gathered}\tag{11.1}\]

This says that we need to decide for every set \(S \in \mathcal{S}\) whether to include it in the set cover \(\mathcal{C} \subseteq \mathcal{S}\) (\(x_S = 1\)) or not (\(x_S = 0\)). The weight of the chosen set cover \(\mathcal{C}\) is then \(\sum_{S \in \mathcal{S}} w_Sx_S\), which is the objective function value we aim to minimize. The constraints \(\sum_{e \in S} x_S \ge 1\) for all \(e \in U\) ensure that every element \(e \in U\) is contained in at least one set \(S \in \mathcal{C}\), that is, that \(\mathcal{C}\) is a set cover.

The LP relaxation of (11.1) is

\[\begin{gathered} \text{Minimize } \sum_{S \in \mathcal{S}} w_Sx_S\\ \begin{aligned} \text{s.t. } \sum_{e \in S} x_S &\ge 1 && \forall e \in U\\ x_S &\ge 0 && \forall S \in \mathcal{S}. \end{aligned} \end{gathered}\tag{11.2}\]

The constraints \(x_S \le 1\) for all \(S \in \mathcal{S}\) are omitted because no optimal solution sets \(x_S > 1\) for any \(S \in \mathcal{S}\). (Remember, we assumed that all sets have strictly positive costs.)

The dual LP of (11.2) is

\[\begin{gathered} \text{Maximize } \sum_{e \in U} y_e\\ \begin{aligned} \text{s.t. } \sum_{e \in S} y_e &\le w_S && \forall S \in \mathcal{S}\\ y_e &\ge 0 && \forall e \in U. \end{aligned} \end{gathered}\tag{11.3}\]

An intuitive way of thinking about this dual is that we are "packing" the values \(y_e\) where \(e \in S\) into the set \(S\) without "overpacking" \(S\), that is, without exceeding its "capacity" \(w_S\).

The analysis of the greedy algorithm in Section 9.2 charges a price \(p_e\) to every element and proves that the sum of the prices charged to all elements in \(U\) pays for the cost of the computed set cover \(\mathcal{C}\):

\[w(\mathcal{C}) = \sum_{e \in U} p_e.\]

In general, it is possible that \(\sum_{e \in S} p_e > w_S\) for some \(S \in \mathcal{S}\). Indeed, if \(S \in \mathcal{C}\), then the prices of the elements in \(S\) that were not covered yet at the time we added \(S\) to \(\mathcal{C}\) pay for \(S\), that is, they sum to \(w_S\): That's exactly how we defined the price paid by each element \(e \in U\). Elements in \(S\) that belong to sets added to \(\mathcal{C}\) before \(S\) are also charged positive prices and thus increase the total price charged to the elements in \(S\) above \(w_S\).

What this shows is that the solution \(\tilde y\) that sets \(\tilde y_e = p_e\) for all \(e \in U\) may overpack some sets in \(\mathcal{S}\) and is not a feasible solution of (11.3). Then again, we should not have expected this solution \(\tilde y\) to be a feasible dual solution because, by LP duality, this would imply that the computed set cover \(\mathcal{C}\) is an optimal set cover, which we cannot compute in polynomial time unless \(\textrm{P} = \textrm{NP}\).

Next we show that the solution \(\hat y\) that sets \(\hat y_e = \frac{p_e}{H_n}\) for all \(e \in U\) does not overpack any set and therefore is a feasible solution of (11.3). The objective function value of this dual solution is

\[\sum_{e \in U} \frac{p_e}{H_n} = \frac{w(\mathcal{C})}{H_n}.\]

Thus,

\[w(\mathcal{C}) \le H_n \cdot \textrm{OPT}_f \le H_n \cdot \textrm{OPT},\]

where \(\textrm{OPT}\) is the cost of an optimal set cover and \(\textrm{OPT}_f \le \textrm{OPT}\) is the objective function value of an optimal solution of the LP relaxation (11.2).

Lemma 11.1: The vector \(\hat y\) is a feasible solution of the dual set cover LP (11.3).

Proof: Consider any set \(S \in \mathcal{S}\) and assume that \(S\) contains \(k\) elements. We arrange these elements in the order in which they are covered by the algorithm, breaking ties arbitrarily. Let \(\langle e_1, \ldots, e_k \rangle\) be this ordering.

Before the algorithm chooses a set \(S'\) that covers the \(i\)th element \(e_i\), at least the elements \(e_i, \ldots, e_k\) in \(S\) are uncovered. Thus, the iteration that covers \(e_i\) can cover the elements in \(S\) at an amortized cost of at most \(\frac{w_S}{k - i + 1}\) by adding \(S\) to \(\mathcal{C}\). Since the iteration that covers \(e_i\) picks the set with minimum amortized cost, this proves that \(e_i\) is covered at an amortized cost of \(p_{e_i} \le \frac{w_S}{k - i + 1}\). By summing over all elements in \(S\), we obtain that

\[\sum_{i=1}^k p_{e_i} \le w_S \cdot \sum_{i=1}^k \frac{1}{k - i + 1} = w_S \cdot H_k.\]

Thus,

\[\sum_{i=1}^k \hat y_{e_i} \le w_S \cdot \frac{H_k}{H_n} \le w_S,\]

that is, \(S\) is not overpacked.

Since this analysis holds for all sets \(S \in \mathcal{S}\), \(\hat y\) is a feasible solution of (11.3). ▨

In this example, the analysis via dual fitting has another advantage over the analysis given in Section 9.2: It allows us to prove a stronger approximation ratio than \(O(\lg n)\) if all sets in \(\mathcal{S}\) are small:

Exercise 11.1: Adapt the argument from Lemma 11.1 to prove that the greedy set cover algorithm from Section 9.2 computes an \(O(\lg k)\)-approximation of an optimal set cover whenever every set in \(\mathcal{S}\) contains no more than \(k\) elements.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.