11.2. Set Multicover*

As another example of the above idea, we consider a generalization of the set cover problem. In the set multicover problem, we are given a universe \(U\), a collection \(\mathcal{S}\) of subsets of \(U\), and an integral cover requirement \(r_e\) for each element \(e \in U\). The task is to pick a collection \(\mathcal{C}\) of sets in \(\mathcal{S}\) such that \(c(\mathcal{C}) = \sum_{S \in \mathcal{C}} c_S\) is minimized and every element \(e \in U\) is contained in at least \(r_e\) sets in \(\mathcal{C}\). In general, we are allowed to include a set in \(\mathcal{S}\) more than once in \(\mathcal{C}\), that is, \(\mathcal{C}\) is allowed to be a multiset. Here, we consider the constrained set multicover problem, which does not allow us to pick any set more than once. In other words, \(\mathcal{C}\) is a set, not a multiset, just as in the set cover problem; the only change is that elements may need to be covered more than once by the sets in \(\mathcal{C}\). While it is easy to see that, as long as \(\bigcup_{S \in \mathcal{S}} S = U\), there always exists a feasible set multicover, there may not always exist a feasible constrained set multicover.

Exercise 11.2: Given a universe \(U\), a collection \(\mathcal{S}\) of subsets of \(U\), and a function \(r : U \rightarrow \mathbb{N}\) specifying cover requirements for the elements in \(U\), develop an algorithm that takes \(O(|U| + \sum_{S \in \mathcal{S}} |S|)\) time to decide whether there exists a constrained set multicover that meets the cover requirements specified by \(r\).

From now on, we assume that we have verified that a constrained set multicover exists. The algorithm for the constrained set multicover problem is very similar to the greedy set cover algorithm from Section 9.2. Once again, we start with the empty set \(\mathcal{C} = \emptyset\). As we add sets to \(\mathcal{C}\), we maintain a set \(C\) of elements covered by \(\mathcal{C}\). This time, we add an element \(e \in U\) to \(C\) only once there are at least \(r_e\) elements in \(\mathcal{C}\) that cover \(e\), that is, once \(\mathcal{C}\) meets \(e\)'s cover requirement. We call the elements in \(U \setminus C\) active. Since \(\mathcal{C} = \emptyset\) initially, we also have \(C = \emptyset\) initially. The amortized cost of adding a set \(S \in \mathcal{S} \setminus \mathcal{C}\) to \(\mathcal{C}\) is once again defined as \(\frac{w_S}{|S \setminus C|}\), that is, we distribute the cost of adding \(S\) to \(\mathcal{C}\) evenly among the elements in \(S\) that are still active and therefore benefit from adding \(S\) to \(\mathcal{C}\). Once again, each iteration adds the set in \(\mathcal{S} \setminus \mathcal{C}\) with lowest amortized cost to \(\mathcal{C}\). We repeat this process until \(\mathcal{C}\) meets the cover requirements of all elements in \(U\), that is, until \(C = U\). At this point, \(\mathcal{C}\) is a valid set multicover.

To pay for the cost of \(\mathcal{C}\), we charge \(r_e\) prices \(p^{S^1_e}_e, \ldots, p^{S^{r_e}_e}_e\) to every element \(e \in U\), where \(S^1_e, \ldots, S^{r_e}_e\) are the first \(r_e\) sets added to \(\mathcal{C}\) that contain \(e\), arranged in the order in which we add them to \(\mathcal{C}\). In other words, \(e\) is active until we add \(S^{r_e}_e\) to \(\mathcal{C}\) and becomes inactive after adding \(S^{r_e}_e\) to \(\mathcal{C}\). For all \(1 \le i \le r_e\), \(p^{S^i_e}_e\) is the amortized cost with which we add \(S^i_e\) to \(\mathcal{C}\), that is,

\[p^{S^i_e}_e = \frac{w_{S^i_e}}{|S^i_e \setminus C|},\]

where \(C\) is the set of elements already sufficiently covered before adding \(S^i_e\) to \(\mathcal{C}\). This ensures that

\[\sum_{e \in U} \sum_{i=1}^{r_e} p^{S^i_e}_e = w(\mathcal{C}).\]

To prove that \(\mathcal{C}\) is an \(H_n\)-approximation of an optimal constrained set multicover of \(U\), we turn the price function \(p\) into a feasible solution of the dual constrained set multicover LP with objective function value at least \(\frac{w(\mathcal{C})}{H_n}\).

The ILP formulation of the constrained set multicover problem is very similar to the ILP formulation of the set cover problem (11.1):

\[\begin{gathered} \text{Minimize } \sum_{S \in \mathcal{S}} w_Sx_S\\ \begin{aligned} \text{s.t. } \sum_{e \in S} x_S &\ge r_e && \forall e \in U\\ x_S &\in \{0,1\} && \forall S \in \mathcal{S} \end{aligned} \end{gathered}\tag{11.4}\]

The only change from (11.1) is that every element \(e\) needs to be covered at least \(r_e\) times, not only once.

The LP relaxation of (11.4) is

\[\begin{gathered} \text{Minimize } \sum_{S \in \mathcal{S}} w_Sx_S\\ \begin{aligned} \text{s.t. } \sum_{e \in S} x_S &\ge r_e && \forall e \in U\\ -x_S &\ge -1 && \forall S \in \mathcal{S}\\ x_S &\ge 0 && \forall S \in \mathcal{S}. \end{aligned} \end{gathered}\tag{11.5}\]

Note that the constraint \(x_S \le 1\)—that is, \(-x_S \ge -1\)—is no longer redundant because it could be beneficial for minimizing the objective function to include a particularly cheap set \(S\) multiple times in \(\mathcal{C}\) if some element \(e \in S\) has a cover requirement \(r_e > 1\). Thus, to obtain a valid LP relaxation of the constrained set multicover ILP, we need to ensure explicitly that no set is picked more than once. Indeed, if we were to omit this constraint, we would obtain the LP relaxation of the unconstrained set multicover problem discussed briefly at the beginning of this section. Given the added set of constraints \(-x_S \ge -1\) for all \(S \in \mathcal{S}\), the dual LP of (11.5) is slightly more complicated than that of (11.3):

\[\begin{gathered} \text{Maximize } \sum_{e \in U} r_ey_e - \sum_{S \in \mathcal{S}} z_S\\ \begin{aligned} \text{s.t. } \sum_{e \in S} y_e - z_S &\le w_S && \forall S \in \mathcal{S}\\ y_e &\ge 0 && \forall e \in U\\ z_S &\ge 0 && \forall S \in \mathcal{S}. \end{aligned} \end{gathered}\tag{11.6}\]

If we rewrite the constraints \(\sum_{e \in S} y_e - z_S \le w_S\) as \(\sum_{e \in S} y_e \le w_S + z_S\) and we interpret this dual LP as representing the problem of packing values \(y_e\) into sets \(S \in \mathcal{S}\) such that no set is overpacked again, then we can interpret the dual variable \(z_S\) as additional capacity of the set \(S\) that we can buy at the expense of decreasing the objective function value by \(z_S\). This is worthwhile if the increase in \(\sum_{e \in U}r_ey_e\) from packing greater values \(y_e\) into the sets in \(\mathcal{S}\) outweighs the increase in \(\sum_{S \in \mathcal{S}}z_S\) needed to pay for the required increases in the capacities of the sets in \(\mathcal{S}\).

To bound the approximation ratio of the constrained set multicover \(\mathcal{C}\) computed by the algorithm, we now define two functions \(\alpha : U \rightarrow \mathbb{R}\) and \(\beta : \mathcal{S} \rightarrow
\mathbb{R}\) and use them to define a good feasible solution \(\bigl(\hat y, \hat z\bigr)\) of the dual.

For each element \(e \in U\), let

\[\alpha_e = p^{S^{r_e}_e}_e\]

be the amortized cost of the last set \(S^{r_e}_e\) that we add to \(\mathcal{C}\) before \(e\) becomes inactive. This ensures that \(\alpha_e \ge 0\).

For each set \(S \in \mathcal{S}\), we set

\[\beta_S = 0 \text{ if } S \notin \mathcal{C}.\]

If \(S \in \mathcal{C}\), then consider the set \(C\) of elements already sufficiently covered before adding \(S\) to \(\mathcal{C}\). For every element \(e \in S \setminus C\), \(S\) is among the first \(r_e\) sets in \(\mathcal{C}\) that contain \(e\), so \(e\) pays its share \(p_e^S\) of the cost of adding \(S\) to \(\mathcal{C}\). We define

\[\beta_S = \sum_{e \in S \setminus C} \Bigl(p^{S^{r_e}_e}_e - p^S_e\Bigr).\]

Note that every term \(p^{S^{r_e}_e}_e - p^S_e\) in the sum is non-negative: Let \(C'\) be the set of elements in \(U\) sufficiently covered immediately before adding \(S^{r_e}_e\) to \(\mathcal{C}\). Then \(C \subseteq C'\) because we add \(S\) to \(\mathcal{C}\) no later than \(S^{r_e}_e\). Thus,

\[ \frac{w_{S^{r_e}_e}}{|S^{r_e}_e \setminus C|} \le \frac{w_{S^{r_e}_e}}{|S^{r_e}_e \setminus C'|} = p^{S^{r_e}_e}_e. \]

Since \(S\) is the set with minimum amortized cost at the time we add it to \(\mathcal{C}\), we have

\[p^S_e = \frac{w_S}{|S \setminus C|} \le \frac{w_{S^{r_e}_e}}{|S^{r_e}_e \setminus C|}.\]

Thus, \(p^{S^{r_e}_e}_e- p^S_e \ge 0\). This implies that \(\beta_S \ge 0\) for all \(S \in \mathcal{S}\).

Now let

\[\hat y_e = \frac{\alpha_e}{H_n} \quad \text{and} \quad \hat z_S = \frac{\beta_S}{H_n}.\]

We prove that

\[w(\mathcal{C}) \le \sum_{e \in U} r_e \alpha_e - \sum_{S \in \mathcal{S}} \beta_S\]

and that \(\bigl(\hat y, \hat z\bigr)\) is a feasible solution of the dual LP (11.6). Thus, since

\[\sum_{e \in U} r_e \alpha_e - \sum_{S \in \mathcal{S}} \beta_S = H_n \left( \sum_{e \in U} r_e \hat y_e - \sum_{S \in \mathcal{S}} \hat z_S \right),\]

we have

\[w(\mathcal{C}) \le H_n \cdot \mathrm{OPT}_f \le H_n \cdot \mathrm{OPT},\]

that is, \(\mathcal{C}\) is an \(H_n\)-approximation of an optimal constrained set multicover of \(U\).

Lemma 11.2: \(w(\mathcal{C}) \le \sum_{e \in U} r_e \alpha_e - \sum_{S \in \mathcal{S}} \beta_S\).

Proof: Let \(\mathcal{C} = \{S_1, \ldots, S_t\}\) be the computed set cover, where the indices of the sets \(S_1, \ldots, S_t\) refer to the order in which we add them to \(\mathcal{C}\). For each \(1 \le j \le t\), let \(C_j\) be the set of elements already sufficiently covered before adding \(S_j\) to \(\mathcal{C}\).

We already observed that \(w(\mathcal{C}) = \sum_{e \in U} \sum_{i=1}^{r_e} p^{S^i_e}_e\). Thus,

\[\begin{aligned} w(\mathcal{C}) &= \sum_{e \in U} \sum_{i=1}^{r_e} p^{S^i_e}_e\\ &= \sum_{e \in U} \sum_{i=1}^{r_e} \left[\left(\left(p^{S^i_e}_e - p^{S^{r_e}_e}_e\right)\right) + p^{S^{r_e}_e}_e\right]\\ &= \sum_{e \in U} r_e\alpha_e - \sum_{e \in U} \sum_{i=1}^{r_e} \left(p^{S^{r_e}_e}_e - p^{S^i_e}_e\right)\\ &= \sum_{e \in U} r_e\alpha_e - \sum_{j = 1}^t \sum_{e \in S_j \setminus C_j} \left(p^{S^{r_e}_e}_e - p^{S_j}_e\right)\\ &= \sum_{e \in U} r_e\alpha_e - \sum_{j = 1}^t \beta_{S_j}\\ &= \sum_{e \in U} r_e\alpha_e - \sum_{S \in \mathcal{S}} \beta_S. \end{aligned}\]

The fourth line equals the third line because any element \(e \in U\) belongs to \(S_j \setminus C_j\) if and only if there exists an index \(1 \le i \le r_e\) such that \(S_j = S^i_e\). Thus, the term \(p^{S^{r_e}_e}_e - p^{S^i_e}_e\) in the third line equals the term \(p^{S^{r_e}_e}_e - p^{S_j}_e\) in the fourth line. ▨

To prove that \(\bigl(\hat y, \hat z\bigr)\) is a feasible solution of the dual LP, we need to verify that \(\hat y \ge 0\), \(\hat z \ge 0\), and \(\sum_{e \in S} \hat y_e - \hat z_S \le w_S\) for all \(S \in \mathcal{S}\).

We already observed that \(\alpha_e \ge 0\) for all \(e \in U\) and \(\beta_S \ge 0\) for all \(S \in \mathcal{S}\). Since \(\hat y_e = \frac{\alpha_e}{H_n}\) for all \(e \in U\) and \(\hat z_S = \frac{\beta_S}{H_n}\) for all \(S \in \mathcal{S}\), this implies that \(\hat y \ge 0\) and \(\hat z \ge 0\). It remains to prove that

Lemma 11.3: For every set \(S \in \mathcal{S}\), \(\sum_{e \in S} \hat y_e - \hat z_S \le w_S\).

Proof: Assume that \(S\) contains \(k\) elements and sort the elements \(e_1, \ldots, e_k\) in \(S\) in the order in which they become inactive, breaking ties arbitrarily.

First assume that \(S \notin \mathcal{C}\). Before \(e_i\) becomes inactive, \(S\) contains at least \(k - i + 1\) active elements, so \(S\) has amortized cost at most \(\frac{w_S}{k - i + 1}\). Since each iteration adds the set of minimum amortized cost to \(\mathcal{C}\), this implies that \(\alpha_{e_i} = p^{S^{r_{e_i}}_{e_i}}_{e_i} \le \frac{w_S}{k - i + 1}\). Since \(S \notin \mathcal{C}\), we have \(\beta_S = 0\). This gives

\[\begin{aligned} \sum_{e \in S} \hat y_e - \hat z_S &= \frac{1}{H_n} \left[ \sum_{i=1}^k \alpha_{e_i} - \beta_S \right]\\ &\le \frac{1}{H_n} \cdot \sum_{i=1}^k \frac{w_S}{k - i + 1}\\ &= \frac{H_k}{H_n} \cdot w_S\\ &\le w_S. \end{aligned}\]

If \(S \in \mathcal{C}\), then assume that the elements \(e_1, \ldots, e_h\) are already sufficiently covered when we add \(S\) to \(\mathcal{C}\) while \(e_{h+1}, \ldots, e_k\) are still active. Since we add \(S\) to \(\mathcal{C}\), we have \(h < k\), that is, at least one element in \(S\) is still active at the time we add \(S\) to \(\mathcal{C}\). Then

\[\begin{aligned} \sum_{e \in S} \alpha_e - \beta_S &= \sum_{i=1}^h p^{S^{r_{e_i}}_{e_i}}_{e_i} + \sum_{i=h+1}^k p^{S^{r_{e_i}}_{e_i}}_{e_i} - \sum_{i=h+1}^k \left[p^{S^{r_{e_i}}_{e_i}}_{e_i} - p^S_{e_i}\right]\\ &= \sum_{i=1}^h p^{S^{r_{e_i}}_{e_i}}_{e_i} + \sum_{i=h+1}^k p^S_{e_i}\\ &= \sum_{i=1}^h p^{S^{r_{e_i}}_{e_i}}_{e_i} + w_S. \end{aligned}\]

For \(1 \le i \le h\), \(S\) contains \(k - i + 1\) active elements when we cover \(e_i\) for the \(r_{e_i}\)th time, so \(S\) has amortized cost at most \(\frac{w_S}{k - i + 1}\) at that time. Since \(S\) is not in \(\mathcal{C}\) yet at the time we cover \(e_i\) for the \(r_{e_i}\)th time, the set \(S^{r_{e_i}}_{e_i}\) that we use to cover \(e_i\) for the \(r_{e_i}\)th time also has amortized cost at most \(\frac{w_S}{k - i + 1}\), that is, \(p^{S^{r_{e_i}}_{e_i}}_{e_i} \le \frac{w_S}{k - i + 1}\). Therefore,

\[\begin{aligned} \sum_{i=1}^h p^{S^{r_{e_i}}_{e_i}}_{e_i} + w_S &\le \sum_{i=1}^h \frac{w_S}{k - i + 1} + w_S\\ &\le \sum_{i=1}^{k-1} \frac{w_S}{k - i + 1} + w_S\\ &= \sum_{i=1}^k \frac{w_S}{k - i + 1}\\ &= w_S \cdot H_k \end{aligned}\]

and

\[\begin{aligned} \sum_{e \in S} \hat y_e - \hat z_S &= \frac{1}{H_n} \left[ \sum_{e \in S} \alpha_e - \beta_S \right]\\ &\le \frac{H_k}{H_n} \cdot w_S\\ &\le w_S.\ \text{▨} \end{aligned}\]

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Algorithms II

11.2. Set Multicover*