9.2.1. The Right Greedy Choice
So what is a good greedy choice for the set cover problem? Two natural choices come to mind:
- Pick the cheapest set (set of minimum weight) in each iteration or
- Pick the set that covers the most elements.
Clearly, picking cheap sets seems to be a good strategy because our goal is to minimize the total weight of the sets in the set cover. This strategy fails when there is a more expensive set that covers many elements. For example, consider the universe \(U = \{1, \ldots, n\}\) and a collection of sets \(\mathcal{S} = \{S_1, \ldots, S_{n+1}\}\), where \(S_i = \{i\}\) for \(1 \le i \le n\), and \(S_{n+1} = U\). The sets \(S_1, \ldots, S_n\) have weight \(1\) each, while the set \(S_{n+1}\) has weight \(2\). If we pick the cheapest set in each iteration, we end up with the set cover \(\mathcal{C} = \{S_1, \ldots, S_n\}\). This set cover has weight \(n\). The optimal set cover consists of the single set \(S_{n+1}\) and has weight \(2\). Thus, we obtain only a \(\frac{n}{2}\)-approximation using the strategy of always picking the cheapest set in this example.
Figure 9.5: Coming soon!
The reason why always picking the cheapest set isn't a good strategy in this example is that \(S_{n+1}\) is only slightly more expensive than the other sets but covers many more elements, all elements in fact. This is the motivation for the second strategy. We hope that by always picking the set that covers the most elements, we include only few sets in the set cover. Thus, if the weights of the sets do not differ too much, this should be a reasonable strategy to minimize the weight of the computed set cover. We can defeat this strategy using the same example that we used to defeat the strategy of always picking the cheapest set, only this time we assign a very large weight to \(S_{n+1}\), say \(w_{S_{n+1}} = n^2\). In this case, picking the largest set produces a set cover of weight \(n^2\) whereas the set cover consisting of the sets \(S_1, \ldots, S_n\) has weight only \(n\).
Figure 9.6: Coming soon!
Picking large sets seemed to be a good strategy when these large sets are cheap. Otherwise, picking the cheapest sets works better. It seems that we need a strategy that takes both the size of the sets and their weights into account. What we really want is to cover all elements in \(U\) at a low cost per element. In the above example, when \(w_{S_{n+1}} = 2\), \(S_{n+1}\) covers all \(n\) elements at a cost of \(2\). The cost per element is thus only \(\frac{2}{n}\), whereas each set \(S_i\) with \(1 \le i \le n\) covers a single element \(i\) at cost \(1\) per element. In an amortized sense, \(S_{n+1}\) is the cheaper set. When \(w_{S_{n+1}} = n^2\), \(S_{n+1}\) covers the elements in \(U\) at a cost of \(n\) per element. This is much greater than the cost of covering each element \(i\) using its corresponding set \(S_i\). Thus, the set cover \(\{S_1, \ldots, S_n\}\) is the better choice in this case.
So let's try another strategy:
- Pick the set \(S_i\) that has minimum weight per element it covers, that is, pick the set \(S_i\) that minimizes the ratio \(\frac{w_{S_i}}{|S_i|}\).
This strategy certainly achieves the goal of taking both the weights of the chosen sets and their sizes into account, striking a balance between choosing cheap sets and large sets. It's still not quite the right strategy though: Consider the universe \(U = \{1, \ldots, 2n\}\) and sets \(S_1, \ldots, S_{n+2}\), where \(S_i = \{i, n+1, \ldots, 2n\}\) for all \(1 \le i \le n\), \(S_{n+1} = \{1, \ldots, n\}\), and \(S_{n+2} = \{n+1, \ldots, 2n\}\). Each of the sets \(S_1, \ldots, S_{n+1}\) has weight \(2\). The set \(S_{n+2}\) has weight \(1\). The set \(S_{n+2}\) has minimum weight per element, \(\frac{1}{n}\), so we choose this set first. Each of the sets \(S_1, \ldots, S_n\) has weight \(\frac{2}{n+1}\) per element, whereas the set \(S_{n+1}\) has weight \(\frac{2}{n}\) per element. Thus, after picking \(S_{n+2}\), we pick all of the sets \(S_1, \ldots, S_n\), which results in a set cover of weight \(2n + 1\). The optimal set cover is \(\{S_{n+1}, S_{n+2}\}\), which has weight \(3\). So what went wrong? Picking \(S_{n+2}\) is certainly a reasonable choice. After picking \(S_{n+2}\), however, picking \(S_{n+1}\) is a much better choice than picking any of the sets \(S_1, \ldots, S_n\): \(S_{n+1}\) contains \(n\) elements not already covered by \(S_{n+2}\), and it covers these elements at a cost of \(\frac{2}{n}\) per element. Each of the sets \(S_1, \ldots, S_n\) contains \(n + 1\) elements, but \(n\) of them are already covered by \(S_{n+2}\); we don't gain anything by covering them again. By adding such a set \(S_i\) to the set cover, we increase the weight of the set cover by \(2\) but increase the number of covered elements only by \(1\). Thus, this new element is covered at an amortized cost of \(2\), which is rather expensive.
Figure 9.7: Coming soon!
So let's adjust our strategy to obtain the final greedy algorithm:
- Let \(\mathcal{C}\) be the collection of sets chosen so far, and let \(C = \bigcup_{S \in \mathcal{C}}S\) be the set of elements covered by \(\mathcal{C}\). Then pick the set \(S_i \in \mathcal{S} \setminus \mathcal{C}\) that minimizes the cost per newly covered element, \(\frac{w_{S_i}}{|S_i \setminus C|}\).
Figure 9.8: Coming soon!
As we prove next, this is the right strategy. Here is the complete algorithm:
Greedy Set Cover:
- Set \(\mathcal{C} = \emptyset\) and \(C = \emptyset\).
- While \(C \ne U\), pick the set \(S_i \in \mathcal{S} \setminus \mathcal{C}\) that minimizes \(\frac{w_{S_i}}{|S_i \setminus C|}\), add \(S_i\) to \(\mathcal{C}\), and add the elements in \(S_i\) to the set of covered elements \(C\).
- Return the set \(\mathcal{C}\).

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.