PnP: Parallel And External Memory Iceberg Cube Computation. |
Ying Chen, Frank Dehne, Todd Eavis, Andrew Rau-Chaplin |
Abstract: Motivated by the work of Ng et.al., and the recent success of Star-Cubing (Han et.al.), we further investigate the use of hybrid approaches for the parallel computation of very large iceberg-cube queries. We present “Pipe ’n Prune” (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up Apriori data pruning. A particular strength of PnP is that it is very efficient for all of the following scenarios: (1) Sequential iceberg-cube queries. (2) External memory iceberg-cube queries. (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks. We performed an extensive performance analysis of PnP for all of the above scenarios with the following main results: In the first scenario, PnP performs very well for both, dense and sparse data sets, providing an interesting alternative to BUC and Star-Cubing. In the second scenario, PnP shows a surprisingly efficient handling of disk I/O, with an external memory running time that is less than twice the running time for full in-memory computation of the same iceberg-cube query. In the third scenario, PnP scales very well, providing near linear speedup for a larger number of processors, thereby solving an open scalability problem observed by Ng et.al |
paper.pdf |
|