Building Large ROLAP Data Cubes in Parallel. |
Ying Chen, Frank Dehne, Todd Eavis, Andrew Rau-Chaplin |
Abstract:
The pre-computation of data cubes is critical to improving the response
time of On-Line Analytical Processing (OLAP) systems and can be
instrumental in accelerating data mining tasks in large data warehouses.
However, as the size of data warehouses grows, the time it takes to
perform this pre-computation becomes a significant performance bottleneck.
This paper presents a fast parallel method for generating ROLAP data cubes
on a shared-nothing multiprocessor based on a novel optimized data
partitioning technique. Since no shared disk is required, this method can
be applied on highly scalable processor clusters consisting of standard
PCs with local disks, connected via a data switch. The approach taken,
which uses a ROLAP representation of the data cube, is well suited to
large data warehouses on high dimensional data, and supports the
generation of both fully materialized and partially materialized cubes. In
comparison with previous approaches, our new method does significantly
improve the scalability with respect to both, the number of processors and
the I/O bandwidth (number of parallel disks). |
paper.pdf |
|