Parallel Data Mining and OLAP  

 

 

 

Project Overview

In recent years, there has been tremendous growth in the data
warehousing market. Despite the sophistication and maturity of
conventional database technologies, the ever-increasing size of
corporate databases, coupled with the emergence of the new global
Internet "database", suggests that new computing models may soon
be required to fully support many crucial data management tasks.
In particular, the exploitation of parallel algorithms and
architectures holds considerable promise, given their inherent
capacity for both concurrent computation and data access.

We are interested in the design parallel data mining and OLAP algorithms and their implementation on coarse grained parallel multicomputers and PC-based clusters. To date we have been focusing on parallelization of Data cube methods. Data cube queries represent an important class of On-Line Analytical Processing (OLAP) queries in decision support systems. The precomputation of the different group-bys of a data cube (i.e., the forming of aggregates for every combination of GROUP BY attributes) is critical to improving the response time of the queries [Gray et. al. 1997]. The resulting data structures can then be used to dramatically accelerate visualization and query tasks associated with large information sets.

 

  Software

We have developed a parallel Data Cube system using C++ and MPI for communications.  This software can generate both complete and partial data cubes and we are working on a parallel query engine.  Our generation methods are based on a parallel version of Pipesort and have been extensively tested on both PC-based clusters and on a SUN Sunfire parallel multicomputer.

 

  Project Team

This project represents ongoing joint work with a number of researchers including Frank Dehne (Carleton, Canada) and Susanne Hambrusch (Purdue, USA). Todd Eavis, a Ph.D. student at Dalhousie has also played an major role.

The following Undergraduate and Masters  students in Computer Science have been involved in the implementation effort: Zimin Chen, Steven Blimkie, Khoi Manh Nguyen, Thomas Pehle, and Suganthan Sivagnanasundaram.

 

 

Publications

Papers in Refereed Journals
F. Dehne, M. Lawrence, and A. Rau-Chaplin, "Cooperative Caching for Grid-Enabled OLAP" , Int. Journal of Grid and Utility Computing, Volume 1, Number 2, Dec 2009, pages 169-18.
F. Dehne, T. Eavis, and A. Rau-Chaplin, "RCUBE: Parallel Multi-Dimensional ROLAP Indexing" , Strategic Advancements in Utilizing Data Mining and Warehousing Technologies: New Concepts and Developments, Jan 2009, pages 47-61.
C. Hamilton and A. Rau-Chaplin, "Compact Hilbert indices: Space-filling curves for domains with unequal side lengths" , Information Processing Letters, Volume 105, Number 5, Aug 2008, pages 155-163.
M. Lawrence and A. Rau-Chaplin, "Dynamic View Selection for OLAP" , Journal of Data Warehousing and Mining, Volume 4, Number 1, Jan 2008, pages 47-61.
Y.Chen, F. Dehne, T. Eavis, and A. Rau-Chaplin, "PnP: Sequential, external memory, and parallel iceberg cube computation" , Distributed and Parallel Databases, Volume 23, Number 2, Jan 2006, pages 99-126.
F. Dehne, T. Eavis, and A. Rau-Chaplin, "The cgmCUBE Project: Optimizing Parallel Data Cube Generation For ROLAP" , Distributed and Parallel Databases, Sep 2004.
Y. Chen, F. Dehne, T. Eavis, and A. Rau-Chaplin, "Improved Data Partitioning For Building Large ROLAP Data Cubes in Parallel" , International Journal of Data Warehousing and Mining, Volume 2, Number 1, Aug 2004, pages 1-26.
Y. Chen, F. Dehne, T. Eavis, and A. Rau-Chaplin, "Parallel ROLAP Data Cube Construction On Shared-Nothing Multiprocessors" , Distributed and Parallel Databases, Volume 15, Number 3, May 2004, pages 219-236.
Y. Chen, F. Dehne, T. Eavis, A. Rau-Chaplin, "Parallel ROLAP Data Cube Construction On Shared-Nothing Multiprocessors" , Distributed and Parallel Databases, Volume 15, Number 3, May 2004, pages 219-236.
F. Dehne, T. Eavis, and A. Rau-Chaplin, "RCUBE: Parallel Multi-Dimensional ROLAP Indexing" , Data Mining and Knowledge Discovery, Mar 2003.
F. Dehne, T. Eavis, S. Hambrusch and A. Rau-Chaplin, "Parallelizing The Data Cube" , Distributed and Parallel Databases (Special Issue on Parallel and Distributed Data Mining), Volume 11, Number 2, Sep 2001, pages 181-201.
Papers in Refereed Conference Proceedings
O. Baltzer, F. Dehne, S. Hambrusch, and A. Rau-Chaplin, "OLAP for Trajectories" in Proceedings of the 19th International Conference on Database and Expert Systems Application (DEXA 2008), Turin, Italy, Sep 2008.
F. Dehne, T. Eavis, and A. Rau-Chaplin, "Efficient Computation of View Subsets" in Proceedings of the 10th ACM International Workshop on Data Warehousing and OLAP (DOLAP 2007), Lisbon, Portugal, Nov 2007.
J.-P. Deveaux, A. Rau-Chaplin, and N. Zeh, "Adaptive Tuple Differential Coding" in Proceedings of the 18th International Conference on Database and Expert Systems Application (DEXA 2007), Regensburg, Germany, Sep 2007.
F. Dehne, M. Lawrence, and A. Rau-Chaplin, "Cooperative Caching for Grid Based Data Warehouses" in Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid 2007 (CCGRID '07), Rio de Janeiro, Brazil, May 2007.
C. H. Hamilton and A. Rau-Chaplin, "Compact Hilbert Indices for Multi-Dimensional Data" in Proceedings of the First International Conference on Complex, Intelligent and Software Intensive Systems (CISIS'07), Vienna, Austria, Apr 2007.
M. Lawrence, F. Dehne and A. Rau-Chapli, "Implementing OLAP Query Fragment Aggregation and Recombination for the OLAP Enabled Grid" in Proceedings of the 2007 IEEE International Parallel and Distributed Processing Symposium, Long Beach, CA, USA, Mar 2007.
M. Lawrence and A. Rau-Chaplin, "Dynamic View Selection for OLAP" in Proceedings of the 8th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2006), Krakow, Poland, Sep 2006.
M. Lawrence and A. Rau-Chaplin, "The OLAP-Enabled Grid: Model and Query Processing Algorithms" in Proceedings of the 20th International Symposium on High Performance Computing Systems and Applications (HPCS'06), IEEE, Eds. R. Deupree, St. Johns, Canada, May 2006.
Y. Chen, F. Dehne, T. Eavis, and A. Rau-Chaplin, "cgmOLAP: Efficient Parallel Generation and Querying of Terabyte Size ROLAP Data Cubes" in Proceedings of the 22nd International Conference on Data Engineering, IEEE, Atlanta, USA, Apr 2006.
Y. Chen, F. Dehne, T. Eavis, A. Rau-Chaplin, "Building Large ROLAP Data Cubes in Parallel" in Proceedings of the 8th International Database Engineering and Applications Symposium (IDEAS '04), IEEE, pages 367-377, Coimbra, Portugal, Jul 2004.
Y. Chen, F. Dehne, T. Eavis, and A. Rau-Chaplin, "PnP: Parallel And External Memory Iceberg Cube Computation." in Proceedings of the 21st International Conference on Data Engineering (ICDE 2005) (Short paper), IEEE, Tokyo, Japan, Jun 2004.
F. Dehne, T. Eavis, and A. Rau-Chaplin, "Computing Partial Data Cubes" in Data Warehousing and Business Intelligence Minitrack of the Thirty-Seventh Hawaii International Conference on System Sciences (HICSS-37), Jan 2004.
F. Dehne, T. Eavis, and A. Rau-Chaplin, "Parallel Multi-Dimensional ROLAP Indexing" in Proceedings of the 3rd IEEE/ACM International Symposuim on Cluster Computing and the Grid (CCGrid2003), pages 86--93, Tokyo, Japan, Oct 2002.
Y. Chen, F. Dehne, T. Eavis, and A. Rau-Chaplin, "Parallel ROLAP Data Cube Construction On Shared-Nothing Multiprocessors" in International Parallel and Distributed Processing Symposium (IPDPS2003), Nice, France, Oct 2002.
F. Dehne, T. Eavis and A. Rau-Chaplin, "Computing Partial Data Cubes for Parallel Data Warehousing Applications" in Proceedings of PVM-MPI 01, Volume 2131, Lecture Notes in Computer Science, Springer Verlag, pages 319-326, Santorini, Greece, Sep 2001.
F. Dehne, T. Eavis, and A. Rau-Chaplin, "Coarse Grained Parallel On-Line Analytical Processing (OLAP) For Data Mining" in Proceedings of the 2001 International Conference on Computational Science (ICCS 2001), San Francisco, USA, May 2001.
F. Dehne, T. Eavis, and A. Rau-Chaplin, "A Cluster Architecture for Parallel Data Warehousing" in Proceedings of the 2001 IEEE International Symposium of Cluster Computing and the Grid (CCGRid'01), May 2001.
F. Dehne, S. Hambrusch, T. Eavis, and A. Rau-Chaplin, "Parallelizing The Data Cube" in Proceedings of the 8th International Conference on Database Theory (ICDT'01), London, UK, Jan 2001.