Research

Research Interests

Algorithms and data structures, especially fast and space-efficient algorithms and data structures, including succinct data structures, string algorithms and text indexing, I/O-efficient algorithms, implicit data structures, and adaptive algorithms.
Computational Geometry, especially efficient algorithms and data structures for computational geometry.
Efficiency Issues in databases, text retrieval systems, bioinformatics and geographic information systems.

Professional Activities

Program committee member, the 31st International Symposium on String Processing and Information Retrieval (SPIRE 2024), Puerto Vallarta, Mexico, 2024.
Program committee member, the 22nd Workshop on Approximation and Online Algorithms (WAOA 2024), Egham, United Kingdom, 2024.
Program committee member, the 35th Canadian Conference on Computational Geometry (CCCG 2024), St. Catharines, Canada, 2024.
Program committee member, the 40th International Symposium on Computational Geometry (SoCG 2024), Athens, Greece, 2024.
Steering committee member, 2014-2017 and 2020-2024, the Canadian Conference on Computational Geometry
Steering committee member, 2021-2023, the Algorithms and Data Structures Symposium.
Program committee member, the 35th Canadian Conference on Computational Geometry (CCCG 2023), Montreal, Canada, 2023.
Program committee member, the 21st Symposium on Experimental Algorithms (SEA 2023), Barcelona, Spain, 2023.
Program committee member, the 29th International Symposium on String Processing and Information Retrieval (SPIRE 2022), Concepción, Chile, 2022.
Conference co-chair, the 33rd Canadian Conference on Computational Geometry (CCCG 2021), Halifax, Canada, 2021.
Conference co-chair, the 17th Algorithms and Data Structures Symposium (WADS 2021), Halifax, Canada, 2021.
Program committee member, the 27th Canadian Conference on Computational Geometry (CCCG 2015), Kingston, Ontario, Canada, 2015.
Program committee member, the 31st European Workshop on Computational Geometry (EuroCG 2015), Ljubljana, Slovenia, 2015.
Program committee co-chair, the 26th Canadian Conference on Computational Geometry (CCCG 2014), Halifax, Canada, 2014.
Program committee member, the 24th Annual Symposium on Combinatorial Pattern Matching (CPM 2013), Bad Herrenalb, Germany, June 17–19, 2013

Research Program

The problem of efficiently storing and retrieving information is an essential topic in computer science. During the past decades, researchers have been developing techniques to handle large data sets. Back in the late 80s, a text corpus of several hundred megabytes, such as the digital version of the 20-volume Oxford English Dictionary, was considered large data, and the research programs, e.g. the New OED Project, on managing data of this scale attracted many world-class researchers. These programs resulted in many well-known techniques including indexing data structures that facilitate text search, and companies such as OpenText (the largest Canadian software company) were started as spin-offs of these projects.

Recently, as a result of the rapid growth of large data sets, such as text databases and spatial databases, on the World Wide Web (think of Google), and in bioinformatics applications, modern applications often deal with data measured in terabytes or even petabytes. This has led to new issues and challenges in the way data are organized. Many algorithms and data structures designed twenty years ago are no longer feasible for these data sets. In many cases, it is because these data structures occupy too much storage space.

My main research contributions are in the design of efficient algorithms and data structures for large data sets. This involves studies of many algorithmic techniques, including succinct data structures, I/O-efficient algorithms and data structures, implicit data structures and adaptive algorithms.

The following are some of my current research projects.

Space Efficient Data Structures

Succinct data structures were proposed to represent data structures in a space-efficient manner, so that the information in large systems can be retrieved quickly, but the space requirement is little more than that of the raw data. Our work in this project improved previous results on succinct representations of many fundamental data structures. These include strings which are key structures used in the design of many succinct data structures, space-efficient text indexes that facilitate text search, and representations of unlabeled and labeled trees and graphs. These problems are fundamental, and they also have applications in text retrieval systems, XML databases and GIS applications such as Google Maps.

We also started several new research directions in this area, such as designing succinct indexes to achieve data abstraction in succinct data structure design, combining succinct data structures with other algorithmic approaches such as I/O-efficient algorithms and data structures, as well as using succinct data structures to improve the query efficiency of classic data structures.

Indexing Techniques for Fast Retrieval of Massive Textual Data

We designed a collection of space-efficient text indexes that support various types of text search, including substring search and position-restricted substring search. These indexes typically occupy space close to a compressed version of the textual data, and they can answer queries without requiring the raw data. This saves storage space drastically, as standing text indexes such as suffix trees typically occupy space that is 10-20 times the size of the raw data.

Algorithms for Queries in Large Geometric and Geographic Databases

Geometric data structures are used extensively in geographic information systems, computer graphics and databases. Thus it is important to improve their space efficiency for large applications. We designed space-efficient data structures that support queries such as range search, majority and point location. We also designed approaches to represent triangular terrains compactly to support queries such as terrain profiles and trickle paths.

Algorithm Engineering for Large Data Sets

Previous work on implementing succinct trees and text indexes has shown that succinct data structures are very useful in practice. Despite such work, there are many theoretical results that have not been evaluated in practice. Thus this project engineers the implementation of succinct data structures.