This class discusses algorithms and data structures that can be used to efficiently solve fundamental problems on data stes of terabyte and even petabyte size. It starts with a discussion of why standard data structures and algorithms are not suitable for solving problems on data sets of such massive size. This discussion will demonstrate the need to develop algorithms that fully exploit the memory hierarchies of state-of-the-art computers. The main part of this class discusses paradigms, algorithms, and data structures for a two-level hierarchy. These algorithms are efficient for solving problems on massive data sets. We will also discuss the concept of cache obliviousness and the most important paradigms for designing cache-oblivious algorithms. In a nutshell, cache-oblivious algorithms are standard internal-memory algorithms that lay out their data structures and access them in a fashion that helps the standard paging algorithm to perform few page transfers (between virtual memory and main memory, or between main memory and L2 cache, or ...). This leads to a clean model to design algorithms that automatically adapt to all levels of a multi-level memory hierarchy. Specific topics covered in this course include:
Paradigms for batched processing
- Multi-way merging
- Multi-way distribution
- Distribution sweeping
- Batched filtering
- Data structuring
- Multi-way divide-and-conquer
- Paradigms for online processing (storage and retrieval)
- Paradigms for graph algorithms
- Paradigms for cache-oblivious algorithms
The following list of references covers all topics covered in class. References made in class will refer to this list. I may add more references to this list as we cover things in class. The first three references are survey papers, which cover (more than) the topics covered on online data structures, graph algorithms, and cache-oblivious algorithms.
The remaining references are research papers that present the algorithms discussed in class. These papers often provide more detail than the lecture notes but are often more difficult to read.
- 3 assignments, each of equal weight (A)
- Class participation (C)
- Project (P)
Final grade = 50% * A + 50% * P + ε * C
The assignments will consist of algorithm design questions; that is, they will be of a theoretical nature. The course project can be of three types: (1) A thorough review of the literature on solving a certain problem or class of problems I/O-efficiently/cache-obliviously, including a critical discussion of the advantages and shortcomings of the proposed approaches. (2) An attempt to make progress on a theoretical open problem in the field, including a brief review of the relevant literature. (3) Implementation and experimental evaluation of an I/O-efficient or cache-oblivious algorithm and a write-up that includes a brief discussion of the relevant literature, a discussion of the design choices made in the implementation process, and a discussion of the experimental results.
The ε for class participation should be interpreted as follows: If you are between two grades (e.g., between B+ and A−), I will hold active class participation in your favour to tip the grade towards the better mark. I teach my course in a relatively interactive style, pointing you in the right direction and asking you to develop the right idea to approach a problem. Active class participation means that you make a conscious effort, that you voice your ideas about how to approach a problem. The answers do not have to be correct: massaging your brain is important. If you do nto follow, asking for clarification also counts as class participation.
Collaboration: I encourage you to form study groups to discuss the course material and work on exercises. However, I do not allow any form of collaboration on assignments. Collaboration on assignments will be treated as plagiarism. Sufficiently similar assignments will be handed over to the Faculty's Academic Integrity Officer.
Plagiarism: I will not tolerate any form of plagiarism (copying from your classmates, copying solutions from the web, etc.). According to university regulations, I have to report you to the Faculty's Integrity Officer if I suspect you of academic dishonesty, and I will. The Integrity Officer may refer your case to the Senate Committee on Plagiarism. The penalties for plagiarism can range from failing the course to expulsion from the university. So please save me and yourself the aggravation. For more information, make sure you read the university policy on intellectual honesty and the Dalhousie Academic Integrity Page.
Late assignments: I do not accept late assignments without a doctor's note or any other official document that certifies a good reason why you could not hand your assignment in on time. In case of important events in your personal life, I am willing to make an exception to this rule if you discuss these circumstances with me and apply for a reasonable extension of the deadline in advance.