The course focuses on the design of algorithms and data structures. Its aim is to teach you how to work your way from a problem to an efficient algorithm that solves this problem. Even though algorihm design is a very creative progress—there is no general recipe how to go about it—I try to help you to become a proficient algorithm designer by presenting the basic design principles, called paradigms, in an organized manner and working through a variety of problems that can be solved using a given problem.
The course is divided into 8 topics. The first one gives a general introduction to algorithm design, motivates it, and highlights a few central questions one can (and should) ask about a problem and an algorithm solving it. Along the way, we will review the mathematical skills you will need in this course and which you should have acquired in CSCI 2112. The next six topics introduce different paradigms for the design of algorithms, more or less in increasing order of difficulty. The final topic revolves around teh question whether all problems can be solved efficiently using computers. More precisely, we will provide evidence that certain problems probably cannot be solved efficiently. The following is a more detailed description of the topics we cover.
We use the problem of finding a stable matching to highlight fundamental questions one can ask about computational problems. These questions include: Is there always a solution to a given instance of the problems? How quickly can we find such a solution? In our attempts to answer these questions, we will review basic proof techniques, such as induction and proof by contradiction. Once we have a bound on the running time of the algorithm, we will ask what this bound tells us. More precisely, we will introduce the concept of asymptotic growth of a function, and we will agree on when we consider a problem computationally tractable.
We review the concept of a graph. Then we discuss graph exploration as a general technique for gaining information about the structure of a graph. We will introduce a general graph exploration framework, which will be underspecified. Depending on how we fill in the details, we will obtain classical algorithms, such as depth-first search, breadth-first search, Dijkstra's algorithm or Prim's algorithm. We will then focus on depth-first search as a simple but very powerful technique to solve a wide variety of graph problems, including connectivity, biconnectivity, strong connectivity, and topological sorting.
Greedy algorithms are a simple paradigm to solve optimization problems, where we have to choose the best among a possibly large number of possible solutions to the given problem. These algorithms, as the name suggests, greedily and myopically make decisions that, from the current point in the computation, bring them closest to an optimal solution. Good heuristics are often greedy. We will learn how to prove that certain greedy algorithms find an optimal solution. Problems we consider include scheduling, minimum-length codes, minimum spanning tree, and single-source shortest paths.
Divide and Conquer
Divide and conquer is an algorithm design paradigm that is very much like induction. Consequently, the proof technique most often used to establish the correctness and running time of these algorithms is induction. The idea of a divide-and-conquer algorithm is the following: If we have a small instance of the problem (a base case), we solve the problem in a straightforward fashion. Otherwise (the inductive step, in a sense), we break up the instance into smaller instances, solve these recursively, and combine the results to obtain a solution to the current instance. To analyze the running time of such algorithms, we will use recurrence relations, that is, recursive formulas that express a function's value for a given argument in terms of its values for smaller arguments. Then we will learn techniques to "solve" these recurrences, that is, to transform them into a non-recursive form that uses O-notation. The problems we study here are sorting, selection, counting inversions, and integer multiplication.
Dynamic programming is another technique for solving optimization problems. Many problems that can be solved greedily can also be solved using dynamic programming. If the problem has a greedy solution, the greedy solution is often simpler and faster. If it does not, dynamic programming is the next more complicated technique to try. The creative part in designing such as algorithm is to find a recurrence relation that expresses the cost of the solution in terms of the cost of the solutions to subproblems. Using this recurrence, we use a tabular method to compute and combine the solutions to all possible subproblems of the given problem instance; that is, we perform a rather exhaustive search, but we are clever about it to obtain an efficient algorithm. The problems we study are scheduling, RNA secondary structure, sequence alignment, and shortest paths.
Data structuring is an idea we have used already in Prim's and Dijkstra's algorithms: use one or more data structures to encapsulate all or some of the hard work of the algorithm. This helps to keep the algorithm clean and promotes code reuse if the data structure is general enough. To illustrate this, we will discuss (a, b)-trees as a dictionaly structure and use them to solve a number of geometric problems, such as range searching and line segment intersection. We will then see how to build new structures from (a, b)-trees, that is, how to augment (a, b)-trees, in order to solve more problems that cannot be solved using standard (a, b)-trees. These structures include a dynamic order statistics structure, a priority search tree, and range trees. In the discussion of priority search trees, we will touch on the concept of amortized analysis.
Sometimes, it is difficult or impossible to make guaranteed good choices in an algorithm, for example, to find a good way to break a problem instance in a divide-and-conquer algorithm into smaller instances. A randomized algorithm flips a coin to amke such decisions and hopes that most of these random decisions are good. The crux is to prove that this is indeed the case. We will review basic concepts in probability theory and use them to analyze the expected running time of randomized algorithms. The problems we will study include sorting, selection, and binary space partitions.
Intractability and NP-completeness
We will introduce the concepts of NP-completeness and NP-hardness as measures for the computational intractability of certain problems. The discussion of these concepts will lead us on a short excursion into the realm of formal languages, which are studied in much more depth in CSCI 4112. Classical NP-hard problems we will discuss are 3-SAT, vertex cover, Hamiltonian cycle, and subset sum. The main technique we will learn to establish the NP-hardness of a problem are polynomial-time reductions.
These are the slides I use for my lectures.
|Topic||Approximate dates when covered|
|Divide and conquer||TBA|
These are notes from when I experimented with teaching this course without slides, using only the blackboard. I don't use them in class, but they may still be helpful, so I make them available.
|Topic||Approximate dates when covered|
|Divide and conquer||TBA|
Here I provide implementations of the algorithms taught in class. I have two goals with this: To dispel the myth that algorithms are hard and difficult to implement; if you know what you're doing, they can be an almost one-to-one translation of the abstract ideas in class. To allow you to play with the code and observe what the algorithm does.
This section is still heavily under construction and not all algorithms are implemented yet. I'll add more code over time.
|Rooted tree representations|
|Basic graph traversal framework|
|A simple stack|
|A simple queue|
|Sorting and selection|
T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms. 3rd edition, MIT Press, 2009.
N. Zeh. Data Structures. CSCI 3110 Lecture Notes, 2005.
10 assignments (A)
The 8 best assignments count
- Midterm (M)
- Final (F)
Grade = max(40% * A + 20% * M + 40% * F, 40% * A + 60% * F, F)
I encourage you to form groups of up to 3 students who collaborate on assignments. Each group submits a single assignment; each member of the group receives the same marks. It is your responsibility to find students with whom you form a group. Group composition may change between assignments. No collaboration between groups is allowed on assignments.
Late assignments are accepted only if you were sick on the day the assignment was due or if there is an important event in your personal life that prevents you from completing the assignment on time. In the latter case, you must discuss these circumstances with me in advance to apply for a reasonable extension.
If you miss an exam, you will be given the opportunity to sit the exam at a later date. You must, however, present a good reason why you were not able to sit the exam at the regular scheduled time. Also, unless there are a large number of students who missed the regularly scheduled exam, make-up exams are oral exams.
I will not tolerate any form of plagiarism (copying from your classmates, copying solutions from the web, etc.). According to university regulations, I have to report you to the Faculty's Integrity Officer if I suspect you of academic dishonesty, and I will. The Integrity Officer may refer your case to the Senate Committee on Plagiarism. The penalty for plagiarism can range from failing the course to expulsion from the university. So please save me and yourself the aggravation. For more information, make sure you read the Dalhousie Academic Integrity Page.