# Outline

The course focuses on the design of algorithms and data structures. Its aim is to teach you how to work your way from a problem to an efficient algorithm that solves this problem. Even though algorihm design is a very creative progress—there is no general recipe how to go about it—I try to help you to become a proficient algorithm designer by presenting the basic design principles, called paradigms, in an organized manner and working through a variety of problems that can be solved using a given problem.

The course is divided into 8 topics. The first one gives a general introduction to algorithm design, motivates it, and highlights a few central questions one can (and should) ask about a problem and an algorithm solving it. Along the way, we will review the mathematical skills you will need in this course and which you should have acquired in CSCI 2112. The next six topics introduce different paradigms for the design of algorithms, more or less in increasing order of difficulty. The final topic revolves around teh question whether all problems can be solved efficiently using computers. More precisely, we will provide evidence that certain problems probably cannot be solved efficiently. The following is a more detailed description of the topics we cover.

## 1. Introduction

We use the problem of finding a stable matching to highlight fundamental questions one can ask about computational problems. These questions include: Is there always a solution to a given instance of the problems? How quickly can we find such a solution? In our attempts to answer these questions, we will review basic proof techniques, such as induction and proof by contradiction. Once we have a bound on the running time of the algorithm, we will ask what this bound tells us. More precisely, we will introduce the concept of asymptotic growth of a function, and we will agree on when we consider a problem computationally tractable.

## 2. Graph algorithms

We review the concept of a graph. Then we discuss graph exploration as a general technique for gaining information about the structure of a graph. We will introduce a general graph exploration framework, which will be underspecified. Depending on how we fill in the details, we will obtain classical algorithms, such as depth-first search, breadth-first search, Dijkstra's algorithm or Prim's algorithm. We will then focus on depth-first search as a simple but very powerful technique to solve a wide variety of graph problems, including connectivity, biconnectivity, strong connectivity, and topological sorting.

## 3. Greedy algorithms

Greedy algorithms are a simple paradigm to solve optimization problems, where we have to choose the best among a possibly large number of possible solutions to the given problem. These algorithms, as the name suggests, greedily and myopically make decisions that, from the current point in the computation, bring them closest to an optimal solution. Good heuristics are often greedy. We will learn how to prove that certain greedy algorithms find an optimal solution. Problems we consider include scheduling, minimum-length codes, minimum spanning tree, and single-source shortest paths.

## 4. Divide and conquer

Divide and conquer is an algorithm design paradigm that is very much like induction. Consequently, the proof technique most often used to establish the correctness and running time of these algorithms is induction. The idea of a divide-and-conquer algorithm is the following: If we have a small instance of the problem (a base case), we solve the problem in a straightforward fashion. Otherwise (the inductive step, in a sense), we break up the instance into smaller instances, solve these recursively, and combine the results to obtain a solution to the current instance. To analyze the running time of such algorithms, we will use recurrence relations, that is, recursive formulas that express a function's value for a given argument in terms of its values for smaller arguments. Then we will learn techniques to "solve" these recurrences, that is, to transform them into a non-recursive form that uses O-notation. The problems we study here are sorting, selection, counting inversions, and integer multiplication.

## 5. Dynamic programming

Dynamic programming is another technique for solving optimization problems. Many problems that can be solved greedily can also be solved using dynamic programming. If the problem has a greedy solution, the greedy solution is often simpler and faster. If it does not, dynamic programming is the next more complicated technique to try. The creative part in designing such as algorithm is to find a recurrence relation that expresses the cost of the solution in terms of the cost of the solutions to subproblems. Using this recurrence, we use a tabular method to compute and combine the solutions to all possible subproblems of the given problem instance; that is, we perform a rather exhaustive search, but we are clever about it to obtain an efficient algorithm. The problems we study are scheduling, RNA secondary structure, sequence alignment, and shortest paths.

## 6. Data structures

Data structuring is an idea we have used already in Prim's and Dijkstra's algorithms: use one or more data structures to encapsulate all or some of the hard work of the algorithm. This helps to keep the algorithm clean and promotes code reuse if the data structure is general enough. To illustrate this, we will discuss (a, b)-trees as a dictionaly structure and use them to solve a number of geometric problems, such as range searching and line segment intersection. We will then see how to build new structures from (a, b)-trees, that is, how to augment (a, b)-trees, in order to solve more problems that cannot be solved using standard (a, b)-trees. These structures include a dynamic order statistics structure, a priority search tree, and range trees. In the discussion of priority search trees, we will touch on the concept of amortized analysis.

## 7. Randomization

Sometimes, it is difficult or impossible to make guaranteed good choices in an algorithm, for example, to find a good way to break a problem instance in a divide-and-conquer algorithm into smaller instances. A randomized algorithm flips a coin to amke such decisions and hopes that most of these random decisions are good. The crux is to prove that this is indeed the case. We will review basic concepts in probability theory and use them to analyze the expected running time of randomized algorithms. The problems we will study include sorting, selection, and binary space partitions.

## 8. Intractability and NP-completeness

We will introduce the concepts of NP-completeness and NP-hardness as measures for the computational intractability of certain problems. The discussion of these concepts will lead us on a short excursion into the realm of formal languages, which are studied in much more depth in CSCI 4112. Classical NP-hard problems we will discuss are 3-SAT, vertex cover, Hamiltonian cycle, and subset sum. The main technique we will learn to establish the NP-hardness of a problem are polynomial-time reductions.

# Slides

 Slides Dates Sep 11 Sep 14, 16, 18 Sep 21, 23, 25, 28 Sep 28, 30; Oct 2, 5, 7 Oct 9, 12, 14, 16, 19, 21, 23 Oct 26, 28; Nov 2 TBD TBD TBD

# Notes

 Note Dates Sep 11, 14, 16, 18 Sep 14, 16, 18 Sep 21, 23, 25, 28 Sep 28, 30; Oct 2, 5, 7 Oct 9, 12, 14, 16 TBD TBD TBD

# Code

 Algorithm Scala Haskell Complete code Introduction to Haskell Stable matching Static dictionaries Graph representations Rooted tree representations Basic graph traversal framework Depth-first search A simple stack Breadth-first search A (not so) simple queue Connected components Bipartiteness testing Topological sorting Sorting and selection

# Textbook

T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms. 3rd edition, MIT Press, 2009.
N. Zeh. Data Structures. CSCI 3110 Lecture Notes, 2005.

# Evaluation

• 10 assignments (A)

The best 8 assignments count; each has equal weight.

• Midterm (M)
• Final (F)

Final grade = max(F, 40% * A + 60% * F, 40% * A + 20% * M + 40% * F)

# Policies

Collaboration: I encourage you to form study groups to discuss the course material and work on exercises. However, I do not allow any form of collaboration on assignments, except within groups of students handing in a joint assignment. Collaboration between groups will be treated as plagiarism. Sufficiently similar assignments will be handed over to the Faculty's Academic Integrity Officer.

Plagiarism: I will not tolerate any form of plagiarism (copying from your classmates, copying solutions from the web, etc.). According to university regulations, I have to report you to the Faculty's Integrity Officer if I suspect you of academic dishonesty, and I will. The Integrity Officer may refer your case to the Senate Committee on Plagiarism. The penalties for plagiarism can range from failing the course to expulsion from the university. So please save me and yourself the aggravation. For more information, make sure you read the university policy on intellectual honesty and the Dalhousie Academic Integrity Page.

Late assignments: I do not accept late assignments without a doctor's note or any other official document that certifies a good reason why you could not hand your assignment in on time. In case of important events in your personal life, I am willing to make an exception to this rule if you discuss these circumstances with me and apply for a reasonable extension of the deadline in advance.

Missed exams: If you miss an exam, I may, at my discretion, agree to give you the opportunity to sit the exam at a later date. You must, however, present a good reason why you were not able to sit the exam at the regularly scheduled time. Also, unless there are a large number of students who missed the regularly scheduled exam, make-up exams are oral exams.

Instructor
Norbert Zeh
TAs
David Samuel
(A1, A5, A9)
Mengdu Li
(A2, A4, A6, A8)
(A3, A7, A10)
Office hours
MW 13:30–15:00
Goldberg 313
Lecture
MWF 12:30–13:30
Goldberg 127
Tutorial
W 8:30–9:30
LSC C236
Midterm
Oct 30
In class
Goldberg 127
Final
TBD
Assignments
Posted:
Sep 18
Due:
Sep 25
Posted:
Sep 25
Due:
Oct 2
Posted:
Oct 2
Due:
Oct 9
Posted:
Oct 9
Due:
Oct 19
Posted:
Oct 19
Due:
Oct 26
Posted:
Oct 30
Due:
Nov 6
Posted:
Nov 6
Due:
Nov 13
Posted:
Nov 14
Due:
Nov 20
Posted:
Nov 20
Due:
Nov 27
Posted:
Nov 24
Due:
Dec 4
News
Dec 7
Here's a sample final exam and its solutions.
Nov 4
Here are the sample solutions for this term's midterm.
Oct 19
Sep 11
The fun begins.