BIOC 4010/5010 - Bioinformatics

Contact Information

Christian Blouin - cblouin@dal.ca
Skype : drcblouin
Office : Goldberg 206

How to use this website

What is a Learning Outcome?

A learning outome is a task that will be used to evaluate your learning. There shall be no surprise evaluation: if something is not covered by learning outcome, it isn't going to be on the examination.


Preparatory Guide for the final Examination

Guidelines

Content

Sample Test

Here is a sample test from a previous year. We had less time this year, and covered less. All irrelevant questions are crossed-over.

Lecture 1 - A graphical abstraction to solve the Genome assembly problem

Learning Outcomes

  1. Create an overlap and a de Bruijn graphical abstraction from a collection of short DNA reads.
  2. Solve the sequence of a contig using the OLC or Eulerian path methods.
  3. Compute the coverage of a shotgun sequencing project when provided with the appropriate parameters.
  4. Contrast strengths and limitations of the Greedy, Overlap-Layout-Consensus and Eulerian path algorithms.
  5. Explain how sequence repeats and sequencing errors affect genome assembly.

Here is what you should be focussing on:
  1. What is the genome assembly problem?
  2. How does sequence repeats and sequencing error influence genome assembly?
  3. What is a n-mer, and how are n-mers defined from a given sequence of DNA.
  4. Contrast the Overlap-Layout-consensus approach with the de Bruijn (Eulerian).
You also may have a look at the Wikipedia entry on Genome Assembly. More fundamentally, here is the entry about directed graphs. After the lecture, you can find an introduction to Eulerian paths here as well as a more whimsical introduction to Eulerian paths with the Bridges of Koenigsberg problem. The corresponding entry for the Hamiltonian path is here.

Course Material

Please be aware that there will be no slides/course notes for this lecture. Bring note-taking material.
A copy of the handout can be dowloaded here.

Lecture 2 - Signal-based Gene detection

Learning Outcomes

  1. Interpret algorithms expressed as pseudocode.
  2. Describe the limitations of assuming an equivalence between long ORFs and expressed gene products in prokaryotes.
  3. Compute and interpret the following prediction performance measures:sensitivity, precision and F-score.
  4. Desribes how GeneMark learns the structure of genes in prokaryotes.
  5. Explain why GLIMMER attempts to consider data from longer k-mers.

Reading Material

The relevant material to this lecture can be found in section 10.3 of the textbook.

Course Material


Lab 4 - Introduction to Scripting

This laboratory introduces key concepts in scripting. The language we'll use is Python, a versatile and intuitive way to program. Python is commonly used in Bioinformatics as well as through the full range of IT industries because it is easy to use and debug.

Learning Outcomes

  1. Create Python scripts and run them.
  2. Perform arithmetic operations.
  3. Create and manipulate string variables.
  4. Read and write from disk file.
  5. Explain how an operating system knows where a line of text ends.

Getting Python

To get Python for your own computer, simply download the appropriate package from the Python home page. There is a tutorial here if you want to know more. However, this series of labs are designed to be all-inclusive.

Lab Material

The lab material can be found here. It is meant to be semi-standalone. I strongly recommend attending the session although the lab itself can be completed at home. A reference cheatsheet for Python is available here.

Lab 5 - Gene Prediction I

This laboratory is about implementing a simple algorithm for predicting genes in a prokaryotic genome. It is mean to give you a taste of running in silico experiments and testing an hypothesis in a data-related environment. You will extend your knowledge of programming with a few syntactical features, but the bulk of the code is provided. You must ensure that you understand the logic behind the code.

Learning Outcomes

Lab Material

The lab material can be found here. It is meant to be semi-standalone. I recommend attending the session although the lab itself can be completed at home.

Code Solution (NEW!)

The solution to the code is now provided to you to compensate for some errors introduced in the lab manual.

Lab 6 - Gene Prediction II, analysis

In this lab, you will be analysing the performance of classification of START and STOP codons, excat ORF boundaries and nucleotide-level prediction. The key challenge today is to compute these performance on thousands of predicted genes. The key programming concept this week is the design of general-purpose functions.

Learning Outcomes

Lab Material

The lab material can be found here. It is meant to be semi-standalone. I recommend attending the session although the lab itself can be completed at home.

Lecture 5 - Molecular Simulations

Learning Outcomes

  1. Describe what is a Force Field
  2. Explain the problems related to the parametrization of Force Fields.
  3. Explain why the quality of a simulation is more dependent on the speed of a calculation rather than its accuracy.
  4. Explain why most of the computation time is used for non-bonded interactions (VdW, electrostatics) rather than for bonded interactions (stretching, bending, etc.)
  5. Contrast the process of protein folding in vivo versus the formulation of protein folding as an energy minimization problem.
  6. Contrast local and global minima in an energy “landscape”.
  7. Understand the concept of greedy optimization, and explain why minimization can't really search the conformational space.
  8. Explain the role of temperature in molecular dynamics, and how this affects the search for the global minimum.
  9. Explain the theoretical limitation of parallelizing a calculation.
  10. Explain how ensemble dynamics get around this theoretical limitation.
  11. Explain why ab initio protein folding, or the prediction of tertiary structure from a sequence, is considered one of the most challenging problem in computational biology.

Reading Material

Appendix B and C from the Zvelebil and Baum Textbook.

Course Material