CSCI 4152/6509 — Course Calendar

  Part I: Introduction
1 We Sep  4Course Introduction
Course introduction: logistics, administrivia, references, evaluation, policies, schedule; Introduction to NLP (reading Ch.1 [JM]): natural language and other languages, NLP applications, NLP as a research area, NLP Research Links and NLP Anthology http://aclweb.org/anthology/. Short history of NLP. NLP methodology overview. Levels of NLP. Why is NLP generally hard.
Files: Syllabus (PDF), slides, lecture notes. Reading: [JM] Ch.1
2 Mo Sep  9 Ambiguities in NLP; Course Project
Ambiguities at different levels of NLP. About Course Project: topics and teams, deliverables, P0, P1, P, R; project types, choosing topic, resources, themes and previous topics.
Files: slides, lecture notes.
  Part II: Stream-based Text Processing
3 We Sep 11 Finite Automata Review
Part II: Stream-based Text Processing: Deterministic and Non-deterministic Automata. (Reading: Chapter 2 [JM]) Review of Deterministic Finite Automata (DFA). Review of Non-deterministic Finite Automata (NFA), and their use in NLP. NFA-to-DFA conversion.
Files: slides, lecture notes. Reading: [JM] Ch.2
A0 out
L1 Fr Sep 13 Lab 1: FCS Computing Environment, Perl Tutorial 1
Logging in using CSID, timberlea environment; Introduction to Perl programming language: basic syntax, variables, string literals, subroutines.
Files: lab notes, slides.
4 Mo Sep 16 Regular Expressions and Perl Files: slides, lecture notes. Reading: On timberlea server `man perlretut' and `man perlre', or perlretut and perlre  
  Tu Sep 17Last day to add/drop courses  
5 We Sep 18 Basic NLP in Perl
Regular expressions in Perl and basic text processing; Text processing examples: tokenization, counting letters. Elements of Morphology: reading: Section 3.1 [JM]; morphemes, stems, affixes, tokenization, stemming.
Files: slides, lecture notes. Reading: Section 3.1 [JM]
A0 due
L2 Fr Sep 20 Lab 2: Perl Tutorial 2
Regular expressions in Perl, Perl: basic I/O.
Files: lab notes, slides.
6 Mo Sep 23 Counting N-grams
Elements of Morphology (continued): lematization, morphological processes; Characters, Words, and N-grams: counting words, Zipf's law. Perl examples with n-gram collection. Elements of Information Retrieval: Vector Space Model.
Files: slides, lecture notes. Reading: [JM] 23.1 (Information Retrieval), [MS] Ch.15 (Topics in Information Retrieval)
A1 out
7 We Sep 25 Elements of Information Retrieval and Text Mining
Some interesting links: Lucene, IR book by Manning, Raghavan, and Schutze. IR Evaluation: precision, recall, F-measure, precision-recall curve. Interpolated Precision-Recall curve. Text mining. Text Classification: classifier evaluation precision, recall, and F-measure in classification. Evaluation methods for classification: training error, train-and-test, and n-fold cross-validation. Similarity-based text classification.
Files: slides, lecture notes.
L3 Fr Sep 27 Lab 3: Perl Tutorial 3
Perl: Arrays or lists; associative arrays or hashes; references.
Files: lab notes, slides.
  Fr Sep 27 P0 Project Topic Proposal due P0 due
  Mo Sep 30National Day for Truth and Reconciliation, University closed  
  We Oct  2Last day to drop classes without "W", change audit to credit or vv.  
8 We Oct  2 Similarity-based Classification
CNG classification method for authorship attribution. Edit distance: introduction, dynamic programming approach, example, algorithm.
Files: slides, lecture notes.
L4 Fr Oct  4 Lab 4: Git and GitLab Tutorial
Introduction to GitLab and Git; adding and modifying files, setting up SSH key, add, commit, and push commands, checkout; creating branches and working collaboratively, pull, merge, resolving conflicts.
Files: lab notes, slides.
  Part III: Probabilistic and Machine Learning Approach to NLP
9 Mo Oct  7 P0 Topics Discussion; Introduction to Probabilistic Modeling
Projects discussion: P-01, P-03, P-04, P-05, P-06. Probabilistic approach to NLP: logical vs. plausible reasoning in AI and NLP; Brief review of elements of probability theory. Bayesian inference, generative models. Probabilistic modeling: random variables, configurations, and models; computational tasks.
Files: P0 slides, slides, lecture notes.
10 We Oct  9 Basic Probabilistic Models; P0 Topics Discussion (2)
Joint distribution model; simulation task; other tasks in joint distribution model; spam example. Fully independent model. Note on efficient sum-product computation and max-product computation. Project discussion: P-02.
Files: slides, lecture notes, P0 slides #12.
  Th Oct 10A1 due A1 due
L5 Fr Oct 11 Lab 5: Python NLTK Tutorial 1
Introduction to Python: basics, lists, tuples, dictionaries; Introduction to NLTK: tokenization, stop-words, stemming, n-grams, frequency distribution, classification.
Files: lab notes, slides.
  Mo Oct 14Thanksgiving Day, University closed  
11 We Oct 16 Naive Bayes Model; P0 Topics Discussion (3)
Fully-independent model finished. Naive Bayes model: definition, assumption, graphical model, computational tasks with example, number of parameters, pros and cons, variations and practical issues. Project discussions: P-07, P-08, P-09, P-11, P-12, P-13.
Files: slides, lecture notes, P0 slides #14.
L6 Fr Oct 18 Lab 6: Python NLTK Tutorial 2
Part-of-speech taggers in NLTK: HMM and CRF, Brill tagger; Named entity chunking; Jupyter and using JupyterHub.
Files: lab notes, slides.
12 Mo Oct 21 P0 Topics Discussion (4); N-gram Model
Project discussions: P-14, P-15, P-16, P-17, P-18, P19, P-20, P-21, P-23, P-24. N-gram model: language modeling, N-gram model assumption, graphical representation.
Files: P0 slides, slides, lecture notes. Reading: [JM] Ch4 N-Grams
13 We Oct 23 N-gram Model Smoothing
P0 discussion (5): P-22, P-25. N-gram model as Markov Chain; language model evaluation: perplexity; language modeling in classification. N-gram model smoothing: Laplace and Witten-Bell smoothing.
Files: slides, lecture notes, P0 slides (#32).
L7 Fr Oct 25 Lab 7: P1 Submission help; TeX/LaTeX Tutorial (not marked) Files: slides. 
14 Mo Oct 28 POS Tagging and Hidden Markov Model
Witten-Bell smooting (finished). POS tagging. POS tags: introduction, open and closed word categories. reading: [JM] Ch5 Part-of-Speech Tagging. Open word categories: nouns (NN, NNS, NNP, NNPS), adjectives (JJ, JJR, JJS), verbs (VB, VBP, VBZ, VBG, VBD, VBN), adverbs (RB, RBR, RBS); Closed word categories: DT, WDT. PDT, PRP, PRP$, WP, WP$, IN, RP, POS, MD, TO, RB (closed), WRB, CC, UH; Other POS classes: EX, FW, LS, punctuation, SYM. Examples. Hidden Markov Model (HMM): motivation, definition, HMM assumption, applications. Example with HMM for POS Tagging.
Files: slides, lecture notes. Reading: [JM] Ch5 Part-of-Speech Tagging.
P1 due
15 We Oct 30 Inference with HMMs
POS tagging with HMM example (continued): brute-force tagging, Viterbi algorithm. HMM as a Bayesian Network. Bayesian Networks review. Bayesian Network inference: general brute-force method; Sum-product message passing algorithms for efficient inference (started): factor graph, order of message calculation.
Files: slides, lecture notes.
A2 out
  Th Oct 31Last day to drop classes with "W"  
16 Mo Nov  4 Efficient Inference for Bayesian Networks and HMMs
Sum-product message-passing algorithms (continued): cases in message calculation, inference computational tasks. BN example with message passing. HMM tagging example with message passing.
Files: slides, lecture notes.
17 We Nov  6 Neural Networks and NLP
Deep Learning Models in NLP: applications, recent history; ideas behind artifical neural networks; elements: perceptron or neuron, activation function, similarity to logistic regression, simple neural language model, RNNs.
Files: slides, lecture notes.
L8 Fr Nov  8 Lab 8: Python Tutorial with PyTorch
Lab 8 PyTorch instruction notebook (on Google Colab)
  Mo Nov 11Remembrance Day, University closed  
  Mo Nov 11Fall Study Break Nov 11-15, no classes, University open except Mon  
  Part IV: Parsing (Syntactic Processing)
18 Mo Nov 18 Deep Learning and NLP; DCG and PCFG
Stacked and bidirectional RNN, LSTM, self-attention and transformers. Parsing (Syntactic Processing): A brief introduction to Prolog, unification, and backgracking; Natural Language Syntax: phrase structure, clauses, sentences; parsing, parse tree examples. Contest-Free Grammars review: definition, parse trees, derivations and other concepts.
Files: slides, lecture notes.
A2 due
19 We Nov 20 DCG and PCFG Grammars Files: slides, lecture notes.A3 out
L9 Fr Nov 22 Lab 9: Prolog Tutorial 1
IMPORTANT NOTICE: The morning Lab is CANCELLED due to technical issues. Update at 8:43am: Technical issues resolved. The morning lab is still cancelled but the afternoon lab will be held. Topics: General Prolog tutorial.
Files: lab notes, slides.
20 Mo Nov 25 Syntax of Natural Languages; CKY Algorithm
Typical phrase structure rules in English (continued): NP, VP, PP, ADJP, ADVP. Dependency structure: heads and dependencies, dependency tree. CYK parsing algorithm (started).
Files: slides, lecture notes.
21 We Nov 27 CKY Algorithm and PCFGs
CKY algorithm (continued), CKY for PCFGs, Issues with PCFGs, ideas fo probabilistic lexicalized CFGs.
Files: slides, lecture notes.
  Th Nov 28 Assignment 4 out A4 out
L10 Fr Nov 29 Lab 10: Prolog Tutorial 2 Files: lab notes, slides. 
  Part V: Student Presentations
  Mo Dec  2 Student Presentations
All presentations are in the room 430, Goldberg CS building. Presentation time slots:
10:00-10:30: PT-01, PT-02;
10:30-11:15: Session 1: PT-03* (Samson), PT-04, PT-05* (Jarrod);
11:30-12:45: PT-06, PT-07, PT-08, PT-09, P-10;
15:00-15:30: PT-11, PT-12;
15:30-16:00: Session 2: PT-13* (Ye, Xu, Baike, Xuelian), PT-14;
16:15-17:00: Session 3: PT-15* (Usama, Konstantin, Pallavi, Rutvik), PT-16* (Rashik, Ying), PT-17;
  Tu Dec  3 Student Presentations
All presentations are in the room 430, Goldberg CS building (rm429/430 booked 10am-5pm). Presentation time slots:
10:00-11:00: PT-21, PT-22, PT-23, PT-24;
11:15-12:30: Session 4: PT-25* (Jenna), PT-26* (Shrey, Kavya), PT-27* (Jay), PT-28, PT-29* (Sibi);
13:00-13:45: PT-30, PT-31, PT-32;
14:00-15:00: Session 5: PT-33* (Logan), PT-34* (Aryan), PT-35, PT-36;
15:15-16:15: Session 6: PT-37* (Alb), PT-38* (Reginald), PT-39, PT-40* (Surya, Sudhan, Karthik, Allotei);
16:30-17:00: Session 7: PT-41* (Bryce), PT-42* (Shubham, Inderdeep);
  We Dec  4 Student Presentations
All presentations are in the room 430, Goldberg CS building. Presentation time slots:
12:30-13:30: Session 8: PT-51, PT-52* (Ridhi), PT-53* (Aditya), PT-54* (Alrashdi, Alsalmim, Yahya, Omar);
13:45-14:45: Session 9: PT-55, PT-56* (Hinda), PT-57* (Ajaykumar, Manish), PT-58;
15:00-16:00: Session 10: PT-59* (Rifat, Mukarrom, Priyadharshini, Syanthan), PT-60* (Omar SA, Keenan), PT-61* (Arash, Aniq, Sebastian, Udhaya), PT-62* (Tasjid);
16:15-16:45: Session 11: PT-63* (Gobind), PT-64* (Kanav);
A3 due, A4 due
  We Dec  4Classes end, Report due Report due
  Final Exam
  Th Dec 12Final Exam (8:30-10:30am)
Final exam, duration 2 hours, starting at 8:30am, Dalplex. Exams schedule URL: https://www.dal.ca/exams/halifax-exam-schedule.html

