Dalhousie University    [  http://web.cs.dal.ca/~vlado/csci6509/coursecalendar.html  ]
Fall 2023 (Sep5-Dec6)
Faculty of Computer Science
Dalhousie University

CSCI 4152/6509 — Course Calendar

[ Home | Calendar | Project ]
  Part I: Introduction
1 Tu Sep  5Course Introduction
Course introduction: logistics, administrivia, references, evaluation, policies, schedule; Introduction to NLP (reading Ch.1 [JM]): natural language and other languages, NLP applications, NLP as a research area, NLP Research Links and NLP Anthology http://aclweb.org/anthology/. Short history of NLP. NLP methodology overview. Levels of NLP.
Files: Syllabus (PDF), slides, lecture notes. Reading: [JM] Ch.1
2 Th Sep  7 Course Project
Why is NLP generally hard; Ambiguities at different levels of NLP. About Course Project: topics and teams, deliverables, P0, P1, P, R; project types, choosing topic, resources, themes and previous topics.
Files: slides, lecture notes.
  Part II: Stream-based Text Processing
3 Tu Sep 12 Finite Automata Review
Part II: Stream-based Text Processing: Deterministic and Non-deterministic Automata. (Reading: Chapter 2 [JM]) Review of Deterministic Finite Automata (DFA). Review of Non-deterministic Finite Automata (NFA), and their use in NLP.
Files: slides, lecture notes. Reading: [JM] Ch.2
A0 out
L1 Tue/Wed Lab 1: FCS Computing Environment, Perl Tutorial 1
Logging in using CSID, timberlea environment; Introduction to Perl programming language: basic syntax, variables, string literals, subroutines.
Files: lab notes, slides.
4 Th Sep 14 NFA and RegEx Review, Perl
NFA-to-DFA conversion. Review of regular expressions. Introduction to Perl, main Perl features, syntactic elements, program examples.
Files: slides, lecture notes. Reading: On timberlea server `man perlretut' and `man perlre', or perlretut and perlre
  Tu Sep 19Last day to add/drop courses  
5 Tu Sep 19 RegEx and Basic NLP in Perl
Regular expressions in Perl and basic text processing; Text processing examples: tokenization, counting letters.
Files: slides, lecture notes.
A1 out
L2 Tue/Wed Lab 2: Perl Tutorial 2 Files: lab notes, slides. 
6 Th Sep 21 Elements of Morphology
Letter frequencies (finished). Elements of Morphology: reading: Section 3.1 [JM]; morphemes, stems, affixes, tokenization, stemming, lemmatization; morphological processes. Characters, Words, and N-grams: counting words, Zipf's law.
Files: slides, lecture notes. Reading: Section 3.1 [JM]
A0 due
7 Tu Sep 26 Elements of Information Retrieval and Text Mining
Project ideas discussion, P0; Perl examples with n-gram collection. Elements of information retrieval: typical IR system architecture, vector space model.
Files: slides, lecture notes. Reading: [JM] 23.1 (Information Retrieval), [MS] Ch.15 (Topics in Information Retrieval)
L3 Tue/Wed Lab 3: Perl Tutorial 3
Perl: Arrays or lists; associative arrays or hashes; references.
Files: lab notes, slides.
8 Th Sep 28 Text Mining Review
Guest speaker regarding project ideas. Information Retrieval review (continued). Some interesting links: Lucene, IR book by Manning, Raghavan, and Schutze. IR Evaluation: precision, recall, F-measure, precision-recall curve. Interpolated Precision-Recall curve. Text mining. Text Classification: classifier evaluation, evaluation measures for text classification, evaluation methods for text classification. Evaluation methods for text classification (continued); Text clustering.
Files: slides, lecture notes.
  Fr Sep 29 P0 Project Topic Proposal due P0 due
  Mo Oct  2National Day for Truth and Reconciliation, University closed  
9 Tu Oct  3 Similarity-based Text Classification, CNG Classification
Discussion about classifier evaluation (training error, train-and-test, and n-fold cross-validation). Similarity-based text classification: classification using vector space model and cosine similarity, Euclidean distance; CNG classification method for authorship attribution. Edit distance: introduction, dynamic programming approach, example, algorithm.
Files: slides, lecture notes.
L4 Tue/Wed Lab 4: Git and GitLab Tutorial
Introduction to GitLab and Git; adding and modifying files, setting up SSH key, add, commit, and push commands, checkout; creating branches and working collaboratively, pull, merge, resolving conflicts.
Files: lab notes, slides.
  We Oct  4Last day to drop classes without "W", change audit to credit or vv.  
  Part III: Probabilistic Approach to NLP
10 Th Oct  5 Introduction to Probabilistic NLP
Edit distance (finished). Probabilistic approach to NLP: logical vs. plausible reasoning in AI and NLP; Brief review of elements of probability theory. Bayesian inference, generative models.
Files: slides, lecture notes.
  Mo Oct  9Thanksgiving Day, University closed  
11 Tu Oct 10 P0 Topics Discussion
Projects discussion: P-01, P-03, P-04, P-05, P-06, P-07, P-08, P-09, P-11.
Files: P0 slides, slides, lecture notes.
L5 Tue/Wed Lab 5: Python NLTK Tutorial 1
Introduction to Python: basics, lists, tuples, dictionaries; Introduction to NLTK: tokenization, stop-words, stemming, n-grams, frequency distribution, classification.
Files: lab notes, slides.
12 Th Oct 12 Probabilistic Modeling
P0 discussion (cont.): P-02. Probabilistic modeling: random variables, configurations, and models; computational tasks; joint distribution model; fully independent model; problem of efficient computation, computational tasks, spam example. Naive Bayes model: definition, assumption, graphical model.
Files: slides, lecture notes, P0 slides.
A1 due
13 Tu Oct 17 Naive Bayes Model
P0 discussion (cont.): P-13. Naive Bayes model (continued): computational tasks with example; number of parameters, pros and cons, variations and practical issues. N-gram model: language modeling, N-gram model assumption, graphical repreentation.
Files: slides, lecture notes, P0 slides. Reading: [JM] Ch4 N-Grams
L6 Tue/Wed Lab 6: Python NLTK Tutorial 2
Part-of-speech taggers in NLTK: HMM and CRF, Brill tagger; Named entity chunking; Jupyter and using JupyterHub.
Files: lab notes, slides.
14 Th Oct 19N-gram Model and Smoothing
N-gram model as Markov Chain; language model evaluation: perplexity; language modeling in classification; N-gram model smoothing: Laplace and Witten-Bell smoothing.
Files: slides, lecture notes.
15 Tu Oct 24 POS Tags and Hidden Markov Model
POS tags: introduction, open and closed word categories. reading: [JM] Ch5 Part-of-Speech Tagging. Open word categories: nouns (NN, NNS, NNP, NNPS), adjectives (JJ, JJR, JJS), verbs (VB, VBP, VBZ, VBG, VBD, VBN), adverbs (RB, RBR, RBS); Closed word categories: DT, WDT. PDT, PRP, PRP$, WP, WP$, IN, RP, POS, MD, TO, RB (closed), WRB, CC, UH; Other POS classes: EX, FW, LS, punctuation, SYM. Examples. Hidden Markov Model (HMM): motivation, definition, HMM assumption, applications.
Files: slides, lecture notes. Reading: [JM] Ch5 Part-of-Speech Tagging. [JM ed3] Ch8 Sequence Labeling for Parts of Speech and Named Entities P1 requirements.
A2 out
L7 Tue/Wed Lab 7: Fetching Tweets with Python
Note: For information only. Not needed to complete.
Files: lab notes.
16 Th Oct 26 Efficient Inference with HMM
HMM: POS tagging. Overview of HMM computational tasks. POS example for training and tagging. POS tagging as HMM inference: brute force, Viterbi algorithm.
Files: slides, lecture notes. Reading: [JM] Ch. 6 (HMM, first part)
  Mo Oct 30P1 Project Statement due P1 due
17 Tu Oct 31 HMM as Bayesian Network
HMM as a Bayesian Network: BN definition, burglar-earthquake example, computational tasks, brute-force inference in BNs, difficulty of efficient inference in general BNs. Sum-product algorithms: factor graph, Principles of message-passing algorithms, order of message calculation, cases in message calculation (started).
Files: slides, lecture notes.
  Th Nov  2Last day to drop classes with "W"  
18 Th Nov  2 Sum-Product (Message-Passing) Algorithms for BN Inference
Message-passing algorithms: cases of message calculation (continued), draft proof of the message-passing algorithm; inference tasks solved by message-passing algorithms.
Files: slides, lecture notes.
19 Tu Nov  7 Examples of Message-Passing Algorithms
Examples with message-passing algorithms: burglar-earthquake example with message passing (started). Message-passing examples: Example 1: conditioning with one variable in the "burglar-earthquake"; Example 2: completion example with HMM for POS tagging.
Files: slides, lecture notes.
L8 Tue/Wed Lab 8: Python Tutorial with PyTorch
Lab 8 instruction notebook (on Google Colab)
20 Th Nov  9 Neural Networks and NLP
Neural networks and deep learning: applications, some main developments, large deep learning models, growth in size. Foundations of neural networks: biological neuron, perceptron, feed-forward network, activation fuctions, logistic regression as a simple network, softmax function.
Files: slides, lecture notes.
  Fr Nov 10A2 due A2 due
  Mo Nov 13In lieu of Remembrance Day, University closed  
  Mo Nov 13Fall Study Break Nov 13-17, no classes, University open except Mon  
  Part IV: Parsing (Syntactic Processing)
21 Tu Nov 21 Neural Network Models for NLP; Parsing NLP
Model overviews for NLP: neural language model, recurrent neural network, stacking and bidirectional RNN, LSTM, self-attention and transformers. Parsing: A brief introduction to Prolog, unification, and backgracking; variables, lists, structures; examples: factorial, member.
Files: slides, lecture notes.
A3 out
L9 Tue/Wed Lab 9: Prolog Tutorial 1 Files: lab notes, slides. 
22 Th Nov 23 Natural Language Syntax
Parsing (Syntactic Processing): Natural language syntax: phrase structure, clauses, sentences; parsing, parse tree examples. Contest-Free Grammars review: definition, parse trees, derivations and other concepts, bracket representation. Definite Clause Grammars (DCG): parsing NL in Prolog using difference lists.
Files: slides, lecture notes. Reading: [JM] Ch 12
23 Tu Nov 28DCG and PCFG
Definite Clause Grammars (DCG): example, building parse tree, handling agreement, embedded code. Probabilistic Context-Free Grammars (PCFG): PCFG as a probabilistic model, computational tasks for PCFG model: evaluation, learning, simulation, proper PCFG, expressing PCFGs in DCGs.
Files: slides, lecture notes.
L10 Tue/Wed Lab 10: Prolog Tutorial 2 Files: lab notes, slides. 
24 Th Nov 30 Typical Phrase Structure of English Files: slides, lecture notes. 
  Part V: Student Presentations
  Mo Dec 4A4 out A4 out
  Tu Dec  5 Student Presentations
All presentations are in the room 430, Goldberg CS building. Presentation time slots:
13:00-14:00: PT-29* (Angelo, Callum), PT-30, PT-31, PT-32;
14:30-15:00: PT-33* (Fangzheng, Zhengping), PT-34;
A3 due
  We Dec  6 Student Presentations
All presentations are in the room 430, Goldberg CS building. Presentation time slots:
12:15-13:00: PT-41* (Mundhir, Brijesh, Harsahib), PT-42, PT-43;
13:30-14:30: PT-44* (Mansoor), PT-45* (Dorsa), PT-46* (Mohammed), PT-47* (Riasat);
15:00-16:00: PT-48* (Hassaan, Ming, Sourabh), PT-49* (Geoff), PT-50* (Yash), PT-51* (Mason);
16:30-17:30: PT-52* (Shaoqin), PT-53* (Yu, Xinxin), PT-54* (Adja, Jyotishka), PT-55;
  We Dec  6Classes end, Monday schedule used, Report due
Allowed delayed submissions: A3 by Dec 7. R by Dec 18. A4 by Dec 18.
Report due
  We Dec  6A4 due A4 due
  Final Exam
  Sa Dec 16Final Exam (3:30-5:30pm)
Final exam, duration 2 hours, starting at 15:30, DALPLEX. Exams schedule URL: http://www.dal.ca/academics/exam_schedule/halifax_campus_exam_schedule.html

Maintained by: Vlado Keselj, last update: 04-Feb-2024