[ http://web.cs.dal.ca/~vlado/csci6509/coursecalendar.html ]
Fall 2023 (Sep5-Dec6) Faculty of Computer Science Dalhousie University |
# | Date | Title | |
---|---|---|---|
Part I: Introduction | |||
1 | Tu Sep 5 | Course Introduction
Course introduction: logistics, administrivia, references, evaluation, policies, schedule; Introduction to NLP (reading Ch.1 [JM]): natural language and other languages, NLP applications, NLP as a research area, NLP Research Links and NLP Anthology http://aclweb.org/anthology/. Short history of NLP. NLP methodology overview. Levels of NLP. Files: Syllabus (PDF), slides, lecture notes. Reading: [JM] Ch.1 | |
2 | Th Sep 7 | Course Project
Why is NLP generally hard; Ambiguities at different levels of NLP. About Course Project: topics and teams, deliverables, P0, P1, P, R; project types, choosing topic, resources, themes and previous topics. Files: slides, lecture notes. | |
Part II: Stream-based Text Processing | |||
3 | Tu Sep 12 | Finite Automata Review
Part II: Stream-based Text Processing: Deterministic and Non-deterministic Automata. (Reading: Chapter 2 [JM]) Review of Deterministic Finite Automata (DFA). Review of Non-deterministic Finite Automata (NFA), and their use in NLP. Files: slides, lecture notes. Reading: [JM] Ch.2 | A0 out |
L1 | Tue/Wed | Lab 1: FCS Computing Environment, Perl Tutorial 1
Logging in using CSID, timberlea environment; Introduction to Perl programming language: basic syntax, variables, string literals, subroutines. Files: lab notes, slides. | |
4 | Th Sep 14 | NFA and RegEx Review, Perl
NFA-to-DFA conversion. Review of regular expressions. Introduction to Perl, main Perl features, syntactic elements, program examples. Files: slides, lecture notes. Reading: On timberlea server `man perlretut' and `man perlre', or perlretut and perlre | |
Tu Sep 19 | Last day to add/drop courses | ||
5 | Tu Sep 19 | RegEx and Basic NLP in Perl
Regular expressions in Perl and basic text processing; Text processing examples: tokenization, counting letters. Files: slides, lecture notes. | A1 out |
L2 | Tue/Wed | Lab 2: Perl Tutorial 2 Files: lab notes, slides. | |
6 | Th Sep 21 | Elements of Morphology
Letter frequencies (finished). Elements of Morphology: reading: Section 3.1 [JM]; morphemes, stems, affixes, tokenization, stemming, lemmatization; morphological processes. Characters, Words, and N-grams: counting words, Zipf's law. Files: slides, lecture notes. Reading: Section 3.1 [JM] | A0 due |
7 | Tu Sep 26 | Elements of Information Retrieval and Text Mining
Project ideas discussion, P0; Perl examples with n-gram collection. Elements of information retrieval: typical IR system architecture, vector space model. Files: slides, lecture notes. Reading: [JM] 23.1 (Information Retrieval), [MS] Ch.15 (Topics in Information Retrieval) | |
L3 | Tue/Wed | Lab 3: Perl Tutorial 3
Perl: Arrays or lists; associative arrays or hashes; references. Files: lab notes, slides. | |
8 | Th Sep 28 | Text Mining Review
Guest speaker regarding project ideas. Information Retrieval review (continued). Some interesting links: Lucene, IR book by Manning, Raghavan, and Schutze. IR Evaluation: precision, recall, F-measure, precision-recall curve. Interpolated Precision-Recall curve. Text mining. Text Classification: classifier evaluation, evaluation measures for text classification, evaluation methods for text classification. Evaluation methods for text classification (continued); Text clustering. Files: slides, lecture notes. | |
Fr Sep 29 | P0 Project Topic Proposal due | P0 due | |
Mo Oct 2 | National Day for Truth and Reconciliation, University closed | ||
9 | Tu Oct 3 | Similarity-based Text Classification, CNG Classification
Discussion about classifier evaluation (training error, train-and-test, and n-fold cross-validation). Similarity-based text classification: classification using vector space model and cosine similarity, Euclidean distance; CNG classification method for authorship attribution. Edit distance: introduction, dynamic programming approach, example, algorithm. Files: slides, lecture notes. | |
L4 | Tue/Wed | Lab 4: Git and GitLab Tutorial
Introduction to GitLab and Git; adding and modifying files, setting up SSH key, add, commit, and push commands, checkout; creating branches and working collaboratively, pull, merge, resolving conflicts. Files: lab notes, slides. | |
We Oct 4 | Last day to drop classes without "W", change audit to credit or vv. | ||
Part III: Probabilistic Approach to NLP | |||
10 | Th Oct 5 | Introduction to Probabilistic NLP
Edit distance (finished). Probabilistic approach to NLP: logical vs. plausible reasoning in AI and NLP; Brief review of elements of probability theory. Bayesian inference, generative models. Files: slides, lecture notes. | |
Mo Oct 9 | Thanksgiving Day, University closed | ||
11 | Tu Oct 10 | P0 Topics Discussion
Projects discussion: P-01, P-03, P-04, P-05, P-06, P-07, P-08, P-09, P-11. Files: P0 slides, slides, lecture notes. | |
L5 | Tue/Wed | Lab 5: Python NLTK Tutorial 1
Introduction to Python: basics, lists, tuples, dictionaries; Introduction to NLTK: tokenization, stop-words, stemming, n-grams, frequency distribution, classification. Files: lab notes, slides. | |
12 | Th Oct 12 | Probabilistic Modeling
P0 discussion (cont.): P-02. Probabilistic modeling: random variables, configurations, and models; computational tasks; joint distribution model; fully independent model; problem of efficient computation, computational tasks, spam example. Naive Bayes model: definition, assumption, graphical model. Files: slides, lecture notes, P0 slides. | A1 due |
13 | Tu Oct 17 | Naive Bayes Model
P0 discussion (cont.): P-13. Naive Bayes model (continued): computational tasks with example; number of parameters, pros and cons, variations and practical issues. N-gram model: language modeling, N-gram model assumption, graphical repreentation. Files: slides, lecture notes, P0 slides. Reading: [JM] Ch4 N-Grams | |
L6 | Tue/Wed | Lab 6: Python NLTK Tutorial 2
Part-of-speech taggers in NLTK: HMM and CRF, Brill tagger; Named entity chunking; Jupyter and using JupyterHub. Files: lab notes, slides. | |
14 | Th Oct 19 | N-gram Model and Smoothing
N-gram model as Markov Chain; language model evaluation: perplexity; language modeling in classification; N-gram model smoothing: Laplace and Witten-Bell smoothing. Files: slides, lecture notes. | |
15 | Tu Oct 24 | POS Tags and Hidden Markov Model
POS tags: introduction, open and closed word categories. reading: [JM] Ch5 Part-of-Speech Tagging. Open word categories: nouns (NN, NNS, NNP, NNPS), adjectives (JJ, JJR, JJS), verbs (VB, VBP, VBZ, VBG, VBD, VBN), adverbs (RB, RBR, RBS); Closed word categories: DT, WDT. PDT, PRP, PRP$, WP, WP$, IN, RP, POS, MD, TO, RB (closed), WRB, CC, UH; Other POS classes: EX, FW, LS, punctuation, SYM. Examples. Hidden Markov Model (HMM): motivation, definition, HMM assumption, applications. Files: slides, lecture notes. Reading: [JM] Ch5 Part-of-Speech Tagging. [JM ed3] Ch8 Sequence Labeling for Parts of Speech and Named Entities P1 requirements. | A2 out |
L7 | Tue/Wed | Lab 7: Fetching Tweets with Python
Note: For information only. Not needed to complete. Files: lab notes. | |
16 | Th Oct 26 | Efficient Inference with HMM
HMM: POS tagging. Overview of HMM computational tasks. POS example for training and tagging. POS tagging as HMM inference: brute force, Viterbi algorithm. Files: slides, lecture notes. Reading: [JM] Ch. 6 (HMM, first part) | |
Mo Oct 30 | P1 Project Statement due | P1 due | |
17 | Tu Oct 31 | HMM as Bayesian Network
HMM as a Bayesian Network: BN definition, burglar-earthquake example, computational tasks, brute-force inference in BNs, difficulty of efficient inference in general BNs. Sum-product algorithms: factor graph, Principles of message-passing algorithms, order of message calculation, cases in message calculation (started). Files: slides, lecture notes. | |
Th Nov 2 | Last day to drop classes with "W" | ||
18 | Th Nov 2 | Sum-Product (Message-Passing) Algorithms for BN Inference
Message-passing algorithms: cases of message calculation (continued), draft proof of the message-passing algorithm; inference tasks solved by message-passing algorithms. Files: slides, lecture notes. | |
19 | Tu Nov 7 | Examples of Message-Passing Algorithms
Examples with message-passing algorithms: burglar-earthquake example with message passing (started). Message-passing examples: Example 1: conditioning with one variable in the "burglar-earthquake"; Example 2: completion example with HMM for POS tagging. Files: slides, lecture notes. | |
L8 | Tue/Wed | Lab 8: Python Tutorial with PyTorch
Lab 8 instruction notebook (on Google Colab) | |
20 | Th Nov 9 | Neural Networks and NLP
Neural networks and deep learning: applications, some main developments, large deep learning models, growth in size. Foundations of neural networks: biological neuron, perceptron, feed-forward network, activation fuctions, logistic regression as a simple network, softmax function. Files: slides, lecture notes. | |
Fr Nov 10 | A2 due | A2 due | |
Mo Nov 13 | In lieu of Remembrance Day, University closed | ||
Mo Nov 13 | Fall Study Break Nov 13-17, no classes, University open except Mon | ||
Part IV: Parsing (Syntactic Processing) | |||
21 | Tu Nov 21 | Neural Network Models for NLP; Parsing NLP
Model overviews for NLP: neural language model, recurrent neural network, stacking and bidirectional RNN, LSTM, self-attention and transformers. Parsing: A brief introduction to Prolog, unification, and backgracking; variables, lists, structures; examples: factorial, member. Files: slides, lecture notes. | A3 out |
L9 | Tue/Wed | Lab 9: Prolog Tutorial 1 Files: lab notes, slides. | |
22 | Th Nov 23 | Natural Language Syntax
Parsing (Syntactic Processing): Natural language syntax: phrase structure, clauses, sentences; parsing, parse tree examples. Contest-Free Grammars review: definition, parse trees, derivations and other concepts, bracket representation. Definite Clause Grammars (DCG): parsing NL in Prolog using difference lists. Files: slides, lecture notes. Reading: [JM] Ch 12 | |
23 | Tu Nov 28 | DCG and PCFG
Definite Clause Grammars (DCG): example, building parse tree, handling agreement, embedded code. Probabilistic Context-Free Grammars (PCFG): PCFG as a probabilistic model, computational tasks for PCFG model: evaluation, learning, simulation, proper PCFG, expressing PCFGs in DCGs. Files: slides, lecture notes. | |
L10 | Tue/Wed | Lab 10: Prolog Tutorial 2 Files: lab notes, slides. | |
24 | Th Nov 30 | Typical Phrase Structure of English Files: slides, lecture notes. | |
Part V: Student Presentations | |||
Mo Dec 4 | A4 out | A4 out | |
Tu Dec 5 | Student Presentations
All presentations are in the room 430, Goldberg CS building. Presentation time slots: 13:00-14:00: PT-29* (Angelo, Callum), PT-30, PT-31, PT-32; 14:30-15:00: PT-33* (Fangzheng, Zhengping), PT-34; | A3 due | |
We Dec 6 | Student Presentations
All presentations are in the room 430, Goldberg CS building. Presentation time slots: 12:15-13:00: PT-41* (Mundhir, Brijesh, Harsahib), PT-42, PT-43; 13:30-14:30: PT-44* (Mansoor), PT-45* (Dorsa), PT-46* (Mohammed), PT-47* (Riasat); 15:00-16:00: PT-48* (Hassaan, Ming, Sourabh), PT-49* (Geoff), PT-50* (Yash), PT-51* (Mason); 16:30-17:30: PT-52* (Shaoqin), PT-53* (Yu, Xinxin), PT-54* (Adja, Jyotishka), PT-55; | ||
We Dec 6 | Classes end, Monday schedule used, Report due
Allowed delayed submissions: A3 by Dec 7. R by Dec 18. A4 by Dec 18. | Report due | |
We Dec 6 | A4 due | A4 due | |
Final Exam | |||
Sa Dec 16 | Final Exam (3:30-5:30pm)
Final exam, duration 2 hours, starting at 15:30, DALPLEX. Exams schedule URL: http://www.dal.ca/academics/exam_schedule/halifax_campus_exam_schedule.html | F.Exam |