[ http://web.cs.dal.ca/~vlado/csci6509/coursecalendar.html ]
Fall 2024 (Sep3-Dec4) Faculty of Computer Science Dalhousie University |
# | Date | Title | |
---|---|---|---|
Part I: Introduction | |||
1 | We Sep 4 | Course Introduction
Course introduction: logistics, administrivia, references, evaluation, policies, schedule; Introduction to NLP (reading Ch.1 [JM]): natural language and other languages, NLP applications, NLP as a research area, NLP Research Links and NLP Anthology http://aclweb.org/anthology/. Short history of NLP. NLP methodology overview. Levels of NLP. Why is NLP generally hard. Files: Syllabus (PDF), slides, lecture notes. Reading: [JM] Ch.1 | |
2 | Mo Sep 9 | Ambiguities in NLP; Course Project
Ambiguities at different levels of NLP. About Course Project: topics and teams, deliverables, P0, P1, P, R; project types, choosing topic, resources, themes and previous topics. Files: slides, lecture notes. | |
Part II: Stream-based Text Processing | |||
3 | We Sep 11 | Finite Automata Review
Part II: Stream-based Text Processing: Deterministic and Non-deterministic Automata. (Reading: Chapter 2 [JM]) Review of Deterministic Finite Automata (DFA). Review of Non-deterministic Finite Automata (NFA), and their use in NLP. NFA-to-DFA conversion. Files: slides, lecture notes. Reading: [JM] Ch.2 | A0 out |
L1 | Fr Sep 13 | Lab 1: FCS Computing Environment, Perl Tutorial 1
Logging in using CSID, timberlea environment; Introduction to Perl programming language: basic syntax, variables, string literals, subroutines. Files: lab notes, slides. | |
4 | Mo Sep 16 | Regular Expressions and Perl Files: slides, lecture notes. Reading: On timberlea server `man perlretut' and `man perlre', or perlretut and perlre | |
Tu Sep 17 | Last day to add/drop courses | ||
5 | We Sep 18 | Basic NLP in Perl
Regular expressions in Perl and basic text processing; Text processing examples: tokenization, counting letters. Elements of Morphology: reading: Section 3.1 [JM]; morphemes, stems, affixes, tokenization, stemming. Files: slides, lecture notes. Reading: Section 3.1 [JM] | A0 due |
L2 | Fr Sep 20 | Lab 2: Perl Tutorial 2
Regular expressions in Perl, Perl: basic I/O. Files: lab notes, slides. | |
6 | Mo Sep 23 | Counting N-grams
Elements of Morphology (continued): lematization, morphological processes; Characters, Words, and N-grams: counting words, Zipf's law. Perl examples with n-gram collection. Elements of Information Retrieval: Vector Space Model. Files: slides, lecture notes. Reading: [JM] 23.1 (Information Retrieval), [MS] Ch.15 (Topics in Information Retrieval) | A1 out |
7 | We Sep 25 | Elements of Information Retrieval and Text Mining
Some interesting links: Lucene, IR book by Manning, Raghavan, and Schutze. IR Evaluation: precision, recall, F-measure, precision-recall curve. Interpolated Precision-Recall curve. Text mining. Text Classification: classifier evaluation precision, recall, and F-measure in classification. Evaluation methods for classification: training error, train-and-test, and n-fold cross-validation. Similarity-based text classification. Files: slides, lecture notes. | |
L3 | Fr Sep 27 | Lab 3: Perl Tutorial 3
Perl: Arrays or lists; associative arrays or hashes; references. Files: lab notes, slides. | |
Fr Sep 27 | P0 Project Topic Proposal due | P0 due | |
Mo Sep 30 | National Day for Truth and Reconciliation, University closed | ||
We Oct 2 | Last day to drop classes without "W", change audit to credit or vv. | ||
8 | We Oct 2 | Similarity-based Classification
CNG classification method for authorship attribution. Edit distance: introduction, dynamic programming approach, example, algorithm. Files: slides, lecture notes. | |
L4 | Fr Oct 4 | Lab 4: Git and GitLab Tutorial
Introduction to GitLab and Git; adding and modifying files, setting up SSH key, add, commit, and push commands, checkout; creating branches and working collaboratively, pull, merge, resolving conflicts. Files: lab notes, slides. | |
Part III: Probabilistic and Machine Learning Approach to NLP | |||
9 | Mo Oct 7 | P0 Topics Discussion; Introduction to Probabilistic Modeling
Projects discussion: P-01, P-03, P-04, P-05, P-06. Probabilistic approach to NLP: logical vs. plausible reasoning in AI and NLP; Brief review of elements of probability theory. Bayesian inference, generative models. Probabilistic modeling: random variables, configurations, and models; computational tasks. Files: P0 slides, slides, lecture notes. | |
10 | We Oct 9 | Basic Probabilistic Models; P0 Topics Discussion (2)
Joint distribution model; simulation task; other tasks in joint distribution model; spam example. Fully independent model. Note on efficient sum-product computation and max-product computation. Project discussion: P-02. Files: slides, lecture notes, P0 slides #12. | |
Th Oct 10 | A1 due | A1 due | |
L5 | Fr Oct 11 | Lab 5: Python NLTK Tutorial 1
Introduction to Python: basics, lists, tuples, dictionaries; Introduction to NLTK: tokenization, stop-words, stemming, n-grams, frequency distribution, classification. Files: lab notes, slides. | |
Mo Oct 14 | Thanksgiving Day, University closed | ||
11 | We Oct 16 | Naive Bayes Model; P0 Topics Discussion (3)
Fully-independent model finished. Naive Bayes model: definition, assumption, graphical model, computational tasks with example, number of parameters, pros and cons, variations and practical issues. Project discussions: P-07, P-08, P-09, P-11, P-12, P-13. Files: slides, lecture notes, P0 slides #14. | |
L6 | Fr Oct 18 | Lab 6: Python NLTK Tutorial 2
Part-of-speech taggers in NLTK: HMM and CRF, Brill tagger; Named entity chunking; Jupyter and using JupyterHub. Files: lab notes, slides. | |
12 | Mo Oct 21 | P0 Topics Discussion (4); N-gram Model
Project discussions: P-14, P-15, P-16, P-17, P-18, P19, P-20, P-21, P-23, P-24. N-gram model: language modeling, N-gram model assumption, graphical representation. Files: P0 slides, slides, lecture notes. Reading: [JM] Ch4 N-Grams | |
13 | We Oct 23 | N-gram Model Smoothing
P0 discussion (5): P-22, P-25. N-gram model as Markov Chain; language model evaluation: perplexity; language modeling in classification. N-gram model smoothing: Laplace and Witten-Bell smoothing. Files: slides, lecture notes, P0 slides (#32). | |
L7 | Fr Oct 25 | Lab 7: P0 Submission help; TeX/LaTeX Tutorial (not marked) Files: slides. | |
14 | Mo Oct 28 | POS Tagging and Hidden Markov Model
Witten-Bell smooting (finished). POS tagging. POS tags: introduction, open and closed word categories. reading: [JM] Ch5 Part-of-Speech Tagging. Open word categories: nouns (NN, NNS, NNP, NNPS), adjectives (JJ, JJR, JJS), verbs (VB, VBP, VBZ, VBG, VBD, VBN), adverbs (RB, RBR, RBS); Closed word categories: DT, WDT. PDT, PRP, PRP$, WP, WP$, IN, RP, POS, MD, TO, RB (closed), WRB, CC, UH; Other POS classes: EX, FW, LS, punctuation, SYM. Examples. Hidden Markov Model (HMM): motivation, definition, HMM assumption, applications. Example with HMM for POS Tagging. Files: slides, lecture notes. Reading: [JM] Ch5 Part-of-Speech Tagging. | P1 due |
15 | We Oct 30 | Inference with HMMs
POS tagging with HMM example (continued): brute-force tagging, Viterbi algorithm. HMM as a Bayesian Network. Bayesian Networks review. Bayesian Network inference: general brute-force method; Sum-product message passing algorithms for efficient inference (started): factor graph, order of message calculation. Files: slides, lecture notes. | A2 out |
Th Oct 31 | Last day to drop classes with "W" | ||
16 | Mo Nov 4 | Efficient Inference for Bayesian Networks and HMMs
Sum-product message-passing algorithms (continued): cases in message calculation, inference computational tasks. BN example with message passing. HMM tagging example with message passing. Files: slides, lecture notes. | |
17 | We Nov 6 | Neural Networks and NLP
Deep Learning Models in NLP: applications, recent history; ideas behind artifical neural networks; elements: perceptron or neuron, activation function, similarity to logistic regression, simple neural language model, RNNs. Files: slides, lecture notes. | |
L8 | Fr Nov 8 | Lab 8: Python Tutorial with PyTorch
Lab 8 PyTorch instruction notebook (on Google Colab) | |
Mo Nov 11 | Remembrance Day, University closed | ||
Mo Nov 11 | Fall Study Break Nov 11-15, no classes, University open except Mon | ||
Part IV: Parsing (Syntactic Processing) | |||
18 | Mo Nov 18 | Deep Learning and NLP; DCG and PCFG Files: slides, lecture notes. | A2 due |
19 | We Nov 20 | DCG and PCFG Grammars Files: slides, lecture notes. | A3 out |
L8 | Fr Nov 22 | Lab 9: Prolog Tutorial 1 | |
20 | Mo Nov 25 | Heads and Dependency, NL Phenomena | |
21 | We Nov 27 | Typical Phrase Structure Rules in English | |
L9 | Fr Nov 29 | Lab 10: Prolog Tutorial 2 | |
Part V: Student Presentations | |||
Mo Dec 2 | Student Presentations
All presentations are in the room 430, Goldberg CS building. Presentation time slots: 10:00-11:15: PT-01, PT-02, PT-03, PT-04, P-05; 11:30-12:45: PT-06, PT-07, PT-08, PT-09, P-10; 15:00-16:00: PT-11, PT-12, PT-13, PT-14; 16:15-17:00: PT-15, PT-16, PT-17; | ||
Tu Dec 3 | Student Presentations
All presentations are in the room 430, Goldberg CS building. Presentation time slots: 10:00-11:00: PT-21, PT-22, PT-23, PT-24; 11:15-12:15: PT-25* (Jenna), PT-26, PT-27* (Bryce), PT-28; 12:45-13:45: PT-29, PT-30, PT-31, PT-32; 14:00-15:00: PT-33, PT-34, PT-35, PT-36; 15:15-16:15: PT-37, PT-39, PT-39, PT-40; 16:30-17:00: PT-41, PT-42; | ||
We Dec 4 | Student Presentations
All presentations are in the room 430, Goldberg CS building. Presentation time slots: 12:30-13:30: PT-51, PT-52* (Ridhi), PT-53, PT-54; 13:45-14:45: PT-55, PT-56, PT-57, PT-58; 15:00-16:00: PT-59, PT-60, PT-61, PT-62* (Tasjid); 16:15-16:45: PT-63* (Gobind), PT-64* (Kanav); | A3 due | |
We Dec 4 | Classes end, Report due | Report due | |
Final Exam | |||
Th Dec 12 | Final Exam (8:30-10:30am)
Final exam, duration 2 hours, starting at 8:30am, Dalplex. Exams schedule URL: https://www.dal.ca/exams/halifax-exam-schedule.html | F.Exam |