[ http://web.cs.dal.ca/~vlado/csci6509/coursecalendar.html ]
Fall 2024 (Sep3-Dec4) Faculty of Computer Science Dalhousie University |
# | Date | Title | |
---|---|---|---|
Part I: Introduction | |||
1 | We Sep 4 | Course Introduction
Course introduction: logistics, administrivia, references, evaluation, policies, schedule; Introduction to NLP (reading Ch.1 [JM]): natural language and other languages, NLP applications, NLP as a research area, NLP Research Links and NLP Anthology http://aclweb.org/anthology/. Short history of NLP. NLP methodology overview. Levels of NLP. Why is NLP generally hard. Files: Syllabus (PDF), slides, lecture notes. Reading: [JM] Ch.1 | |
2 | Mo Sep 9 | Ambiguities in NLP; Course Project
Ambiguities at different levels of NLP. About Course Project: topics and teams, deliverables, P0, P1, P, R; project types, choosing topic, resources, themes and previous topics. Files: slides, lecture notes. | |
Part II: Stream-based Text Processing | |||
3 | We Sep 11 | Finite Automata Review
Part II: Stream-based Text Processing: Deterministic and Non-deterministic Automata. (Reading: Chapter 2 [JM]) Review of Deterministic Finite Automata (DFA). Review of Non-deterministic Finite Automata (NFA), and their use in NLP. NFA-to-DFA conversion. Files: slides, lecture notes. Reading: [JM] Ch.2 | A0 out |
L1 | Fr Sep 13 | Lab 1: FCS Computing Environment, Perl Tutorial 1
Logging in using CSID, timberlea environment; Introduction to Perl programming language: basic syntax, variables, string literals, subroutines. Files: lab notes, slides. | |
4 | Mo Sep 16 | Regular Expressions and Perl Files: slides, lecture notes. Reading: On timberlea server `man perlretut' and `man perlre', or perlretut and perlre | |
Tu Sep 17 | Last day to add/drop courses | ||
5 | We Sep 18 | Basic NLP in Perl
Regular expressions in Perl and basic text processing; Text processing examples: tokenization, counting letters. Elements of Morphology: reading: Section 3.1 [JM]; morphemes, stems, affixes, tokenization, stemming. Files: slides, lecture notes. Reading: Section 3.1 [JM] | A0 due |
L2 | Fr Sep 20 | Lab 2: Perl Tutorial 2
Regular expressions in Perl, Perl: basic I/O. Files: lab notes, slides. | |
6 | Mo Sep 23 | Counting N-grams
Elements of Morphology (continued): lematization, morphological processes; Characters, Words, and N-grams: counting words, Zipf's law. Perl examples with n-gram collection. Elements of Information Retrieval: Vector Space Model. Files: slides, lecture notes. Reading: [JM] 23.1 (Information Retrieval), [MS] Ch.15 (Topics in Information Retrieval) | A1 out |
7 | We Sep 25 | Elements of Information Retrieval and Text Mining
Some interesting links: Lucene, IR book by Manning, Raghavan, and Schutze. IR Evaluation: precision, recall, F-measure, precision-recall curve. Interpolated Precision-Recall curve. Text mining. Text Classification: classifier evaluation precision, recall, and F-measure in classification. Evaluation methods for classification: training error, train-and-test, and n-fold cross-validation. Similarity-based text classification. Files: slides, lecture notes. | |
L3 | Fr Sep 27 | Lab 3: Perl Tutorial 3
Perl: Arrays or lists; associative arrays or hashes; references. Files: lab notes, slides. | |
Fr Sep 27 | P0 Project Topic Proposal due | P0 due | |
Mo Sep 30 | National Day for Truth and Reconciliation, University closed | ||
We Oct 2 | Last day to drop classes without "W", change audit to credit or vv. | ||
8 | We Oct 2 | Similarity-based Classification
CNG classification method for authorship attribution. Edit distance: introduction, dynamic programming approach, example, algorithm. Files: slides, lecture notes. | |
L4 | Fr Oct 4 | Lab 4: Git and GitLab Tutorial
Introduction to GitLab and Git; adding and modifying files, setting up SSH key, add, commit, and push commands, checkout; creating branches and working collaboratively, pull, merge, resolving conflicts. Files: lab notes, slides. | |
Part III: Probabilistic and Machine Learning Approach to NLP | |||
9 | Mo Oct 7 | P0 Topics Discussion; Introduction to Probabilistic Modeling
Projects discussion: P-01, P-03, P-04, P-05, P-06. Probabilistic approach to NLP: logical vs. plausible reasoning in AI and NLP; Brief review of elements of probability theory. Bayesian inference, generative models. Probabilistic modeling: random variables, configurations, and models; computational tasks. Files: P0 slides, slides, lecture notes. | |
10 | We Oct 9 | Basic Probabilistic Models; P0 Topics Discussion (2)
Joint distribution model; simulation task; other tasks in joint distribution model; spam example. Fully independent model. Note on efficient sum-product computation and max-product computation. Project discussion: P-02. Files: slides, lecture notes, P0 slides #12. | |
Th Oct 10 | A1 due | A1 due | |
L5 | Fr Oct 11 | Lab 5: Python NLTK Tutorial 1
Introduction to Python: basics, lists, tuples, dictionaries; Introduction to NLTK: tokenization, stop-words, stemming, n-grams, frequency distribution, classification. Files: lab notes, slides. | |
Mo Oct 14 | Thanksgiving Day, University closed | ||
11 | We Oct 16 | Naive Bayes Model; P0 Topics Discussion (3)
Fully-independent model finished. Naive Bayes model: definition, assumption, graphical model, computational tasks with example, number of parameters, pros and cons, variations and practical issues. Project discussions: P-07, P-08, P-09, P-11, P-12, P-13. Files: slides, lecture notes, P0 slides #14. | |
L6 | Fr Oct 18 | Lab 6: Python NLTK Tutorial 2
Part-of-speech taggers in NLTK: HMM and CRF, Brill tagger; Named entity chunking; Jupyter and using JupyterHub. Files: lab notes, slides. | |
12 | Mo Oct 21 | P0 Topics Discussion (4); N-gram Model
Project discussions: P-14, P-15, P-16, P-17, P-18, P19, P-20, P-21, P-23, P-24. N-gram model: language modeling, N-gram model assumption, graphical representation. Files: P0 slides, slides, lecture notes. Reading: [JM] Ch4 N-Grams | |
13 | We Oct 23 | N-gram Model Smoothing
P0 discussion (5): P-22, P-25. N-gram model as Markov Chain; language model evaluation: perplexity; language modeling in classification. N-gram model smoothing: Laplace and Witten-Bell smoothing. Files: slides, lecture notes, P0 slides (#32). | |
L7 | Fr Oct 25 | Lab 7: P1 Submission help; TeX/LaTeX Tutorial (not marked) Files: slides. | |
14 | Mo Oct 28 | POS Tagging and Hidden Markov Model
Witten-Bell smooting (finished). POS tagging. POS tags: introduction, open and closed word categories. reading: [JM] Ch5 Part-of-Speech Tagging. Open word categories: nouns (NN, NNS, NNP, NNPS), adjectives (JJ, JJR, JJS), verbs (VB, VBP, VBZ, VBG, VBD, VBN), adverbs (RB, RBR, RBS); Closed word categories: DT, WDT. PDT, PRP, PRP$, WP, WP$, IN, RP, POS, MD, TO, RB (closed), WRB, CC, UH; Other POS classes: EX, FW, LS, punctuation, SYM. Examples. Hidden Markov Model (HMM): motivation, definition, HMM assumption, applications. Example with HMM for POS Tagging. Files: slides, lecture notes. Reading: [JM] Ch5 Part-of-Speech Tagging. | P1 due |
15 | We Oct 30 | Inference with HMMs
POS tagging with HMM example (continued): brute-force tagging, Viterbi algorithm. HMM as a Bayesian Network. Bayesian Networks review. Bayesian Network inference: general brute-force method; Sum-product message passing algorithms for efficient inference (started): factor graph, order of message calculation. Files: slides, lecture notes. | A2 out |
Th Oct 31 | Last day to drop classes with "W" | ||
16 | Mo Nov 4 | Efficient Inference for Bayesian Networks and HMMs
Sum-product message-passing algorithms (continued): cases in message calculation, inference computational tasks. BN example with message passing. HMM tagging example with message passing. Files: slides, lecture notes. | |
17 | We Nov 6 | Neural Networks and NLP
Deep Learning Models in NLP: applications, recent history; ideas behind artifical neural networks; elements: perceptron or neuron, activation function, similarity to logistic regression, simple neural language model, RNNs. Files: slides, lecture notes. | |
L8 | Fr Nov 8 | Lab 8: Python Tutorial with PyTorch
Lab 8 PyTorch instruction notebook (on Google Colab) | |
Mo Nov 11 | Remembrance Day, University closed | ||
Mo Nov 11 | Fall Study Break Nov 11-15, no classes, University open except Mon | ||
Part IV: Parsing (Syntactic Processing) | |||
18 | Mo Nov 18 | Deep Learning and NLP; DCG and PCFG
Stacked and bidirectional RNN, LSTM, self-attention and transformers. Parsing (Syntactic Processing): A brief introduction to Prolog, unification, and backgracking; Natural Language Syntax: phrase structure, clauses, sentences; parsing, parse tree examples. Contest-Free Grammars review: definition, parse trees, derivations and other concepts. Files: slides, lecture notes. | A2 due |
19 | We Nov 20 | DCG and PCFG Grammars Files: slides, lecture notes. | A3 out |
L9 | Fr Nov 22 | Lab 9: Prolog Tutorial 1
IMPORTANT NOTICE: The morning Lab is CANCELLED due to technical issues. Update at 8:43am: Technical issues resolved. The morning lab is still cancelled but the afternoon lab will be held. Topics: General Prolog tutorial. Files: lab notes, slides. | |
20 | Mo Nov 25 | Syntax of Natural Languages; CKY Algorithm
Typical phrase structure rules in English (continued): NP, VP, PP, ADJP, ADVP. Dependency structure: heads and dependencies, dependency tree. CYK parsing algorithm (started). Files: slides, lecture notes. | |
21 | We Nov 27 | CKY Algorithm and PCFGs
CKY algorithm (continued), CKY for PCFGs, Issues with PCFGs, ideas fo probabilistic lexicalized CFGs. Files: slides, lecture notes. | |
Th Nov 28 | Assignment 4 out | A4 out | |
L10 | Fr Nov 29 | Lab 10: Prolog Tutorial 2 Files: lab notes, slides. | |
Part V: Student Presentations | |||
Mo Dec 2 | Student Presentations
All presentations are in the room 430, Goldberg CS building. Presentation time slots: 10:00-10:30: PT-01, PT-02; 10:30-11:15: Session 1: PT-03* (Samson), PT-04, PT-05* (Jarrod); 11:30-12:45: PT-06, PT-07, PT-08, PT-09, P-10; 15:00-15:30: PT-11, PT-12; 15:30-16:00: Session 2: PT-13* (Ye, Xu, Baike, Xuelian), PT-14; 16:15-17:00: Session 3: PT-15* (Usama, Konstantin, Pallavi, Rutvik), PT-16* (Rashik, Ying), PT-17; | ||
Tu Dec 3 | Student Presentations
All presentations are in the room 430, Goldberg CS building (rm429/430 booked 10am-5pm). Presentation time slots: 10:00-11:00: PT-21, PT-22, PT-23, PT-24; 11:15-12:30: Session 4: PT-25* (Jenna), PT-26* (Shrey, Kavya), PT-27* (Jay), PT-28, PT-29* (Sibi); 13:00-13:45: PT-30, PT-31, PT-32; 14:00-15:00: Session 5: PT-33* (Logan), PT-34* (Aryan), PT-35, PT-36; 15:15-16:15: Session 6: PT-37* (Alb), PT-38* (Reginald), PT-39, PT-40* (Surya, Sudhan, Karthik, Allotei); 16:30-17:00: Session 7: PT-41* (Bryce), PT-42* (Shubham, Inderdeep); | ||
We Dec 4 | Student Presentations
All presentations are in the room 430, Goldberg CS building. Presentation time slots: 12:30-13:30: Session 8: PT-51, PT-52* (Ridhi), PT-53* (Aditya), PT-54* (Alrashdi, Alsalmim, Yahya, Omar); 13:45-14:45: Session 9: PT-55, PT-56* (Hinda), PT-57* (Ajaykumar, Manish), PT-58; 15:00-16:00: Session 10: PT-59* (Rifat, Mukarrom, Priyadharshini, Syanthan), PT-60* (Omar SA, Keenan), PT-61* (Arash, Aniq, Sebastian, Udhaya), PT-62* (Tasjid); 16:15-16:45: Session 11: PT-63* (Gobind), PT-64* (Kanav); | A3 due, A4 due | |
We Dec 4 | Classes end, Report due | Report due | |
Final Exam | |||
Th Dec 12 | Final Exam (8:30-10:30am)
Final exam, duration 2 hours, starting at 8:30am, Dalplex. Exams schedule URL: https://www.dal.ca/exams/halifax-exam-schedule.html | F.Exam |