Dalhousie University    [  http://web.cs.dal.ca/~vlado/csci6509/coursecalendar.html  ]
Fall 2021 (Sep7-Dec7)
Faculty of Computer Science
Dalhousie University

CSCI 4152/6509 — Course Calendar (tentative)

[ Home | Calendar | Project | P0 | Misc | A0 | A1 | A2 | A3 | A4 | Login ]
  Part I: Introduction
1 Tu Sep  7Course Introduction
Course introduction: logistics, administrivia, references, evaluation, policies, schedule, A0; Introduction to NLP (reading Ch.1 [JM]): natural language and other languages, NLP applications, NLP as a research area, NLP Research Links and NLP Anthology http://aclweb.org/anthology/. Short history of NLP.
Files: Syllabus (PDF), slides, lecture notes. Reading: [JM] Ch.1
A0 out
2 Th Sep  9 Levels of NLP; Course Project
NLP methodology overview; Levels of NLP; Why is NLP generally hard; Ambiguities at different levels of NLP. About Course Project: topics and teams, deliverables, P0, P1, P.
Files: slides, lecture notes.
  Part II: Stream-based Text Processing
L1 Tu Sep 14 Lab 1: FCS Computing Environment, Perl Tutorial 1
Logging in using CSID, timberlea environment; Introduction to Perl programming language.
Files: lab notes, slides.
3 Tu Sep 14 Finite Automata and Regular Expressions
About course project (continued): R (report), project types, choosing topic, resources, themse and previous topics. Part II: Stream-based Text Processing: Deterministic and Non-deterministic Automata. (Reading: Chapter 2 [JM]) Review of Deterministic Finite Automata (DFA) and Non-deterministic Finite Automata (NFA), and their use in NLP; NFA-to-DFA conversion. Review of regular expressions.
Files: slides, lecture notes. Reading: [JM] Ch.2
4 Th Sep 16 Text Processing in Perl
Regular expressions review continued: some regex references, history, examples; Introduction to Perl, main Perl features, program examples, syntactic elements, I/O, regular expressions in Perl.
Files: slides, lecture notes.
  Fr Sep 17Last day to add/drop courses A0 due
  Sa Sep 18A1 out A1 out
L2 Tu Sep 21 Lab 2: Perl Tutorial 2
Regular expressions and character n-grams in Perl.
Files: lab notes, slides.
5 Tu Sep 21 Elements of Morphology
More on Perl regular expressions; Text processing examples: tokenization, countil letters. Elements of Morphology: reading: Section 3.1 [JM]; morphemes, stems, affixes, tokenization, stemming, lemmatization; morphological processes. Characters, Words, and N-grams: counting words, Zipf's law, n-grams.
Files: slides, lecture notes.
6 Th Sep 23 Elements of Information Retrieval
Perl examples with n-gram collection. Elements of information retrieval: typical IR system architecture, vector space model. Reading: [JM] 23.1 (Information Retrieval), [MS] Ch.15 (Topics in Information Retrieval). Some interesting links: Lucene, IR book by Manning, Raghavan, and Schutze. IR Evaluation: precision, recall, F-measure, precision-recall curve.
Files: slides, lecture notes. Reading: [JM] 23.1 (Information Retrieval), [MS] Ch.15 (Topics in Information Retrieval)
L3 Tu Sep 28 Lab 3: Perl Tutorial 3
Perl: Arrays or lists; associative arrays or hashes; references.
Files: lab notes, slides.
7 Tu Sep 28 Text Classification
Interpolated Precision-Recall curve. Text mining. Text Classification: classifier evaluation, evaluation measures for text classification, evaluation methods for text classification; Text clustering; Similarity-based text classification: CNG classification method for authorship attribution.
Files: slides, lecture notes.
A1 due
  Th Sep 30National Day for Truth and Reconciliation, University closed  
  Fr Oct  1Last day to drop classes without "W", change audit to credit or vv. P0 due
L4 Tu Oct  5 Lab 4: Git and GitLab Tutorial
Introduction to GitLab and Git; adding and modifying files, setting up SSH key, add, commit, and push commands, checkout; creating branches and working collaboratively, pull, merge, rebase, resolving conflicts.
Files: lab notes, slides.
  Part III: Probabilistic Approach to NLP
8 Tu Oct  5 Edit Distance; Probabilistic Modeling
Edit distance: introduction, properties, dynamic programming approach, example, algorithm. Probabilistic approach to NLP: logical vs. plausible reasoning in AI and NLP; Brief review of elements of probability theory.
Files: slides, lecture notes.
9 Th Oct  7 P0 Topics Discussion (1)
Projects discussion: P-01, P-02, P-03, P-04, P-05, P-07, P-08, P-09, P-10, P-11, P-12, P-13, P-14, P-15.
Files: P0 slides, slides, lecture notes.
  Mo Oct 11Thanksgiving Day, University closed  
L5 Tu Oct 12 Lab 5: Python NLTK Tutorial 1
Introduction to Python: basics, lists, tuples, dictionaries; Introduction to NLTK: tokenization, stop-words, stemming, n-grams, frequency distribution, classification.
Files: lab notes, slides.
10 Tu Oct 12 P0 Topics Discussion (2); Probabilistic Modeling
Projects discussion: P-06, P-16, P-17, P-18, P-19, P-21, P-23, P-24, P-26, P-27, P-28, P-29. Probabilistic modelling: probability theory review (continued)
Files: P0 slides, slides, lecture notes.
  We Oct 13 A2 out A2 out
11 Th Oct 14 Probabilistic Modeling
Bayesian inference, generative models. Probabilistic modeling: random variables, configurations, and models; computational tasks; joint distribution model; fully independent model.
Files: slides, lecture notes.
L6 Tu Oct 19 Lab 6: Python NLTK Tutorial 2
Part-of-speech taggers in NLTK: HMM and CRF, Brill tagger; Named entity chunking; Jupyter and using JupyterHub.
Files: lab notes, slides.
12 Tu Oct 19 Naive Bayes Model
Fully-independend model (continued); efficient product-sum formula. Naive Bayes model: definition, assumption, graphical model, computational tasks, spam example, additional notes. N-gram model: role in language modeling assumption, (to continue).
Files: slides, lecture notes.
13 Th Oct 21N-gram Model
N-gram model as Markov Chain; reading:[JM] Ch4 N-Grams; language model evaluation: perplexity; language modeling in classification; N-gram model smoothing: Laplace and Witten-Bell smoothing. POS tags: introduction, open and closed word categories. reading: [JM] Ch5 Part-of-Speech Tagging. Open word categories: nouns (NN, NNS, NNP, NNPS), adjectives (JJ, JJR, JJS),
Files: slides, lecture notes. Reading: [JM] Ch4 N-Grams. [JM] Ch5 Part-of-Speech Tagging.
L7 Tu Oct 26 Lab 7: Fetching Tweets with Python
Using Twitter API, Tweepy, retrieving user profile, retrieving tweets, saving tweets into a csv file, preprocessing tweets.
Files: lab notes.
14 Tu Oct 26 POS Tags and POS Tagging
POS tags (continued): verbs (VB, VBP, VBZ, VBG, VBD, VBN), adverbs (RB, RBR, RBS); Closed word categories: DT, WDT. PDT, PRP, PRP$, WP, WP$, IN, RP, POS, MD, TO, RB (closed), WRB, CC, UH; Other POS classes: EX, FW, LS, punctuation, SYM. Examples. Hidden Markov Model (HMM): motivation, definition, HMM assumption, applications, POS tagging. reading: [JM] Ch. 6 (HMM, first part)
Files: slides, lecture notes. Reading: [JM] Ch. 6 (HMM, first part)
A2 due
15 Th Oct 28 HMM as Bayesian Network
P1 requirements. HMM inference: Viterbi algorithm, HMM as a Bayesian Network: BN definition, burglar-earthquake example, computational tasks, brute-force inference in BNs, difficulty of inference in general BNs; Sum-product algorithms: factor graph.
Files: slides, lecture notes.
  Mo Nov  1Last day to drop classes with "W"  
16 Tu Nov  2 Sum-Product (Message Passing) Algorithms
Principles of message-passing algorithms, order of message calculation, cases in message calculation, inference tasks solved by message-passing algorithms, burglar-earthquake example with message passing.
Files: slides, lecture notes.
17 Th Nov  4 Sum-Product HMM Example, Neural Network Models
POS tagging example using message passing algorithm. Neural networks and deep learning: applications, some main developments, large deep learning models, growth in size.
Files: slides, lecture notes.
  Fr Nov  5P1 due (postponed) P1 due
  Mo Nov  8Fall Study Break Nov 8-12, no classes, University open A3 out
  Th Nov 11Remembrance Day, University closed  
  Part IV: Parsing (Syntactic Processing)
L8 Tu Nov 16 Lab 8: Prolog Tutorial 1 Files: lab notes, slides. 
18 Tu Nov 16 Introduction to Prolog and Unification
Deep learning (continued): biological neuron, perceptron, feed-forward network, activation functions, logistic regression as a simple network, softmax function; model overviews: neural language model, recurrent neural network, stacked and bidirectional RNN, LSTM, self-attention and transformers. Parsing: A brief introduction to Prolog, unification and backtracking.
Files: slides, lecture notes.
19 Th Nov 18 Natural Language Syntax
Prolog (continued): variables, lists, structures; examples: factorial, member. Parsing (Syntactic Processing): Natural language syntax: phrase structure, clauses, sentences; reading: [JM] Ch 12; parsing, parse tree examples. Contest-Free Grammars review: definition, parse trees, derivations and other concepts, bracket representation. Using Prolog to parse NL (started).
Files: slides, lecture notes.
  Mo Nov 22 A3 due A3 due
L9 Tu Nov 23 Lab 9: Prolog Tutorial 2 Files: lab notes, slides. 
20 Tu Nov 23 NL Parsing in Prolog, PCFG
Using Prolog to parse NL (continued): Parsing natural language in Prolog using difference lists; Definite Clause Grammars (DCG): example, building parse tree, handling agreement, embedded code. Probabilistic Context-Free Grammars (PCFG): PCFG as a probabilistic model, computational tasks for PCFG model: evaluation, learning, simulation, proper PCFG, expressing PCFGs in DCGs. reading: [JM] Chapters 13 and 14 (PCFG) CYK chart parsing algorithm, CNF.
Files: slides, lecture notes. Reading: [JM] Chapters 13 and 14 (PCFG)
21 Th Nov 25 CYK Parsing for CFG and PCFG
CYK algorithm: example (continued), algorithm; CYK for PCFG: marginalization, completion; issues with PCFG. Typical phrase structure rules in English (started).
Files: slides, lecture notes.
A4 out
22 Tu Nov 30Natural Language Syntax
Phrase structure in English (continued): NP, VP, PP, ADJP, ADVP; heads and dependency, dependency tree; non-context-free phenomena: agreement, movement subcategorization. Parser evaluation. Elements of semantics (started).
Files: slides, lecture notes.
23 Th Dec  2 Unification in Syntactic and Semantic Processing Files: slides, lecture notes. 
  Part VI: Student Presentations
  Mo Dec  6 Student Presentations (during day)
12:30-13:00: PT-11* (Haorui, Dongyuan, Yilong, Jingwen), PT-12;
13:30-14:30: PT-13* (Leon), PT-14* (Asad), PT-15, PT-16;
14:45-15:45: PT-17* (Keelin), PT-18* (Justin), PT-19* (Zesheng), PT-20* (Arit);
16:00-17:00: PT-21* (Tongqi), PT-22* (Yixiao), PT-23, PT-24* (Ben, Urmzd);
  Tu Dec  7 Student Presentations (during day)
09:30-10:30: PT-25* (Robert, Julia, Noah, Conor), PT-26* (Jemis, Kirtan) , PT-27* (Yaoxin), PT-28* (Emily);
10:45-11:45: PT-29* (Frederik), PT-30* (Keshava), PT-31* (Patrick), PT-32* (Gaurav);
12:00-13:00: PT-33* (Adeolu, Aishik, Grant), PT-34* (Sanjana), PT-35* (Yuqing, Borong, Junqiao, Archer), PT-36* (Parvez);
13:30-14:30: PT-37* (Ohiduzzaman), PT-38* (Sigma), PT-39* (Usmi), PT-40* (Janvi, Bansi, Deep, Sanket);
14:45-15:45: PT-41* (Isaac), PT-42* (Will), PT-43* (Akhilesh), PT-44* (Mayank);
16:00-17:00: PT-45* (Kishan, Narendran, Nirmal, Tejeswi), PT-46* (Siddharth), PT-47* (Temi), PT-48* (Pranav, Yunzhong);
A4 due
  Tu Dec  7Classes end, Monday schedule used, Report due Reports due
  Mo Dec 13: Extended deadline for A4 A4
  We Dec 15: Extended deadline for Project Report R
  Final Exam
  Fr Dec 17Final Exam (8:30-10:30am)
Final exam, duration 2 hours, starting at 08:30am, On-line. Exams schedule URL: http://www.dal.ca/academics/exam_schedule/halifax_campus_exam_schedule.html

Maintained by: Vlado Keselj, last update: 16-Dec-2021