Dalhousie University    [  http://www.cs.dal.ca/~vlado/hinf6210/coursecalendar.html  ]
Fall 2009 (Sep10-Dec7)
Faculty of Computer Science
Dalhousie University

[ Home | Calendar ]

HINF 6210/ECMM 6014 - Course Calendar (tentative)

#DateTitle 
  Part I: Introduction
1 Th Sep 10Course Introduction
Course information: logistics and administrivia, textbook and main references, evaluation, policy on academic integrity, H1N1 influenza advisory, course content overview; Introductio to data mining: definition, descriptin of data mining, history, application examples: e-commerce, health informatics, and other application areas; data mining and other disciplines. Handouts: Course overview, tentative schedule, policy on academic integrity, H1N1 advisory, A0, slides
Files: PDF slides. Reading: HK Ch.1
A0 out
2 Tu Sep 15Data Mining Data Sources
DM as a step in KDD process, data mining and business intelligence, typical data mining system; different types of data: relational data, data warehouses, etc.; Data mining functionalities (or tasks): classification and prediction, data clustering
Files: PDF slides.
 
  Part II: Fundamentals and Data Preprocessing
3 Th Sep 17Data Mining Functionalities
Data mining functionalities (continued): classification and prediction, cluster analysis, association rule mining, concept description: characterization and discrimination, outlier analysis, trend and evolution analysis, other; measures of interestingness; different views on data mining, major issues in data mining, major sources for research references; Course project: handout, deliverables; Part II: Fundamentals and Data Preprocessing, why is data preprocessing important, major tasks in data preprocessing, some statistical concepts, descriptive data summarization, measuring central tendency; distributive, algebraic, and holistic measures
Files: PDF slides, Project Handout. Reading: HK Ch.2
A0 due
4 Tu Sep 22 Data Cleaning
Descriptive data summarization: measuring data dispersion, graphical summaries, major tasks in data preprocessing; data cleaning: missing values, noisy data, inconsistent data; simple binning: equi-width and equi-depth data; smooothing using binning methods; other types of smoothing; data integration: handling redundancy, correlational analysis, use of chi-square statistics in correlation analysis
Files: PDF slides.
A1 out
5 Th Sep 24Data Transformation
Data transformation; normalization: min-max, z-score, decimal scaling, out-of-range values, softmax scaling, logistic function; Data reduction: strategies, compression: lossy and lossless compression, Discrete Wavelet Transformation, example; dimensionality reduction
Files: PDF slides.
 
6 Tu Sep 29Elements of Algorithmics
Brief algorithms review: algorithms, computational models: Turing machine and RAM, algorithm examples: mean, bubble sort, all subsets; algorithm complexity, polynomial vs. exponential algorithms, NP complete and NP hard problems; Methods for dimensionality reduction, feature selection (forward, backward, etc.), decision trees, principal component analysis, numerosity reduction, V-optimal and MaxDiff partitining, sampling
Files: PDF slides.
 
  Tu Sep 29WEKA Tutorial (12:30-2:30pm, Teaching Lab 2, CSbldg) Files: PPT slides. 
7 Th Oct 1Entropy-based Discretization
Sampling, sampling with and without replacement, stratified sampling; Discretization and concept hierarchies, types of attributes, discretization methods, segmentation by natural partitioning, some notions from information theory, entropy, expected value, discretization using entropy, examples;
Files: PDF slides.
 
  Part III: Core Data Mining Methodology
8 Tu Oct 6Weka Overview
Concept hierarchies: specifications, automatic generation; Introduction to WEKA, some available tools in data mining, WEKA tool for data mining, WEKA: local resources, ARFF data format, running WEKA; Classification and Prediction; Chapter 6 overview, classification vs. prediction, supervised vs. unsupervised learning, classification process, evaluating classification models, classification with decision trees. handout: WEKA ARFF format and a WEKA tutorial
Files: PDF slides. Reading: HK Ch.6
A1 due
9 Th Oct 8Classification using Decision Trees
Classification with decision trees (cont.): example training data set, an algorithm for decision tree induction, ID3 algorithm, example, decision tree representation, avoiding overfitting in decision trees, other improvements; properties of decision tree algorithms; Bayesian Classification: basic notions in probability theory, examples
Files: PDF slides.
P0 due
10 Tu Oct 13Naive Bayes Classification
Bayes theorem and its use in classification, Naive Bayes assumption and Naive Bayes classification, example, Naive Bayes classification with Gaussian distribution, handout, advantages and disadvantages of the Naive Bayes classification; Bayesian Networks, example
Files: PDF slides.
A2 out
11 Th Oct 15Belief Bayesian Networks
Belief Bayesian Networks: definition, classification example (handout), learning Bayesian Networks; Classification by backpropagation: i.e., using artificial neural networks, background on artificial neural networks, an artificial neuron, multi-layer perceptron; Support Vector Machines (SVM): motivation, two-dimensional linearly separable case
Files: PDF slides.
 
12 Tu Oct 20Prediction
Support Vector Machines (cont'd): Multi-dimensional case, linearly inseparable case, introducing new columns, SVM vs. Neural networks, SVM related links; k Nearest Neighbour classification (kNN), Voronoi diagram; Other classification methods; Meta-classifiers (ensemble methods): bagging and boosting; Prediction: linear regression, 2 and more variables (multiple regression), non-linear regression; Distance measures, Minimization of different error distances
Files: PDF slides.
 
13 Th Oct 22Cluster Analysis
Logistic regression, predictor error measures, Using WEKA for classification, WEKA: ARFF data format, WEKA Classifiers, Command-line options, Command-Line options (2), examples of usage, examples of usage (2), Meta-learning schemes; Cluster Analysis (chapter 7): what is cluster analysis? general applications of clustering, examples of clustering applications, type of data in clustering analysis, distance measures
Files: PDF slides. Reading: HK Ch. 7
 
14 Tu Oct 27Clustering Algorithms
Distance examples, cosine similarity and clustering of textual data; Major clustering approaches, partitioning algorithms; k-means clustring method: algorithm, example, strengths and weaknesses; K-Medoids method (PAM), Hierarchiclal clustering, density-based clustering: DBSCAN; Mining association rules (Chapter 5): support, confidence, item-sets, and frequent item-sets
Files: PDF slides. Reading: HK Ch. 5
A2 due
15 Th Oct 29Apriori Algorithm, Student presentations
Student presentations; Apriori algorithm: motivation and basic idea
Files: PDF slides.
P1 due
  Part IV: Data Warehouses and OLAP
16 Tu Nov 3  Data Warehouses and OLAP
Apriori algorithm (continued): example, self-joining and pruning, challenges; Improvement of Apriori algorithm: partitioning with two database scans, FP-growth algorithm; Variations of association rule mining, multi-level and multi-dimensional rules, quantitative association rules, other variations; Concept description and comparison; Data Warehouses and OLAP: what is a data warehouse
Files: PDF slides. Reading: [HK] Ch. 3, Ch. 4
 
17 Th Nov 5  Conceptual Modelling of Data Warehouses
Data warehouse vs. operational DBMS; OLTP vs. OLAP; Multidimensional data model: lattice of cuboids, Conceptual modelling of data warehouses: star, snowflake, and fact fact constellation schema, examples, DMQL; Multidimensional data with concept hierarchies, OLAP operations
Files: PDF slides.
A3 out
18 Tu Nov 10 Efficient Cube Computation
OLAP operations (cont'd), Design of a data warehouse: a business analysis framework, process of data warehouse design, a three-tier warehouse architecture, OLAP server architectures, efficient data cube computation, cube operation; Multi-way array aggregation for cube computation, an efficient method for multi-way array aggregation for cube computation. Discussion about project presentations and project reports.
Files: PDF slides.
A4 out
  Part V: Course Review and Test
19 Th Nov 12 Course Review
Course review, solutions to some sample problems, A2 solutions handed out
Files: PDF slides.
 
20 Tu Nov 17 Test  
  Part VI: Project Presentations
21 Th Nov 19 Student presentations A3 due
22 Tu Nov 24 Student presentations  
23 Th Nov 26 Student presentations  
24 Tu Dec 1  Student presentations  
25 Th Dec 3  Course Evaluation, Student presentations  
  Mo Dec 7End of term, Project report due Report due, A4 due

© 2004-2011 Vlado Keselj, last update: 07-Jun-2010