CSCI 3151.03 Web Intelligence

Instructor: Evangelos E. Milios

This is a new course that will capitalize on the research strength of our Faculty in these areas. It will be an introduction to Semantic Web concepts, ontologies, knowledge management, natural language processing, text mining, information extraction, social networks.

Times & Days:
Monday 0930-1030 + 1400-1530 (as needed)
Office Hours:
Monday, Wednesday, Friday, 1030-1100
and by appointment

Calendar Description: The Web and on-line digital libraries constitute the largest repository of interconnected knowledge in text form mankind ever created. Search engines have made this knowledge accessible to the lay person. Social networks further enhance the exchange of knowledge among individual Web users. Mining the Web and associated digital libraries is the next challenge that promises to change the nature of scientific discovery, and to dramatically impact the way business is conducted. This course will introduce the core Artificial Intelligence concepts and algorithms in the context of Web and text mining: machine learning, natural language processing, semantic web, social networks and web usage mining.

Prerequisites: CSCI 2112, CSCI 2141, STAT 2060, MATH 2030 (each with a grade of C- or better)

Weekly schedule -- lectures

Virtual class attendance


Course Component Weight

Assignments (a1 , a2, a3 equally weighted)
a0 is pass/fail

10% x 3 = 30%
Class Participation 10%
Midterm exam


Final exam 40%


Students with disabilities are encouraged to register as quickly as possible at the Student Accessibility Services if they want to receive academic accommodations. To do so please phone 494-2836, e-mail access <at-symbol> , drop in at the Killam, G28 or visit  .

Required textbooks

[BL] Bing Liu: Web Data Mining, Springer, 2nd ed. 2011 (view online or download from Springerlink)

[MRS] Manning, Raghavan and Schuetze: Introduction to Information Retrieval, Cambridge University Press, 2008 (book available online)

[SAOM] Bing Liu: Sentiment Analysis and Opinion Mining, Morgan & Claypool, 2013 (pdf can be downloaded from publisher's website)

[BKL] Steven Bird; Ewan Klein; Edward Loper: Natural Language Processing with Python, O'Reilly Media, Inc.
2009, 978-0-596-51649-9 (available on Safari and in html form here)

[MLR] Brett Lantz: Machine Learning with R, ISBN: 9781782162148, Oct. 2013 (available on Safari)

[RDM] Yanchang Zhao R and Data mining: Examples and Case Studies Academic Press, Elsevier, ISBN: 978-0-123-96963-7, December 2012 (pdf and code available online)

[TS] T. Segaran: Programming Collective Intelligence: Building smart Web 2.0 applications O'Reilly, Aug. 2007, 1st ed., ISBN 0-596-52932-5
(available on Safari)

[SK] Scott Spangler and Jeffrey Kreulen: Mining the Talk: Unlocking the business value in unstructured information ISBN-10: 0-13-233953-6; ISBN-13: 978-0-13-233953-7; Published: Jul 19, 2007; Edition: 1st. (readable on Safari)

The R project for statistical computing (free comprehensive statistical programming environment)
---- TM: Text Mining in R
---- R/Weka: an R interface to WEKA
---- Online documents on R and data mining

Recommended Readings

[ML] Matthew Russell: Mining the Social Web, O'Reilly, 2011.

[AL] Akerkar & Lingras: Building an Intelligent Web: Theory and Practice, Jones & Bartlett Learning, 2008, ISBN: 978-0763741372

[MB] Marmanis & Babenko: Algorithms of the Intelligent Web, Manning Publications, 2009, ISBN: 978-1933988665

[AH] G. Antoniou, Frank van Harmelen: A Semantic Web Primer, 2nd edition, MIT Press, 2008, 978-0262012423 (pdf)

[JM] P. Jackson, I. Moulinier: Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization, 2nd edition, John Benjamins, 2007, ISBN: 978-9027249937

[WF] Ian H. Witten, Eibe Frank:
Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (2nd edition)
Paperback - 525 pages (Jun. 2005)
Morgan Kaufmann Publishers; ISBN: 0-12-088407-0

[MS] C. Manning, H. Schuetze: Foundations of Statistical Natural Language Processing, MIT Press, 1999

[SBLK] Schenker, Bunke, Last and Kandel: Graph-theoretic techniques for Web Content Mining, World Scientific, 2005.

[CA] Charu Aggarwal, Data Mining: The Textbook (Springer), May 2015. (pdf accessible through Dalhousie library)

[LRU] Jure Leskovec, Anand Rajaraman, Jeff Ullman: Mining of Massive Datasets, Cambridge Univ. Press, 2nd ed, 2014 (pdf available from

[SM] Stephen Marsland: "Machine Learning: An Algorithmic Perspective" CRC Press, 2nd ed. 2014

Other relevant readings

[FS] Ronen Feldman and James Sanger
The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Cambridge University Press, 2007
(ISBN-13: 9780521836579 | ISBN-10: 0521836573)

Related courses

CSCI 3154: Introduction to AI with Gaming Applications
CSCI 3172: Web-centric computing
CSCI 4141: Information Retrieval
CSCI 4152: Natural Language Processing
CSCI 4155: Machine Learning (and Robotics)

Machine Learning Resources on the Web:

UCI Machine Learning Repository
Reinforcement Learning Repository at UMass

: a free scientific software package
(a free equivalent of Matlab)
Octave (a free GNU project alternative to Matlab)
WEKA: Data Mining Software in Java
RapidMiner (formerly YALE: Yet Another Learning Environment (incorporates WEKA))

Dragon toolkit for Languge Modeling, Text Retrieval, and Text Data Mining
LingPipe (NLP applications of ML)
Topic Modelling Toolbox (Matlab) using Latent Dirichlet Allocation (LDA)

Apache Mahout Scalable machine learning library (recommendation mining, text clustering, text classification) on Hadoop Python-based ecosystem of open-source software for mathematics, science and engineering.