CSCI 3151.03 Web Intelligence
Instructor: Evangelos
E. Milios
This is a new course that will capitalize on the research strength of our Faculty in these areas. It will be an introduction to Semantic Web concepts, ontologies, knowledge management, natural language processing, text mining, information extraction, social networks.
| Times & Days: Monday 1330-1430 + 1600-1730 Wednesday 1330-1430 Friday 1330-1430 |
Office Hours: Monday 1430-1600 Wednesday, Friday, 1430-1500 and by appointment |
Calendar Description: The Web and on-line digital libraries constitute the largest repository of interconnected knowledge in text form mankind ever created. Search engines have made this knowledge accessible to the lay person. Social networks further enhance the exchange of knowledge among individual Web users. Mining the Web and associated digital libraries is the next challenge that promises to change the nature of scientific discovery, and to dramatically impact the way business is conducted. This course will introduce the core Artificial Intelligence concepts and algorithms in the context of Web and text mining: machine learning, natural language processing, semantic web, social networks and web usage mining.
Prerequisites: CSCI 2112, CSCI 2141, STAT 2060, MATH 2030 (each with a grade of C- or better)
Evaluation:
| Course Component | Weight |
| Assignments (a1 , a2, a3 equally weighted) | 30% |
| Class Participation | 10% |
| Midterm exam | 20% |
| Final exam | 40% |
Notes:
Students with disabilities are encouraged to register as quickly as possible at the Student Accessibility Services if they want to receive academic accommodations. To do so please phone 494-2836, e-mail access <at-symbol> dal.ca , drop in at the Killam, G28 or visit www.studentaccessibility.dal.ca .
Required textbooks
[BL] Bing Liu: Web Data Mining, Springer, 2nd ed. 2011 (view online or download from Springerlink)
[MRS] Manning, Raghavan and Schuetze: Introduction to Information Retrieval, Cambridge University Press, 2008 (book available online)
[SAOM] Bing Liu: Sentiment Analysis and Opinion Mining, Morgan & Claypool, 2013 (pdf can be downloaded from publisher's website)
[BKL] Steven Bird; Ewan Klein; Edward Loper: Natural
Language Processing with Python, O'Reilly Media, Inc.
2009, 978-0-596-51649-9 (available on Safari
and in html form here)
[TS] T. Segaran: Programming
Collective Intelligence: Building smart Web 2.0 applications O'Reilly, Aug.
2007, 1st ed., ISBN 0-596-52932-5
(available on Safari)
[SK] Scott Spangler and Jeffrey Kreulen: Mining the Talk: Unlocking the business value in unstructured information ISBN-10: 0-13-233953-6; ISBN-13: 978-0-13-233953-7; Published: Jul 19, 2007; Edition: 1st. (readable on Safari)
Recommended Readings
[AL] Akerkar & Lingras: Building an Intelligent Web: Theory and Practice, Jones & Bartlett Learning, 2008, ISBN: 978-0763741372
[MB] Marmanis & Babenko: Algorithms of the Intelligent Web, Manning Publications, 2009, ISBN: 978-1933988665
[AH] G. Antoniou, Frank van Harmelen: A Semantic Web Primer, 2nd edition, MIT Press, 2008, 978-0262012423
[JM] P. Jackson, I. Moulinier: Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization, 2nd edition, John Benjamins, 2007, ISBN: 978-9027249937
[WF] Ian H. Witten, Eibe Frank:
Data Mining: Practical
Machine Learning Tools and Techniques with Java Implementations (2nd edition)
Paperback - 525 pages (Jun. 2005)
Morgan Kaufmann Publishers; ISBN: 0-12-088407-0
[MS] C. Manning, H. Schuetze: Foundations of Statistical Natural Language Processing, MIT Press, 1999
[SBLK] Schenker, Bunke, Last and Kandel: Graph-theoretic techniques for Web Content Mining, World Scientific, 2005.
Other relevant readings
[FS] Ronen Feldman and James Sanger
The
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Cambridge University Press, 2007
(ISBN-13: 9780521836579 | ISBN-10: 0521836573)
Related courses
CSCI 3154: Introduction
to AI with Gaming Applications
CSCI 3172: Web-centric
computing
CSCI 4141: Information
Retrieval
CSCI 4152: Natural Language
Processing
CSCI 4155: Machine
Learning (and Robotics)
Machine Learning
Resources on the Web:
UCI
Machine Learning Repository
GMD Machine Learning Archive
Reinforcement Learning Repository
at UMass
Scilab: a free scientific software package (a free equivalent of Matlab)
Octave (a
free GNU project alternative to Matlab)
WEKA: Data Mining
Software in Java
RapidMiner (formerly
YALE: Yet Another Learning Environment (incorporates WEKA))
The R project for statistical computing
(free comprehensive statistical programming environment)
---- TM: Text
Mining in R
---- R/Weka:
an R interface to WEKA
Dragon
toolkit for Languge Modeling, Text Retrieval, and Text Data Mining
LingPipe
(NLP applications of ML)
Topic
Modelling Toolbox (Matlab) using Latent Dirichlet Allocation (LDA)