CSCI 4152/6509 - Course Project
[ Home |
Additional Files ]
The course project for graduate students (CSCI 6509) should follow the
basic structure of a typical research project, such as the research
work on a thesis, only on a smaller scale.
The undergraduate students (CSCI 4152) can choose as well to do a
research project, or they can do an implementation-focused project,
which would follow the structure of a software development project.
You can form project teams of up to four students, or work individually.
The final paper should be in the form of a technical report.
The graduate students will give individual presentations, following
duration and format of a short conference presentation. The
presentation should last up to 10 minutes, followed by 5 minutes reserved for
Undergraduate students can present project as a team, or individually.
Regarding presentation dates, please take a look at the free time slots on
the course calendar and let me know by email your preference.
The presentations will be scheduled based on first-come-first-served
- P0 – project topic proposal, by email, due on Monday, Feb 4, 2013, but
better as soon as possible,
- P1 – a project statement, by email and printed, due on Friday, Feb 22, 2013,
- P – Oral presentation, book time, send slides or slides overview (PDF or PPT) 24h ahead, and
- R – Project Report, by email and printed, due Monday, Apr 8, 2013.
Note about emails: All emails related to the course must have the course number included in the subject,
such as CSCI4152, CSCI6509, or, the best option, CSCI4152/6509.
In more details, your subject lines for required deliverables must be:
- CSCI4152/6509: Presentation time slot request
- CSCI4152/6509: P0 submission
- CSCI4152/6509: P1 submission
- CSCI4152/6509: Presentation slides
- CSCI4152/6509: Project report
P0 – Project Topic Proposal
Worth: 1% of the final mark.
You will need to choose a topic for your project.
Send me by e-mail in plain text the following information:
Please do this as soon as you have chosen a topic. If two or more projects have the same or
very similar topic, the one that sends P0 later may be required to change
the topic. If the topic is not sufficiently relevant to the course,
you may be asked to change it.
- tentative title,
- the list of team members, and
- one paragraph description of the topic.
P1 – Project Statement
Worth: 3% for CSCI4152, or 5% for CSCI6509, of the final mark.
The project statement should be sent by email and printed as well.
Bring printed version to the class. It should be about 2 pages long.
It must include:
The statement should identify a feasible project.
P1 will be marked based on its completeness, clarity of presentation,
and research on and analysis of related work.
- Project title,
- Names of the member(s) of the group,
- Problem statement,
- List of possible approaches with citations to relevant work,
- Project plan for the rest of the term, and
- List of references.
P – Oral Presentation
Worth: 10% of the final mark, including class participation.
You are required to submit the slides of the presentation at least 24 hours before your
presentation by email to the instructor. The slides can be original slides, or an slides handout (e.g., 6 slides per page),
and they must be in a PowerPoint or PDF format. I will assume that you will use your
computer for presentation, but if you need a computer you can use instructor's computer.
If you use instructor's computer, you can only use PDF or PowerPoint slides, without use of Internet or
running any other programs.
The presentations should last 10 minutes, with 5 additional
minutes reserved for questions and speaker switch.
There is a significant flexibility in choosing the topic of your presentation, but it
should be related to the project. It could be the work you have done up
to that time, or simply what you plan to do. It is a good idea to include
research or other related work that you did so far. You could also
present a related method from the textbook or another paper.
Evaluation scheme for presentations:
- content: how interesting and valuable is the presentation,
appropriateness of the topic, appropriateness for the audience and
the time allocated,
- presentation: clarity, eye contact with the audience; it should
be vivid, interesting; it is not a good idea to read from a paper, or from
the slides; avoid looking too much to the slides rather than to the
audience; do not just present to one person (e.g., instructor) -- talk to the whole audience,
organization and structure of the presentation should be well-planned;
time length should be appropriate,
- slides: organization of the presentation; slides content: appropriate
amount of text, use of figures,
- question answering: listening and answering the questions being asked,
appropriate answers, answering the actual question to the point,
but not going into a too lengthy additional discussion.
R – Project Report
Worth: grads 24% / ugrads 16% of the final mark.
The written project report is submitted in printed and electronic
form; i.e., in class and by email.
The reports are kept in archive with the instructor for several years.
A typical structure of a research project is:
This structure is just a guideline and parts may not be relevant to
your project. There are not fixed requirements about the length of the paper, since it
may depend on the type of the project and number of people in a group.
It is expected that a project report contains at least 8 pages, and it may be sufficient for an
A+ project if some implementational or experimental work has been done.
- Title, Author, Course name,
- Abstract – make sure it is an abstract of the whole paper and not
just a part of the introduction. The abstract should be brief,
definitely not longer than a half of a page.
- Introduction – introduce the problem; get a reader's attention;
explain motivation and significance of the problem,
define paper objectives. It is said
that the title, abstract, introduction, and the whole paper
should in a way express the same story in 10, 100, 1000, and
10,000 words, respectively. (Do not take these numbers literally!)
- Related Work – cover related work. Has this topic been
studied yet? Do not just give an annotated list of citations. Give a critical
analysis of the previous work.
- Problem Definition and Methodology – there is no good
research without a research problem. Define it precisely.
- Experiment Design – experiments are not mandatory, but some
form of evaluation of your approach should exist.
Evaluation, Discussion of evaluation results
General Guidelines Regarding Project Topic
A typical research project should be based on the following:
- Choose an NLP-related problem that is important and interesting
in your opinion. You should have some ideas about how it could
be solved, and about what interesting results you could obtain
by the end of term. The discussed problem should be
feasible in this sense, but it should not be trivial.
- The next step is to search through existing published work and
find out about existing solutions on the same problem, or to the
closest similar problem. You can start with the textbook.
You may decide to change your focus if you discover that the
problem is already solved in a satisfactory way, or if it is a
special case of a more general problem that is already solved.
Even if there are existing solutions, you may have an idea about
a simpler solution, or an approach with a different advantage,
and may want to test it.
- Design you method(s), implement it, and run experiments.
- Analyze results. Revisit your methodology if needed.
- Finish the report. Keep writing during the term.
While the above guidelines describe a typical research project in NLP,
you can also consider some alternative forms:
Alternative Project Types
- theoretical project: You can focus on establishing a
formal framework and proving theoretical results, usually
regarding algorithm complexity of some solutions. Still, it may
be a good idea to have some experimental results even in this
type of project. This kind of project is still very research-oriented.
- implementation: You can put more emphasis on
the implementational part of the project. This usually means
developing a system prototype with multiple functionalities.
In this case, you can devote more space in your report to the
design, testing, and user documentation.
This kind of project could fit well undergraduate students, and they could
choose to implement some algorithm that is well-understood, and not necessarily
very relevant to the latest research.
- software evaluation: Choose one or more existing software
tools, download them, learn to use them, and use them to solve a
problem. Report on your evaluation of the tool, instructions
about its usage, advantages and limitations of the tool, and your
This kind of project is likely more appropriate for undergraduate students.
- survey: The survey format is a critical review of the current
research in a narrow sub-area of NLP. If you choose this option,
make sure that you do not cover a too wide, or already well-understood area,
with published surveys on the topic.
I would discourage you to go with this option, unless it is a
part of your wider research program. It is difficult to write a
good survey paper in a one-term time.
- NLP Research Links on the course web page
- http://acl.ldc.upenn.edu/ —
- Google scholar and other scientific Internet resources
Topics of Some Previous Course Projects
- Character n-gram based analysis and visualization of the sentiment content of text documents
- A Visual Analytics Approach for User-driven Clustering
- Online Spam Classification with Symbiotic Bid-Based Genetic Programming
- An n-gram approach to multiple mood classification of song lyrics
- Concept Hierarchy Generation from Wikipedia
- An Evaluation of the Efficacy of Restricted Boltzmann Machines for Sentiment Classification
- Hierarchical aspect-based summarization and sentiment analysis of online reviews
- Classification of Horse Tack Reviews via an N-gram Model
- Alternative Methods of Input: Speech Recognition in Games
- Naive Bayes and TF-IDF Hybrid Approach for Spam Filtering
- Sentiment Analysis of Twitter Data
- Research on Sentiment Analysis of Movie Reviews to Help the Blogger
- Sarcasm detection in online reviews
- Behaviour Analysis and the Theoretical Application of Text Processing Techniques
- Sentiment Analysis of Real-Time Events on Twitter
- Morphosyntactic Annotation of Esperanto
- Clustering Large Amounts of Noisy Short Twitter Feeds
- Detecting Spam Comments by Analyzing Posted Comments in Blogs
- Algorithms for Linear Time Unification
- Support Vector Machines for Forum Troll Classification
- A Survey of Financial Information Extraction Research
- A Multi-objective evolutionary algorithm for textual data clustering
- A Study on ``Co-Training'' -- a Semi Supervised Learning
- FAQ-based Question Answering
- A Template-Based Approach to Paraphrasing
- Question Classification System using Support Vector Machines
- Comparing Term Extraction Techniques for Man Pages
- A Review of Techniques for Context-Dependent Spell Checking
- An Investigation of a Multi-Objective Genetic Algorithm for Document Clustering
- Implementation of "Intelligent Copy" (icp) and an Evaluation of WEKA Classification Methods
- MedOnto: Medical Ontology Learning System
- Automatic Keyphrase Extraction for Marine Sciences Text
- To Be Or Not To Be Shakespeare: Using Genetic Algorithm to Build an Authorship Profile for Use in Text Classification
- An Implementation Oriented Introduction to Automatic Speech Recognition
- Part-of-Speech Tagging with Restricted Boltzmann Machines
- Searching for Relevance: A Study on the Relatedness of Articles on Wikipedia
- Sentiment Classification of Conversational Language
- Morphological Analysis of Afan Oromo
- Financial Forecasting with Annual Reports: An Application of N-grams and Readability Scores
- A Character N-gram and Word N-gram Approach to Classification of Literature by Literary Period
- Context Aware Text Repair
- Rule-based Acronym Extraction and Expansion
- e-English Normalization: Converting SMS-Text to Correct English
- Generating bash Scripts from Natural Language
- Blog Generation using a Dialog Agent
- Text Format Segmentation using HMM
- Automatic Inter-Document Link Generation
- Experiments in Character N-gram Based Information Retrieval
- Character N-gram Based Approach to Classification of Movie Reviews
- Compiling Program Code from Natural Language:
A Learning Tool for Students Learning Object Oriented Programming
- A Comparison of Rule-Based and Data-Driven Algorithms for
Automatic Syllabification of Italian Words
- N-grams and Spam: Using N-gram Analysis to Detect Spam Email
- A Practical Method for Extracting Prefixes and Suffixes of
- Email Authorship Attribution using N-Grams
- Source Text Disambiguation for Improved Machine Translation
- Improving Word Alignments Through Matrix Factorization
- An Unsupervised Approach to Morphological Analysis
- Automatic Composer Recognition -- An N-gram-based Approach
- Improving Automatic Term Extraction using Shallow Parsing
- Using Natural Language Queries for E-mail Retrieval
- Context-Dependent Spelling Correction in Languages with No Word Boundaries
- A Comparative Study of Text Categorization using Naive Bayes
Classifier with Different Feature Space and Dimensionality Reduction Methodologies
- Semantic Annotation of Conference Notifications in Resource
- N-gram Collection using Suffix Arrays
- N-gram-based Classification of Plain-text Privacy Policies
- Comparing Co-Clustering using N-grams and Words
- A Stochastic Method for Software String Translation
- A Simple C++ N-gram Extraction Package
- N-gram-based Hierarchical Text Clustering for PPML Data
- Probable Solutions of Monoalphabetic Substitution Ciphers via Word-Gram Analysis
- Authorship Attribution using Compression and Clustering
- From Natural Language to Java
- Improving Naive Bayes Classification using Natural Language Processing
- Implementing ExtrAns
- A Second-Order Hidden Markov Model for Part-Of-Speech Tagging
- A Study of Connectionist Methods in Natural Language Parsing
- Information Retrieval Performance using Morphology, Part of
Speech Tagging, and Semantic Expansion
- Document Clustering with Automatic Term Extraction
- An Approach to Evaluating the Readability of Texts
- Optimising Naive Bayesian Networks for Spam Detection
- Evolved Transformations in Brill's Transformation-based Tagger
- Automatic Term Extraction in Large Text Corpora
- Proper Noun Detection for Search Engines
© 2002-2013 Vlado Keselj, last update: 15-May-2013