[ Vlado Keselj's home page ]

NLP Research Links

General Links
- Central Resources - Associations - Journals - Institutions and Groups - Conference links - E-mail lists - NLP Courses - General Human Language Resources - Other General Resources
Speech processing
- CMU Sphinx - The CMU Sphinx Group Open Source Speech Recognition Engines
- eSpeak - text to speech open source software
- The Festival Speech Synthesis System - by the Centre for Speech Technology Research, Univ.of Edinburgh
- Festival at CMU
- PRAAT: Phonetics and Speech Tools
- The EMU Speech Database System
- Example of prosodic annotation data
- ToBI Annotation document
- Phonology - Commercial
N-gram analysis
- Text::Ngrams Perl Package - Flexible Ngram analysis (for characters, words, and more); on CPAN; by Vlado Keselj
- Ngram Statistics Package in Perl, by T. Pedersen at al.
- Text::Ngram Perl Package by Simon Cozens
- Perl script ngram.pl by Jarkko Hietaniemi
- Waterloo Statistical N-Gram Language Modeling Toolkit - in C++ by Fuchun Peng
- Suffix Arrays for Ngrams
- Transcription and Encoding Schemes - Sentence Splitters (sentencizers, sentence boundary detectors) - Word segmentation - Stop-word Removal
- Letter-to-Phoneme Pascal Challenge 2006
- Unsupervised segmentation of words into morphemes -- Challenge 2005
- Using eigenvectors of the bigram graph to infer morpheme identity - by Mikhail Belkin and John Goldsmith, 2002
- Stemmers and Lematizers
Finite State Methods
- intex - Linguistic Development Environment by Max Silberztein
- UNITEX - Corpus processins sytem, GPL licence, implemented in C/C++ and Java; spawned from the same project as intex and NooJ
- Nooj - Linguistic development environment - Related to intex, by Max Silberztein.
- JFLAP - Java Formal Langauge and Automata Package
POS Tagging
- Software Plaza:Brill's tagger
- AI repository: Brill's tagger
- TnT -- Statistical Part-of-Speech Tagging
- QTag - a probabilistic POS tagger - language independent, implemented in Java, by Oliver Mason (c) 1994-2003
- Ingo's collection of POS taggers
- CLAWS part-of-speech tagger
- AI repository: taggers
- Brazilian Portuguese POS Tagger (based on QTAG)
- Lingua::EN::Tagger - POS tagger for English; Perl module; uses HMM bigram word model; by Maciej Ceglowski and Aaron Coburn
- POS tagger in Perl/Tk by Kristie Seymore
- BNC POS tags
- A Practical Part-of-Speech Tagger (1992) - Cutting et al.; Xerox code, Application to Spanish
- SVMTool - Open source POS tagger based on Support Vector Machine
- Stanford Log-linear Tagger
Document Clustering
- Tutorial on clustering Large and High-Dimensional data by Nicholas et al. - On CIKM 2003
- Clustering and Segmentation software on KDnuggets
- CLUTO - Software Package for Clustering High-Dimensional Datasets
- Matlab Clustering Package by Frank Dellaert
Terminology Extraction
- Gensen Web - An automatic domain terminology extraction system
Text Categorization (TC)
- Spam detection and E-mail classification - Encoding identification - Language identification - Sentiment classification - Authorship attribution and Plagiarism detection (AATT) - Topic categorization - Other
Text Summarization
- Text Summarization site - by Dragomir Radev
- "Statistics-Based Summarization --- Step One: Sentence Compression," (K. Knight and D. Marcu), National Conference on Artificial Intelligence (AAAI), 2000.
Dictionaries and Lexicons
- ACL-SIGLEX - ACL Special Interest Group on Lexicon
- Dictionary development - On-line dictionaries
Lexical Semantics
- WordNet - Word Sense Disambiguation (WSD)
- Theory - Practice: Unification-based Systems
Grammar Formalisms
- Unification-based grammars
- Head-driven Phrase Structure Grammar (HPSG) - Lexical Functional Grammar (LFG) - Stochastic Unification-based Grammars
Parsing (Syntactic Analysis)
- ALE unification-based,parser - coverage: medium
- LKB unification-based,parser - coverage: medium
- PC-PATR unification-based,parser - coverage: small
- Stefy unification-based,parser - coverage: small
- NLP Software (includes parser list)
- Parser comparison (several parsers referenced)
- Collins parser, coverage: large
- Link Grammar parser, coverage: large
- Apple Pie Parser
- Probabilistic Word Graph Parser: Java Source & Documentation, Bob Carpenter, coverage: small
- MINIPAR parser, coverage: medium
- Evalb - bracket scoring program
Parse TreeBanks
- The Penn Treebank Project (English)
- NEGRA corpus (German)
- Kyoto Text Corpus (Japanese)
Machine Translation
- General - On-line translation - MT Research - A list of Suppliers of Machine Translation Software - by The British Computer Society Natural Language Translation Specialist Group
- Systran - Comercial machine translation; available free on-line service
- Babel Fish Translation - On-line translation, Babel Fish, AltaVista (powered by SYSTRAN)
Information Retrieval
- Open Source Search Engines - WebSPHINX - A Personal, Customizable Web Crawler
- A list of IR systems (ir.dcs.gla.ac.uk)
- System SMART
- The Lemur Toolkit for Language Modeling and Information Retrieval
- Nutch search engine
- Zettair (once called Lucy)
- mg ("Managing Gigabytes")
- DataparkSearch Engine
- Lemur
- Andrew McCallum's Code and Data
- Introduction to Information Retrieval - by Chrisopher Manning, Prabhakar Raghavan, and Hinrich Schutze, 2007, draft available on-line
- Information Retrieval - A book by C. J. van Rijsbergen, 1979, available on-line
- Modern Information Retrieval - A book by Ricardo Baeza-Yates and Berthier Ribeiro-Neto, contents
- Cross-Language IR
Information Extraction
- Balie - A tool for multilingual information extraction
- Chelba, Mahajan: Information Extraction Using the Structured Language Model
- BioCreAtIvE Critical Assesment of IE system in Biology
Semantic Annotation
- Semantic Web - Genomics - Semantic Annotation: Other - Semantic Role Labeling - XML-Related
- SWRC - Semantic Web Research Community Ontology
- Open Cyc
- Cycorp
- SUMO - Suggested Upper Merged Ontology - link suggested by Adam Pease [Link1]
- Protege Ontologies Library
- SUMO translation for Protege frame system
- Dublin Core
Question Answering
- TREC QA - QA Systems - FAQ Collections
Natural Language Generation
- Chatterbots in Google Directory
NLP Tools
- GATE - General Architecture for Text Engineering, used in Information Extraction
- MedLEE
- OpenNLP - Open source NLP, project umbrella
- Natural Language Toolkit (NLTK) in Pyton
- FreeLing
- The Festival Speech Synthesis System - by the Centre for Speech Technology Research, Univ.of Edinburgh
- Festival at CMU
- PRAAT: Phonetics and Speech Tools
- The EMU Speech Database System
- Commercial tools
NL Corpora and Other NL Resources
- Standards - Word Lists - N-grams - NL Corpora - Free - NL Corpora - Free with Licence agreement - NL Corpora - Commercial
Commercial Links
- NLP Products - NLP Companies
Thanks for Links
- Pythonner

© 2003-2022 Vlado Keselj, last update: 14-May-2021