Address Dataset can be downloaded from here.
Source code of the Data Collector
Address Extraction Demo: Regular Expression Based System, Machine Learning Based System
Presentation at MALNIS, Jun 27, 2003
Presentation PPT version, PDF version
Presentaion on Information Theoretical Co-Clustering - May 20, 2004, latex
Geographic Search
Applications
Datasets
Papers
Information Extraction
n-gram
Softwares
NLP
Dalhousie NLP Group
NLP Course,NLP Course and NLP Group at Stanford, NLP Code and Data resources, including some examples in Java.
Information Retrieval Resouces Link
CRL Computing Research Laboratory
UIUC course 498
Link Analysis
When experts agree: using non-affiliated experts to rank popular topics (2002)
Hilltop: A Search Engine based on Expert Documents (2000)
Who Links to Whom: Mining Linkage between Web Sites.(2001) by Krishna Bharat
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (1998)
The connectivity server: Fast access to linkage information on the Web.(1998)
Automatic resource compilation by analyzing hyperlink structure and associated text (1998)
The Quest for Correct Information on the Web: Hyper Search Engines
Graph structure in the web
Finding Related Pages in the World Wide Web (1999)
IBM Almaden Webfountain
Parallel Clustering
A Hybrid Parallel Web Document Clustering Algorithm and Its Performance Study (2003)
Parallelizing the Buckshot Algorithm for Efficient Document Clustering (2002)
Clustering and Classification of Large Document Bases in a Parallel Environment (1997)
Clustering
Efficient Clustering of Very Large Document Collections (2001)
Principal Direction Divisive Partitioning (1997) - PDDP project
An Analysis of Recent Work on Clustering Algorithms (1999)
Survey Of Clustering Data Mining Techniques
(2002)
Overcoming the Curse of Dimensionality in Clustering by Means of the Wavelet Transform (2000)
Oren Zamir. Web Document Clustering: A Feasibility Demonstration (1998)
Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections (1992)
Feature Selection and Document
Clustering
Charu C. Aggarwal, Philip S. Yu, Finding Generalized Projected Clusters in High Dimensional Spaces (2000)
The Challenges of Clustering High Dimensional Data
A matrix density based algorithm to hierarchically co-cluster documents and words (2003)
HIERARCHICAL DOCUMENT CLUSTERING USING FREQUENT ITEMSETS (2002)
Assessment and Pruning of Hierarchical Model Based
Clustering (2003)
Why so many clustering algorithms
Projections For Efficient Document (1997)
Document Clustering with Cluster Refinement and Model Selection Capabilities (2002)
Shenghuo Zhu's Publications
Information Retrieval
iVia tools
Great Site Ranking in Google The Secrets Out
A Survey On Web Information Retrieval Technologies (2000)>
Algorithmic Challenges in Web Search Engines Monika R. Henzinger
Pivoted Document Length Normalization (1996)
Iterative methods for sparse linear systems
Matrices, Vector Spaces, and Information Retrieval
Singular value decomposition M.W. Berry, S.T. Dumais, G.W. O'Brien. Using Linear Algebra for Intelligent Information Retrieval (1995)
Scientific Computing: Fundamentals and Applications
Two Algorithms for Nearest-Neighbor Search in High Dimensions (1997)
Semantic Web
Latent Semantic Indexing, An intesting discussion
CIRCA whitepaper
Parallel Computing
Parallel computation of the singular value decomposition
MapReduce: Simplified Data Processing on Large Cluster
Search Engine
HillTop Ranking
Yahoo! Research Lab
Search Engine Watch
Labin - A Multipurpose Crawler Shell
Papers written by Googlers
WEB SEARCH FOR A PLANET: THE GOOGLE CLUSTER ARCHITECTURE
Google File System
Crawler links
Building a Vector Space Search Engine in Perl
Data Mining & Machine Learning
Mining and Knowledge Discovery from the Web
Very Large Data Bases (VLDB) Conference
Journal of AI Research JAIR
SIGKDD Explorations
Data Mining at UTCS
Inderjit S. Dhillon. Publications by Mohammed Javeed Zaki
Research at Microsoft
IBM Almaden Research Center
A Roadmap to Text Mining and Web Mining
Stanford Publication Server
Clustering Large Dataset
gSpan, Souce Code
How to Implement SVMs - by J. Platt, IEEE Intelligent Systems Magazine, Trends and Controversies, Marti Hearst, ed., vol 13, no 4, (1998).
AI-SPECIFIC Software RESOURCES - Collected By Evangelos E. Milios
People
Tom Mitchell
Jiawei Han
Vipin Kumar
David Skillicorn
Information Theory, Inference, and Learning Algorithms
Misc
Brainstorming, Influence, and Icebergs
Accidental Algorithms
Last Update: September 13, 2004 11:12 AM by Zheyuan Yu