Geographic Entity Extraction from the Web

Thesis: High Accuracy Postal Address Extraction from Web Pages

Address Dataset can be downloaded from here.

Source code of the Data Collector

Address Extraction Demo: Regular Expression Based System, Machine Learning Based System

Presentation at MALNIS, Jun 27, 2003

Presentation PPT version, PDF version

Presentaion on Information Theoretical Co-Clustering - May 20, 2004, latex

Geographic Search

Applications

Datasets

Papers

Information Extraction

n-gram

Softwares

NLP

Dalhousie NLP Group

NLP Course,NLP Course and NLP Group at Stanford, NLP Code and Data resources, including some examples in Java.

Information Retrieval Resouces Link

CRL Computing Research Laboratory UIUC course 498

Link Analysis

When experts agree: using non-affiliated experts to rank popular topics (2002)

Hilltop: A Search Engine based on Expert Documents (2000)

Who Links to Whom: Mining Linkage between Web Sites.(2001) by Krishna Bharat

Improved Algorithms for Topic Distillation in a Hyperlinked Environment (1998)

The connectivity server: Fast access to linkage information on the Web.(1998)

Automatic resource compilation by analyzing hyperlink structure and associated text (1998)

The Quest for Correct Information on the Web: Hyper Search Engines

Graph structure in the web

Finding Related Pages in the World Wide Web (1999)

IBM Almaden Webfountain

Parallel Clustering

A Hybrid Parallel Web Document Clustering Algorithm and Its Performance Study (2003)

Parallelizing the Buckshot Algorithm for Efficient Document Clustering (2002)

Clustering and Classification of Large Document Bases in a Parallel Environment (1997)

Clustering

Efficient Clustering of Very Large Document Collections (2001)

Principal Direction Divisive Partitioning (1997) - PDDP project

An Analysis of Recent Work on Clustering Algorithms (1999)

Survey Of Clustering Data Mining Techniques (2002)

Overcoming the Curse of Dimensionality in Clustering by Means of the Wavelet Transform (2000)

Oren Zamir. Web Document Clustering: A Feasibility Demonstration (1998)

Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections (1992)

Feature Selection and Document Clustering

Charu C. Aggarwal, Philip S. Yu, Finding Generalized Projected Clusters in High Dimensional Spaces (2000)

The Challenges of Clustering High Dimensional Data

A matrix density based algorithm to hierarchically co-cluster documents and words (2003)

HIERARCHICAL DOCUMENT CLUSTERING USING FREQUENT ITEMSETS (2002)

Assessment and Pruning of Hierarchical Model Based Clustering (2003)

Why so many clustering algorithms

Projections For Efficient Document (1997)

Document Clustering with Cluster Refinement and Model Selection Capabilities (2002)

Shenghuo Zhu's Publications

Information Retrieval

Great Site Ranking in Google The Secrets Out

A Survey On Web Information Retrieval Technologies (2000)

Algorithmic Challenges in Web Search Engines Monika R. Henzinger

Pivoted Document Length Normalization (1996)

Iterative methods for sparse linear systems

Matrices, Vector Spaces, and Information Retrieval

Singular value decomposition M.W. Berry, S.T. Dumais, G.W. O'Brien. Using Linear Algebra for Intelligent Information Retrieval (1995)

Scientific Computing: Fundamentals and Applications

Two Algorithms for Nearest-Neighbor Search in High Dimensions (1997)

Semantic Web

Latent Semantic Indexing, An intesting discussion

CIRCA whitepaper

Parallel Computing

Parallel computation of the singular value decomposition

MapReduce: Simplified Data Processing on Large Cluster

Search Engine

HillTop Ranking

Yahoo! Research Lab

Search Engine Watch

Labin - A Multipurpose Crawler Shell

Papers written by Googlers

WEB SEARCH FOR A PLANET: THE GOOGLE CLUSTER ARCHITECTURE

Google File System

Building a Vector Space Search Engine in Perl

Data Mining & Machine Learning

Mining and Knowledge Discovery from the Web

Very Large Data Bases (VLDB) Conference

Journal of AI Research JAIR

SIGKDD Explorations

Data Mining at UTCS

Inderjit S. Dhillon. Publications by Mohammed Javeed Zaki

Research at Microsoft

IBM Almaden Research Center

A Roadmap to Text Mining and Web Mining

Stanford Publication Server

Clustering Large Dataset

gSpan, Souce Code

How to Implement SVMs - by J. Platt, IEEE Intelligent Systems Magazine, Trends and Controversies, Marti Hearst, ed., vol 13, no 4, (1998).

AI-SPECIFIC Software RESOURCES - Collected By Evangelos E. Milios

People

David Skillicorn

Information Theory, Inference, and Learning Algorithms

Misc

Brainstorming, Influence, and Icebergs

Accidental Algorithms

Last Update: September 13, 2004 11:12 AM by Zheyuan Yu