Hathai Tanta-ngai

Peer-to-peer document retrieval and management

We are designing a peer-to-peer system infrastructure for an Internet scale document retrieval, based on an automated "user-profile" and the "role" of each node. User-profile defines user interests. The role of each node defines its functionality in the system.  For example, a publishing firm provides a source of documents, and a researcher publishes and retrieves documents. In the Internet scale, this system is composed of hundreds of thousands of nodes. The overall system comprises of an enormous set of documents, both relevant and irrelevant to a specific researcher. Querying every single node for interesting documents is time consuming. The queries need to be distributed only to nodes that may have relevant documents. Global analysis of documents from different sources is needed for rating document quality, including content analysis, link analysis between documents, and citation analysis. System organization and management are required to provide an efficient search, and establish collaboration among nodes to meet their interests. This project focuses on organizing a peer-to-peer overlay based on user-profiles and node functionalities, designing search methodologies for efficient document retrieval, and providing a common interface of document management for different stakeholders in the system.