The MALNIS Digital Library Project

The MALNIS digital library project aims to capitalize on recent advances in terminology extraction, document clustering and summarization, and article metadata extraction, to maximize the utility of collections of electronic articles of an individual researcher or a research group. With currently available hard disk capacities, it is feasible to store such collections on one's computer. The currently available tool for accessing the articles is a search engine (for example Google Desktop or Lucene Apache). A search engine is the right tool to use when one is looking for a specific article in the collection, or for a list of articles on a specific topic that is easy to describe by keywords. A search engine has serious limitations, however. Imagine wanting to use the personal collection in designing a graduate course. This would require activities such as browsing through the collection, following references and citations, grouping articles into clusters and summarizing the clusters. The project will integrate existing algorithms for supporting the user in these tasks, and test the resulting system on a collection of about 2000 articles in machine learning, natural language processing, networked information spaces and information retrieval, that form the literature upon which the research of the MALNIS group is built.

The project is affiliated with Citeseer X, as the Personal Citeseer version. The plan is to build on Citeseer X components, as they become available, and reuse the available Citeseer metadata. For detailed breakdown of tasks, see the project planning page.