My research has primarily been in the intersection of Information Retrieval, Machine Learning, Natural Language Processing, and Web Mining. It is particularly centered on Web Document Summarization. Text summarization is an important aspect of text organization, which improves accessibility to information. Web document summarization, which has gained much attention in recent years, provides a useful tool for more effective navigation and access to online information in the Web age.
In my master's thesis, I developed a content-based system which summarizes an entire Web site and generates a concise summary consisting of keywords and key sentences. Experimental evaluation demonstrates that the system is able to automatically generate summaries as informative as human authored summaries such as those provided by the Open Directory Project (DMOZ). Two papers based on this work have been published: one received the best paper award at the Canadian AI'2003 Conference, and the other appeared in Web Intelligence and Agent Systems: An International Journal, 2(1), pages 39-53, June 2004.
In my Ph.D. thesis, I proposed a framework for effective summarization of Web sites with diverse topics and heterogeneous content. The main objective is to develop a system, which can take advantage of both narrative content and link information embedded in a given Web site and create a few cluster summaries which highlight the main topics. The system is novel in that it aims to perform automatic summarization via coupled text- and link-based clustering of an entire Web site, which involves design and application of techniques such as keyword extraction, text classification, text clustering, and hyperlink analysis. These have been topics of renewed interest in the IR community. Furthermore, the proposed approach has potential to become an effective means of visualizing large Web sites and lead to enhanced IR systems searching for Web sites, where, for example, summaries of Web sites are indexed and presented to the user as the text snippets associated with the query results. A paper based on the thesis proposal, which was defended in late 2004, was presented in the SIGIR'05 Doctoral Consortium.
I also did some work on Automatic Term Extraction (both from special text corpora and Web document corpora), and Web Collection Clustering and Summarization. My advisors are Dr. Milios and Dr. Zincir-Heywood.
[home] [courses] [research] [publications] [teaching] [links] [contact]
First created: 12-Dec-2001. Last updated: 15-Nov-2007.
© 2001-2005 Yongzheng Zhang. All rights reserved.