LITHEL Project

The text resources on the desktop computers of an organization's members, such as email, papers, memos and downloaded web pages, contain a wealth of information,  that remains largely untapped. This project aims towards a next-generation system for mining these text resources, based on an explicit model of the structure and objectives of the organization. The benefits of the approach include easy access to information and expertise, automatic profiling of the members of the organization, mapping of the social network of the organization, and identification of the concepts characterizing the organization. This approach is related, but goes well beyond, recent trends in Web technology, such as desktop search,  web logging and information sharing on peer-to-peer networks.

These recent trends to the use of the Web for just-in-time, informal, low overhead, dynamic information sharing and dissemination within organizations or communities sharing common interests or goals, as contrasted with the more structured, static, higher maintenance overhead associated with Web sites. We assume that information will continue to be represented in the form of natural language text, and that semantic descriptions (in the form of metadata for the semantic web) will involve too much overhead on the authors of the information to be widely adopted for informal information sharing and communication.

 Document resources in a P2P network exist in the context of the goals of the organization and of the domain of discourse the network belongs to. Recent research in knowledge management has led to semantic models that provide a framework for capturing the shared knowledge of a group of workers. A semantic model for a corporation, used by business analysts, includes strategic objectives, allies/competitors, or events. A semantic model for a research team would include project objectives, milestones, expertise of team members.The proposed approach will include tools to (1) organize the available information into clusters and automatically generate summaries in browsable form, (2) extract the social network of the organization directly from the text data, (3) quickly locate information relevant to a particular organizational goal.

Faculty - Students