ABSTRACT
Verity Inc. has developed a comprehensive suite of tools for accurately and efficiently organizing enterprise content which involves four basic steps: (i) creating taxonomies, (ii) building classification models, (iii) populating taxonomies with documents, and (iv) deploying populated taxonomies in enterprise portals. A taxonomy is a hierarchical representation of categories. A taxonomy provides a navigation structure for exploring and understanding the underlying corpus without sifting through a huge volume of documents. Thematic Mapping automatically discovers a concept tree from a corpus of unstructured documents and assigns meaningful labels to concepts based on a semantic network. Integrating with Verity Intelligent Classifier's user-friendly GUI, a user can drill down a concept tree for navigation, perform a conceptual search to retrieve documents pertaining to a concept, build a taxonomy from the concept tree, as well as edit a taxonomy to tailor it into various views (customized taxonomies) of the same corpus. Classification rules can be generated automatically from concepts. These classification rules can be used for populating documents into the taxonomy.
- P. G. Anick, S. Tipirneni. The Paraphrase Search Assistant. Terminological Feedback for Iterative Information Seeking. International Conference on Research and Development in Information Retrieval (SIGIR 1993), pp.153--159. Google ScholarDigital Library
- C. Chung, A Luk, J. Mao, S. Taank. A Method and System for Naming a Cluster of Words and Phrases. US Patent application filed through Verity, Inc. 2001.Google Scholar
- C. Chung, J. Liu, A. Luk, J. Mao, S. Taank, V. Vutukuru. A System and Method for Automatically Discovering a Hierarchy of Concepts From a Collection of Documents. US patent application filed through Verity, Inc. 2002.Google Scholar
- B. S. Everitt, S. Landau, M. Leese. Cluster Analysis. Edward Arnold. ISBN: 0340761199. 4th edition. May 2001.Google Scholar
- R.H. Fowler, B.A. Wilson, W.A.L. Fowler. INFORMATION NAVIGATOR: An Information System using Associative Networks for Display and Retrieva.l Department of Computer Science, University of Texas at Pan American. Technical Report NAG9-551, #92-1.Google Scholar
- B. Gelfand, M. Wulfekuhler, and W. F. Punch III. Automatic Concept Extraction From Plain Text. AAAI Workshop on Learning for Text Categorization, Madison, July 1998.Google Scholar
- M. A. Hearst. Text data mining: Issues, techniques, and the relationship to information access. Presentation notes for UW/MS workshop on data mining, July 1997.Google Scholar
- T. Honkela, S. Kaski, K. Lagus, and T. Kohonen. WebSOM - Self-Organizing Maps of Document Collections. In Proceedings of Workshop on Self-Organizing Maps (WSOM97), Espoo. Finland, 1997.Google Scholar
- R. Kosala, H. Blockeel. Web Mining Research: A Survey. SIGKDD: SIGKDD Explorations. 2000 Google ScholarDigital Library
- G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to WordNet: An On-line Lexical Database. Communications of ACM. Nov. 1995. pp.39--41. Google ScholarDigital Library
- A. Popescul and L. H. Ungar. Automatic Labeling of Document Clusters. http://www.cis.upenn.edu/~popescul/Publications/labeling_KDD00.pdf, 2000.Google Scholar
- A. Rauber. LabelSOM: On the Labeling of Self-Organizing Maps. http://www.ifs.tuwien.ac.at/~andi, 1999.Google Scholar
- A. E. Smith. Machine Mapping of Document Collections: the Leximancer. Proceedings of the 5th Australasian Document Computing Symposium. Sunshine Coast, Australia. December 1, 2000.Google Scholar
- M. Sanderson and Bruce Croft. Deriving Concept Hierarchies From Text. International Conference on Research and Development in Information Retrieval (SIGIR 1999), pp.206--213. Google ScholarDigital Library
- Verity K2 Enterprise, Classification Users Guide V4.5. 2002.Google Scholar
Index Terms
- Thematic mapping - from unstructured documents to taxonomies
Recommendations
A relational model for unstructured documents
SIGIR '87: Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrievalThe logical structure of a document is usually a tree in which the order of the nodes is important at least at some level of the tree. We call a document unstructured if its structure is a single-level ordered tree. The purpose of this paper is to ...
Thematic alignment of documents with meeting dialogs
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on MultimediaThe primary goal of this PhD thesis is to align printable documents with meetings' dialogs. This bi-modal alignment consists in bridging thematic links between documents' content and speech transcripts' content. An obvious application is a system that ...
Context-based extraction of concepts from unstructured textual documents
Graphical abstractDisplay Omitted
Highlights- An unsupervised method for extracting context-based concepts from unstructured textual documents.
AbstractSummarizing a collection of unstructured textual documents, e.g., lecture slides or book chapters, by extracting the most relevant concepts helps learners realize connections among these concepts. However, to accomplish this goal ...
Comments