skip to main content
10.1145/584792.584892acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Thematic mapping - from unstructured documents to taxonomies

Published:04 November 2002Publication History

ABSTRACT

Verity Inc. has developed a comprehensive suite of tools for accurately and efficiently organizing enterprise content which involves four basic steps: (i) creating taxonomies, (ii) building classification models, (iii) populating taxonomies with documents, and (iv) deploying populated taxonomies in enterprise portals. A taxonomy is a hierarchical representation of categories. A taxonomy provides a navigation structure for exploring and understanding the underlying corpus without sifting through a huge volume of documents. Thematic Mapping automatically discovers a concept tree from a corpus of unstructured documents and assigns meaningful labels to concepts based on a semantic network. Integrating with Verity Intelligent Classifier's user-friendly GUI, a user can drill down a concept tree for navigation, perform a conceptual search to retrieve documents pertaining to a concept, build a taxonomy from the concept tree, as well as edit a taxonomy to tailor it into various views (customized taxonomies) of the same corpus. Classification rules can be generated automatically from concepts. These classification rules can be used for populating documents into the taxonomy.

References

  1. P. G. Anick, S. Tipirneni. The Paraphrase Search Assistant. Terminological Feedback for Iterative Information Seeking. International Conference on Research and Development in Information Retrieval (SIGIR 1993), pp.153--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Chung, A Luk, J. Mao, S. Taank. A Method and System for Naming a Cluster of Words and Phrases. US Patent application filed through Verity, Inc. 2001.Google ScholarGoogle Scholar
  3. C. Chung, J. Liu, A. Luk, J. Mao, S. Taank, V. Vutukuru. A System and Method for Automatically Discovering a Hierarchy of Concepts From a Collection of Documents. US patent application filed through Verity, Inc. 2002.Google ScholarGoogle Scholar
  4. B. S. Everitt, S. Landau, M. Leese. Cluster Analysis. Edward Arnold. ISBN: 0340761199. 4th edition. May 2001.Google ScholarGoogle Scholar
  5. R.H. Fowler, B.A. Wilson, W.A.L. Fowler. INFORMATION NAVIGATOR: An Information System using Associative Networks for Display and Retrieva.l Department of Computer Science, University of Texas at Pan American. Technical Report NAG9-551, #92-1.Google ScholarGoogle Scholar
  6. B. Gelfand, M. Wulfekuhler, and W. F. Punch III. Automatic Concept Extraction From Plain Text. AAAI Workshop on Learning for Text Categorization, Madison, July 1998.Google ScholarGoogle Scholar
  7. M. A. Hearst. Text data mining: Issues, techniques, and the relationship to information access. Presentation notes for UW/MS workshop on data mining, July 1997.Google ScholarGoogle Scholar
  8. T. Honkela, S. Kaski, K. Lagus, and T. Kohonen. WebSOM - Self-Organizing Maps of Document Collections. In Proceedings of Workshop on Self-Organizing Maps (WSOM97), Espoo. Finland, 1997.Google ScholarGoogle Scholar
  9. R. Kosala, H. Blockeel. Web Mining Research: A Survey. SIGKDD: SIGKDD Explorations. 2000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to WordNet: An On-line Lexical Database. Communications of ACM. Nov. 1995. pp.39--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Popescul and L. H. Ungar. Automatic Labeling of Document Clusters. http://www.cis.upenn.edu/~popescul/Publications/labeling_KDD00.pdf, 2000.Google ScholarGoogle Scholar
  12. A. Rauber. LabelSOM: On the Labeling of Self-Organizing Maps. http://www.ifs.tuwien.ac.at/~andi, 1999.Google ScholarGoogle Scholar
  13. A. E. Smith. Machine Mapping of Document Collections: the Leximancer. Proceedings of the 5th Australasian Document Computing Symposium. Sunshine Coast, Australia. December 1, 2000.Google ScholarGoogle Scholar
  14. M. Sanderson and Bruce Croft. Deriving Concept Hierarchies From Text. International Conference on Research and Development in Information Retrieval (SIGIR 1999), pp.206--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Verity K2 Enterprise, Classification Users Guide V4.5. 2002.Google ScholarGoogle Scholar

Index Terms

  1. Thematic mapping - from unstructured documents to taxonomies

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader