Article

Thematic mapping - from unstructured documents to taxonomies

Authors:
Christina Yip Chung

Verity, Inc., Sunnyvale, CA

Verity, Inc., Sunnyvale, CA
View Profile

,
Raymond Lieu

Verity, Inc., Sunnyvale, CA

Verity, Inc., Sunnyvale, CA
View Profile

,
Jinhui Liu

Verity, Inc., Sunnyvale, CA

Verity, Inc., Sunnyvale, CA
View Profile

,
Alpha Luk

Verity, Inc., Sunnyvale, CA

Verity, Inc., Sunnyvale, CA
View Profile

,
Jianchang Mao

Verity, Inc., Sunnyvale, CA

Verity, Inc., Sunnyvale, CA
View Profile

,
Prabhakar Raghavan

Verity, Inc., Sunnyvale, CA

Verity, Inc., Sunnyvale, CA
View Profile

CIKM '02: Proceedings of the eleventh international conference on Information and knowledge managementNovember 2002Pages 608–610https://doi.org/10.1145/584792.584892

Published:04 November 2002Publication History

CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management

Pages 608–610

ABSTRACT

Verity Inc. has developed a comprehensive suite of tools for accurately and efficiently organizing enterprise content which involves four basic steps: (i) creating taxonomies, (ii) building classification models, (iii) populating taxonomies with documents, and (iv) deploying populated taxonomies in enterprise portals. A taxonomy is a hierarchical representation of categories. A taxonomy provides a navigation structure for exploring and understanding the underlying corpus without sifting through a huge volume of documents. Thematic Mapping automatically discovers a concept tree from a corpus of unstructured documents and assigns meaningful labels to concepts based on a semantic network. Integrating with Verity Intelligent Classifier's user-friendly GUI, a user can drill down a concept tree for navigation, perform a conceptual search to retrieve documents pertaining to a concept, build a taxonomy from the concept tree, as well as edit a taxonomy to tailor it into various views (customized taxonomies) of the same corpus. Classification rules can be generated automatically from concepts. These classification rules can be used for populating documents into the taxonomy.

References

P. G. Anick, S. Tipirneni. The Paraphrase Search Assistant. Terminological Feedback for Iterative Information Seeking. International Conference on Research and Development in Information Retrieval (SIGIR 1993), pp.153--159. Google ScholarDigital Library
C. Chung, A Luk, J. Mao, S. Taank. A Method and System for Naming a Cluster of Words and Phrases. US Patent application filed through Verity, Inc. 2001.Google Scholar
C. Chung, J. Liu, A. Luk, J. Mao, S. Taank, V. Vutukuru. A System and Method for Automatically Discovering a Hierarchy of Concepts From a Collection of Documents. US patent application filed through Verity, Inc. 2002.Google Scholar
B. S. Everitt, S. Landau, M. Leese. Cluster Analysis. Edward Arnold. ISBN: 0340761199. 4th edition. May 2001.Google Scholar
R.H. Fowler, B.A. Wilson, W.A.L. Fowler. INFORMATION NAVIGATOR: An Information System using Associative Networks for Display and Retrieva.l Department of Computer Science, University of Texas at Pan American. Technical Report NAG9-551, #92-1.Google Scholar
B. Gelfand, M. Wulfekuhler, and W. F. Punch III. Automatic Concept Extraction From Plain Text. AAAI Workshop on Learning for Text Categorization, Madison, July 1998.Google Scholar
M. A. Hearst. Text data mining: Issues, techniques, and the relationship to information access. Presentation notes for UW/MS workshop on data mining, July 1997.Google Scholar
T. Honkela, S. Kaski, K. Lagus, and T. Kohonen. WebSOM - Self-Organizing Maps of Document Collections. In Proceedings of Workshop on Self-Organizing Maps (WSOM97), Espoo. Finland, 1997.Google Scholar
R. Kosala, H. Blockeel. Web Mining Research: A Survey. SIGKDD: SIGKDD Explorations. 2000 Google ScholarDigital Library
G. A. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Introduction to WordNet: An On-line Lexical Database. Communications of ACM. Nov. 1995. pp.39--41. Google ScholarDigital Library
A. Popescul and L. H. Ungar. Automatic Labeling of Document Clusters. http://www.cis.upenn.edu/~popescul/Publications/labeling_KDD00.pdf, 2000.Google Scholar
A. Rauber. LabelSOM: On the Labeling of Self-Organizing Maps. http://www.ifs.tuwien.ac.at/~andi, 1999.Google Scholar
A. E. Smith. Machine Mapping of Document Collections: the Leximancer. Proceedings of the 5th Australasian Document Computing Symposium. Sunshine Coast, Australia. December 1, 2000.Google Scholar
M. Sanderson and Bruce Croft. Deriving Concept Hierarchies From Text. International Conference on Research and Development in Information Retrieval (SIGIR 1999), pp.206--213. Google ScholarDigital Library
Verity K2 Enterprise, Classification Users Guide V4.5. 2002.Google Scholar

Index Terms

Thematic mapping - from unstructured documents to taxonomies

Recommendations

A relational model for unstructured documents
SIGIR '87: Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval

The logical structure of a document is usually a tree in which the order of the nodes is important at least at some level of the tree. We call a document unstructured if its structure is a single-level ordered tree. The purpose of this paper is to ...
Read More
Thematic alignment of documents with meeting dialogs
MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia

The primary goal of this PhD thesis is to align printable documents with meetings' dialogs. This bi-modal alignment consists in bridging thematic links between documents' content and speech transcripts' content. An obvious application is a system that ...
Read More
Context-based extraction of concepts from unstructured textual documents
Graphical abstract

Display Omitted
Highlights
- An unsupervised method for extracting context-based concepts from unstructured textual documents.
Abstract
Summarizing a collection of unstructured textual documents, e.g., lecture slides or book chapters, by extracting the most relevant concepts helps learners realize connections among these concepts. However, to accomplish this goal ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management
November 2002
704 pages
ISBN:1581134924
DOI:10.1145/584792
General Chair:
Charles Nicholas
University of Maryland Baltimore County
,
Program Chairs:
David Grossman
Illinois Institute of Technology
,
Konstantinos Kalpakis
University of Maryland Baltimore County
,
Sajda Qureshi
Erasmus University, Rotterdam
,
Han van Dissel
Erasmus University, Rotterdam
,
Len Seligman
The MITRE Corporation
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 November 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering and labeling
concept discovery
concept tree construction and visualization
conceptual search
thematic mapping
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 1,373
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Thematic mapping - from unstructured documents to taxonomies

CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A relational model for unstructured documents

Thematic alignment of documents with meeting dialogs

Context-based extraction of concepts from unstructured textual documents