Generating Concept Hierarchies from User Queries

Wall, Bob; Richter, Neal; Angryk, Rafal

doi:10.1007/978-3-540-78488-3_25

Bob Wall⁶,
Neal Richter⁶ &
Rafal Angryk⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 118))

1209 Accesses

Summary

Most information retrieval (IR) systems are comprised of a focused set of domain-specific documents located within a single logical repository. A mechanism is developed by which user queries against a particular type of IR repository, a frequently asked question (FAQ) system, are used to generate a concept hierarchy pertinent to the domain. First, an algorithm is described which selects a set of user queries submitted to the system, extracts terms from the repository documents matching those queries, and then reduces this set of terms to a manageable length. The resulting terms are used to generate a feature vector for each query, and the queries are clustered using a hierarchical agglomerative clustering (HAC) algorithm. The HAC algorithm generates a binary tree of clusters, which is not particularly amenable to use by humans and which is slow to search due to its depth, so a subsequent processing step applies min-max partitioning to form a shallower, bushier tree that is a more natural representation of the hierarchy of concepts inherent in the system. Two alternative versions of the partitioning algorithm are compared to determine which produces a more usable concept hierarchy.

The goal is to generate a concept hierarchy that is built from phrases that users actually enter when searching the repository, which should make the hierarchy more usable for all users. While the algorithm presented here is applied to an FAQ system, the techniques can easily be extended to any IR system that allows users to submit natural language queries and that selects documents from the repository that match those queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Spangler S, Kreulen J (2001) Knowledge base maintenance using knowledge gap analysis. In: Proceedings of SIGKDD’01, San Francisco, CA, August, 2001, pp. 462–466
Google Scholar
Sanderson M, Croft B (1999) Deriving concept hierarchies from text. In: Proceedings of SIGIR’99, Berkeley, CA, August, 1999, pp. 206–213
Google Scholar
Cilibrasi R, Vitanyi P. Automatic meaning discovery using Google. Published on Web, available at http://arxiv.org/abs/cs/0412098
Chuang S-L, Chien L-F (2002) Towards automatic generation of query taxonomy: a hierarchical query clustering approach. In: Proceedings of ICDM’02, Maebashi City, Japan, December 9–12, 2002, pp. 75–82
Google Scholar
Chuang S-L, Chien L-F (2004) A practical web-based approach to generating topic hierarchy for text segments. In: Proceedings of CIKM’04, Washington DC, November, 2004, pp. 127–136
Google Scholar
Larsen B, Aone C (1999) Fast and effective text mining using linear-time document clustering. In: Proceedings of SIGKDD’99, San Diego, CA, August, 1999, pp. 16–22
Google Scholar
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. In: Journal of machine learning research, vol. 3, 2003, pp. 1289–1305
Article MATH Google Scholar
Jain A, Murty M, Flynn P (1999) Data clustering: a review. In: ACM computing surveys, vol. 31, no. 3, September, 1999, pp. 264–323
Article Google Scholar
Fodor IK (2002) A survey of dimension reduction techniques. LLNL technical report, June 2002, UCRL-ID-148494 (available at http://www.llnl.gov/CASC/sapphire/pubs/148494.pdf)
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. In: IEEE transactions on knowledge and data engineering, vol. 17, no. 4, April, 2005, pp. 491–502
Article Google Scholar
Dy JG, Brodley CE (2005) Feature selection for unsupervised learning. In: Journal of machine learning research, vol. 5, 2005, pp. 845–889
MathSciNet Google Scholar
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: KDD workshop on text mining, 2000
Google Scholar
Dhillon I, Fan J, Guan Y (2001) Efficient clustering of very large document collections. In: Grossman R, Kamath G, Naburu R (eds) Data mining for scientific and engineering applications, Kluwer, Boston
Google Scholar
Yager RR (2000) Intelligent control of the hierarchical agglomerative clustering process. In: IEEE transactions on systems, man, cybernetics, part B, vol. 30, no. 6, December 2000, pp. 835–845
Article Google Scholar
Mandhani B, Joshi S, Kummamuru K (2003) A matrix density based algorithm to hierarchically co-cluster documents and words. In: Proceedings of WWW2003, May 20–24, 2003, Budapest, Hungary, pp. 511–518
Google Scholar
Frigui H, Masraoui O (2004) Simultaneous clustering and dynamic keyword weighting for text documents. In: Berry, MW (ed) Survey of text mining: clustering, classification, and retrieval, Springer, Berlin Heidelberg New York, 2004, pp. 45–72
Google Scholar

Download references

Author information

Authors and Affiliations

RightNow Technologies, Bozeman, MT, USA
Bob Wall & Neal Richter
Montana State University, Bozeman, MT, USA
Rafal Angryk

Authors

Bob Wall
View author publications
You can also search for this author in PubMed Google Scholar
Neal Richter
View author publications
You can also search for this author in PubMed Google Scholar
Rafal Angryk
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, San Jose State University, San Jose, CA, 95192, USA
Tsau Young Lin
Department of Computer Science and Information Systems, Kennesaw State University, Building 11, Room 3060 1000 Chastain Road, Kennesaw, GA, 30144, USA
Ying Xie
Department of Computer Science, The University at Stony Brook, Stony Brook, New York, 11794-4400, USA
Anita Wasilewska
Institute of Information Science, Academia Sinica, No 128, Academia Road, Section 2 Nankang, Taipei, 11529, Taiwan
Churn-Jung Liau

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wall, B., Richter, N., Angryk, R. (2008). Generating Concept Hierarchies from User Queries. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_25

Download citation

DOI: https://doi.org/10.1007/978-3-540-78488-3_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78487-6
Online ISBN: 978-3-540-78488-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics