Skip to main content

Generating Concept Hierarchies from User Queries

  • Chapter
Data Mining: Foundations and Practice

Part of the book series: Studies in Computational Intelligence ((SCI,volume 118))

  • 1209 Accesses

Summary

Most information retrieval (IR) systems are comprised of a focused set of domain-specific documents located within a single logical repository. A mechanism is developed by which user queries against a particular type of IR repository, a frequently asked question (FAQ) system, are used to generate a concept hierarchy pertinent to the domain. First, an algorithm is described which selects a set of user queries submitted to the system, extracts terms from the repository documents matching those queries, and then reduces this set of terms to a manageable length. The resulting terms are used to generate a feature vector for each query, and the queries are clustered using a hierarchical agglomerative clustering (HAC) algorithm. The HAC algorithm generates a binary tree of clusters, which is not particularly amenable to use by humans and which is slow to search due to its depth, so a subsequent processing step applies min-max partitioning to form a shallower, bushier tree that is a more natural representation of the hierarchy of concepts inherent in the system. Two alternative versions of the partitioning algorithm are compared to determine which produces a more usable concept hierarchy.

The goal is to generate a concept hierarchy that is built from phrases that users actually enter when searching the repository, which should make the hierarchy more usable for all users. While the algorithm presented here is applied to an FAQ system, the techniques can easily be extended to any IR system that allows users to submit natural language queries and that selects documents from the repository that match those queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Spangler S, Kreulen J (2001) Knowledge base maintenance using knowledge gap analysis. In: Proceedings of SIGKDD’01, San Francisco, CA, August, 2001, pp. 462–466

    Google Scholar 

  2. Sanderson M, Croft B (1999) Deriving concept hierarchies from text. In: Proceedings of SIGIR’99, Berkeley, CA, August, 1999, pp. 206–213

    Google Scholar 

  3. Cilibrasi R, Vitanyi P. Automatic meaning discovery using Google. Published on Web, available at http://arxiv.org/abs/cs/0412098

  4. Chuang S-L, Chien L-F (2002) Towards automatic generation of query taxonomy: a hierarchical query clustering approach. In: Proceedings of ICDM’02, Maebashi City, Japan, December 9–12, 2002, pp. 75–82

    Google Scholar 

  5. Chuang S-L, Chien L-F (2004) A practical web-based approach to generating topic hierarchy for text segments. In: Proceedings of CIKM’04, Washington DC, November, 2004, pp. 127–136

    Google Scholar 

  6. Larsen B, Aone C (1999) Fast and effective text mining using linear-time document clustering. In: Proceedings of SIGKDD’99, San Diego, CA, August, 1999, pp. 16–22

    Google Scholar 

  7. Forman G (2003) An extensive empirical study of feature selection metrics for text classification. In: Journal of machine learning research, vol. 3, 2003, pp. 1289–1305

    Article  MATH  Google Scholar 

  8. Jain A, Murty M, Flynn P (1999) Data clustering: a review. In: ACM computing surveys, vol. 31, no. 3, September, 1999, pp. 264–323

    Article  Google Scholar 

  9. Fodor IK (2002) A survey of dimension reduction techniques. LLNL technical report, June 2002, UCRL-ID-148494 (available at http://www.llnl.gov/CASC/sapphire/pubs/148494.pdf)

  10. Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. In: IEEE transactions on knowledge and data engineering, vol. 17, no. 4, April, 2005, pp. 491–502

    Article  Google Scholar 

  11. Dy JG, Brodley CE (2005) Feature selection for unsupervised learning. In: Journal of machine learning research, vol. 5, 2005, pp. 845–889

    MathSciNet  Google Scholar 

  12. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: KDD workshop on text mining, 2000

    Google Scholar 

  13. Dhillon I, Fan J, Guan Y (2001) Efficient clustering of very large document collections. In: Grossman R, Kamath G, Naburu R (eds) Data mining for scientific and engineering applications, Kluwer, Boston

    Google Scholar 

  14. Yager RR (2000) Intelligent control of the hierarchical agglomerative clustering process. In: IEEE transactions on systems, man, cybernetics, part B, vol. 30, no. 6, December 2000, pp. 835–845

    Article  Google Scholar 

  15. Mandhani B, Joshi S, Kummamuru K (2003) A matrix density based algorithm to hierarchically co-cluster documents and words. In: Proceedings of WWW2003, May 20–24, 2003, Budapest, Hungary, pp. 511–518

    Google Scholar 

  16. Frigui H, Masraoui O (2004) Simultaneous clustering and dynamic keyword weighting for text documents. In: Berry, MW (ed) Survey of text mining: clustering, classification, and retrieval, Springer, Berlin Heidelberg New York, 2004, pp. 45–72

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Wall, B., Richter, N., Angryk, R. (2008). Generating Concept Hierarchies from User Queries. In: Lin, T.Y., Xie, Y., Wasilewska, A., Liau, CJ. (eds) Data Mining: Foundations and Practice. Studies in Computational Intelligence, vol 118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78488-3_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78488-3_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78487-6

  • Online ISBN: 978-3-540-78488-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics