Skip to main content

Query Classification by Leveraging Explicit Concept Information

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10086))

Abstract

A key task in query understanding is interpreting user intentions from the limited words that the user submitted to the search engines. Query classification (QC) has been widely studied for this purpose, which classifies queries into a set of target categories as user search intents. Query classification is an important as well as difficult problem in the field of information retrieval, since the queries are usually short in length, ambiguous and noisy. In this case, traditional “bag-of-words” based classification methods fail to achieve high accuracy in the task of QC. In this paper, we propose to mine explicit “Concept” information to help resolve this problem. Specifically, we first leverage existing knowledge bases to enrich the short query from the concept level. Then we discuss the usage of the mined concept information and propose a novel language model based query classification method which takes both words and concepts into consideration. Experimental results show that the mined concepts are very informative and effective to improve query classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.kdd.org/kddcup/view/kddcup2005/Tasks.

  2. 2.

    https://www.wikipedia.org/.

  3. 3.

    http://en.wikipedia.org/wiki/Longest_prefix_match.

  4. 4.

    Probase data is publicly available at http://probase.msra.cn/dataset.aspx.

  5. 5.

    https://github.com/dnmilne/wikipediaminer.

  6. 6.

    https://github.com/yago-naga/aida.

  7. 7.

    http://cogcomp.cs.illinois.edu/page/download_view/Wikifier.

References

  1. Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Building bridges for web query classification. In: SIGIR (2006)

    Google Scholar 

  2. Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Query enrichment for web-query classification. ACM TOIS 24(3), 320–352 (2006)

    Article  Google Scholar 

  3. Merkel, A., Klakow, D.: Language model based query classification. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 720–723. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71496-5_77

    Chapter  Google Scholar 

  4. Cao, H., Hu, D.H., Shen, D., Jiang, D., Sun, J.-T., Chen, E., Yang, Q.: Context-aware query classification. In: SIGIR, pp. 3–10. ACM (2009)

    Google Scholar 

  5. Hu, J., Wang, G., Lochovsky, F., Sun, J.-T., Chen, Z.: Understanding user’s query intent with wikipedia. In: WWW, pp. 471–480. ACM (2009)

    Google Scholar 

  6. Yang, H., Hu, Q., He, L.: Learning topic-oriented word embedding for query classification. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp. 188–198. Springer, Heidelberg (2015). doi:10.1007/978-3-319-18038-0_15

    Google Scholar 

  7. KhudaBukhsh, A.R., Bennett, P.N., White, R.W.: Building effective query classifiers: a case study in self-harm intent detection. In: CIKM, pp. 1735–1738. ACM (2015)

    Google Scholar 

  8. Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large web search engine query log. In: ACM SIGIR Forum, vol. 33, pp. 6–12. ACM (1999)

    Google Scholar 

  9. Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Q2c@UST: our winning solution to query classification in KDDCUP 2005. SIGKDD 7(2), 100–110 (2005)

    Article  Google Scholar 

  10. Dai, H.K., Zhao, L., Nie, Z., Wen, J.-R., Wang, L., Li, Y.: Detecting online commercial intention (OCI). In: WWW (2006)

    Google Scholar 

  11. Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: SIGIR, pp. 231–238. ACM (2007)

    Google Scholar 

  12. Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 59–81 (2002)

    Article  Google Scholar 

  13. Beitzel, S.M., Jensen, E.C., Frieder, O., Lewis, D.D., Chowdhury, A., Kolcz, A.: Improving automatic query classification via semi-supervised learning. In: ICDM, pp. 42–49. IEEE (2005)

    Google Scholar 

  14. Beitzel, S.M., Jensen, E.C., Lewis, D.D., Chowdhury, A., Frieder, O.: Automatic classification of web queries using very large unlabeled query logs. ACM TOIS 25(2), 107–108 (2007)

    Article  Google Scholar 

  15. Arguello, J., Diaz, F., Callan, J., Crespo, J.-F.: Sources of evidence for vertical selection. In: SIGIR, pp. 315–322. ACM (2009)

    Google Scholar 

  16. Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: AAAI (2006)

    Google Scholar 

  17. Seifzadeh, S., Farahat, A.K., Kamel, M.S., Karray, F.: Short-text clustering using statistical semantics. In: WWW, pp. 805–810. ACM (2015)

    Google Scholar 

  18. Huang, L.: Concept-based text clustering. Ph.D. thesis, The University of Waikato (2011)

    Google Scholar 

  19. Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: SIGKDD, pp. 407–416. ACM (2000)

    Google Scholar 

  20. Craswell, N., Szummer, M.: Random walks on the click graph. In: SIGIR, pp. 239–246. ACM (2007)

    Google Scholar 

  21. Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)

    MATH  Google Scholar 

  22. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250. ACM (2008)

    Google Scholar 

  23. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706. ACM (2007)

    Google Scholar 

  24. Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492. ACM (2012)

    Google Scholar 

  25. Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)

    Google Scholar 

  26. Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)

    Google Scholar 

  27. Alhelbawy, A., Gaizauskas, R.: Graph ranking for collective named entity disambiguation. In: ACL, pp. 75–80. ACL (2014)

    Google Scholar 

  28. Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. ACL (2011)

    Google Scholar 

  29. Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL-HLT, pp. 1375–1384. ACL (2011)

    Google Scholar 

  30. Li, P., Wang, H., Zhu, K.Q., Wang, Z., Wu, X.: Computing term similarity by large probabilistic ISA knowledge. In: CIKM, pp. 1401–1410. ACM (2013)

    Google Scholar 

  31. Wang, F., Wang, Z., Li, Z., Wen, J.-R.: Concept-based short text classification and ranking. In: CIKM, pp. 1069–1078. ACM (2014)

    Google Scholar 

  32. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR, pp. 334–342. ACM (2001)

    Google Scholar 

  33. Cheng, X., Roth, D.: Relational inference for wikification. In: EMNLP 13. ACL (2013)

    Google Scholar 

  34. Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (ACM TIST) 2(3), 27 (2011)

    Google Scholar 

Download references

Acknowledgments

This work was supported by Beijing Advanced Innovation Center for Imaging Technology (No. BAICIT-2016001), the National Natural Science Foundation of China (Grand Nos. 61370126, 61672081), National High Technology Research and Development Program of China (No. 2015AA016004), the Fund of the State Key Laboratory of Software Development Environment (No. SKLSDE-2015ZX-16).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fang Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Wang, F., Yang, Z., Li, Z., Zhou, J. (2016). Query Classification by Leveraging Explicit Concept Information. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q. (eds) Advanced Data Mining and Applications. ADMA 2016. Lecture Notes in Computer Science(), vol 10086. Springer, Cham. https://doi.org/10.1007/978-3-319-49586-6_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49586-6_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49585-9

  • Online ISBN: 978-3-319-49586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics