Abstract
A key task in query understanding is interpreting user intentions from the limited words that the user submitted to the search engines. Query classification (QC) has been widely studied for this purpose, which classifies queries into a set of target categories as user search intents. Query classification is an important as well as difficult problem in the field of information retrieval, since the queries are usually short in length, ambiguous and noisy. In this case, traditional “bag-of-words” based classification methods fail to achieve high accuracy in the task of QC. In this paper, we propose to mine explicit “Concept” information to help resolve this problem. Specifically, we first leverage existing knowledge bases to enrich the short query from the concept level. Then we discuss the usage of the mined concept information and propose a novel language model based query classification method which takes both words and concepts into consideration. Experimental results show that the mined concepts are very informative and effective to improve query classification.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
Probase data is publicly available at http://probase.msra.cn/dataset.aspx.
- 5.
- 6.
- 7.
References
Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Building bridges for web query classification. In: SIGIR (2006)
Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Query enrichment for web-query classification. ACM TOIS 24(3), 320–352 (2006)
Merkel, A., Klakow, D.: Language model based query classification. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 720–723. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71496-5_77
Cao, H., Hu, D.H., Shen, D., Jiang, D., Sun, J.-T., Chen, E., Yang, Q.: Context-aware query classification. In: SIGIR, pp. 3–10. ACM (2009)
Hu, J., Wang, G., Lochovsky, F., Sun, J.-T., Chen, Z.: Understanding user’s query intent with wikipedia. In: WWW, pp. 471–480. ACM (2009)
Yang, H., Hu, Q., He, L.: Learning topic-oriented word embedding for query classification. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp. 188–198. Springer, Heidelberg (2015). doi:10.1007/978-3-319-18038-0_15
KhudaBukhsh, A.R., Bennett, P.N., White, R.W.: Building effective query classifiers: a case study in self-harm intent detection. In: CIKM, pp. 1735–1738. ACM (2015)
Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large web search engine query log. In: ACM SIGIR Forum, vol. 33, pp. 6–12. ACM (1999)
Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Q2c@UST: our winning solution to query classification in KDDCUP 2005. SIGKDD 7(2), 100–110 (2005)
Dai, H.K., Zhao, L., Nie, Z., Wen, J.-R., Wang, L., Li, Y.: Detecting online commercial intention (OCI). In: WWW (2006)
Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: SIGIR, pp. 231–238. ACM (2007)
Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 59–81 (2002)
Beitzel, S.M., Jensen, E.C., Frieder, O., Lewis, D.D., Chowdhury, A., Kolcz, A.: Improving automatic query classification via semi-supervised learning. In: ICDM, pp. 42–49. IEEE (2005)
Beitzel, S.M., Jensen, E.C., Lewis, D.D., Chowdhury, A., Frieder, O.: Automatic classification of web queries using very large unlabeled query logs. ACM TOIS 25(2), 107–108 (2007)
Arguello, J., Diaz, F., Callan, J., Crespo, J.-F.: Sources of evidence for vertical selection. In: SIGIR, pp. 315–322. ACM (2009)
Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: AAAI (2006)
Seifzadeh, S., Farahat, A.K., Kamel, M.S., Karray, F.: Short-text clustering using statistical semantics. In: WWW, pp. 805–810. ACM (2015)
Huang, L.: Concept-based text clustering. Ph.D. thesis, The University of Waikato (2011)
Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: SIGKDD, pp. 407–416. ACM (2000)
Craswell, N., Szummer, M.: Random walks on the click graph. In: SIGIR, pp. 239–246. ACM (2007)
Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250. ACM (2008)
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706. ACM (2007)
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492. ACM (2012)
Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)
Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)
Alhelbawy, A., Gaizauskas, R.: Graph ranking for collective named entity disambiguation. In: ACL, pp. 75–80. ACL (2014)
Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. ACL (2011)
Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL-HLT, pp. 1375–1384. ACL (2011)
Li, P., Wang, H., Zhu, K.Q., Wang, Z., Wu, X.: Computing term similarity by large probabilistic ISA knowledge. In: CIKM, pp. 1401–1410. ACM (2013)
Wang, F., Wang, Z., Li, Z., Wen, J.-R.: Concept-based short text classification and ranking. In: CIKM, pp. 1069–1078. ACM (2014)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR, pp. 334–342. ACM (2001)
Cheng, X., Roth, D.: Relational inference for wikification. In: EMNLP 13. ACL (2013)
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (ACM TIST) 2(3), 27 (2011)
Acknowledgments
This work was supported by Beijing Advanced Innovation Center for Imaging Technology (No. BAICIT-2016001), the National Natural Science Foundation of China (Grand Nos. 61370126, 61672081), National High Technology Research and Development Program of China (No. 2015AA016004), the Fund of the State Key Laboratory of Software Development Environment (No. SKLSDE-2015ZX-16).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Wang, F., Yang, Z., Li, Z., Zhou, J. (2016). Query Classification by Leveraging Explicit Concept Information. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q. (eds) Advanced Data Mining and Applications. ADMA 2016. Lecture Notes in Computer Science(), vol 10086. Springer, Cham. https://doi.org/10.1007/978-3-319-49586-6_45
Download citation
DOI: https://doi.org/10.1007/978-3-319-49586-6_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49585-9
Online ISBN: 978-3-319-49586-6
eBook Packages: Computer ScienceComputer Science (R0)