Query Classification by Leveraging Explicit Concept Information

Wang, Fang; Yang, Ze; Li, Zhoujun; Zhou, Jianshe

doi:10.1007/978-3-319-49586-6_45

Query Classification by Leveraging Explicit Concept Information

Fang Wang¹⁸,
Ze Yang¹⁸,
Zhoujun Li¹⁸ &
…
Jianshe Zhou¹⁹

Conference paper
First Online: 13 November 2016

2444 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10086))

Abstract

A key task in query understanding is interpreting user intentions from the limited words that the user submitted to the search engines. Query classification (QC) has been widely studied for this purpose, which classifies queries into a set of target categories as user search intents. Query classification is an important as well as difficult problem in the field of information retrieval, since the queries are usually short in length, ambiguous and noisy. In this case, traditional “bag-of-words” based classification methods fail to achieve high accuracy in the task of QC. In this paper, we propose to mine explicit “Concept” information to help resolve this problem. Specifically, we first leverage existing knowledge bases to enrich the short query from the concept level. Then we discuss the usage of the mined concept information and propose a novel language model based query classification method which takes both words and concepts into consideration. Experimental results show that the mined concepts are very informative and effective to improve query classification.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.kdd.org/kddcup/view/kddcup2005/Tasks.
2.
https://www.wikipedia.org/.
3.
http://en.wikipedia.org/wiki/Longest_prefix_match.
4.
Probase data is publicly available at http://probase.msra.cn/dataset.aspx.
5.
https://github.com/dnmilne/wikipediaminer.
6.
https://github.com/yago-naga/aida.
7.
http://cogcomp.cs.illinois.edu/page/download_view/Wikifier.

References

Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: Building bridges for web query classification. In: SIGIR (2006)
Google Scholar
Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Query enrichment for web-query classification. ACM TOIS 24(3), 320–352 (2006)
Article Google Scholar
Merkel, A., Klakow, D.: Language model based query classification. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 720–723. Springer, Heidelberg (2007). doi:10.1007/978-3-540-71496-5_77
Chapter Google Scholar
Cao, H., Hu, D.H., Shen, D., Jiang, D., Sun, J.-T., Chen, E., Yang, Q.: Context-aware query classification. In: SIGIR, pp. 3–10. ACM (2009)
Google Scholar
Hu, J., Wang, G., Lochovsky, F., Sun, J.-T., Chen, Z.: Understanding user’s query intent with wikipedia. In: WWW, pp. 471–480. ACM (2009)
Google Scholar
Yang, H., Hu, Q., He, L.: Learning topic-oriented word embedding for query classification. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9077, pp. 188–198. Springer, Heidelberg (2015). doi:10.1007/978-3-319-18038-0_15
Google Scholar
KhudaBukhsh, A.R., Bennett, P.N., White, R.W.: Building effective query classifiers: a case study in self-harm intent detection. In: CIKM, pp. 1735–1738. ACM (2015)
Google Scholar
Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large web search engine query log. In: ACM SIGIR Forum, vol. 33, pp. 6–12. ACM (1999)
Google Scholar
Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Q2c@UST: our winning solution to query classification in KDDCUP 2005. SIGKDD 7(2), 100–110 (2005)
Article Google Scholar
Dai, H.K., Zhao, L., Nie, Z., Wen, J.-R., Wang, L., Li, Y.: Detecting online commercial intention (OCI). In: WWW (2006)
Google Scholar
Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: SIGIR, pp. 231–238. ACM (2007)
Google Scholar
Wen, J.-R., Nie, J.-Y., Zhang, H.-J.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 59–81 (2002)
Article Google Scholar
Beitzel, S.M., Jensen, E.C., Frieder, O., Lewis, D.D., Chowdhury, A., Kolcz, A.: Improving automatic query classification via semi-supervised learning. In: ICDM, pp. 42–49. IEEE (2005)
Google Scholar
Beitzel, S.M., Jensen, E.C., Lewis, D.D., Chowdhury, A., Frieder, O.: Automatic classification of web queries using very large unlabeled query logs. ACM TOIS 25(2), 107–108 (2007)
Article Google Scholar
Arguello, J., Diaz, F., Callan, J., Crespo, J.-F.: Sources of evidence for vertical selection. In: SIGIR, pp. 315–322. ACM (2009)
Google Scholar
Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: AAAI (2006)
Google Scholar
Seifzadeh, S., Farahat, A.K., Kamel, M.S., Karray, F.: Short-text clustering using statistical semantics. In: WWW, pp. 805–810. ACM (2015)
Google Scholar
Huang, L.: Concept-based text clustering. Ph.D. thesis, The University of Waikato (2011)
Google Scholar
Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: SIGKDD, pp. 407–416. ACM (2000)
Google Scholar
Craswell, N., Szummer, M.: Random walks on the click graph. In: SIGIR, pp. 239–246. ACM (2007)
Google Scholar
Fellbaum, C.: WordNet. Wiley Online Library, New York (1998)
MATH Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD, pp. 1247–1250. ACM (2008)
Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: WWW, pp. 697–706. ACM (2007)
Google Scholar
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492. ACM (2012)
Google Scholar
Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: EMNLP-CoNLL, vol. 7, pp. 708–716 (2007)
Google Scholar
Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)
Google Scholar
Alhelbawy, A., Gaizauskas, R.: Graph ranking for collective named entity disambiguation. In: ACL, pp. 75–80. ACL (2014)
Google Scholar
Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. ACL (2011)
Google Scholar
Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL-HLT, pp. 1375–1384. ACL (2011)
Google Scholar
Li, P., Wang, H., Zhu, K.Q., Wang, Z., Wu, X.: Computing term similarity by large probabilistic ISA knowledge. In: CIKM, pp. 1401–1410. ACM (2013)
Google Scholar
Wang, F., Wang, Z., Li, Z., Wen, J.-R.: Concept-based short text classification and ranking. In: CIKM, pp. 1069–1078. ACM (2014)
Google Scholar
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to ad hoc information retrieval. In: SIGIR, pp. 334–342. ACM (2001)
Google Scholar
Cheng, X., Roth, D.: Relational inference for wikification. In: EMNLP 13. ACL (2013)
Google Scholar
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (ACM TIST) 2(3), 27 (2011)
Google Scholar

Download references

Acknowledgments

This work was supported by Beijing Advanced Innovation Center for Imaging Technology (No. BAICIT-2016001), the National Natural Science Foundation of China (Grand Nos. 61370126, 61672081), National High Technology Research and Development Program of China (No. 2015AA016004), the Fund of the State Key Laboratory of Software Development Environment (No. SKLSDE-2015ZX-16).

Author information

Authors and Affiliations

State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, People’s Republic of China
Fang Wang, Ze Yang & Zhoujun Li
Beijing Advanced Innovation Center for Imaging Technology, Capital Normal University, Beijing, 100048, People’s Republic of China
Jianshe Zhou

Authors

Fang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ze Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhoujun Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianshe Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fang Wang .

Editor information

Editors and Affiliations

University of Technology , Sydney, New South Wales, Australia
Jinyan Li
University of Queensland , Brisbane, Australia
Xue Li
Beijing Institute of Technology , Beijing, China
Shuliang Wang
University of Western Australia , Crawley, West Australia, Australia
Jianxin Li
University of Adelaide , Adelaide, South Australia, Australia
Quan Z. Sheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, F., Yang, Z., Li, Z., Zhou, J. (2016). Query Classification by Leveraging Explicit Concept Information. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q. (eds) Advanced Data Mining and Applications. ADMA 2016. Lecture Notes in Computer Science(), vol 10086. Springer, Cham. https://doi.org/10.1007/978-3-319-49586-6_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-49586-6_45
Published: 13 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49585-9
Online ISBN: 978-3-319-49586-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics