skip to main content
research-article

Learning with click graph for query intent classification

Published: 02 July 2010 Publication History

Abstract

Topical query classification, as one step toward understanding users' search intent, is gaining increasing attention in information retrieval. Previous works on this subject primarily focused on enrichment of query features, for example, by augmenting queries with search engine results. In this work, we investigate a completely orthogonal approach—instead of improving feature representation, we aim at drastically increasing the amount of training data. To this end, we propose two semisupervised learning methods that exploit user click-through data. In one approach, we infer class memberships of unlabeled queries from those of labeled ones according to their proximities in a click graph; and then use these automatically labeled queries to train classifiers using query terms as features. In a second approach, click graph learning and query classifier training are conducted jointly with an integrated objective. Our methods are evaluated in two applications, product intent and job intent classification. In both cases, we expand the training data by over two orders of magnitude, leading to significant improvements in classification performance. An additional finding is that with a large amount of training data obtained in this fashion, a classifier based on simple query term features can outperform those using state-of-the-art, augmented features.

References

[1]
Agichtein, E., Brill, E., and Dumais, S. 2006. Improving web search ranking by incorporating user behavior information. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'06). 19--26.
[2]
Baker, L. D. and McCallum, A. 1998. Distributional clustering of words for text classification. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98). 96--103.
[3]
Beeferman, D. and Berger, A. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (SIGKDD). 407--416.
[4]
Beitzel, S., Jensen, E., Chowdhury, A., and Frieder, O. 2007. Varying approaches to topical web query classification. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development (SIGIR'07). 783--784.
[5]
Beitzel, S., Jensen, E., Frieder, O., Lewis, D., Chowdhury, A., and Kolcz, A. 2005. Improving automatic query classification via semi-supervised learning. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM'05).
[6]
Belkin, M., Niyogi, P., and Sindhwani, V. 2006. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, Nov.
[7]
Blum, A. and Mitchell, T. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the Workshop on Computational Learning Theory (COLT).
[8]
Broder, A. 2002. A taxonomy of web search. SIGIR Forum 36, 3--10.
[9]
Broder, A., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., and Zhang, T. 2007. Robust classification of rare queries using web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07).
[10]
Craswell, N. and Szummer, M. 2007. Random walk on the click graph. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). 239--246.
[11]
Gravano, L., Hatzivassiloglou, V., and Lichtenstein, R. 2003. Categorizing web queries according to geographical locality. In Proceedings of the 12th International Conference on Information and Knowledge Management. 325--333.
[12]
Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
[13]
He, X. and Jhala, P. 2008. Regularized query classification using search click information. Patt. Recog. 41, 2283--2288.
[14]
Lee, U., Liu, Z., and Cho, J. 2005. Automatic identification of user goals in web search. In Proceedings of the 14th International World Wide Web Conference 2005 (WWW).
[15]
Li, X., Wang, Y.-Y., and Acero, A. 2008. Learning query intent from regularized click graph. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'08). 339--346.
[16]
Nigam, K., Lafferty, J., and McCallum, A. 1999. Using maximum entropy for text classification. In Workshop on Machine Learning for Information Filtering (IJCAI'99). 61--67.
[17]
Pereira, F. C., Tishby, N., and Lee, L. 1993. Distributional clustering of English words. In Proceedings of thes 30th Annual Meeting of the Association for Computational Linguistics. 183--190.
[18]
Rose, D. E. and Levinson, D. 2004. Understanding user goals in web search. In Proceedings of the 13th International Conference on World Wide Web. 13--19.
[19]
Shen, D., Sun, J., Yang, Q., and Chen, Z. 2006. Building bridges for web query classification. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'06). 131--138.
[20]
Szummer, M. and Jaakkola, T. 2001. Partially labeled classification with Markov random walks. In Advances in Neural Information Processing Systems, vol. 14.
[21]
Wen, J.-R., Nie, J.-Y., and Zhang, H.-J. 2001. Clustering user queries of a search engine. In Proceedings of the 10th International World Wide Web Conference.
[22]
Xue, G.-R., Shen, D., Yang, Q., Zeng, H.-J., Chen, Z., Yu, Y., Xi, W., and Ma, W.-Y. 2004. IRC: An iterative reinforcement categorization algorithm for interrelated Web objects. In Proceedings of the 4th IEEE International Conference on Data Mining.
[23]
Yarowsky, D. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. 189--196.
[24]
Zhou, D., Bousquet, O., Lal, T., Weston, J., and Schölkopf, B. 2003. Learning with local and global consistency. In Advances in Neural Information Processing Systems.
[25]
Zhu, X. and Ghahramani, Z. 2002. Learning from labeled and unlabeled data with label propagation. Tech. rep. CMU-CALD-02, Carnegie Mellon University.

Cited By

View all
  • (2021)Burstiness-Aware Web Search Analysis on Different Levels of EvidencesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3109304(1-1)Online publication date: 2021
  • (2021)Regional Language Code-Switching for Natural Language Understanding and Intelligent Digital AssistantsInnovations in Electrical and Electronic Engineering10.1007/978-981-16-0749-3_71(927-948)Online publication date: 25-May-2021
  • (2020)Query Intent UnderstandingQuery Understanding for Search Engines10.1007/978-3-030-58334-7_4(69-101)Online publication date: 2-Dec-2020
  • Show More Cited By

Index Terms

  1. Learning with click graph for query intent classification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Information Systems
    ACM Transactions on Information Systems  Volume 28, Issue 3
    June 2010
    231 pages
    ISSN:1046-8188
    EISSN:1558-2868
    DOI:10.1145/1777432
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 July 2010
    Accepted: 01 August 2009
    Revised: 01 February 2009
    Received: 01 September 2008
    Published in TOIS Volume 28, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Semisupervised learning
    2. click graph
    3. query classification
    4. user intent

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Burstiness-Aware Web Search Analysis on Different Levels of EvidencesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3109304(1-1)Online publication date: 2021
    • (2021)Regional Language Code-Switching for Natural Language Understanding and Intelligent Digital AssistantsInnovations in Electrical and Electronic Engineering10.1007/978-981-16-0749-3_71(927-948)Online publication date: 25-May-2021
    • (2020)Query Intent UnderstandingQuery Understanding for Search Engines10.1007/978-3-030-58334-7_4(69-101)Online publication date: 2-Dec-2020
    • (2019)A hybrid deep neural network model for query intent classificationJournal of Intelligent & Fuzzy Systems10.3233/JIFS-182682(1-11)Online publication date: 24-May-2019
    • (2019)Domain Identification for Commercial Intention-holding Posts on Twitter2019 International Conference on Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA)10.1109/CyberSA.2019.8899491(1-10)Online publication date: Jun-2019
    • (2017)Dynamic Data-Cache Locking for Minimizing the WCET of a Single TaskACM Transactions on Embedded Computing Systems10.1145/299460216:2(1-29)Online publication date: 2-Jan-2017
    • (2017)Continuous Learning as a Service for Conversational Virtual AgentsService-Oriented Computing10.1007/978-3-319-69035-3_47(641-656)Online publication date: 13-Nov-2017
    • (2016)User Intent in Multimedia SearchACM Computing Surveys10.1145/295493049:2(1-37)Online publication date: 13-Aug-2016
    • (2015)Improving Ranking Consistency for Web Search by Leveraging a Knowledge Base and Search LogsProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806479(1441-1450)Online publication date: 17-Oct-2015
    • (2015)Learning Topic-Oriented Word Embedding for Query ClassificationAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-18038-0_15(188-198)Online publication date: 17-Apr-2015
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media