Skip to main content
Log in

Finding similar queries based on query representation analysis

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

In order to understand user intents behind their queries, many researchers study similar query finding. Recently, the click graph has shown its utility in describing the relationship between queries and URLs. The previous approaches mainly either generate related terms or find relevant queries based on the co-clicked URLs. However, these approaches may suffer from the complexity of natural language processing and click-through data sparseness. In this paper, we tackle this problem through three query probability distribution representation models: Click Model, Term Model, and Semantic Model. The Click Model extracts credible transition probability from queries to URLs, and describes a query without considering web contents. The Term Model focuses on representing a query via term distribution over its main entities and purposes, which can better capture information needs behind short and ambiguous keyword queries. The Semantic Model learns potential intent distribution of queries to distinguish user intents behind a query. Among the three models, we apply pairwise similarity metrics and graph-based personalized pagerank to find similar queries. Compared to traditional representation models, our representation models are verified to be effective and efficient, especially for long tail queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Beeferman, D., Berger, A.: Agglomerative clustering of a search engine query log. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’00, pp. 407–416. ACM, New York, NY (2000). doi:10.1145/347090.347176

    Google Scholar 

  2. Bendersky, M., Croft, W.B.: Modeling higher-order term dependencies in information retrieval using query hypergraphs. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, pp. 941–950. ACM, New York, NY (2012). doi:10.1145/2348283.2348408

    Chapter  Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., Vigna, S.: The query-flow graph: model and applications. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM ’08, pp. 609–618. ACM, New York, NY (2008). doi:10.1145/1458082.1458163

    Google Scholar 

  5. Boldi, P., Bonchi, F., Castillo, C., Donato, D., Vigna, S.: Query suggestions using query-flow graphs. In: Proceedings of the 2009 Workshop on Web Search Click Data, WSCD ’09, pp. 56–63. ACM, New York, NY (2009). doi:10.1145/1507509.1507518

    Chapter  Google Scholar 

  6. Bordino, I., Castillo, C., Donato, D., Gionis, A.: Query similarity by projecting the query-flow graph. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, pp. 515–522. ACM, New York, NY (2010). doi:10.1145/1835449.1835536

    Google Scholar 

  7. Castillo, J.J.: A wordnet-based semantic approach to textual entailment and cross-lingual textual entailment. IJMLC 2(3), 177–189 (2011). doi:10.1007/s13042-011-0026-z

    Google Scholar 

  8. Chen, J., Wang, Y., Liu, J., Huang, Y.: Modeling semantic and behavioral relations for query suggestion. In: Web-Age Information Management, pp. 678–690. Springer (2013)

  9. Craswell, N., Szummer, M.: Random walks on the click graph. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’07, pp. 239–246. ACM, New York, NY (2007). doi:10.1145/1277741.1277784

    Chapter  Google Scholar 

  10. Deng, H., King, I., Lyu, M.R.: Entropy-biased models for query representation on the click graph. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 339–346. ACM, New York, NY (2009). doi:10.1145/1571941.1572001

    Chapter  Google Scholar 

  11. Dou, Z., Hu, S., Luo, Y., Song, R., Wen, J.R.: Finding dimensions for queries. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pp. 1311–1320. ACM, New York, NY (2011). doi:10.1145/2063576.2063767

    Google Scholar 

  12. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231. Portland, Oregon, USA. AAAI Press 1996. ISBN 1-57735-004-9 (1996)

  13. Fujita, S., Dupret, G., Baeza-Yates, R.A.: Learning to rank query recommendations by semantic similarities. CoRR. arXiv:abs/1204.2712 (2012)

  14. Griffiths, T.: Gibbs sampling in the generative model of Latent Dirichlet Allocation. Tech. rep., Stanford University (2002). www-psych.stanford.edu/~gruffydd/cogsci02/lda.ps

  15. Guo, J., Cheng, X., Xu, G., Zhu, X.: Intent-aware query similarity. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pp. 259–268. ACM, New York, NY (2011). doi:10.1145/2063576.2063619

    Google Scholar 

  16. Haveliwala, T., Kamvar, S., Jeh, G.: An analytical comparison of approaches to personalizing pagerank. Technical Report 2003-35, Stanford InfoLab (2003)

  17. Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., Zheng, Q.: Mining query subtopics from search log data. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, pp. 305–314. ACM, New York, NY (2012). doi:10.1145/2348283.2348327

    Chapter  Google Scholar 

  18. Huang, C.K., Chien, L.F., Oyang, Y.J.: Relevant term suggestion in interactive web search based on contextual information in query session logs. JASIST 54(7), 638–649 (2003). doi:10.1002/asi.10256

    Article  Google Scholar 

  19. Huang, J., Gao, J., Miao, J., Li, X., Wang, K., Behr, F., Giles, C.L.: Exploring web scale language models for search query processing. In: Proceedings of the 19th International Conference on World Wide Web, WWW ’10, pp. 451–460. ACM, New York, NY (2010). doi:10.1145/1772690.1772737

    Chapter  Google Scholar 

  20. Ji-Rong, W., Jian-Yun, N., Zhang, H.J.: Query clustering using user logs. ACM Trans. Inf. Syst. 20(1), 59–81 (2002). doi:10.1145/503104.503108

    Article  Google Scholar 

  21. Jones, R., Rey, B., Madani, O., Greiner, W.: Generating query substitutions. In: Proceedings of the 15th International Conference on World Wide Web, WWW ’06, pp. 387–396. ACM, New York, NY (2006). doi:10.1145/1135777.1135835

    Chapter  Google Scholar 

  22. Liu, Y., Miao, J., Zhang, M., Ma, S., Ru, L.: How do users describe their information need: query recommendation based on snippet click model. Expert Syst. Appl. 38(11), 13,847–13,856 (2011). doi:10.1016/j.eswa.2011.04.188

    Google Scholar 

  23. Ma, H., Yang, H., King, I., Lyu, M.R.: Learning latent semantic relations from clickthrough data for query suggestion. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM ’08, pp. 709–718. ACM, New York, NY (2008). doi:10.1145/1458082.1458177

    Google Scholar 

  24. Mei, Q., Zhou, D., Church, K.: Query suggestion using hitting time. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM ’08, pp. 469–478. ACM, New York, NY (2008). doi:10.1145/1458082.1458145

    Google Scholar 

  25. Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, pp. 472–479. ACM, New York, NY (2005). doi:10.1145/1076034.1076115

    Chapter  Google Scholar 

  26. Metzler, D., Croft, W.B.: Latent concept expansion using markov random fields. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’07, pp. 311–318. ACM, New York, NY (2007). doi:10.1145/1277741.1277796

    Chapter  Google Scholar 

  27. Radlinski, F., Broder, A., Ciccolo, P., Gabrilovich, E., Josifovski, V., Riedel, L.: Optimizing relevance and revenue in ad search: a query substitution approach. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’08, pp. 403–410. ACM, New York, NY (2008). doi:10.1145/1390334.1390404

    Chapter  Google Scholar 

  28. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Proceedings of the 3rd Text REtrieval Conference, pp. 109–126. Department of Commerce, National Institute of Standards and Technology (1994)

  29. Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, WWW ’06, pp. 377–386. ACM, New York, NY (2006). doi:10.1145/1135777.1135834

    Chapter  Google Scholar 

  30. Song, Y., Zhou, D., He, L.w.: Query suggestion by constructing term-transition graphs. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining, WSDM ’12, pp. 353–362. ACM, New York, NY (2012). doi:10.1145/2124295.2124339

    Chapter  Google Scholar 

  31. Wang, H., Liang, Y., Fu, L., Xue, G.R., Yu, Y.: Efficient query expansion for advertisement search. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, pp. 51–58. ACM, New York, NY (2009). doi:10.1145/1571941.1571953

    Chapter  Google Scholar 

  32. Xue, X., Croft, W.B.: Generating reformulation trees for complex queries. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, pp. 525–534. ACM, New York, NY (2012). doi:10.1145/2348283.2348355

    Chapter  Google Scholar 

  33. Yi, X., Allan, J.: Discovering missing click-through query language information for web search. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pp. 153–162. ACM, New York, NY (2011). doi:10.1145/2063576.2063604

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Liu.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 1.02 MB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Y., Liu, J., Chen, J. et al. Finding similar queries based on query representation analysis. World Wide Web 17, 1161–1188 (2014). https://doi.org/10.1007/s11280-013-0233-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-013-0233-5

Keywords

Navigation