Skip to main content
Log in

Query intent mining with multiple dimensions of web search data

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

Understanding the users’ latent intents behind the search queries is critical for search engines. Hence, there has been an increasing attention on studying how to effectively mine the intents of search queries by analyzing search engine query log. However, we observe that the information richness of query log is not fully utilized so far and the information underuse heavily limits the performance of the existing methods. In this paper, we tackle the problem of query intent mining by taking full advantage of the information richness of query log from a multi-dimensional perspective. Specifically, we capture the latent relations between search queries via three different dimensions: the URL dimension, the session dimension and the term dimension. We first propose the Result-Oriented Framework (ROF), which is easy to implement and significantly improves both the precision and the recall of query intent mining. We further propose the Topic-Oriented Framework (TOF), in order to significantly reduce the online time and memory consumptions for query intent mining. TOF employs the Query Log Topic Model (QLTM) that derives the latent topics from query log to integrate the information of the three dimensions in a principled way. The latent topics that are considered as low-dimensional descriptions of the query relations and serve as the basis of efficient online query intent mining. We conduct extensive experiments on a major commercial search engine query log. Experimental results show that the two frameworks significantly outperform the state-of-the-art methods with respect to a variety of metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8

Similar content being viewed by others

Notes

  1. http://mathworld.wolfram.com/FrobeniusNorm.html

  2. The work was done when the first author visiting Yahoo Labs

  3. http://www.dmoz.org/

References

  1. Arguello, J., Diaz, F., Callan, J., Crespo, J.F.: In: SIGIR (2009)

  2. Baker, L., McCallum, A.: In: SIGIR (1998)

  3. Beeferman, D., Berger, A.: In: SIGKDD (2000)

  4. Blei, D., Ng, A., Jordan, M.: In: NIPS (2002)

  5. Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A., Vigna, S.: In: CIKM (2008)

  6. Broder, A: In SIGIR forum (2002)

  7. Broder, A., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: In: SIGIR (2007)

  8. Calderon-Benavides, L., Gonzalez-Caro, C., Baeza-Yates, R.: In: SIGIR Workshop (2010)

  9. Cao, H., Jiang, D., Pei, J., He, Q., Liao, Z., Chen, E., Li, H.: In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (ACM), pp. 875–883 (2008)

  10. Carman, M., Crestani, F., Harvey, M., Baillie, M.: In CIKM (2010)

  11. Celikyilmaz, A., Hakkani-Tur, D., Tur, G.: Leveraging web query logs to learn user intent via bayesian discrete latent variable model. ICML (2011)

  12. Craswell, N., Szummer, M.: In: SIGIR (2007)

  13. Dang, V., Xue, X., Croft, W.B.: In: CIKM (2011)

  14. Deng, H., Lyu, M.R.: In: SIGKDD (2009)

  15. Griffiths, T.L., Steyvers, M.: NAS (2004)

  16. Han, J., Wang, J., Lu, Y., Tzvetkov, P.: In: ICDM (2002)

  17. Hu, Y., Qian, Y., Li, H., Jiang, D., Pei, J., Zheng, Q.: In: SIGIR (2012)

  18. Jiang, D., Leung, K., Ng, W.: In: CIKM (2011)

  19. Jiang, D., Vosecky, J., Leung, K.W.T., Ng, W.: G-WSTD: a framework for geographic web search topic discovery. In: CIKM (2012)

  20. Jo, Y., Oh, A.H.: In: WSDM (2011)

  21. Lee, U., Liu, Z., Cho, J.: Automatic identification of user goals in web search. In: Proceedings of the 14th international conference on World Wide Web, pp. 391–400, (ACM, 2005)

  22. Li, X., Wang, Y.Y.: In: SIGIR (2008)

  23. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to information retrieval (2008)

  24. Pantel, P., Lin, T., Gamon, M.: In: ACL (2012)

  25. Pereira, F., Tishby, N., Lee, L.: Distributional clustering of English words. ACL (1993)

  26. Poblete, B., Castillo, C., Gionis, A.: In: CIKM (2008)

  27. Qian, Y., Sakai, T., Ye, J., Zheng, Q., Li, C.: Dynamic query intent mining from a search log stream. In: Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, pp. 1205–1208, (ACM, 2013)

  28. Radlinski, F., Szummer, M., Craswell, N.: In: WWW (2010)

  29. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: In: UAI (2004)

  30. Sadikov, E., Madhavan, J., Wang, L., Halevy, A.: In: WWW (2010)

  31. Shen, D., Pan, R., Sun, J.T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: In: TOIS (2006)

  32. Shen, D., Sun, J., Yang, Q., Chen, Z.: In: SIGIR (2006)

  33. http://www.seomoz.org/beginners-guide-to-seo/keywordresearch

  34. Wang, C.J., Chen, H.H.: Intent mining in search query logs for automatic search script generation. Knowl. Inf. Syst. 39(3), 513 (2014)

    Article  Google Scholar 

  35. Wang, X., Zhai, C.: In: CIKM (2008)

  36. Wallach, H.: In: ICML (2006)

  37. Wallach, H.M.: Unpublished doctoral dissertation. Univ. of Cambridge (2008)

  38. Wen, J.R., Nie, J.Y., Zhang, H.J.: Query clustering using content words and user feedback. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (ACM, 2001), pp. 442–443

  39. Yang, D., Shen, D.-R., Yu, G., Kou, Y., Nie, T.-Z.: Query intent disambiguation of keyword-based semantic entity search in dataspaces. J. Comput. Sci. Technol. 28(2), 382 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Di Jiang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, D., Leung, K.WT. & Ng, W. Query intent mining with multiple dimensions of web search data. World Wide Web 19, 475–497 (2016). https://doi.org/10.1007/s11280-015-0336-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-015-0336-2

Keywords

Navigation