Skip to main content
Log in

Personalized query suggestion diversification in information retrieval

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Query suggestions help users refine their queries after they input an initial query. Previous work on query suggestion has mainly concentrated on approaches that are similarity-based or context-based, developing models that either focus on adapting to a specific user (personalization) or on diversifying query aspects in order to maximize the probability of the user being satisfied (diversification). We consider the task of generating query suggestions that are both personalized and diversified. We propose a personalized query suggestion diversification (PQSD) model, where a user’s long-term search behavior is injected into a basic greedy query suggestion diversification model that considers a user’s search context in their current session. Query aspects are identified through clicked documents based on the open directory project (ODP) with a latent dirichlet allocation (LDA) topic model. We quantify the improvement of our proposed PQSD model against a state-of-the-art baseline using the public america online (AOL) query log and show that it beats the baseline in terms of metrics used in query suggestion ranking and diversification. The experimental results show that PQSD achieves its best performance when only queries with clicked documents are taken as search context rather than all queries, especially when more query suggestions are returned in the list.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Chen W Y, Cai F, Chen H H, De Rijke M. Personalized query suggestion diversification. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2017, 817–820

  2. Yang S, Zhou D Y, He L W. Post-ranking query suggestion by diversifying search results. In: Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 815–824

  3. Li R R, Kao B, Bi B, Cheng R, Lo E. DQR: a probabilistic approach to diversified query recommendation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 16–25

  4. Ma H, Lyu M R, King I. Diversifying query suggestion results. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. 2010, 1399–1404

  5. Zhang Z Y, Nasraoui O. Mining search engine query logs for query recommendation. In: Proceedings of the 15th International Conference on World Wide Web. 2006, 1039–1040

  6. Cao H H, Jiang D X, Pei J, He Q, Liao Z, Chen E H, Li H. Context-aware query suggestion by mining click-through and session data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 875–883

  7. Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3(4): 993–1022

    MATH  Google Scholar 

  8. Pass G, Chowdhury A, Torgeson C. A picture of search. In: Proceedings of the 1st International Conference on Scalable Information Systems. 2006, 1–7

  9. Cai F, De Rijke M. A survey of query auto completion in information retrieval. Foundations and Trends in Information Retrieval, 2016, 10(4): 273–363

    Article  Google Scholar 

  10. Cai F, Liang S S, De Rijke M. Prefix-adaptive and time-sensitive personalized query auto completion. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(9): 2452–2466

    Article  Google Scholar 

  11. Cai F, De Rijke M. Learning from homologous queries and semantically related terms for query auto completion. Information Processing and Management, 2016, 52(4): 628–643

    Article  Google Scholar 

  12. Santos R L T, Peng J, Macdonald C, Ounis I. Explicit search result diversification through sub-queries. In: Proceedings of the 32nd European Conference on Information Retrieval. 2010, 87–99

    Google Scholar 

  13. Al-otaibi S, Ykhlef M. Hybrid immunizing solution for job recommender system. Frontiers of Computer Science, 2017, 11(3): 511–527

    Article  Google Scholar 

  14. Kharitonov E, Macdonald C, Serdyukov P, Ounis I. Intent models for contextualising and diversifying query suggestions. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013, 2303–2308

  15. Ziegler C N, McNee S M, Konstan J A, Lausen G. Improving recommendation lists through topic diversification. In: Proceedings of the 14th International Conference on World Wide Web. 2005, 22–32

  16. Li L, Yang Z L, Liu L, Kitsuregawa M. Query-URL bipartite based approach to personalized query recommendation. In: Proceedings of the 22nd AAAI Conference on Artificial Intelligence. 2008, 1189–1194

  17. Sharma S, Mangla N. Obtaining personalized and accurate query suggestion by using agglomerative clustering algorithm and P-QC method. International Journal of Engineering Research and Technology, 2012, 1(5): 28–35

    Google Scholar 

  18. Verberne S, Sappelli M, Järvelin K, Kraaij W. User simulations for interactive search: evaluating personalized query suggestion. In: Proceedings of the 2015 European Conference on Information Retrieval. 2015, 678–690

    Google Scholar 

  19. Vallet D, Castells P. Personalized diversification of search results. In: Proceedings of the 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012, 841–850

  20. Craswell N, Szummer M. Random walks on the click graph. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2007, 239–246

  21. Cui J W, Liu H Y, Yan J, JiL, Jin R M, He J, Guo Y Q, Chen Z, Du X Y. Multi-view random walk framework for search task discovery from click-through log. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 135–140

  22. Ma H, Yang H X, King I, R. Lyu M. Learning latent semantic relations from clickthrough data for query suggestion. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management. 2008, 709–718

  23. Mei Q Z, Zhou D, Church K. Query suggestion using hitting time. In: Proceedings of the 17th ACM International Conference on Information and Knowledge Management. 2008, 469–478

  24. Liang S S, Cai F, Ren Z C, de Rijke M. Efficient structured learning for personalized diversification. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(11): 2958–2973

    Article  Google Scholar 

  25. Huang C K, Chien L F, Oyang Y J. Relevant term suggestion in interactive web search based on contextual information in query session logs. Journal of the American Society for Information Science and Technology, 2003, 54(7): 638–649

    Article  Google Scholar 

  26. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In: Proceedings of Workshop at International Conference on Learning Representations. 2013, 1–13

  27. Cai F, Ridho R, De Rijke M. Diversifying query auto-completion. ACM Transactions on Information Systems, 2016, 34(4): 1–33

    Article  Google Scholar 

  28. Joachims T. Optimizing search engines using clickthrough data. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2002, 133–142

  29. Bollegala D, Matsuo Y, Ishizuka M. Measuring semantic similarity between words using Web search engines. In: Proceedings of the 16th International Conference on World Wide Web. 2007, 757–766

  30. Carbonell J, Goldstein J. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1998, 335–336

  31. Guo J F, Cheng X Q, Xu G, Zhu X F. Intent-aware query similarity. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011, 259–268

  32. Shah C, Croft W B. Evaluating high accuracy retrieval techniques. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2004, 2–9

  33. Clarke C L A, Kolla M, V. Cormack G, Vechtomova O, Ashkan A, Büttcher S, MacKinnon I. Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2008, 659–666

  34. Järvelin K, Kekäläinen J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 2002, 20(4): 422–446

    Article  Google Scholar 

  35. Chapelle O, Metzler D, Zhang Y, Grinspan P. Expected reciprocal rank for graded relevance. In: Proceedings of the 18th ACM International Conference on Information and Knowledge Management. 2009, 621–630

  36. Asuncion A, Welling M, Smyth P, Teh W Y. On smoothing and inference for topic models. In: Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence. 2009, 27–34

  37. Agrawal R, Gollapudi S, Halverson A, Ieong S. Diversifying search results. In: Proceedings of the 2009 International Conference on Web Search and Data Mining. 2009, 5–14

  38. Cai F, Wang S Q, De Rijke M. Behavior-based personalization in Web search. Journal of the Association for Information Science and Technology, 2017, 68(4): 855–868

    Article  Google Scholar 

  39. Sepliarskai A, Radlinski F, De Rijke M. Simple personalized search based on long-term behavioral signals. In: Proceedings of the 39th European Conference on Information Retrieval. 2017, 95–107

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (Grant No. 61702526), the National Advanced Research Project (6141B0801010b), Ahold Delhaize, Amsterdam Data Science, the Bloomberg Research Grant program, the Criteo Faculty Research Award program, Elsevier, the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement nr 312827 (VOX-Pol), the Microsoft Research PhD program, the Netherlands Institute for Sound and Vision, the Netherlands Organisation for Scientific Research (NWO) under project nrs (612.001.116, HOR-11-10, CI-14-25, 652.-002.001, 612.001.551, 652.001.003), and Yandex. All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Cai.

Additional information

A preliminary version of this paper is published in the proceedings of SI-GIR 2017 [1]. In this extension, we (1) examine the impact on the model performance introduced by the trade-off parameter λ2 which controls the contribution of personalization and diversification in our PQSD model via manually changing it from 0 to 1 with an interval 0.1; (2) investigate the sensitivity of our PQSD model to the number of query suggestions N, as a larger N simply increases the probability of including the ground truth in query suggestion list; and (3) include more related work and provide more detailed analyses of the approach and experimental results.

Wanyu Chen is a master student at the National University of Defense Technology, China. Her research interests include in query suggestion and information retrieval. She got her bachelor degree at the National University of Defense Technology majoring in System Engineering, China in 2015. She has published a SIGIR paper in 2017.

Fei Cai is an assistant professor at the National University of Defense Technology, China. He got his Doctor degree on Computer Science from the University of Amsterdam, The Netherlands under the supervision of Prof. Maarten de Rijke. His research interests include information retrieval and query formulation. He has several papers published in SIGIR, CIKM, FnTIR, TOIS, TKDE, etc. In addition, he serves as a PC member for CIKM and WSDM as well as a reviewer for SIGIR, WWW, WSDM, CIKM, TKDE, IPM, JASIST, etc.

Honghui Chen is a professor at the National University of Defense Technology, China. He got his Doctor degree on Operational Research from the National University of Defense Technology, China in 2007. His research interests include information system and information retrieval. He has published serval papers at SIGIR, IPM and other top journals.

Maarten de Rijke is a professor of computer science in the Informatics Institute at the University of Amsterdam, The Netherlands. He is a member of the Royal Netherlands Academy of Arts and Sciences. His research focus is on intelligent information access, with projects on self-learning search engines and semantic search. He is the Editor-in-Chief of ACM Transactions on Information Systems and of Foundations and Trends in Information Retrieval. De Rijke has published over 700 papers.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, W., Cai, F., Chen, H. et al. Personalized query suggestion diversification in information retrieval. Front. Comput. Sci. 14, 143602 (2020). https://doi.org/10.1007/s11704-018-7283-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-018-7283-x

Keywords

Navigation