Skip to main content

Enhancing Sequence Representation for Personalized Search

  • Conference paper
  • First Online:
Chinese Computational Linguistics (CCL 2024)

Abstract

The critical process of personalized search is to reorder candidate documents of the current query based on the user’s historical behavior sequence. There are many types of information contained in user historical information sequence, such as queries, documents, and clicks. Most existing personalized search approaches concatenate these types of information to get an overall user representation, but they ignore the associations among them. We believe the associations of different information mentioned above are significant to personalized search. Based on a hierarchical transformer as base architecture, we design three auxiliary tasks to capture the associations of different information in user behavior sequence. Under the guidance of mutual information, we adjust the training loss, enabling our PSMIM model to better enhance the information representation in personalized search. Experimental results demonstrate that our proposed method outperforms some personalized search methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ahmad, W.U., Chang, K., Wang, H.: Multi-task learning for document ranking and query suggestion. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net (2018). https://openreview.net/forum?id=SJ1nzBeA-

  2. Ahmad, W.U., Chang, K.W., Wang, H.: Context attentive document ranking and query suggestion. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 385–394. SIGIR’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3331184.3331246

  3. Bennett, P.N., et al.: Modeling the impact of short- and long-term behavior on search personalization. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 185–194. SIGIR ’12, Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/3331184.3331246, https://doi.org/10.1145/2348283.2348312

  4. Cai, F., Liang, S., de Rijke, M.: Personalized document re-ranking based on Bayesian probabilistic matrix factorization. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 835–838. SIGIR ’14, Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2600428.2609453

  5. Dai, Z., Xiong, C., Callan, J., Liu, Z.: Convolutional neural networks for soft-matching n-grams in ad-hoc search. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 126–134. WSDM ’18, Association for Computing Machinery, New York, NY, USA (2018).https://doi.org/10.1145/3159652.3159659

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423, https://www.aclweb.org/anthology/N19-1423

  7. Dou, Z., Song, R., Wen, J.R.: A large-scale evaluation and analysis of personalized search strategies. In: Proceedings of the 16th International Conference on World Wide Web, pp. 581–590. WWW ’07, Association for Computing Machinery, New York, NY, USA (2007). https://doi.org/10.1145/1242572.1242651

  8. Ge, S., Dou, Z., Jiang, Z., Nie, J.Y., Wen, J.R.: Personalizing search results using hierarchical RNN with query-aware attention. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 347-356. CIKM ’18, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3269206.3271728

  9. Gutmann, M.U., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13(null), 307–361 (2012)

    Google Scholar 

  10. Harvey, M., Crestani, F., Carman, M.J.: Building user profiles from topic models for personalised search. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 2309–2314. CIKM ’13, Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2505515.2505642

  11. Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net (2019). https://openreview.net/forum?id=Bklr3j0cKX

  12. Huang, J., Zhang, W., Sun, Y., Wang, H., Liu, T.: Improving entity recommendation with search log and multi-task learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 4107–4114. International Joint Conferences on Artificial Intelligence Organization (2018). https://doi.org/10.24963/ijcai.2018/571

  13. Kong, L., de Masson d’Autume, C., Yu, L., Ling, W., Dai, Z., Yogatama, D.: A mutual information maximization perspective of language representation learning. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net (2020). https://openreview.net/forum?id=Syx79eBKwr

  14. Logeswaran, L., Lee, H.: An efficient framework for learning sentence representations. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net (2018). https://openreview.net/forum?id=rJvJXZb0W

  15. Lu, S., Dou, Z., Jun, X., Nie, J.Y., Wen, J.R.: PSGAN: a minimax game for personalized search with limited and noisy click data. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 555–564. SIGIR’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3331184.3331218

  16. Lu, S., Dou, Z., Xiong, C., Wang, X., Wen, J.R.: Knowledge enhanced personalized search. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 709–718. SIGIR ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3397271.3401089

  17. Ma, Z., Dou, Z., Bian, G., Wen, J.R.: PSTIE: time information enhanced personalized search. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1075–1084. CIKM ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3340531.3411877

  18. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. CoRR abs/1807.03748 (2018). http://arxiv.org/abs/1807.03748

  19. Paninski, L.: Estimation of entropy and mutual information. Neural Comput. 15(6), 1191–1253 (2014)

    Article  MathSciNet  Google Scholar 

  20. Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: Proceedings of the 1st International Conference on Scalable Information Systems, p. 1-es. InfoScale ’06, Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1146847.1146848

  21. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019

  22. Sieg, A., Mobasher, B., Burke, R.: Web search personalization with ontological user profiles. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 525–534. CIKM ’07, Association for Computing Machinery, New York, NY, USA (2007). https://doi.org/10.1145/1321440.1321515

  23. Teevan, J., Liebling, D.J., Ravichandran Geetha, G.: Understanding and predicting personal navigation. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 85–94. WSDM ’11, Association for Computing Machinery, New York, NY, USA (2011). https://doi.org/10.1145/1935826.1935848

  24. Vu, T., Nguyen, D.Q., Johnson, M., Song, D., Willis, A.: Search personalization with embeddings. In: Jose, J.M., et al. (eds.) Advances in Information Retrieval, pp. 598–604. Springer International Publishing, Cham (2017). https://doi.org/10.1007/978-3-319-56608-5_54

    Chapter  Google Scholar 

  25. Vu, T., Willis, A., Tran, S.N., Song, D.: Temporal latent topic user profiles for search personalisation. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) Advances in Information Retrieval, pp. 605–616. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_67

    Chapter  Google Scholar 

  26. White, R.W., Chu, W., Hassan, A., He, X., Song, Y., Wang, H.: Enhancing personalized search by mining and modeling task behavior. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1411–1420. WWW ’13, Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2488388.2488511

  27. Xiong, C., Dai, Z., Callan, J., Liu, Z., Power, R.: End-to-end neural ad-hoc ranking with kernel pooling. ACM SIGIR Forum 51(cd), 55–64 (2017)

    Google Scholar 

  28. Yao, J., Dou, Z., Wen, J.R.: Employing personal word embeddings for personalized search. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1359–1368. SIGIR ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3397271.3401153

  29. Yao, J., Dou, Z., Xu, J., Wen, J.R.: RLPER: a reinforcement learning model for personalized search. In: Proceedings of The Web Conference 2020, pp. 2298–2308. WWW ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3366423.3380294

  30. Zhou, K., et al.: S3-Rec: self-supervised learning for sequential recommendation with mutual information maximization. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1893–1902. CIKM ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3340531.3411954

  31. Zhou, Y., Dou, Z., Wen, J.R.: Encoding history with context-aware representation learning for personalized search. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1111–1120. SIGIR ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3397271.3401175

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve this paper. The study was supported by the China Postdoctoral Fellowship Program of CPSF(GZC20230287) and the Fundamental Research Funds for the Central Universities(2024QY004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhe Yuan .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Appendices

Appendix 1 Mutual Information Maximization

Mutual information maximization is a pivotal strategy for integrating diverse forms of historical information. Rooted in information theory, mutual information (MI) is a valuable tool for quantifying the dependence between random variables. Its mathematical definition is expressed as:

$$\begin{aligned} I(A,B) = P(A) - P(A|B) = P(B) - P(B|A). \end{aligned}$$
(17)

Suppose A and B are different views of the input data, such as a word and its context in NLP tasks or a document and its historical context sequence in personalized search. Let f be a function receiving \(A=a\) and \(B=b\) as inputs. The primary aim of maximizing MI is to tune the parameters of f to maximize the mutual information I(AB), thereby extracting the most discriminative and salient attributes of the samples.

The essence of effective feature extraction involves distinguishing a sample from the entire dataset by capturing its distinctive information. By maximizing mutual information, one can isolate and harness such unique characteristics. However, when f constitutes neural networks or other encoders, directly optimizing MI is usually tricky [19]. Thus, a common workaround is to find a tractable lower bound for I(AB) that closely approximates the target function. A specific lower bound proved to be effective in practice is InfoNCE [14, 18], which is based on noise contrast estimation [9]. InfoNCE is defined as follows:

$$\begin{aligned} \textrm{InfoNCE} = \mathbb {E}_p(A,B) \left( f_\theta (a,b)-\mathbb {E}_{q( \tilde{\mathcal {B}} )} \left( log\sum _{\tilde{b} \in {\tilde{\mathcal {B}} }} {\textrm{exp} f_\theta (a,\tilde{b})} \right) \right) + log|\tilde{\mathcal {B}}|, \end{aligned}$$
(18)

where a and b are different views of the input data, and \(f_{\theta }\in {\mathbb {R}}\) is a function whose parameter is \(\theta \) (for example, dot product result expressed by word and context or cos distance). \(\tilde{\mathcal {B}}\) is a set of samples taken from the distribution \(q(\tilde{\mathcal {B}})\). The B set contains a positive sample b and \(|\tilde{\mathcal {B}}| - 1\) negative samples. Learning representation based on this goal is also called contrastive learning.

We can see that the InfoNCE is analogous to the cross-entropy form the formula below when \(\tilde{\mathcal {B}}\) can take all possible values of B (i.e., \(\tilde{\mathcal {B}}=\mathcal {B}\)) and they are uniformly distributed, maximizing InfoNCE is analogous to maximize the cross-entropy loss:

$$\begin{aligned} \mathbb {E}_p(A,B) = \left( f_\theta (a,b)- \textrm{log}\sum _{\tilde{b} \in {\mathcal {B} }} {\textrm{exp} f_\theta (a,\tilde{b})} \right) . \end{aligned}$$
(19)

Appendix 2 Implementation Details

The parameters of our model PSMIM are set as follows: The word embedding size is 100. The hidden size of the transformer layer in our base model is 512. The number of heads in multi-head attention is 8. The size of the MLP hidden layer is 256. In the experiment, we set the hyperparameters \(z_1\), \(z_2\) and \(z_3\) in Formula 15 of three auxiliary tasks to 1.0. We use the Adam optimizer to minimize the final loss \(L_\textrm{total}\), and the learning rate of our optimizer is \(1e^{-3}\). In the experiment, we set \(\alpha \) in Formula 16 to 1.0. In addition, the number of matched cores for the KRNM model is 11.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S., Zhang, H., Yuan, Z. (2025). Enhancing Sequence Representation for Personalized Search. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2024. Lecture Notes in Computer Science(), vol 14761. Springer, Singapore. https://doi.org/10.1007/978-981-97-8367-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-8367-0_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-8366-3

  • Online ISBN: 978-981-97-8367-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics