Enhancing Sequence Representation for Personalized Search

Wang, Shijun; Zhang, Han; Yuan, Zhe

doi:10.1007/978-981-97-8367-0_2

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14761))

Included in the following conference series:

China National Conference on Chinese Computational Linguistics

207 Accesses

Abstract

The critical process of personalized search is to reorder candidate documents of the current query based on the user’s historical behavior sequence. There are many types of information contained in user historical information sequence, such as queries, documents, and clicks. Most existing personalized search approaches concatenate these types of information to get an overall user representation, but they ignore the associations among them. We believe the associations of different information mentioned above are significant to personalized search. Based on a hierarchical transformer as base architecture, we design three auxiliary tasks to capture the associations of different information in user behavior sequence. Under the guidance of mutual information, we adjust the training loss, enabling our PSMIM model to better enhance the information representation in personalized search. Experimental results demonstrate that our proposed method outperforms some personalized search methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

How to personalize and whether to personalize? Candidate documents decide

Article 27 May 2024

A Probabilistic Model for Information Retrieval by Mining User Behaviors

Article 29 January 2016

Long short-term search session-based document re-ranking model

Article 09 September 2024

References

Ahmad, W.U., Chang, K., Wang, H.: Multi-task learning for document ranking and query suggestion. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net (2018). https://openreview.net/forum?id=SJ1nzBeA-
Ahmad, W.U., Chang, K.W., Wang, H.: Context attentive document ranking and query suggestion. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 385–394. SIGIR’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3331184.3331246
Bennett, P.N., et al.: Modeling the impact of short- and long-term behavior on search personalization. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 185–194. SIGIR ’12, Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/3331184.3331246, https://doi.org/10.1145/2348283.2348312
Cai, F., Liang, S., de Rijke, M.: Personalized document re-ranking based on Bayesian probabilistic matrix factorization. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 835–838. SIGIR ’14, Association for Computing Machinery, New York, NY, USA (2014). https://doi.org/10.1145/2600428.2609453
Dai, Z., Xiong, C., Callan, J., Liu, Z.: Convolutional neural networks for soft-matching n-grams in ad-hoc search. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 126–134. WSDM ’18, Association for Computing Machinery, New York, NY, USA (2018).https://doi.org/10.1145/3159652.3159659
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota (2019). https://doi.org/10.18653/v1/N19-1423, https://www.aclweb.org/anthology/N19-1423
Dou, Z., Song, R., Wen, J.R.: A large-scale evaluation and analysis of personalized search strategies. In: Proceedings of the 16th International Conference on World Wide Web, pp. 581–590. WWW ’07, Association for Computing Machinery, New York, NY, USA (2007). https://doi.org/10.1145/1242572.1242651
Ge, S., Dou, Z., Jiang, Z., Nie, J.Y., Wen, J.R.: Personalizing search results using hierarchical RNN with query-aware attention. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 347-356. CIKM ’18, Association for Computing Machinery, New York, NY, USA (2018). https://doi.org/10.1145/3269206.3271728
Gutmann, M.U., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13(null), 307–361 (2012)
Google Scholar
Harvey, M., Crestani, F., Carman, M.J.: Building user profiles from topic models for personalised search. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 2309–2314. CIKM ’13, Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2505515.2505642
Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net (2019). https://openreview.net/forum?id=Bklr3j0cKX
Huang, J., Zhang, W., Sun, Y., Wang, H., Liu, T.: Improving entity recommendation with search log and multi-task learning. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 4107–4114. International Joint Conferences on Artificial Intelligence Organization (2018). https://doi.org/10.24963/ijcai.2018/571
Kong, L., de Masson d’Autume, C., Yu, L., Ling, W., Dai, Z., Yogatama, D.: A mutual information maximization perspective of language representation learning. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net (2020). https://openreview.net/forum?id=Syx79eBKwr
Logeswaran, L., Lee, H.: An efficient framework for learning sentence representations. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net (2018). https://openreview.net/forum?id=rJvJXZb0W
Lu, S., Dou, Z., Jun, X., Nie, J.Y., Wen, J.R.: PSGAN: a minimax game for personalized search with limited and noisy click data. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 555–564. SIGIR’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3331184.3331218
Lu, S., Dou, Z., Xiong, C., Wang, X., Wen, J.R.: Knowledge enhanced personalized search. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 709–718. SIGIR ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3397271.3401089
Ma, Z., Dou, Z., Bian, G., Wen, J.R.: PSTIE: time information enhanced personalized search. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1075–1084. CIKM ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3340531.3411877
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. CoRR abs/1807.03748 (2018). http://arxiv.org/abs/1807.03748
Paninski, L.: Estimation of entropy and mutual information. Neural Comput. 15(6), 1191–1253 (2014)
Article MathSciNet Google Scholar
Pass, G., Chowdhury, A., Torgeson, C.: A picture of search. In: Proceedings of the 1st International Conference on Scalable Information Systems, p. 1-es. InfoScale ’06, Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1146847.1146848
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retr. 3(4), 333–389 (2009). https://doi.org/10.1561/1500000019
Sieg, A., Mobasher, B., Burke, R.: Web search personalization with ontological user profiles. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 525–534. CIKM ’07, Association for Computing Machinery, New York, NY, USA (2007). https://doi.org/10.1145/1321440.1321515
Teevan, J., Liebling, D.J., Ravichandran Geetha, G.: Understanding and predicting personal navigation. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 85–94. WSDM ’11, Association for Computing Machinery, New York, NY, USA (2011). https://doi.org/10.1145/1935826.1935848
Vu, T., Nguyen, D.Q., Johnson, M., Song, D., Willis, A.: Search personalization with embeddings. In: Jose, J.M., et al. (eds.) Advances in Information Retrieval, pp. 598–604. Springer International Publishing, Cham (2017). https://doi.org/10.1007/978-3-319-56608-5_54
Chapter Google Scholar
Vu, T., Willis, A., Tran, S.N., Song, D.: Temporal latent topic user profiles for search personalisation. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) Advances in Information Retrieval, pp. 605–616. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_67
Chapter Google Scholar
White, R.W., Chu, W., Hassan, A., He, X., Song, Y., Wang, H.: Enhancing personalized search by mining and modeling task behavior. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1411–1420. WWW ’13, Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2488388.2488511
Xiong, C., Dai, Z., Callan, J., Liu, Z., Power, R.: End-to-end neural ad-hoc ranking with kernel pooling. ACM SIGIR Forum 51(cd), 55–64 (2017)
Google Scholar
Yao, J., Dou, Z., Wen, J.R.: Employing personal word embeddings for personalized search. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1359–1368. SIGIR ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3397271.3401153
Yao, J., Dou, Z., Xu, J., Wen, J.R.: RLPER: a reinforcement learning model for personalized search. In: Proceedings of The Web Conference 2020, pp. 2298–2308. WWW ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3366423.3380294
Zhou, K., et al.: S3-Rec: self-supervised learning for sequential recommendation with mutual information maximization. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 1893–1902. CIKM ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3340531.3411954
Zhou, Y., Dou, Z., Wen, J.R.: Encoding history with context-aware representation learning for personalized search. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1111–1120. SIGIR ’20, Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3397271.3401175

Download references

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve this paper. The study was supported by the China Postdoctoral Fellowship Program of CPSF(GZC20230287) and the Fundamental Research Funds for the Central Universities(2024QY004).

Author information

Authors and Affiliations

School of Information, Renmin University of China, Beijing, 100872, China
Shijun Wang
China Unicom Research Institute, Beijing, 100080, China
Han Zhang
Academy of Regional and Global Governance, Beijing Foreign Studies University, Beijing, 100089, China
Zhe Yuan

Authors

Shijun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Han Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhe Yuan .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Shanxi University, Taiyuan, China
Jiye Liang
Chinese Academy of Sciences, Beijing, China
Xianpei Han
Tsinghua University, Beijing, China
Zhiyuan Liu
King's College London, London, UK
Yulan He
Beijing Language and Culture University, Beijing, China
Gaoqi Rao
Chinese Academy of Sciences, Beijing, China
Yubo Chen
National University of Defense Technology, Changsha, China
Zhiliang Tian

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Appendices

Appendix 1 Mutual Information Maximization

Mutual information maximization is a pivotal strategy for integrating diverse forms of historical information. Rooted in information theory, mutual information (MI) is a valuable tool for quantifying the dependence between random variables. Its mathematical definition is expressed as:

$$\begin{aligned} I(A,B) = P(A) - P(A|B) = P(B) - P(B|A). \end{aligned}$$

(17)

Suppose A and B are different views of the input data, such as a word and its context in NLP tasks or a document and its historical context sequence in personalized search. Let f be a function receiving $A=a$ and $B=b$ as inputs. The primary aim of maximizing MI is to tune the parameters of f to maximize the mutual information I(A, B), thereby extracting the most discriminative and salient attributes of the samples.

The essence of effective feature extraction involves distinguishing a sample from the entire dataset by capturing its distinctive information. By maximizing mutual information, one can isolate and harness such unique characteristics. However, when f constitutes neural networks or other encoders, directly optimizing MI is usually tricky [19]. Thus, a common workaround is to find a tractable lower bound for I(A, B) that closely approximates the target function. A specific lower bound proved to be effective in practice is InfoNCE [14, 18], which is based on noise contrast estimation [9]. InfoNCE is defined as follows:

$$\begin{aligned} \textrm{InfoNCE} = \mathbb {E}_p(A,B) \left( f_\theta (a,b)-\mathbb {E}_{q( \tilde{\mathcal {B}} )} \left( log\sum _{\tilde{b} \in {\tilde{\mathcal {B}} }} {\textrm{exp} f_\theta (a,\tilde{b})} \right) \right) + log|\tilde{\mathcal {B}}|, \end{aligned}$$

(18)

where a and b are different views of the input data, and $f_{\theta }\in {\mathbb {R}}$ is a function whose parameter is $\theta $ (for example, dot product result expressed by word and context or cos distance). $\tilde{\mathcal {B}}$ is a set of samples taken from the distribution $q(\tilde{\mathcal {B}})$. The B set contains a positive sample b and $|\tilde{\mathcal {B}}| - 1$ negative samples. Learning representation based on this goal is also called contrastive learning.

We can see that the InfoNCE is analogous to the cross-entropy form the formula below when $\tilde{\mathcal {B}}$ can take all possible values of B (i.e., $\tilde{\mathcal {B}}=\mathcal {B}$) and they are uniformly distributed, maximizing InfoNCE is analogous to maximize the cross-entropy loss:

$$\begin{aligned} \mathbb {E}_p(A,B) = \left( f_\theta (a,b)- \textrm{log}\sum _{\tilde{b} \in {\mathcal {B} }} {\textrm{exp} f_\theta (a,\tilde{b})} \right) . \end{aligned}$$

(19)

Appendix 2 Implementation Details

The parameters of our model PSMIM are set as follows: The word embedding size is 100. The hidden size of the transformer layer in our base model is 512. The number of heads in multi-head attention is 8. The size of the MLP hidden layer is 256. In the experiment, we set the hyperparameters $z_1$, $z_2$ and $z_3$ in Formula 15 of three auxiliary tasks to 1.0. We use the Adam optimizer to minimize the final loss $L_\textrm{total}$, and the learning rate of our optimizer is $1e^{-3}$. In the experiment, we set $\alpha $ in Formula 16 to 1.0. In addition, the number of matched cores for the KRNM model is 11.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Zhang, H., Yuan, Z. (2025). Enhancing Sequence Representation for Personalized Search. In: Sun, M., et al. Chinese Computational Linguistics. CCL 2024. Lecture Notes in Computer Science(), vol 14761. Springer, Singapore. https://doi.org/10.1007/978-981-97-8367-0_2

Download citation

DOI: https://doi.org/10.1007/978-981-97-8367-0_2
Published: 29 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8366-3
Online ISBN: 978-981-97-8367-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Sequence Representation for Personalized Search