short-paper

Pay Attention to Attention for Sequential Recommendation

Authors:

Xiaojing LiuAuthors Info & Claims

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

Pages 890 - 895

https://doi.org/10.1145/3640457.3688164

Published: 08 October 2024 Publication History

Abstract

Transformer-based approaches have demonstrated remarkable success in various sequence-based tasks. However, traditional self-attention models may not sufficiently capture the intricate dependencies within items in sequential recommendation scenarios. This is due to the lack of explicit emphasis on attention weights, which play a critical role in allocating attention and understanding item-to-item correlations. To better exploit the potential of attention weights and improve the capability of sequential recommendation in learning high-order dependencies, we propose a novel sequential recommendation (SR) approach called attention weight refinement (AWRSR). AWRSR enhances the effectiveness of self-attention by additionally paying attention to attention weights, allowing for more refined attention distributions of correlations among items. We conduct comprehensive experiments on multiple real-world datasets, demonstrating that our approach consistently outperforms state-of-the-art SR models. Moreover, we provide a thorough analysis of AWRSR’s effectiveness in capturing higher-level dependencies. These findings suggest that AWRSR offers a promising new direction for enhancing the performance of self-attention architecture in SR tasks, with potential applications in other sequence-based problems as well.

References

[1]

Samira Abnar and Willem Zuidema. 2020. Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928 (2020).

[2]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[3]

Philippe Clement and Wolfgang Desch. 2008. An elementary proof of the triangle inequality for the Wasserstein metric. Proc. Amer. Math. Soc. 136, 1 (2008), 333–339.

[4]

Robin Devooght and Hugues Bersini. 2017. Long and short-term recommendations with recurrent neural networks. In Proceedings of the 25th conference on user modeling, adaptation and personalization. 13–21.

Digital Library

[5]

Yue Ding, Yuxiang Shi, Bo Chen, Chenghua Lin, Hongtao Lu, Jie Li, Ruiming Tang, and Dong Wang. 2021. Semi-deterministic and contrastive variational graph autoencoder for recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 382–391.

Digital Library

[6]

Tim Donkers, Benedikt Loepp, and Jürgen Ziegler. 2017. Sequential user-based recurrent neural network recommendations. In Proceedings of the eleventh ACM conference on recommender systems. 152–160.

Digital Library

[7]

Ziwei Fan, Zhiwei Liu, Yu Wang, Alice Wang, Zahra Nazari, Lei Zheng, Hao Peng, and Philip S Yu. 2022. Sequential recommendation via stochastic self-attention. In Proceedings of the ACM Web Conference 2022. 2036–2047.

Digital Library

[8]

Chaoxu Guo, Bin Fan, Jie Gu, Qian Zhang, Shiming Xiang, Veronique Prinet, and Chunhong Pan. 2019. Progressive sparse local attention for video object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3909–3918.

[9]

Ruining He and Julian McAuley. 2016. Fusing similarity models with markov chains for sparse sequential recommendation. In 2016 IEEE 16th international conference on data mining (ICDM). IEEE, 191–200.

[10]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web. 173–182.

Digital Library

[11]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2015. Session-based recommendations with recurrent neural networks. arXiv preprint arXiv:1511.06939 (2015).

[12]

Sarthak Jain and Byron C Wallace. 2019. Attention is not Explanation. In Proceedings of NAACL-HLT. 3543–3556.

[13]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 197–206.

[14]

Boris Knyazev, Graham W Taylor, and Mohamed Amer. 2019. Understanding attention and generalization in graph neural networks. Advances in neural information processing systems 32 (2019).

[15]

Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Maarten de Rijke, and Tao Mei. 2017. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1419–1428.

Digital Library

[16]

Linghui Li, Sheng Tang, Lixi Deng, Yongdong Zhang, and Qi Tian. 2017. Image caption with global-local attention. In Proceedings of the AAAI conference on artificial intelligence, Vol. 31.

[17]

Yang Li, Tong Chen, Peng-Fei Zhang, and Hongzhi Yin. 2021. Lightweight self-attentive sequential recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 967–977.

Digital Library

[18]

Qiang Liu, Shu Wu, Diyi Wang, Zhaokang Li, and Liang Wang. 2016. Context-aware sequential recommendation. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 1053–1058.

[19]

Larry R Medsker and LC Jain. 2001. Recurrent neural networks. Design and Applications 5 (2001), 64–67.

[20]

Michael J Pazzani and Daniel Billsus. 2007. Content-based recommendation systems. The adaptive web: methods and strategies of web personalization (2007), 325–341.

[21]

Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web. 811–820.

Digital Library

[22]

Ludger Rüschendorf. 1985. The Wasserstein distance and approximation theorems. Probability Theory and Related Fields 70, 1 (1985), 117–129.

[23]

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web. 285–295.

Digital Library

[24]

J Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen. 2007. Collaborative filtering recommender systems. The adaptive web: methods and strategies of web personalization (2007), 291–324.

[25]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenpeng Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, 1441–1450.

Digital Library

[26]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).

[27]

Shoujin Wang, Liang Hu, Longbing Cao, Xiaoshui Huang, Defu Lian, and Wei Liu. 2018. Attention-based transactional context embedding for next-item recommendation. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.

[28]

Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not explanation. arXiv preprint arXiv:1908.04626 (2019).

[29]

Haochao Ying, Fuzhen Zhuang, Fuzheng Zhang, Yanchi Liu, Guandong Xu, Xing Xie, Hui Xiong, and Jian Wu. 2018. Sequential recommender system based on hierarchical attention network. In IJCAI International Joint Conference on Artificial Intelligence.

[30]

Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 974–983.

Digital Library

[31]

Shuai Zhang, Yi Tay, Lina Yao, and Aixin Sun. 2018. Next item recommendation with self-attention. arXiv preprint arXiv:1808.06414 (2018).

[32]

Ruiqi Zhong, Steven Shao, and Kathleen McKeown. 2019. Fine-grained sentiment analysis with faithful attention. arXiv preprint arXiv:1908.06870 (2019).

[33]

Haoyi Zhou, Siyang Xiao, Shanghang Zhang, Jieqi Peng, Shuai Zhang, and Jianxin Li. 2022. Jump Self-attention: Capturing High-order Statistics in Transformers. Advances in Neural Information Processing Systems 35 (2022), 17899–17910.

Index Terms

Pay Attention to Attention for Sequential Recommendation
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank

Recommendations

Sequential Recommendation via Stochastic Self-Attention
WWW '22: Proceedings of the ACM Web Conference 2022

Sequential recommendation models the dynamics of a user’s previous behaviors in order to forecast the next item, and has drawn a lot of attention. Transformer-based approaches, which embed items as vectors and use dot-product self-attention to measure ...
Attention Mechanism Indicating Item Novelty for Sequential Recommendation
ASONAM '22: Proceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Most sequential recommendation systems, including those that employ a variety of features and state-of-the-art network models, tend to favor items that are the most popular or of greatest relevance to the historic behavior of the user. Recommendations ...
HSA: Hyperbolic Self-attention for Sequential Recommendation
Web and Big Data
Abstract
Recently, researchers apply various deep neural networks to the task of sequential recommendation, which captures dynamics of user preference from user behavior data to make accurate recommendation. Self-attention based approaches have been ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

RecSys '24: Proceedings of the 18th ACM Conference on Recommender Systems

October 2024

1438 pages

ISBN:9798400705052

DOI:10.1145/3640457

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

RecSys '24

Sponsor:

RecSys '24: 18th ACM Conference on Recommender Systems

October 14 - 18, 2024

Bari, Italy

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
216
Total Downloads

Downloads (Last 12 months)216
Downloads (Last 6 weeks)20

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten