Skip to main content
Log in

Leveraging mixed distribution of multi-head attention for sequential recommendation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Attention mechanism has been proven to be a useful model for sequence recommendation. Traditional multi-head self-attention architecture can exploit the entire user sequence and adaptively consider consumed items for the next item recommendation. However, the scaling between the number of heads and the size of each head in the multi-head attention model gives rise to a low-rank bottleneck problem, resulting in insufficient expression power and hurting the performance of recommendation model. In this paper, we propose a variant of self-attention called mixed distribution of multi-head attention for sequence recommendation (MMSRec), which constructs the mixed distribution model by weighted averaging of multiple simple distributions, instead of currently dominant methods by increasing the embedding size for addressing the low-rank bottleneck. Extensive experiments on four real-world datasets show that our MMSRec algorithm has significant improvements over state-of-the-art algorithms. Empirical evidence shows that the performance of our recommendation model can be effectively improved by stacking multiple low-rank distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://jmcauley.ucsd.edu/data/amazon/links.html

  2. https://grouplens.org/datasets/movielens/1m/

  3. https://www.yelp.com/dataset/challenge

  4. https://github.com/haomiaocqut/ReSys_MMSRec

References

  1. Guan X, Cheng Z, He X, et al. (2019) Attentive aspect modeling for review-aware recommendation[J]. ACM Trans Inf Syst 37(3):1–27

    Article  Google Scholar 

  2. Pujahari A, Sisodia DS (2021) Preference relation based collaborative filtering with graph aggregation for group recommender system[J]. Appl Intell 51(2):658–672

    Article  Google Scholar 

  3. Wang D, Xu D, Yu D, et al. (2021) Time-aware sequence model for next-item recommendation[J]. Appl Intell 51(2):906–920

    Article  Google Scholar 

  4. Li G, Qiu L, Yu C, et al. (2020) IPTV Channel zapping recommendation with attention mechanism[J]. IEEE Trans Multimed 23:538–549

    Article  Google Scholar 

  5. Xu C, Feng J, Zhao P, et al. (2021) Long-and short-term self-attention network for sequential recommendation[J]. Neurocomputing 423:580–589

    Article  Google Scholar 

  6. Tang J, Wang K (2018) Personalized top-n sequential recommendation via convolutional sequence embedding[C]. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining, pp 565–573

  7. Li J, Wang Y, McAuley J (2020) Time interval aware self-attention for sequential recommendation[C]. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp 322–330

  8. Zhang Y, Shi Z, Zuo W, et al. (2020) Joint Personalized Markov Chains with social network embedding for cold-start recommendation[J]. Neurocomputing 386:208–220

    Article  Google Scholar 

  9. Donkers T, Loepp B, Ziegler J (2017) Sequential user-based recurrent neural network recommendations[C]. In: Proceedings of the 11th ACM Conference on Recommender Systems, pp 152–160

  10. Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need[C]. In: Advances in Neural Information Processing Systems, pp 5998–6008

  11. Zhang T, Zhao P, Liu Y, et al. (2019) Feature-level deeper self-attention network for sequential recommendation[C]. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp 4320–4326

  12. Kang W C, McAuley J. (2018) Self-attentive sequential recommendation[C]. In: Proceedings of the 2018 IEEE International Conference on Data Mining, pp 197–206

  13. Zhang S, Tay Y, Yao L, et al. (2019) Next item recommendation with self-attentive metric learning[C]. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence, pp 9

  14. Wu L, Li S, Hsieh C J, et al. (2020) SSE-PT: Sequential Recommendation via personalized transformer[C]. In: Proceedings of the 14th ACM Conference on Recommender Systems, pp 328–337

  15. Bhojanapalli S, Yun C, Rawat A S, et al. (2020) Low-rank bottleneck in multi-head attention models[C]. In: Proceedings of the 37th International Conference on Machine Learning, pp 864–873

  16. Wang J, Zhu L, Dai T, et al. (2021) Low-rank and sparse matrix factorization with prior relations for recommender systems[J]. Appl Intell 51(6):3435–3449

    Article  Google Scholar 

  17. Zhang S, Yao L, Sun A, et al. (2019) Deep learning based recommender system: A survey and new perspectives[J]. ACM Comput Surv 52(1):1–38

    Article  Google Scholar 

  18. He X, Liao L, Zhang H, et al. (2017) Neural collaborative filtering[C]. In: Proceedings of the 26th International Conference on World Wide Web, pp 173–182

  19. Nassar N, Jafar A, Rahhal Y (2020) A novel deep multi-criteria collaborative filtering model for recommendation system[J]. Knowl-Based Syst 187:104811

    Article  Google Scholar 

  20. Wu C Y, Ahmed A, Beutel A, et al. (2017) Recurrent recommender networks[C]. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining, pp 495–503

  21. Chen X, Xu H, Zhang Y, et al. (2018) Sequential recommendation with user memory networks[C]. In: Proceedings of the 11th ACM International Conference on Web Search and Data Mining, pp 108–116

  22. Gehring J, Auli M, Grangier D, et al. (2017) Convolutional sequence to sequence learning[C]. In: Proceedings of the 34th International Conference on Machine Learning, vol 70, pp 1243–1252

  23. Wu C, Wu F, Ge S, et al. (2019) Neural news recommendation with multi-head self-attention[C]. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp 6390–6395

  24. Chen J, Wang C, Shi Q, et al. (2019) Social recommendation based on users’ attention and preference. Neurocomputing 341(5):1–9

    Google Scholar 

  25. Lei K, Fu Q, Yang M, et al. (2020) Tag recommendation by text classification with attention-based capsule network. Neurocomputing 391(5):65–73

    Article  Google Scholar 

  26. Zhang Y, Liu X (2021) Learning attention embeddings based on memory networks for neural collaborative recommendation[J]. Expert Systems with Applications, pp 115439

  27. Kovaleva O, Romanov A, Rogers A, et al. (2019) Revealing the dark secrets of BERT[c]. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp 4365–4374

  28. Kingma D P, Ba J. (2015) Adam A method for stochastic optimization[C]. In: Proceedings of the 3rd International Conference on Learning Representations, pp 1–15

  29. He R, McAuley J (2016) Ups and Downs Modeling the visual evolution of fashion trends with one-class collaborative filtering[C]. In: Proceedings of the 25th International Conference on World Wide Web, pp 507–517

  30. Sarwar B M, Karypis G, Konstan J A, et al. (2001) Item-based collaborative filtering recommendation algorithms[C]. In: Proceedings of the 10th International World Wide Web Conference, pp 285–295

  31. Ning X, Karypis G (2011) Slim: Sparse linear methods for top-n recommender systems[C]. In: Proceedings of the 11th International Conference on Data Mining, pp 497–506

  32. Cheng Z, Ding Y, He X, et al. (2018) A3NCF: an adaptive aspect attention model for rating prediction[C]. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp 3748–3754

Download references

Acknowledgements

The work is supported by the Natural Science Foundation of Chongqing (No.cstc2019jcyj-msxmX0544), the Science and Technology Research Program of Chongqing Municipal Education Commission (No.KJZD-K202101105, KJQN202001136), the National Natural Science Foundation of China (No.61702063).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yihao Zhang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Grants No.cstc2019jcyj-msxmX0544, No.KJZD-K202101105, KJQN202001136, No.61702063

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Liu, X. Leveraging mixed distribution of multi-head attention for sequential recommendation. Appl Intell 53, 454–469 (2023). https://doi.org/10.1007/s10489-022-03520-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03520-5

Keywords

Navigation