skip to main content
10.1145/3539618.3591716acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Open access

LightGT: A Light Graph Transformer for Multimedia Recommendation

Published: 18 July 2023 Publication History

Abstract

Multimedia recommendation methods aim to discover the user preference on the multi-modal information to enhance the collaborative filtering (CF) based recommender system. Nevertheless, they seldom consider the impact of feature extraction on the user preference modeling and prediction of the user-item interaction, as the extracted features contain excessive information irrelevant to the recommendation.
To capture the informative features from the extracted ones, we resort to Transformer model to establish the correlation between the items historically interacted by the same user. Considering its challenges in effectiveness and efficiency, we propose a novel Transformer-based recommendation model, termed as Light Graph Transformer model (LightGT). Therein, we develop a modal-specific embedding and a layer-wise position encoder for the effective similarity measurement, and present a light self-attention block to improve the efficiency of self-attention scoring. Based on these designs, we can effectively and efficiently learn the user preference from the off-the-shelf items' features to predict the user-item interactions. Conducting extensive experiments on Movielens, Tiktok and Kwai datasets, we demonstrate that LigthGT significantly outperforms the state-of-the-art baselines with less time. Our code is publicly available at: https://github.com/Liuwq-bit/LightGT.

References

[1]
Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In International Conference on Learning Representations. 1--16.
[2]
Feiyu Chen, Junjie Wang, Yinwei Wei, Hai-Tao Zheng, and Jie Shao. 2022. Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation. In Proceedings of the 30th ACM International Conference on Multimedia. 385--394.
[3]
Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings of International ACM SIGIR conference on Research and Development in Information Retrieval. 335--344.
[4]
Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior sequence transformer for e-commerce recommendation in alibaba. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. 1--4.
[5]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[6]
Ziwei Fan, Zhiwei Liu, Jiawei Zhang, Yun Xiong, Lei Zheng, and Philip S Yu. 2021. Continuous-time sequential recommendation with temporal graph collaborative transformer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 433--442.
[7]
Hao Fei, Shengqiong Wu, Jingye Li, Bobo Li, Fei Li, Libo Qin, Meishan Zhang, Min Zhang, and Tat-Seng Chua. 2022. LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model. In Proceedings of the Advances in Neural Information Processing Systems, NeurIPS 2022. 15460--15475.
[8]
Hao Fei, Meishan Zhang, and Donghong Ji. 2020. Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7014--7026.
[9]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and statistics. 249--256.
[10]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems, Vol. 30 (2017).
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[12]
Ruining He and Julian McAuley. 2016. VBPR: Visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI Conference on Artificial Intelligence. 144--150.
[13]
Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 639--648.
[14]
Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et al. 2017. CNN architectures for large-scale audio classification. In IEEE International Conference on Acoustics, Speech and Signal Processing. 131--135.
[15]
Wei Ji, Long Chen, Yinwei Wei, Yiming Wu, and Tat-Seng Chua. 2022. MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding. arXiv preprint arXiv:2212.13163 (2022).
[16]
Wei Ji, Renjie Liang, Zhedong Zheng, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Mengze Li, and Tat-seng Chua. 2023. Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning. (2023).
[17]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations. 1--16.
[18]
Svea Klaus, Ria Van Hecke, Kaweh Djafari Naini, Ismail Sengor Altingovde, Juan Bernabé-Moreno, and Enrique Herrera-Viedma. 2022. Summarizing Legal Regulatory Documents using Transformers. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2426--2430.
[19]
Yi Li, Jieming Zhu, Weiwen Liu, Liangcai Su, Guohao Cai, Qi Zhang, Ruiming Tang, Xi Xiao, and Xiuqiang He. 2022. PEAR: Personalized Re-ranking with Contextualized Transformer for Recommendation. In Proceedings of the International Conference on World Wide Web.
[20]
Fan Liu, Huilin Chen, Zhiyong Cheng, Anan Liu, Liqiang Nie, and Mohan Kankanhalli. 2022. Disentangled Multimodal Representation Learning for Recommendation. IEEE Transactions on Multimedia (2022), 1--11.
[21]
Fan Liu, Zhiyong Cheng, Lei Zhu, Zan Gao, and Liqiang Nie. 2021a. Interest-Aware Message-Passing GCN for Recommendation. In Proceedings of the Web Conference 2021. Association for Computing Machinery, 1296--1305.
[22]
Shang Liu, Zhenzhong Chen, Hongyi Liu, and Xinghai Hu. 2019. User-video co-attention network for personalized micro-video recommendation. In The World Wide Web Conference. 3020--3026.
[23]
Yong Liu, Susen Yang, Chenyi Lei, Guoxin Wang, Haihong Tang, Juyong Zhang, Aixin Sun, and Chunyan Miao. 2021b. Pre-training graph transformer with multimodal side information for recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 2853--2861.
[24]
Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the international conference on machine learning. 3--9.
[25]
Yanjun Qin, Yuchen Fang, Haiyong Luo, Fang Zhao, and Chenxing Wang. 2022. Next Point-of-Interest Recommendation with Auto-Correlation Enhanced Multi-Modal Transformer Network. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2612--2616.
[26]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of Conference on Uncertainty in Artificial Intelligence. 452--461.
[27]
Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441--1450.
[28]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).
[29]
Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations. 1--12.
[30]
Wenjie Wang, Fuli Feng, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2021. Clicks can be cheating: Counterfactual recommendation for mitigating clickbait issue. In SIGIR. 1288--1297.
[31]
Wenjie Wang, Xinyu Lin, Fuli Feng, Xiangnan He, Min Lin, and Tat-Seng Chua. 2022. Causal Representation Learning for Out-of-Distribution Recommendation. In WWW. 3562--3571.
[32]
Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. 165--174.
[33]
Yinwei Wei, Xiang Wang, Qi Li, Liqiang Nie, Yan Li, Xuanping Li, and Tat-Seng Chua. 2021. Contrastive learning for cold-start recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 5382--5390.
[34]
Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat-Seng Chua. 2020. Graph-refined convolutional network for multimedia recommendation with implicit feedback. In Proceedings of the 28th ACM international conference on multimedia. 3541--3549.
[35]
Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM International Conference on Multimedia. 1437--1445.
[36]
Liwei Wu, Shuqing Li, Cho-Jui Hsieh, and James Sharpnack. 2020. SSE-PT: Sequential recommendation via personalized transformer. In Fourteenth ACM Conference on Recommender Systems. 328--337.
[37]
Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Bo Zhang, and Liefeng Bo. 2020. Multiplex behavioral relation learning for recommendation via memory augmented transformer network. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2397--2406.
[38]
Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Xiyue Zhang, Hongsheng Yang, Jian Pei, and Liefeng Bo. 2021. Knowledge-enhanced hierarchical graph transformer network for multi-behavior recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4486--4493.
[39]
Enming Yuan, Wei Guo, Zhicheng He, Huifeng Guo, Chengkai Liu, and Ruiming Tang. 2022. Multi-Behavior Sequential Transformer Recommender. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1642--1652.
[40]
Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021. Mining Latent Structures for Multimedia Recommendation. In Proceedings of ACM International Conference on Multimedia. 3872--3880.
[41]
Jie Zou, Evangelos Kanoulas, Pengjie Ren, Zhaochun Ren, Aixin Sun, and Cheng Long. 2022. Improving conversational recommender systems via transformer-based sequential modelling. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2319--2324.

Cited By

View all
  • (2025)Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language ModelsACM Transactions on Information Systems10.1145/370499943:2(1-26)Online publication date: 21-Jan-2025
  • (2025)Rethinking information fusion: Achieving adaptive information throughput and interaction pattern in graph convolutional networks for collaborative filteringInformation Fusion10.1016/j.inffus.2025.103050120(103050)Online publication date: Aug-2025
  • (2025)Self-supervised graph transformer networks for social recommendationComputers and Electrical Engineering10.1016/j.compeleceng.2025.110121123(110121)Online publication date: Apr-2025
  • Show More Cited By

Index Terms

  1. LightGT: A Light Graph Transformer for Multimedia Recommendation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2023
    3567 pages
    ISBN:9781450394086
    DOI:10.1145/3539618
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph convolutional network
    2. multimedia recommendation
    3. recommender system
    4. transformer

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGIR '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2,049
    • Downloads (Last 6 weeks)145
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language ModelsACM Transactions on Information Systems10.1145/370499943:2(1-26)Online publication date: 21-Jan-2025
    • (2025)Rethinking information fusion: Achieving adaptive information throughput and interaction pattern in graph convolutional networks for collaborative filteringInformation Fusion10.1016/j.inffus.2025.103050120(103050)Online publication date: Aug-2025
    • (2025)Self-supervised graph transformer networks for social recommendationComputers and Electrical Engineering10.1016/j.compeleceng.2025.110121123(110121)Online publication date: Apr-2025
    • (2025)Knowledge-aware recommendation based on hypergraph representation learning and transformer model optimizationApplied Intelligence10.1007/s10489-025-06257-z55:5Online publication date: 16-Jan-2025
    • (2024)MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for RecommendationsFuture Internet10.3390/fi1608027016:8(270)Online publication date: 29-Jul-2024
    • (2024)Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware DialoguesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3697838Online publication date: 4-Oct-2024
    • (2024)Cluster-Based Graph Collaborative FilteringACM Transactions on Information Systems10.1145/368748142:6(1-24)Online publication date: 22-Oct-2024
    • (2024)Modality-Balanced Learning for Multimedia RecommendationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680626(7551-7560)Online publication date: 28-Oct-2024
    • (2024)A Unified Graph Transformer for Overcoming Isolations in Multi-modal RecommendationProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688096(518-527)Online publication date: 8-Oct-2024
    • (2024)Multimodal Pretraining, Adaptation, and Generation for Recommendation: A SurveyProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671473(6566-6576)Online publication date: 25-Aug-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media