research-article

Open access

LightGT: A Light Graph Transformer for Multimedia Recommendation

Authors:

Tat-Seng ChuaAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1508 - 1517

https://doi.org/10.1145/3539618.3591716

Published: 18 July 2023 Publication History

Abstract

Multimedia recommendation methods aim to discover the user preference on the multi-modal information to enhance the collaborative filtering (CF) based recommender system. Nevertheless, they seldom consider the impact of feature extraction on the user preference modeling and prediction of the user-item interaction, as the extracted features contain excessive information irrelevant to the recommendation.

To capture the informative features from the extracted ones, we resort to Transformer model to establish the correlation between the items historically interacted by the same user. Considering its challenges in effectiveness and efficiency, we propose a novel Transformer-based recommendation model, termed as Light Graph Transformer model (LightGT). Therein, we develop a modal-specific embedding and a layer-wise position encoder for the effective similarity measurement, and present a light self-attention block to improve the efficiency of self-attention scoring. Based on these designs, we can effectively and efficiently learn the user preference from the off-the-shelf items' features to predict the user-item interactions. Conducting extensive experiments on Movielens, Tiktok and Kwai datasets, we demonstrate that LigthGT significantly outperforms the state-of-the-art baselines with less time. Our code is publicly available at: https://github.com/Liuwq-bit/LightGT.

References

[1]

Sanjeev Arora, Yingyu Liang, and Tengyu Ma. 2017. A simple but tough-to-beat baseline for sentence embeddings. In International Conference on Learning Representations. 1--16.

[2]

Feiyu Chen, Junjie Wang, Yinwei Wei, Hai-Tao Zheng, and Jie Shao. 2022. Breaking Isolation: Multimodal Graph Fusion for Multimedia Recommendation by Edge-wise Modulation. In Proceedings of the 30th ACM International Conference on Multimedia. 385--394.

Digital Library

[3]

Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and Tat-Seng Chua. 2017. Attentive collaborative filtering: Multimedia recommendation with item-and component-level attention. In Proceedings of International ACM SIGIR conference on Research and Development in Information Retrieval. 335--344.

Digital Library

[4]

Qiwei Chen, Huan Zhao, Wei Li, Pipei Huang, and Wenwu Ou. 2019. Behavior sequence transformer for e-commerce recommendation in alibaba. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. 1--4.

Digital Library

[5]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[6]

Ziwei Fan, Zhiwei Liu, Jiawei Zhang, Yun Xiong, Lei Zheng, and Philip S Yu. 2021. Continuous-time sequential recommendation with temporal graph collaborative transformer. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 433--442.

Digital Library

[7]

Hao Fei, Shengqiong Wu, Jingye Li, Bobo Li, Fei Li, Libo Qin, Meishan Zhang, Min Zhang, and Tat-Seng Chua. 2022. LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model. In Proceedings of the Advances in Neural Information Processing Systems, NeurIPS 2022. 15460--15475.

[8]

Hao Fei, Meishan Zhang, and Donghong Ji. 2020. Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7014--7026.

[9]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and statistics. 249--256.

[10]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems, Vol. 30 (2017).

[11]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[12]

Ruining He and Julian McAuley. 2016. VBPR: Visual bayesian personalized ranking from implicit feedback. In Proceedings of the AAAI Conference on Artificial Intelligence. 144--150.

[13]

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 639--648.

Digital Library

[14]

Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et al. 2017. CNN architectures for large-scale audio classification. In IEEE International Conference on Acoustics, Speech and Signal Processing. 131--135.

Digital Library

[15]

Wei Ji, Long Chen, Yinwei Wei, Yiming Wu, and Tat-Seng Chua. 2022. MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding. arXiv preprint arXiv:2212.13163 (2022).

[16]

Wei Ji, Renjie Liang, Zhedong Zheng, Wenqiao Zhang, Shengyu Zhang, Juncheng Li, Mengze Li, and Tat-seng Chua. 2023. Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning. (2023).

[17]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations. 1--16.

[18]

Svea Klaus, Ria Van Hecke, Kaweh Djafari Naini, Ismail Sengor Altingovde, Juan Bernabé-Moreno, and Enrique Herrera-Viedma. 2022. Summarizing Legal Regulatory Documents using Transformers. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2426--2430.

Digital Library

[19]

Yi Li, Jieming Zhu, Weiwen Liu, Liangcai Su, Guohao Cai, Qi Zhang, Ruiming Tang, Xi Xiao, and Xiuqiang He. 2022. PEAR: Personalized Re-ranking with Contextualized Transformer for Recommendation. In Proceedings of the International Conference on World Wide Web.

Digital Library

[20]

Fan Liu, Huilin Chen, Zhiyong Cheng, Anan Liu, Liqiang Nie, and Mohan Kankanhalli. 2022. Disentangled Multimodal Representation Learning for Recommendation. IEEE Transactions on Multimedia (2022), 1--11.

Digital Library

[21]

Fan Liu, Zhiyong Cheng, Lei Zhu, Zan Gao, and Liqiang Nie. 2021a. Interest-Aware Message-Passing GCN for Recommendation. In Proceedings of the Web Conference 2021. Association for Computing Machinery, 1296--1305.

Digital Library

[22]

Shang Liu, Zhenzhong Chen, Hongyi Liu, and Xinghai Hu. 2019. User-video co-attention network for personalized micro-video recommendation. In The World Wide Web Conference. 3020--3026.

Digital Library

[23]

Yong Liu, Susen Yang, Chenyi Lei, Guoxin Wang, Haihong Tang, Juyong Zhang, Aixin Sun, and Chunyan Miao. 2021b. Pre-training graph transformer with multimodal side information for recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 2853--2861.

Digital Library

[24]

Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In Proceedings of the international conference on machine learning. 3--9.

[25]

Yanjun Qin, Yuchen Fang, Haiyong Luo, Fang Zhao, and Chenxing Wang. 2022. Next Point-of-Interest Recommendation with Auto-Correlation Enhanced Multi-Modal Transformer Network. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2612--2616.

Digital Library

[26]

Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of Conference on Uncertainty in Artificial Intelligence. 452--461.

[27]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management. 1441--1450.

Digital Library

[28]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[29]

Petar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations. 1--12.

[30]

Wenjie Wang, Fuli Feng, Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2021. Clicks can be cheating: Counterfactual recommendation for mitigating clickbait issue. In SIGIR. 1288--1297.

[31]

Wenjie Wang, Xinyu Lin, Fuli Feng, Xiangnan He, Min Lin, and Tat-Seng Chua. 2022. Causal Representation Learning for Out-of-Distribution Recommendation. In WWW. 3562--3571.

[32]

Xiang Wang, Xiangnan He, Meng Wang, Fuli Feng, and Tat-Seng Chua. 2019. Neural Graph Collaborative Filtering. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. 165--174.

Digital Library

[33]

Yinwei Wei, Xiang Wang, Qi Li, Liqiang Nie, Yan Li, Xuanping Li, and Tat-Seng Chua. 2021. Contrastive learning for cold-start recommendation. In Proceedings of the 29th ACM International Conference on Multimedia. 5382--5390.

Digital Library

[34]

Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, and Tat-Seng Chua. 2020. Graph-refined convolutional network for multimedia recommendation with implicit feedback. In Proceedings of the 28th ACM international conference on multimedia. 3541--3549.

Digital Library

[35]

Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM International Conference on Multimedia. 1437--1445.

Digital Library

[36]

Liwei Wu, Shuqing Li, Cho-Jui Hsieh, and James Sharpnack. 2020. SSE-PT: Sequential recommendation via personalized transformer. In Fourteenth ACM Conference on Recommender Systems. 328--337.

Digital Library

[37]

Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Bo Zhang, and Liefeng Bo. 2020. Multiplex behavioral relation learning for recommendation via memory augmented transformer network. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2397--2406.

Digital Library

[38]

Lianghao Xia, Chao Huang, Yong Xu, Peng Dai, Xiyue Zhang, Hongsheng Yang, Jian Pei, and Liefeng Bo. 2021. Knowledge-enhanced hierarchical graph transformer network for multi-behavior recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4486--4493.

[39]

Enming Yuan, Wei Guo, Zhicheng He, Huifeng Guo, Chengkai Liu, and Ruiming Tang. 2022. Multi-Behavior Sequential Transformer Recommender. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1642--1652.

Digital Library

[40]

Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, and Liang Wang. 2021. Mining Latent Structures for Multimedia Recommendation. In Proceedings of ACM International Conference on Multimedia. 3872--3880.

Digital Library

[41]

Jie Zou, Evangelos Kanoulas, Pengjie Ren, Zhaochun Ren, Aixin Sun, and Cheng Long. 2022. Improving conversational recommender systems via transformer-based sequential modelling. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2319--2324.

Digital Library

Cited By

Liu FLiu YChen HCheng ZNie LKankanhalli M(2025)Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language ModelsACM Transactions on Information Systems10.1145/370499943:2(1-26)Online publication date: 21-Jan-2025
https://dl.acm.org/doi/10.1145/3704999
Wu JPang CChen GWan JOuyang XZhao J(2025)Rethinking information fusion: Achieving adaptive information throughput and interaction pattern in graph convolutional networks for collaborative filteringInformation Fusion10.1016/j.inffus.2025.103050120(103050)Online publication date: Aug-2025
https://doi.org/10.1016/j.inffus.2025.103050
Li QYang QTian SYu L(2025)Self-supervised graph transformer networks for social recommendationComputers and Electrical Engineering10.1016/j.compeleceng.2025.110121123(110121)Online publication date: Apr-2025
https://doi.org/10.1016/j.compeleceng.2025.110121
Show More Cited By

Index Terms

LightGT: A Light Graph Transformer for Multimedia Recommendation
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems

Recommendations

MONET: Modality-Embracing Graph Convolutional Network and Target-Aware Attention for Multimedia Recommendation
WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining

In this paper, we focus on multimedia recommender systems using graph convolutional networks (GCNs) where the multimodal features as well as user-item interactions are employed together. Our study aims to exploit multimodal features more effectively in ...
Contrastive Learning for Cold-Start Recommendation
MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Recommending purely cold-start items is a long-standing and fundamental challenge in the recommender systems. Without any historical interaction on cold-start items, the collaborative filtering (CF) scheme fails to leverage collaborative signals to infer ...
Multi-View Graph Convolutional Network for Multimedia Recommendation
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Multimedia recommendation has received much attention in recent years. It models user preferences based on both behavior information and item multimodal information. Though current GCN-based methods achieve notable success, they suffer from two ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Shenzhen College Stability Support Plan
University Synergy Innovation Program of Anhui Province

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
3,281
Total Downloads

Downloads (Last 12 months)2,049
Downloads (Last 6 weeks)145

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu FLiu YChen HCheng ZNie LKankanhalli M(2025)Understanding Before Recommendation: Semantic Aspect-Aware Review Exploitation via Large Language ModelsACM Transactions on Information Systems10.1145/370499943:2(1-26)Online publication date: 21-Jan-2025
https://dl.acm.org/doi/10.1145/3704999
Wu JPang CChen GWan JOuyang XZhao J(2025)Rethinking information fusion: Achieving adaptive information throughput and interaction pattern in graph convolutional networks for collaborative filteringInformation Fusion10.1016/j.inffus.2025.103050120(103050)Online publication date: Aug-2025
https://doi.org/10.1016/j.inffus.2025.103050
Li QYang QTian SYu L(2025)Self-supervised graph transformer networks for social recommendationComputers and Electrical Engineering10.1016/j.compeleceng.2025.110121123(110121)Online publication date: Apr-2025
https://doi.org/10.1016/j.compeleceng.2025.110121
Zuo YZhang YZhang QZhang W(2025)Knowledge-aware recommendation based on hypergraph representation learning and transformer model optimizationApplied Intelligence10.1007/s10489-025-06257-z55:5Online publication date: 16-Jan-2025
https://doi.org/10.1007/s10489-025-06257-z
Li YYan SZhao FJiang YChen SWang LMa L(2024)MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for RecommendationsFuture Internet10.3390/fi1608027016:8(270)Online publication date: 29-Jul-2024
https://doi.org/10.3390/fi16080270
Qu SWang WZhou XZhan HLi ZQu LLuo LLi YHaffari G(2024)Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware DialoguesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3697838Online publication date: 4-Oct-2024
https://dl.acm.org/doi/10.1145/3697838
Liu FZhao SCheng ZNie LKankanhalli M(2024)Cluster-Based Graph Collaborative FilteringACM Transactions on Information Systems10.1145/368748142:6(1-24)Online publication date: 22-Oct-2024
https://dl.acm.org/doi/10.1145/3687481
Zhang JLiu GLiu QWu SWang LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Modality-Balanced Learning for Multimedia RecommendationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680626(7551-7560)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680626
Yi ZOunis I(2024)A Unified Graph Transformer for Overcoming Isolations in Multi-modal RecommendationProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688096(518-527)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688096
Liu QZhu JYang YDai QDu ZWu XZhao ZZhang RDong ZBaeza-Yates RBonchi F(2024)Multimodal Pretraining, Adaptation, and Generation for Recommendation: A SurveyProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671473(6566-6576)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671473
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten