short-paper

Open access

UserBERT: Pre-training User Model with Contrastive Self-supervision

Authors:

Yongfeng HuangAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2087 - 2092

https://doi.org/10.1145/3477495.3531810

Published: 07 July 2022 Publication History

Abstract

User modeling is critical for personalization. Existing methods usually train user models from task-specific labeled data, which may be insufficient. In fact, there are usually abundant unlabeled user behavior data that encode rich universal user information, and pre-training user models on them can empower user modeling in many downstream tasks. In this paper, we propose a user model pre-training method named UserBERT to learn universal user models on unlabeled user behavior data with two contrastive self-supervision tasks. The first one is masked behavior prediction and discrimination, aiming to model the contexts of user behaviors. The second one is behavior sequence matching, aiming to capture user interest stable in different periods. Besides, we propose a medium-hard negative sampling framework to select informative negative samples for better contrastive pre-training. Extensive experiments validate the effectiveness of UserBERT in user model pre-training.

Supplementary Material

MP4 File (userbert.mp4)

UserBERT: Pre-training User Model with Contrastive Self-supervision.

Download
54.96 MB

References

[1]

Mingxiao An, Fangzhao Wu, Heyuan Wang, Tao Di, Jianqiang Huang, and Xing Xie. 2019. Neural CTR Prediction for Native Ad. In CCL. Springer, 600--612.

[2]

Mingxiao An, Fangzhao Wu, Chuhan Wu, Kun Zhang, Zheng Liu, and Xing Xie. 2019. Neural News Recommendation with Long-and Short-term User Representations. In ACL. 336--345.

[3]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In ICML. PMLR, 1597--1607.

[4]

Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, Heyan Huang, and Ming Zhou. 2021. Infoxlm: An information-theoretic framework for cross-lingual language model pre-training. In NAACL-HLT.

[5]

Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In ICLR.

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.

[7]

Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pretraining for natural language understanding and generation. In NIPS. 13063-- 13075.

[8]

Jie Gu, Feng Wang, Qinghui Sun, Zhiquan Ye, Xiaoxiao Xu, Jingmin Chen, and Jun Zhang. 2021. Exploiting Behavioral Consistence for Universal User Representation. In AAAI, Vol. 35. 4063--4071.

[9]

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In CVPR. 9729-- 9738.

[10]

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In ICLR.

[11]

Yannis Kalantidis, Mert Bulent Sariyildiz, Noe Pion, Philippe Weinzaepfel, and Diane Larlus. 2020. Hard negative mixing for contrastive learning. arXiv preprint arXiv:2010.01028 (2020).

[12]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In ICDM. IEEE, 197--206.

[13]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In EMNLP. 1746--1751.

[14]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.

[15]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[16]

Shumpei Okura, Yukihiro Tagami, Shingo Ono, and Akira Tajima. 2017. Embedding-based news recommendation for millions of users. In KDD. 1933-- 1942.

[17]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).

[18]

Zhaopeng Qiu, Xian Wu, Jingyue Gao, and Wei Fan. 2021. U-BERT: Pre-training user representations for improved recommendation. In AAAI. 1--8.

[19]

Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang. 2019. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In CIKM. 1441--1450.

[20]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. 5998--6008.

Digital Library

[21]

Chuhan Wu, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. 2019. Neural News Recommendation with Attentive Multi-View Learning. In IJCAI. 3863--3869.

[22]

Chuhan Wu, Fangzhao Wu, Mingxiao An, Tao Qi, Jianqiang Huang, Yongfeng Huang, and Xing Xie. 2019. Neural news recommendation with heterogeneous user behavior. In EMNLP-IJCNLP. 4876--4885.

[23]

Chuhan Wu, Fangzhao Wu, Suyu Ge, Tao Qi, Yongfeng Huang, and Xing Xie. 2019. Neural News Recommendation with Multi-Head Self-Attention. In EMNLPIJCNLP. 6390--6395.

[24]

Chuhan Wu, Fangzhao Wu, Yongfeng Huang, and Xing Xie. 2021. User-as-Graph: User Modeling with Heterogeneous Graph Pooling for News Recommendation. In IJCAI. 1624--1630.

[25]

Chuhan Wu, Fangzhao Wu, Yongfeng Huang, and Xing Xie. 2022. Personalized news recommendation: Methods and Challenges. TOIS (2022).

[26]

Chuhan Wu, Fangzhao Wu, Junxin Liu, Shaojian He, Yongfeng Huang, and Xing Xie. 2019. Neural Demographic Prediction using Search Query. In WSDM. 654-- 662.

[27]

Chuhan Wu, Fangzhao Wu, Lingjuan Lyu, Yongfeng Huang, and Xing Xie. 2022. FedCTR: Federated Native Ad CTR Prediction with Cross Platform User Behavior Data. TIST (2022).

[28]

Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2020. User Modeling with Click Preference and Reading Satisfaction for News Recommendation. In IJCAI. 3023--3029.

[29]

Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2021. Empowering News Recommendation with Pre-trained Language Models. In SIGIR. ACM, 1652-- 1656.

[30]

Chuhan Wu, Fangzhao Wu, Tao Qi, Jianxun Lian, Yongfeng Huang, and Xing Xie. 2020. PTUM: Pre-training User Model from Unlabeled User Behaviors via Self-supervision. In EMNLP: Findings. 1939--1944.

[31]

Chuhan Wu, Fangzhao Wu, Xiting Wang, Yongfeng Huang, and Xing Xie. 2021. FairRec:Fairness-aware News Recommendation with Decomposed Adversarial Learning. In AAAI. 4462--4469.

[32]

Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, et al. 2020. Mind: A large-scale dataset for news recommendation. In ACL. 3597--3606.

[33]

Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2017. Joint detection and identification feature learning for person search. In CVPR. 3415--3424.

[34]

Xu Xie, Fei Sun, Zhaoyang Liu, Jinyang Gao, Bolin Ding, and Bin Cui. 2020. Contrastive Pre-training for Sequential Recommendation. arXiv preprint arXiv:2010.14395 (2020).

[35]

Lee Xiong, Chenyan Xiong, Ye Li, Kwok-Fung Tang, Jialin Liu, Paul Bennett, Junaid Ahmed, and Arnold Overwijk. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020).

[36]

Hong Xuan, Abby Stylianou, Xiaotong Liu, and Robert Pless. 2020. Hard negative examples are hard, but useful. In ECCV. Springer, 126--142.

[37]

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In NAACL-HLT. 1480--1489.

[38]

Fajie Yuan, Xiangnan He, Alexandros Karatzoglou, and Liguang Zhang. 2020. Parameter-efficient transfer from sequential behaviors for user modeling and recommendation. In SIGIR. 1469--1478.

[39]

Qi Zhang, Jingjie Li, Qinglin Jia, Chuyuan Wang, Jieming Zhu, Zhaowei Wang, and Xiuqiang He. 2021. UNBERT: User-News Matching BERT for News Recommendation. In IJCAI. 3356--3362.

[40]

Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In AAAI, Vol. 33. 5941--5948.

Digital Library

Cited By

Xin HSun YWang CXiong H(2025)LLMCDSR: Enhancing Cross-Domain Sequential Recommendation with Large Language ModelsACM Transactions on Information Systems10.1145/3715099Online publication date: 28-Jan-2025
https://doi.org/10.1145/3715099
Huang SXiong YXie YQiu TWang GSerra ESpezzano F(2024)Robust Sequence-Based Self-Supervised Representation Learning for Anti-Money LaunderingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680078(4571-4578)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680078
Liu QHu HWu JZhu JKan MWu XChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Discrete Semantic Tokenization for Deep CTR PredictionCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651558(919-922)Online publication date: 13-May-2024
https://doi.org/10.1145/3589335.3651558
Show More Cited By

Index Terms

UserBERT: Pre-training User Model with Contrastive Self-supervision
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
  2. World Wide Web
    1. Web searching and information discovery
      1. Personalization

Recommendations

Contrastive encoder pre-training-based clustered federated learning for heterogeneous data
Abstract
Federated learning (FL) is a promising approach that enables distributed clients to collaboratively train a global model while preserving their data privacy. However, FL often suffers from data heterogeneity problems, which can significantly ...
MimCo: Masked Image Modeling Pre-training with Contrastive Teacher
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Recent masked image modeling (MIM) has received much attention in self-supervised learning (SSL), which requires the target model to recover the masked part of the input image. Although MIM-based pre-training methods achieve new state-of-the-art ...
Interpretive self-supervised pre-training: boosting performance on visual medical data
ICVGIP '21: Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing

Self-supervised learning algorithms have become one of the best tools for unsupervised representation learning. Although self- supervised algorithms have achieved state-of-the-art performance for classification tasks in the case of natural image data, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Research Initiation Project of Zhejiang Lab
National Natural Science Foundation of China

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
1,426
Total Downloads

Downloads (Last 12 months)361
Downloads (Last 6 weeks)32

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xin HSun YWang CXiong H(2025)LLMCDSR: Enhancing Cross-Domain Sequential Recommendation with Large Language ModelsACM Transactions on Information Systems10.1145/3715099Online publication date: 28-Jan-2025
https://doi.org/10.1145/3715099
Huang SXiong YXie YQiu TWang GSerra ESpezzano F(2024)Robust Sequence-Based Self-Supervised Representation Learning for Anti-Money LaunderingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680078(4571-4578)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680078
Liu QHu HWu JZhu JKan MWu XChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Discrete Semantic Tokenization for Deep CTR PredictionCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651558(919-922)Online publication date: 13-May-2024
https://doi.org/10.1145/3589335.3651558
Zhu JZhou XWu CZhang RDong ZChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Multimodal Pretraining and Generation for Recommendation: A TutorialCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3641248(1272-1275)Online publication date: 13-May-2024
https://doi.org/10.1145/3589335.3641248
Liang FXi WXing XWan WWang CChen MGuizani M(2024)Contrastive Learning for Adapting Language Model to Sequential Recommendation2024 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM59182.2024.00032(251-260)Online publication date: 9-Dec-2024
https://doi.org/10.1109/ICDM59182.2024.00032
Wu LZheng ZQiu ZWang HGu HShen TQin CZhu CZhu HLiu QXiong HChen E(2024)A survey on large language models for recommendationWorld Wide Web10.1007/s11280-024-01291-227:5Online publication date: 22-Aug-2024
https://doi.org/10.1007/s11280-024-01291-2
Fu CWu WZhang XHu JWang JZhou JFrommholz IHopfgartner FLee MOakes MLalmas MZhang MSantos R(2023)Robust User Behavioral Sequence Representation via Multi-scale Stochastic Distribution PredictionProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614714(4567-4573)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3583780.3614714
Yun JKwak WKim JSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Multi Datasource LTV User Representation (MDLUR)Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599871(5500-5508)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599871
Yang BGu JLiu KXu XXu RSun QLiu HSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Empowering General-purpose User Representation with Full-life Cycle Behavior ModelingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599331(2908-2917)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599331

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten