Parameters Efficient Fine-Tuning for Long-Tailed Sequential Recommendation

Lv, Zheqi; Wang, Feng; Zhang, Shengyu; Zhang, Wenqiao; Kuang, Kun; Wu, Fei

doi:10.1007/978-981-99-8850-1_36

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14473))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

177 Accesses

Abstract

In an era of information explosion, recommendation systems play an important role in people’s daily life by facilitating content exploration. It is known that user activeness, i.e., number of behaviors, tends to follow a long-tail distribution, where the majority of users are with low activeness. In practice, we observe that tail users suffer from significantly lower-quality recommendation than the head users after joint training. We further identify that a model trained on tail users separately still achieve inferior results due to limited data. Though long-tail distributions are ubiquitous in recommendation systems, improving the recommendation performance on the tail users still remains challenge in both research and industry. Directly applying related methods on long-tail distribution might be at risk of hurting the experience of head users, which is less affordable since a small portion of head users with high activeness contribute a considerate portion of platform revenue. In this paper, we propose a novel approach that significantly improves the recommendation performance of the tail users while achieving at least comparable performance for the head users over the base model. The essence of this approach is a novel Gradient Aggregation technique that learns common knowledge shared by all users into a backbone model, followed by separate plugin prediction networks for the head users and the tail users personalization. As for common knowledge learning, we leverage the backward adjustment from the causality theory for deconfounding the gradient estimation and thus shielding off the backbone training from the confounder, i.e., user activeness. We conduct extensive experiments on two public recommendation benchmark datasets and a large-scale industrial datasets collected from the Alipay platform. Empirical studies validate the rationality and effectiveness of our approach.

Z. Lv and F. Wang—These authors equally contributed to this study.

This work was supported in part by National Natural Science Foundation of China (62006207, 62037001, U20A20387), Young Elite Scientists Sponsorship Program by CAST (2021QNRC001), Zhejiang Province Natural Science Foundation (LQ21F020020), Project by Shanghai AI Laboratory (P22KS00111), Program of Zhejiang Province Science and Technology (2022C01044), the StarryNight Science Fund of Zhejiang University Shanghai Institute for Advanced Study (SN-ZJU-SIAS-0010), and the Fundamental Research Funds for the Central Universities (226-2022-00142, 226-2022-00051).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Chen, Z., Badrinarayanan, V., Lee, C.Y., Rabinovich, A.: Gradnorm: gradient normalization for adaptive loss balancing in deep multitask networks. In: International Conference on Machine Learning, pp. 794–803. PMLR (2018)
Google Scholar
Dong, M., Yuan, F., Yao, L., Xu, X., Zhu, L.: Mamo: memory-augmented meta-optimization for cold-start recommendation. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 688–697 (2020)
Google Scholar
Glymour, M., Pearl, J., Jewell, N.P.: Causal Inference in Statistics: A Primer. Wiley, Hoboken (2016)
Google Scholar
Hidasi, B., Karatzoglou, A., Baltrunas, L., Tikk, D.: Session-based recommendations with recurrent neural networks. In: International Conference on Learning Representations 2016 (2016)
Google Scholar
Huang, C., Li, Y., Loy, C.C., Tang, X.: Learning deep representation for imbalanced classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5375–5384 (2016)
Google Scholar
Huang, R., et al.: Audiogpt: understanding and generating speech, music, sound, and talking head. arXiv preprint arXiv:2304.12995 (2023)
Kang, W.C., McAuley, J.: Self-attentive sequential recommendation. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 197–206. IEEE (2018)
Google Scholar
Krichene, W., Rendle, S.: On sampled metrics for item recommendation. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1748–1757 (2020)
Google Scholar
Lee, H., Im, J., Jang, S., Cho, H., Chung, S.: MeLU: meta-learned user preference estimator for cold-start recommendation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1073–1082 (2019)
Google Scholar
Li, M., et al.: Winner: weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23090–23099 (2023)
Google Scholar
Li, M., et al.: End-to-end modeling via information tree for one-shot natural language spatial video grounding. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8707–8717 (2022)
Google Scholar
Lv, Z., et al.: Ideal: toward high-efficiency device-cloud collaborative and dynamic recommendation system. arXiv preprint arXiv:2302.07335 (2023)
Lv, Z., et al.: Duet: a tuning-free device-cloud collaborative parameters generation framework for efficient device model generalization. In: Proceedings of the ACM Web Conference 2023 (2023)
Google Scholar
Mansilla, L., Echeveste, R., Milone, D.H., Ferrante, E.: Domain generalization via gradient surgery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6630–6638 (2021)
Google Scholar
McAuley, J.J., Targett, C., Shi, Q., Hengel, A.V.D.: Image-based recommendations on styles and substitutes. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015 (2015)
Google Scholar
Neuberg, L.G.: Causality: models, reasoning, and inference, by Judea Pearl, Cambridge University Press, 2000. Econom. Theory 19(4), 675–685 (2003)
Google Scholar
Ouyang, W., Wang, X., Zhang, C., Yang, X.: Factors in finetuning deep model for object detection with long-tail distribution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 864–873 (2016)
Google Scholar
Pan, F., Li, S., Ao, X., Tang, P., He, Q.: Warm up cold-start advertisements: improving CTR predictions via learning to learn id embeddings. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 695–704 (2019)
Google Scholar
Pearl, J.: Causal diagrams for empirical research. Biometrika 82(4), 669–688 (1995)
Article MathSciNet Google Scholar
Pearl, J.: Causality. Cambridge University Press, Cambridge (2009)
Google Scholar
Tong, Y., et al.: Quantitatively measuring and contrastively exploring heterogeneity for domain generalization. In: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2023)
Google Scholar
Wang, Y.X., Ramanan, D., Hebert, M.: Learning to model the tail. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 7032–7042 (2017)
Google Scholar
Wang, Z., Tsvetkov, Y., Firat, O., Cao, Y.: Gradient vaccine: investigating and improving multi-task optimization in massively multilingual models. arXiv preprint arXiv:2010.05874 (2020)
Yin, J., Liu, C., Wang, W., Sun, J., Hoi, S.C.: Learning transferrable parameters for long-tailed sequential user behavior modeling. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 359–367 (2020)
Google Scholar
Yu, T., Kumar, S., Gupta, A., Levine, S., Hausman, K., Finn, C.: Gradient surgery for multi-task learning. arXiv preprint arXiv:2001.06782 (2020)
Zhang, S., Yao, D., Zhao, Z., Chua, T., Wu, F.: Causerec: counterfactual user sequence synthesis for sequential recommendation. In: SIGIR 2021: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 11–15 July 2021, pp. 367–377. ACM (2021)
Google Scholar
Zhang, Y., Kang, B., Hooi, B., Yan, S., Feng, J.: Deep long-tailed learning: a survey. arXiv preprint arXiv:2110.04596 (2021)
Zhang, Y., et al.: Online adaptive asymmetric active learning for budgeted imbalanced data. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2768–2777 (2018)
Google Scholar
Zhang, Z., Pfister, T.: Learning fast sample re-weighting without reward data. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 725–734 (2021)
Google Scholar
Zhou, G., et al.: Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1059–1068 (2018)
Google Scholar
Zhu, D., et al.: Bridging the gap: neural collapse inspired prompt tuning for generalization under class imbalance. arXiv preprint arXiv:2306.15955 (2023)

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hang Zhou, China
Zheqi Lv, Kun Kuang & Fei Wu
School of Software Technology, Zhejiang University, Hang Zhou, China
Feng Wang
DAMO Academy, Alibaba Group, Hang Zhou, China
Shengyu Zhang & Wenqiao Zhang

Authors

Zheqi Lv
View author publications
You can also search for this author in PubMed Google Scholar
Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shengyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenqiao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kun Kuang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shengyu Zhang , Kun Kuang or Fei Wu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Jian Pei
Shanghai Jiao Tong Univeristy, Shanghai, China
Guangtao Zhai
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Appendices

A Pseudo Code

The pseudo code of our proposed method is summarized in Algorithm 1.

B Experiments

1.1 B.1 Experiments Settings

Datasets

Movielens^{Footnote 1}. MovieLens is a widely used public benchmark on movie ratings. In our experiments, we use movielens-1M which contains one million samples.

Amazon^{Footnote 2}. Amazon Review dataset [15] is a widely-known recommendation benchmark. We use the Amazon-Books dataset for evaluation.

Alipay. We collect a larger-scale industrial dataset for online evaluation from the AliPay platform^{Footnote 3}. Applets such as mobile recharge service are treated as items. For each user, clicked applets are treated as positives and other applets exposed to the user are negatives.

The detailed statistics of these datasets are summarized in Table 3.

Table 3. Statistics of the evaluation datasets.

Full size table

Evaluation Metrics

$$\begin{aligned} \begin{aligned} & \textrm{AUC}=\frac{\sum _{x_0\in \mathcal {D}_T} \sum _{x_1 \in \mathcal {D}_F}\mathbbm {1}[f(x_1)<f(x_0)]}{|\mathcal {D}_T||\mathcal {D}_F|}, \\ & \text {HitRate}@K = \frac{1}{|\mathcal {U}|}\sum _{u\in \mathcal {U}} \mathbbm {1}(R_{u,g_u}\le K), \end{aligned} \end{aligned}$$

(14)

where $\mathbbm {1}(\cdot )$ is the indicator function, f is the model to be evaluated, $R_{u,g_u}$ is the rank predicted by the model for the ground truth item $g_u$ and user u, and $\mathcal {D}_T$, $\mathcal {D}_F$ is the positive and negative testing sample set respectively.

Baselines

GRU4Rec [4] is one of the early works that introduce recurrent neural networks to model user behavior sequences in recommendation.

DIN [30] introduces a target-attention mechanism for historically interacted items aggregation for click-through-rate prediction.

SASRec [7] is a representative sequential modeling method based on self-attention mechanisms. It simultaneously predicts multiple next-items by masking the backward connections in the attention map.

To evaluate the effectiveness on tail user modeling, the following competing methods are introduced for comparison.

Agr-Rand [14] introduced a gradient surgery strategy to solve the domain generalization problem by coordinating inter-domain gradients to update neural weights in common consistent directions to create a more robust image classifier.

PCGrad [25] is a very classic gradient surgery model that mitigates the negative cosine similarity problem by projecting the gradients of one task onto the normal components of the gradients of the other task by removing the disturbing components to mitigate gradient conflicts.

Grad-Transfer [24] adjusts the weight of each user during training through resample and gradient alignment, and adopts an adversarial learning method to avoid the model from using the sensitive information of user activity group in prediction to solve the long-tail problem.

Implementation Details

Preprocessing. On the Alipay dataset, the dates of all samples in the dataset are from 2021-5-19 to 2021-7-10. In order to simulate the real a/b testing environment, we use the date to divide the dataset. We take the data before 0:00 AM in 2021-7-1 as the training set, and vice versa as the test set. On Movielens and Amazon datasets, we treat the labels of all user-item pairs in the dataset as 1, and the labels of user-item pairs that have not appeared as 0. We take the user’s last sample as the test set. On Movielens, we use positive samples in the training set: the ratio of negative samples = 1:4 to sample negative samples. In the test set, we refer to [8], so we use all negative samples of a user as the test set. In Amazon’s training set, we sample negative samples with the ratio of positive samples: negative samples = 1:4, and this ratio becomes 1:99 in the test set. We also filter out all users and items in Amazon with less than 15 clicks to reduce the dataset. On Alipay and Amazon datasets, we group by the number of samples of users. On the Movielens dataset, we group by the length of the user’s click sequence

Implementation. In terms of hardware, our models are trained on workstations equipped with NVidia Tesla V100 GPUs. For all datasets and all models, the batch size is set to 512. The loss function is optimized by the Adam optimizer with a learning rate of 0.001 for the gradient aggregation learning stage and 0.0001 for the plugin model learning stage. The training is stopped when the loss converges on the validation set.

1.2 B.2 Results

According to Fig. 4, the larger the group number, the more active the user is, that is, the first group is the least active user group, and the fifth group is the most active user group. Training the plugin network for relatively inactive groups of users requires only a small number of epochs to be optimal (such as groups 1 and 2). The training curve of the user group with relatively high activity level has a more stable upward trend with the increase of epoch (such as group 3, group 4 and group 5). This is mainly due to the difference in the amount of personalized information in user groups with different levels of activity. For a user group with a large amount of data, more personalized information is required, and more epochs are needed to learn the personalized information. Otherwise, only a few epochs are required.

Toy Example. In Fig. 5, we give a toy example of long-tail effect of a real case in Alipay platform and show the improvement brought by our propose method. In this case, there are two groups of women, one is young women with high activity and the other is middle-aged women with low activity. They have common preferences such as clothes and shoes but also some different preferences. Due to the long-tail effect, the preferences of low-activity women are difficult to capture, and the model will recommend some popular products for them. To address this problem, we extract generalization information via the gradient aggregation module so that the model can recommend common preferences such as clothes and shoes to low-activity women, although sometimes not her favorite style. The model’s recommendation for the high-activity women and the low-activity women are more similar, so the performance of the model on high-activity women has decreased. Next, we train a plugin network for each group of women. The plugin network captures the group-specific personalization information such as different styles of clothes and shoes and other unpopular preference.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lv, Z., Wang, F., Zhang, S., Zhang, W., Kuang, K., Wu, F. (2024). Parameters Efficient Fine-Tuning for Long-Tailed Sequential Recommendation. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14473. Springer, Singapore. https://doi.org/10.1007/978-981-99-8850-1_36

Download citation

DOI: https://doi.org/10.1007/978-981-99-8850-1_36
Published: 04 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8849-5
Online ISBN: 978-981-99-8850-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parameters Efficient Fine-Tuning for Long-Tailed Sequential Recommendation