Abstract
Rating data collected by recommendation systems contain noise caused by human uncertainty and malicious attacks. Existing outlier removal approaches usually aim at detecting noise inserted into ground-truth ratings. However, in real applications, the ground-truth of the training data are unavailable, or even unimportant for the prediction task. In this paper, we propose an efficient and effective outlier removal algorithm to improve the quality of the training data. The noise is modeled by the mixture of Gaussian distribution, which can approximate any continuous distribution. First, we employ the expectation-maximization algorithm to calculate the low-rank matrices, whose product forms the recovered ratings. Second, we compare the original and recovered ratings to solicit suspected outliers. This process is repeated a number of times, and ratings that are suspected enough times will be treated as outliers. To validate the effectiveness of our algorithm, we compared the prediction quality of four popular recommendation algorithms. Results showed that several measures on the algorithms were improved with the new training data.
Similar content being viewed by others
References
Soros G (2013) Fallibility, reflexivity, and the human uncertainty principle. J Econ Methodol 20(4):309–329
Beinhocker ED (2013) Reflexivity, complexity, and the nature of social science. J Econ Methodol 20(4):330–342
Aggarwal CC et al (2016) Recommender systems. Springer, New York
Gunes I, Kaleli C, Bilge A, Polat H (2014) Shilling attacks against recommender systems: a comprehensive survey. Artif Intell Rev 42(4):767–799
O’Mahony MP, Hurley NJ, Silvestre G (2006) Detecting noise in recommender system databases. In: Proceedings of the 11th international conference on intelligent user interfaces, ACM pp 109–115
Wu C, Zhang Q, Zhao F, Cheng Y, Wang G (2021) Three-way recommendation model based on shadowed set with uncertainty invariance. Int J Approx Reason 135:53–70
Wang YP, Yu H, Wang GY, Xie YF (2020) Cross-domain recommendation based on sentiment analysis and latent feature mapping. Entropy 22(4):473
Zhang HR, Min F, Wu YX, Fu ZL, Gao L (2018) Magic barrier estimation models for recommended systems under normal distribution. Appl Intell 48(12):4678–4693
Sah RK (1991) Fallibility in human organizations and political systems. J Econ Perspect 5(2):67–88
Xu YS, Zhang FZ (2019) Detecting shilling attacks in social recommender systems based on time series analysis and trust features. Knowl-Based Syst 178:25–47
Pang M, Gao W, Tao M, Zhou ZH (2018) Unorganized malicious attacks detection. In: NIPS. pp 6976–6985
Lam SK, Riedl J (2004) Shilling recommender systems for fun and profit. In: WWW. pp 393–402
Luca M.: Reviews, reputation, and revenue: the case of yelp. com. Harvard Business School Working Papers 12-016, Harvard Business School (2016)
Jasberg K, Sizov S (2017) The magic barrier revisited: accessing natural limitations of recommender assessment. In: RecSys. pp 55–64
Ling G, King I, Lyu MR (2013) A unified framework for reputation estimation in online rating systems. IJCA I:2670–2676
Williams CA, Mobasher B, Burke R (2007) Defending recommender systems: detection of profile injection attacks. Serv Orient Comput Appl 1(3):157–170
Yap GE, Tan AH, Pang HH (2007) Discovering and exploiting causal dependencies for robust mobile context-aware recommenders. IEEE Trans Knowl Data Eng 19(7):977–992
Li B, Chen L, Zhu XQ, Zhang CQ (2013) Noisy but non-malicious user detection in social recommender systems. World Wide Web 16(5–6):677–699
Kim E, Pyo S, Park E, Kim M (2011) An automatic recommendation scheme of TV program contents for (IP)TV personalization. IEEE Trans Broadcast 57(3):674–684
Chen XA, Han Z, Wang Y, Zhao Q, Meng DY, Tang YD (2016) Robust tensor factorization with unknown noise. In: CVPR pp 5213–5221
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker
Meng DY, De La Torre F (2013) Robust matrix factorization with unknown noise. In: ICCV. pp 1337–1344
Hofmann T (2003) Collaborative filtering via Gaussian probabilistic latent semantic analysis. In: SIGIR pp 259–266
Si L, Jin R (2003) Flexible mixture model for collaborative filtering. In: ICML pp 704–711
Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Process Mag 13(6):47–60
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 85:1–38
Liu D (2021) The effectiveness of three-way classification with interpretable perspective. Inform Sci 567:237–255
Xu YY, Zhang HR, Min F (2017) A three-way recommender system for popularity-based costs. In: Proceedings of international joint conference on rough set. pp 278–289
Gemmell J, Schimoler T, Ramezani M, Mobasher B (2009) Adapting k-nearest neighbor for tag recommendation in folksonomies. In: ITWP
Zhang HR, Min F, Zhang ZH, Wang S (2018) Efficient collaborative filtering recommendations with multi-channel feature vectors. Int J Mach Learn Cybernet 10:1–8
Tsai CF, Hung C (2012) Cluster ensembles in collaborative filtering recommendation. Appl Soft Comput 12:1417–1425
Liu D, Ye XQ (2020) A matrix factorization based dynamic granularity recommendation with three-way decisions. Knowl Based Syst 191:105243
Nilashi M, Ibrahim O, Bagherifard K (2018) A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Syst Appl 92:507–520
Panagiotakis C, Papadakis H, Papagrigoriou A, Fragopoulou P (2021) Improving recommender systems via a dual training error based correction approach. Expert Syst Appl 183:115386
Zhang HR, Min F (2016) Three-way recommender systems based on random forests. Knowl-Based Syst 91:275–286
Zhang HR, Min F, Shi B (2017) Regression-based three-way recommendation. Inform Sci 378:444–461
Ye XQ, Liu D (2021) An interpretable sequential three-way recommendation based on collaborative topic regression. Expert Syst Appl 168:114454
Revaud J, Almazán J, Rezende RS, Souza CRD (2019) Learning with average precision: training image retrieval with a listwise loss. In: ICCV. pp 5107–5116
Chen WS, Zhao Y, Pan B, Chen B (2019) Supervised kernel nonnegative matrix factorization for face recognition. Neurocomputing 205:165–181
Devooght R, Kourtellis N, Mantrach A (2015) Dynamic matrix factorization with priors on unknown values. In: SIGKDD. pp 189–198
He X, Zhang H, Kan MY, Chua TS (2016) Fast matrix factorization for online recommendation with implicit feedback. In: SIGIR. pp 549–558
Funk S (2006) Netflix update: try this at home
Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: ICDM. pp 263–272
Davoudi A, Chatterjee M (2017) Detection of profile injection attacks in social recommender systems using outlier analysis. In: ICBD. pp 2714–2719
Panagiotakis C, Papadakis H, Fragopoulou P (2020) Unsupervised and supervised methods for the detection of hurriedly created profiles in recommender systems. Int J Mach Learn Cybernet 11(9):2165–2179
Toledo RY, Mota YC, Martínez L (2015) Correcting noisy ratings in collaborative recommender systems. Knowl-Based Syst 76:96–108
Chakraborty PS (2020) Attack detection in recommender systems using subspace outlier detection algorithm. In: Proceedings of the 2nd international conference on communication, devices and computing. pp 679—685
Scheunders P, De Backer S (2007) Wavelet denoising of multicomponent images using Gaussian scale mixture models and a noise-free image as priors. IEEE Trans Image Process 16(7):1865–1872
Hansen F, Pedersen GK (1982) Jensen’s inequality for operators and löwner’s theorem. Math Ann 258(3):229–241
Peajcariaac JE, Tong YL (1992) Convex functions, partial orderings, and statistical applications. Academic Press, San Diego
Yu H, Zhou B, Deng MY, Hu F (2018) Tag recommendation method in folksonomy based on user tagging status. J Intell Inform Syst 14:1–22
Ma TH, Zhou JJ, Tang ML, Tian Y, Al-Dhelaan A, Al-Rodhaan M, Lee S (2015) Social network and tag sources based augmenting collaborative recommender system. IEICE Trans Inform Syst 98(4):902–910
Harper FM, Konstan JA (2016) The movielens datasets: history and context. Acm Trans Interact Intell Syst 5(4):1–19
Sarwar B. Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: WWW. pp 285–295
Adeniyi D, Wei ZQ, Yang YQ (2016) Automated web usage data mining and recommendation system using K-nearest neighbor (KNN) classification method. Appl Comput Inform 12(1):90–108
Kannan R, Woo H, Aggarwal CC, Park H (2017) Outlier detection for text data. In: Proceedings of the 2017 SIAM international conference on data mining. pp 489–497
Marnissi Y, Zheng Y, Chouzenoux E, Pesquet JC (2017) A variational Bayesian approach for image restoration—application to image deblurring with poisson-gaussian noise. IEEE Trans Comput Imaging 3(4):722–737
Cao XY, Chen Y, Zhao Q, Meng DY, Wang Y, Wang D, Xu ZB (2015) Low-rank matrix factorization under general mixture noise distributions. In: ICCV. pp 1493–1501
Yang ZZ, Fan L, Yang YP, Yang Z, Gui G (2020) Generalized nuclear norm and Laplacian scale mixture based low-rank and sparse decomposition for video foreground-background separation. Signal Process 172:107527
Acknowledgements
This work is supported in part by the National Natural Scientific Foundation of China (61976194, 41631179), the Open project of Key Laboratory of Oceanographic Big Data Mining and Application of Zhejiang Province (OBMA202005), the Zhejiang Provincial Natural Science Foundation of China (LY18F030017), the Natural Science Foundation of Sichuan Province (2019YJ0314).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, YY., Gu, SM. & Min, F. Improving recommendation quality through outlier removal. Int. J. Mach. Learn. & Cyber. 13, 1819–1832 (2022). https://doi.org/10.1007/s13042-021-01490-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01490-7