Skip to main content

Advertisement

Log in

Improving recommendation quality through outlier removal

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Rating data collected by recommendation systems contain noise caused by human uncertainty and malicious attacks. Existing outlier removal approaches usually aim at detecting noise inserted into ground-truth ratings. However, in real applications, the ground-truth of the training data are unavailable, or even unimportant for the prediction task. In this paper, we propose an efficient and effective outlier removal algorithm to improve the quality of the training data. The noise is modeled by the mixture of Gaussian distribution, which can approximate any continuous distribution. First, we employ the expectation-maximization algorithm to calculate the low-rank matrices, whose product forms the recovered ratings. Second, we compare the original and recovered ratings to solicit suspected outliers. This process is repeated a number of times, and ratings that are suspected enough times will be treated as outliers. To validate the effectiveness of our algorithm, we compared the prediction quality of four popular recommendation algorithms. Results showed that several measures on the algorithms were improved with the new training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://grouplens.org/datasets/movielens/

  2. http://snap.stanford.edu/data/web-Amazon-links.html

  3. https://www.yelp.com/dataset

  4. https://www.librec.net/datasets.html

References

  1. Soros G (2013) Fallibility, reflexivity, and the human uncertainty principle. J Econ Methodol 20(4):309–329

    Article  Google Scholar 

  2. Beinhocker ED (2013) Reflexivity, complexity, and the nature of social science. J Econ Methodol 20(4):330–342

    Article  Google Scholar 

  3. Aggarwal CC et al (2016) Recommender systems. Springer, New York

    Book  Google Scholar 

  4. Gunes I, Kaleli C, Bilge A, Polat H (2014) Shilling attacks against recommender systems: a comprehensive survey. Artif Intell Rev 42(4):767–799

    Article  Google Scholar 

  5. O’Mahony MP, Hurley NJ, Silvestre G (2006) Detecting noise in recommender system databases. In: Proceedings of the 11th international conference on intelligent user interfaces, ACM pp 109–115

  6. Wu C, Zhang Q, Zhao F, Cheng Y, Wang G (2021) Three-way recommendation model based on shadowed set with uncertainty invariance. Int J Approx Reason 135:53–70

    Article  MathSciNet  Google Scholar 

  7. Wang YP, Yu H, Wang GY, Xie YF (2020) Cross-domain recommendation based on sentiment analysis and latent feature mapping. Entropy 22(4):473

    Article  Google Scholar 

  8. Zhang HR, Min F, Wu YX, Fu ZL, Gao L (2018) Magic barrier estimation models for recommended systems under normal distribution. Appl Intell 48(12):4678–4693

    Article  Google Scholar 

  9. Sah RK (1991) Fallibility in human organizations and political systems. J Econ Perspect 5(2):67–88

    Article  Google Scholar 

  10. Xu YS, Zhang FZ (2019) Detecting shilling attacks in social recommender systems based on time series analysis and trust features. Knowl-Based Syst 178:25–47

    Article  Google Scholar 

  11. Pang M, Gao W, Tao M, Zhou ZH (2018) Unorganized malicious attacks detection. In: NIPS. pp 6976–6985

  12. Lam SK, Riedl J (2004) Shilling recommender systems for fun and profit. In: WWW. pp 393–402

  13. Luca M.: Reviews, reputation, and revenue: the case of yelp. com. Harvard Business School Working Papers 12-016, Harvard Business School (2016)

  14. Jasberg K, Sizov S (2017) The magic barrier revisited: accessing natural limitations of recommender assessment. In: RecSys. pp 55–64

  15. Ling G, King I, Lyu MR (2013) A unified framework for reputation estimation in online rating systems. IJCA I:2670–2676

    Google Scholar 

  16. Williams CA, Mobasher B, Burke R (2007) Defending recommender systems: detection of profile injection attacks. Serv Orient Comput Appl 1(3):157–170

    Article  Google Scholar 

  17. Yap GE, Tan AH, Pang HH (2007) Discovering and exploiting causal dependencies for robust mobile context-aware recommenders. IEEE Trans Knowl Data Eng 19(7):977–992

    Article  Google Scholar 

  18. Li B, Chen L, Zhu XQ, Zhang CQ (2013) Noisy but non-malicious user detection in social recommender systems. World Wide Web 16(5–6):677–699

    Article  Google Scholar 

  19. Kim E, Pyo S, Park E, Kim M (2011) An automatic recommendation scheme of TV program contents for (IP)TV personalization. IEEE Trans Broadcast 57(3):674–684

    Article  Google Scholar 

  20. Chen XA, Han Z, Wang Y, Zhao Q, Meng DY, Tang YD (2016) Robust tensor factorization with unknown noise. In: CVPR pp 5213–5221

  21. McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering. Marcel Dekker

  22. Meng DY, De La Torre F (2013) Robust matrix factorization with unknown noise. In: ICCV. pp 1337–1344

  23. Hofmann T (2003) Collaborative filtering via Gaussian probabilistic latent semantic analysis. In: SIGIR pp 259–266

  24. Si L, Jin R (2003) Flexible mixture model for collaborative filtering. In: ICML pp 704–711

  25. Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Process Mag 13(6):47–60

    Article  Google Scholar 

  26. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 85:1–38

    MathSciNet  MATH  Google Scholar 

  27. Liu D (2021) The effectiveness of three-way classification with interpretable perspective. Inform Sci 567:237–255

    Article  MathSciNet  Google Scholar 

  28. Xu YY, Zhang HR, Min F (2017) A three-way recommender system for popularity-based costs. In: Proceedings of international joint conference on rough set. pp 278–289

  29. Gemmell J, Schimoler T, Ramezani M, Mobasher B (2009) Adapting k-nearest neighbor for tag recommendation in folksonomies. In: ITWP

  30. Zhang HR, Min F, Zhang ZH, Wang S (2018) Efficient collaborative filtering recommendations with multi-channel feature vectors. Int J Mach Learn Cybernet 10:1–8

    Google Scholar 

  31. Tsai CF, Hung C (2012) Cluster ensembles in collaborative filtering recommendation. Appl Soft Comput 12:1417–1425

    Article  Google Scholar 

  32. Liu D, Ye XQ (2020) A matrix factorization based dynamic granularity recommendation with three-way decisions. Knowl Based Syst 191:105243

    Article  Google Scholar 

  33. Nilashi M, Ibrahim O, Bagherifard K (2018) A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Syst Appl 92:507–520

    Article  Google Scholar 

  34. Panagiotakis C, Papadakis H, Papagrigoriou A, Fragopoulou P (2021) Improving recommender systems via a dual training error based correction approach. Expert Syst Appl 183:115386

    Article  Google Scholar 

  35. Zhang HR, Min F (2016) Three-way recommender systems based on random forests. Knowl-Based Syst 91:275–286

    Article  Google Scholar 

  36. Zhang HR, Min F, Shi B (2017) Regression-based three-way recommendation. Inform Sci 378:444–461

    Article  Google Scholar 

  37. Ye XQ, Liu D (2021) An interpretable sequential three-way recommendation based on collaborative topic regression. Expert Syst Appl 168:114454

    Article  Google Scholar 

  38. Revaud J, Almazán J, Rezende RS, Souza CRD (2019) Learning with average precision: training image retrieval with a listwise loss. In: ICCV. pp 5107–5116

  39. Chen WS, Zhao Y, Pan B, Chen B (2019) Supervised kernel nonnegative matrix factorization for face recognition. Neurocomputing 205:165–181

    Article  Google Scholar 

  40. Devooght R, Kourtellis N, Mantrach A (2015) Dynamic matrix factorization with priors on unknown values. In: SIGKDD. pp 189–198

  41. He X, Zhang H, Kan MY, Chua TS (2016) Fast matrix factorization for online recommendation with implicit feedback. In: SIGIR. pp 549–558

  42. Funk S (2006) Netflix update: try this at home

  43. Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: ICDM. pp 263–272

  44. Davoudi A, Chatterjee M (2017) Detection of profile injection attacks in social recommender systems using outlier analysis. In: ICBD. pp 2714–2719

  45. Panagiotakis C, Papadakis H, Fragopoulou P (2020) Unsupervised and supervised methods for the detection of hurriedly created profiles in recommender systems. Int J Mach Learn Cybernet 11(9):2165–2179

    Article  Google Scholar 

  46. Toledo RY, Mota YC, Martínez L (2015) Correcting noisy ratings in collaborative recommender systems. Knowl-Based Syst 76:96–108

    Article  Google Scholar 

  47. Chakraborty PS (2020) Attack detection in recommender systems using subspace outlier detection algorithm. In: Proceedings of the 2nd international conference on communication, devices and computing. pp 679—685

  48. Scheunders P, De Backer S (2007) Wavelet denoising of multicomponent images using Gaussian scale mixture models and a noise-free image as priors. IEEE Trans Image Process 16(7):1865–1872

    Article  MathSciNet  Google Scholar 

  49. Hansen F, Pedersen GK (1982) Jensen’s inequality for operators and löwner’s theorem. Math Ann 258(3):229–241

    Article  MathSciNet  Google Scholar 

  50. Peajcariaac JE, Tong YL (1992) Convex functions, partial orderings, and statistical applications. Academic Press, San Diego

    Google Scholar 

  51. Yu H, Zhou B, Deng MY, Hu F (2018) Tag recommendation method in folksonomy based on user tagging status. J Intell Inform Syst 14:1–22

    Google Scholar 

  52. Ma TH, Zhou JJ, Tang ML, Tian Y, Al-Dhelaan A, Al-Rodhaan M, Lee S (2015) Social network and tag sources based augmenting collaborative recommender system. IEICE Trans Inform Syst 98(4):902–910

    Article  Google Scholar 

  53. Harper FM, Konstan JA (2016) The movielens datasets: history and context. Acm Trans Interact Intell Syst 5(4):1–19

    Article  Google Scholar 

  54. Sarwar B. Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: WWW. pp 285–295

  55. Adeniyi D, Wei ZQ, Yang YQ (2016) Automated web usage data mining and recommendation system using K-nearest neighbor (KNN) classification method. Appl Comput Inform 12(1):90–108

    Article  Google Scholar 

  56. Kannan R, Woo H, Aggarwal CC, Park H (2017) Outlier detection for text data. In: Proceedings of the 2017 SIAM international conference on data mining. pp 489–497

  57. Marnissi Y, Zheng Y, Chouzenoux E, Pesquet JC (2017) A variational Bayesian approach for image restoration—application to image deblurring with poisson-gaussian noise. IEEE Trans Comput Imaging 3(4):722–737

    Article  MathSciNet  Google Scholar 

  58. Cao XY, Chen Y, Zhao Q, Meng DY, Wang Y, Wang D, Xu ZB (2015) Low-rank matrix factorization under general mixture noise distributions. In: ICCV. pp 1493–1501

  59. Yang ZZ, Fan L, Yang YP, Yang Z, Gui G (2020) Generalized nuclear norm and Laplacian scale mixture based low-rank and sparse decomposition for video foreground-background separation. Signal Process 172:107527

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Natural Scientific Foundation of China (61976194, 41631179), the Open project of Key Laboratory of Oceanographic Big Data Mining and Application of Zhejiang Province (OBMA202005), the Zhejiang Provincial Natural Science Foundation of China (LY18F030017), the Natural Science Foundation of Sichuan Province (2019YJ0314).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Min.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, YY., Gu, SM. & Min, F. Improving recommendation quality through outlier removal. Int. J. Mach. Learn. & Cyber. 13, 1819–1832 (2022). https://doi.org/10.1007/s13042-021-01490-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01490-7

Keywords

Navigation