Abstract
Collaborative filtering (CF) technique is capable of generating personalized recommendations. However, the recommender systems utilizing CF as their key algorithms are vulnerable to shilling attacks which insert malicious user profiles into the systems to push or nuke the reputations of targeted items. There are only a small number of labeled users in most of the practical recommender systems, while a large number of users are unlabeled because it is expensive to obtain their identities. In this paper, Semi-SAD, a new semi-supervised learning based shilling attack detection algorithm is proposed to take advantage of both types of data. It first trains a naïve Bayes classifier on a small set of labeled users, and then incorporates unlabeled users with EM-λ to improve the initial naïve Bayes classifier. Experiments on MovieLens datasets are implemented to compare the efficiency of Semi-SAD with supervised learning based detector and unsupervised learning based detector. The results indicate that Semi-SAD can better detect various kinds of shilling attacks than others, especially against obfuscated and hybrid shilling attacks.
Similar content being viewed by others
References
Bell, R.M., Koren, Y.: Improved neighborhood-based collaborative filtering. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’07), pp. 7–14 (2007)
Burke, R., Mobasher, B., et al.: Classification features for attack detection in collaborative recommendation systems. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06), pp. 542–547 (2006)
Cacheda, F., Carneiro, V., Fernandez, D., Formoso, V.: Comparison of collaborative filtering algorithms: limitations of current techniques and proposals for scalable, high-performance recommender systems. ACM Trans. Web (TWEB’11) 5(1), 3–34 (2011)
Castelli, V., Cover, T.M.: On the exponential value of labeled samples. Pattern Recogn. Lett. 16(1), 105–111 (1995)
Chiang, M.F., Peng, W.C., Yu, P.S.: Exploring latent browsing graph for question answering recommendation. WWWJ (2012). doi:10.1007/s11280-011-0146-0
Chirita, P.A., Nejdl, W., Zamfir, C.: Preventing shilling attacks in online recommender systems. In: Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management (WIDM’05), pp. 67–74 (2005)
Gunawardana, A., Meek, C.: A unified approach to building hybrid recommender systems. In: Proceedings of the Third ACM Conference on Recommender Systems (RecSys’09), pp. 117–124 (2009)
Hurley, N., Cheng, Z., Zhang, M.: Statistical attack detection. In: Proceedings of the Third ACM Conference on Recommender Systems (RecSys’09), pp. 149–156 (2009)
Lam, S.K., Riedl, J.: Shilling recommender systems for fun and profit. In: Proceedings of the 13th International Conference on World Wide Web (WWW’04), pp. 393–402 (2004)
Lee, J., Zhu, D.: Shilling attack detection—a new approach for a trustworthy recommender system. INFORMS J. Comput. 24(1), 117–131 (2012)
Leung, C.W., Chan, S.C., Chung, F., Ngai, G.: A probabilistic rating inference framework for mining user preferences from reviews. WWWJ 14(2), 187–215 (2011)
Manouselis, N., Costopoulou, C.: Analysis and classification of multi-criteria recommender systems. WWWJ 10(4), 415–441 (2007)
Mehta, B., Hofmann, T., Fankhauser, P.: Lies and propaganda: detecting spam users in collaborative filtering. In: Proceedings of the 12th International Conference on Intelligent User Interfaces (IUI’07), pp. 14–21 (2007)
Melville, P., Mooney, R.J., Nagarajan, R.: Content-boosted collaborative filtering for improved recommendations. In: 8th National Conference on Artificial Intelligence (AAAI’02), pp. 187–192 (2002)
Mobasher, B., Burke, R., Williams, C., Bhaumik, R.: Analysis and detection of segment-focused attacks against collaborative recommendation. In: WebKDD Workshop, pp. 96–118 (2006)
Nigam, K., Mccallum, A., Thrun, S., Mitchill, T.: Text classification from labeled and unlabeled documents using em. Machine Learn. 39(2), 103–134 (2000)
Shahshahani, B.M., Landgrebe, D.A.: The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon. IEEE Trans. Geosci. Remote Sens. 32(5), 1087–1095 (1994)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley (2005)
Williams, C.: Profile injection attack detection for securing collaborative recommender systems. Technical report, DePaul University (2006)
Wu, X., Kumar, V., Ross, J.Q., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Wu, J., Xiong, H., Chen, J.: COG: local decomposition for rare class analysis. Data Min. Knowl. Discovery 20(2), 191–220 (2010)
Wu, D., Ke, Y., Yu, J.X., Yu, P.S., Chen, L.: Leadership discovery when data correlatively evolve. WWWJ 14(1), 1–25 (2011)
Wu, Z., Cao, J., Mao, B., Wang, Y.: SemiSAD: applying semi-supervised learning to shilling attack detection. In: Proceedings of ACM Conference on Recommender Systems (RecSys’11), pp. 289–292. Chicago, IL, USA (2011)
Zheng, Z., Ma, H., Lyu, M.R., King, I.: WSRec: a collaborative filtering based web service recommender system. In: IEEE International Conference on Web Services (ICWS’09), pp. 437–444 (2009)
Zhou, Z., Li, M.: Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans. Knowl. Data Eng. (TKDE’05) 17(11), 1529–1541 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cao, J., Wu, Z., Mao, B. et al. Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system. World Wide Web 16, 729–748 (2013). https://doi.org/10.1007/s11280-012-0164-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-012-0164-6