A novel item anomaly detection approach against shilling attacks in collaborative recommendation systems using the dynamic time interval segmentation technique
Introduction
The explosive growth of online resources (including information and products) has resulted in an excessive number of irrelevant or unnecessary options for people [21], [40], although they are processed by accessing and retrieving techniques. Personalized recommendation, especially the collaborative filtering (CF)-based mechanism, has been successfully introduced to filter out irrelevant resources [1], [18], [22], [32], [34], [40], [46], [48], [54], [56] and has been widely accepted in many different domains, such as auxiliary teaching [14], online learning [15], [19], [37], movie and TV programs [6], [36], tourism [9], [24], online social networks (including communities) [3], [30], [55], [58], digital libraries [49], [52], and technology transfer offices [45].
The majority of CF-based recommendation systems rely on opinions from user to item, which are expressed in the form of rating [1], [22], [32], [40], [54] and are totally vulnerable to shilling attacks [39], [42], [43] designed to increase/reduce the probability of the target item being recommended by inputting certain amounts of fake rating profiles, so that the attackers can benefit [25], [39]. Typically, some profit-driven raters (i.e., item providers) may inject a great deal of positive ratings to promote the reputation of their own items and negative ratings to undermine their competitors. It appears that shilling attacks are emerging as a great threat for the recommendation system because they can generate large volumes of useless information, mislead review comments, and finally successfully change recommendation results.
There are various types of solutions against shilling attacks for the CF algorithm, and the most common way is to detect the fake user profile, that is, finding out the malicious user directly through the features of attack types [8], [10], [11], [12], [13], [17], [28], [33], [38], [39], [61]. However, many models are limited to certain attack types, the features of which have been extracted explicitly or scrutinized by researchers [8], [25], [39]. In addition, the majority of approaches belong to “anomaly user detection” rather than “attacker detection” because the generated anomaly user could be genuine. For example, a captious but authentic user may be classified as an anomaly user by the detection method if he/she usually gives low scores to dissatisfied items, the qualities of which are actually high and could satisfy other people. Actually, many studies neglect the difference between “anomaly user” and “attacker”, although it may influence the misclassification rate or false alarm rate, as some genuine users are misclassified as attackers [38].
To solve these problems, we proposed to detect anomaly items directly, which is equal to finding out items attacked by fake profiles directly. This is because the basic assumption of an item is that its intrinsic quality follows the uniform distribution [27]; the resulting rating distribution of this item remains stable without attack ratings. Once it changes greatly, the item is definitely considered under attack. In addition, this approach is generally effective for nearly all attack types, as all effective attacks must change the statistical characteristics of the target item along with the underlying intention of the attackers. For instance, to improve the recommending possibility of one item, large numbers of extremely high ratings must be injected for that item, and the following mean and mode rating values of that item definitely increase. Hence, we could detect any attack regardless of the specific attack type through indications of the changes in rating distribution.
An additional point is many attacks are short periods so that the attackers could maximum their profits, which means that attack ratings in nearly all time-ordered rating sequences of target items must be close to each other or even neighbors. We then proposed a dynamic time segmentation technique to divide the whole rating series into several time intervals and gather together as many attack ratings as possible, which lowers the computational cost and can be applied online effectively.
The key point of any detection method is the performance, and the most popular aspect is the accuracy [7], [10], [11], [12], [17], [23], [28], [38], [39], [62]. However, we believe that there are other important aspects of detection algorithm performance aside from the accuracy that have been largely overlooked in the current literature. In particular, we introduce a new stability metric for a complementary assessment of the robustness of the detection algorithm inspired by [2], [41] because the robustness of the detection algorithm should include two aspects: the first is the accuracy, and the second is the stability.
The rest of this paper is organized as follows. Section 2 briefly discusses the related work on shilling attacks and commonly used detection methods. In Section 3, we elaborate the intrinsic features and the categories of shilling attacks through the perspective of the item profile, and we also offer a comprehensive description of the stability metric. Section 4 lists our anomaly detection method. Next, in Section 5, we experiment and analyze the performance of the proposed algorithm in three aspects: the effectiveness, the robustness and the timeliness. Finally, we present our paper’s conclusions and note directions for future work in Section 6.
Section snippets
Shilling attack types
There are two categories in shilling attacks concerning attack intention: push attacks and nuke attacks [25], [26], [39], [64]. Attacks that intend to increase the reputation of some targeted items are referred to as push attacks, while others aiming to decrease the popularity of the targeted items are known as nuke attacks. Gunes et al. [25] indicated several widely used shilling attack types based on the research of Mobasher et al. [39], as displayed in Table 1. We make a light modification
Preliminaries
In this part, some basic notations are first presented for clarity. Then, several common features and three types of shilling attacks are analyzed. Finally, the definition and analysis of the new stability metric are introduced.
Dynamic time interval segmentation and hypothesis test detection-based framework (SDF)
In this paper, we quantified the characteristics of the time-ordered rating sequence of an item with a skewness metric, which describes the asymmetry of the probability distribution of a real-valued random variable about its mean [20]. Then, the changes of the skewness quantities between the neighboring ratings imply the influence of the latter coming rating to the whole rating distribution. Moreover, the rate of the change or the first order difference of skewness at each rating represents the
Experiment evaluation
We selected a Movielens dataset [65], which has 1682 items with 100,000 ratings from 942 users. Ratings are discrete-valued between 1 and 5. We sorted ratings for each item by their time stamp. The generation of attack event (here, we only focus on the push attack) is in line with three types of strategies as indicated in Section 3.3.
As opposed to the categorization in [7], we only considered low average rating for the push attack (the experiment can easily be changed and applied to a nuke
Conclusions
The proposed detection framework offers an effective solution to the very real problem of detecting the attacked item with its malicious rating intervals and, further, the malicious users, regardless of the attack type. The detection algorithm utilizes the rate of change of skewness quantities for time interval segmentation. This dynamic interval segmentation technique is, to the best of our knowledge, the only one that can successfully cluster the consecutive attack ratings of the same type
Acknowledgments
This work was supported, in part, by the National Key Basic Research Program of China (973 Program 2013CB329103 of 2013CB329100), the National Natural Science Foundations of China (NSFC-61173129, 71102065, 61103116, 91420102, 61472053), the Specialized Research Fund for the Doctoral Program of Higher Education of China (20120191110026), the Fundamental Research Funds for the Central Universities under Grant (106112014 CDJZR 095502).
References (65)
- et al.
A hybrid content-based and item-based collaborative filtering approach to recommend TV programs enhanced with singular value decomposition
Inform. Sci.
(2010) - et al.
Intelligent tourism recommender systems: a survey
Expert Syst. Appl.
(2014) - et al.
Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce
Expert Syst. Appl.
(2004) - et al.
The problem of information overload in business organisations: a review of the literature
Int. J. Inform. Manage.
(2000) - et al.
Mobile recommender systems in tourism
J. Network Comput. Appl.
(2014) - et al.
A survey of trust and reputation systems for online service provision
Decis. Support Syst.
(2007) - et al.
A group recommendation system for online communities
Int. J. Inform. Manage.
(2010) - et al.
A trust prediction framework in rating-based experience sharing social networks without a web of trust
Inform. Sci.
(2012) - et al.
A hybrid collaborative filtering method for multiple-interests and multiple-content recommendation in E-Commerce
Expert Syst. Appl.
(2005) - et al.
A hybrid recommender system for the selective dissemination of research resources in a technology transfer office
Inform. Sci.
(2012)
A Google wave-based fuzzy recommender system to disseminate information in university digital libraries 2.0
Inform. Sci.
An efficient and versatile approach to trust and reputation using hierarchical Bayesian modelling
Artif. Intell.
A quality based recommender system to disseminate information in a university digital library
Inform. Sci.
Using SVD and demographic data for the enhancement of generalized collaborative filtering
Inform. Sci.
Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions
IEEE Trans. Knowl. Data Eng.
Stability of recommendation algorithms
ACM Trans. Inform. Syst. (TOIS)
A collaborative filtering framework for friends recommendation in social networks based on interaction intensity and adaptive user similarity
Social Network Anal. Min.
Application of belief propagation to trust and reputation management
Iterative trust and reputation management using belief propagation
IEEE Trans. Dependable Secure Comput.
Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system
World Wide Web
A hybrid system of pedagogical pattern recommendations based on singular value decomposition and variable data attributes
Inform. Process. Manage.
A hybrid recommendation algorithm adapted in e-learning environments
World Wide Web
Evaluating collaborative filtering recommendations inside large learning object repositories
Inform. Process. Manage.
Probability and Statistics for Engineering and the Sciences
Personalisation in web computing and informatics: theories, techniques, applications, and future research
Inform. Syst. Front.
Cited by (59)
Few-shot time-series anomaly detection with unsupervised domain adaptation
2023, Information SciencesDetecting shilling groups in online recommender systems based on graph convolutional network
2022, Information Processing and ManagementReady for emerging threats to recommender systems? A graph convolution-based generative shilling attack
2021, Information SciencesAn effective and efficient fuzzy approach for managing natural noise in recommender systems
2021, Information SciencesRoot-cause analysis for time-series anomalies via spatiotemporal graphical modeling in distributed complex systems
2021, Knowledge-Based SystemsCitation Excerpt :To exploit the large data availability in distributed CPSs and implement effective monitoring and decision making, data-driven approaches are being explored including (i) learning and inference in CPSs with multiple nominal modes, (ii) identifying anomaly and root cause without labeled data, and (iii) handling continuous and discrete data simultaneously [10,11]. Moreover, the data-driven methods for complex technological systems need to be flexible, scalable, robust, and adaptive [12,13]. Graphical modeling is increasingly being applied in complex CPSs for learning and inference [14] including directed acyclic graph, directed cyclic graph and directed pseudograph, where the later two are more difficult to be fully discovered [15] but are more realistic in applications [16].
Trustworthy and profit: A new value-based neighbor selection method in recommender systems under shilling attacks
2019, Decision Support Systems