A novel item anomaly detection approach against shilling attacks in collaborative recommendation systems using the dynamic time interval segmentation technique

doi:10.1016/j.ins.2015.02.019

Information Sciences

Volume 306, 10 June 2015, Pages 150-165

https://doi.org/10.1016/j.ins.2015.02.019 Get rights and content

Abstract

Various types of web applications have gained both higher customer satisfaction and more benefits since being successfully armed with personalized recommendation. However, the increasingly rampant shilling attackers apply biased rating profiles to systems to manipulate item recommendations, which not just lower the recommending precision and user satisfaction but also damage the trustworthiness of intermediated transaction platforms and participants. Many studies have offered methods against shilling attacks, especially user profile based-detection. However, this detection suffers from the extraction of the universal feature of attackers, which directly results in poor performance when facing the improved shilling attack types. This paper presents a novel dynamic time interval segmentation technique based item anomaly detection approach to address these problems. In particular, this study is inspired by the common attack features from the standpoint of the item profile, and can detect attacks regardless of the specific attack types. The proposed segmentation technique could confirm the size of the time interval dynamically to group as many consecutive attack ratings together as possible. In addition, apart from effectiveness metrics, little attention has been paid to the robustness of detection methods, which includes measuring both the accuracy and the stability of results. Hence, we introduced a stability metric as a complement for estimating the robustness. Thorough experiments on the MovieLens dataset illustrate the performance of the proposed approach, and justify the value of the proposed approach for online applications.

Introduction

The explosive growth of online resources (including information and products) has resulted in an excessive number of irrelevant or unnecessary options for people [21], [40], although they are processed by accessing and retrieving techniques. Personalized recommendation, especially the collaborative filtering (CF)-based mechanism, has been successfully introduced to filter out irrelevant resources [1], [18], [22], [32], [34], [40], [46], [48], [54], [56] and has been widely accepted in many different domains, such as auxiliary teaching [14], online learning [15], [19], [37], movie and TV programs [6], [36], tourism [9], [24], online social networks (including communities) [3], [30], [55], [58], digital libraries [49], [52], and technology transfer offices [45].

The majority of CF-based recommendation systems rely on opinions from user to item, which are expressed in the form of rating [1], [22], [32], [40], [54] and are totally vulnerable to shilling attacks [39], [42], [43] designed to increase/reduce the probability of the target item being recommended by inputting certain amounts of fake rating profiles, so that the attackers can benefit [25], [39]. Typically, some profit-driven raters (i.e., item providers) may inject a great deal of positive ratings to promote the reputation of their own items and negative ratings to undermine their competitors. It appears that shilling attacks are emerging as a great threat for the recommendation system because they can generate large volumes of useless information, mislead review comments, and finally successfully change recommendation results.

There are various types of solutions against shilling attacks for the CF algorithm, and the most common way is to detect the fake user profile, that is, finding out the malicious user directly through the features of attack types [8], [10], [11], [12], [13], [17], [28], [33], [38], [39], [61]. However, many models are limited to certain attack types, the features of which have been extracted explicitly or scrutinized by researchers [8], [25], [39]. In addition, the majority of approaches belong to “anomaly user detection” rather than “attacker detection” because the generated anomaly user could be genuine. For example, a captious but authentic user may be classified as an anomaly user by the detection method if he/she usually gives low scores to dissatisfied items, the qualities of which are actually high and could satisfy other people. Actually, many studies neglect the difference between “anomaly user” and “attacker”, although it may influence the misclassification rate or false alarm rate, as some genuine users are misclassified as attackers [38].

To solve these problems, we proposed to detect anomaly items directly, which is equal to finding out items attacked by fake profiles directly. This is because the basic assumption of an item is that its intrinsic quality follows the uniform distribution [27]; the resulting rating distribution of this item remains stable without attack ratings. Once it changes greatly, the item is definitely considered under attack. In addition, this approach is generally effective for nearly all attack types, as all effective attacks must change the statistical characteristics of the target item along with the underlying intention of the attackers. For instance, to improve the recommending possibility of one item, large numbers of extremely high ratings must be injected for that item, and the following mean and mode rating values of that item definitely increase. Hence, we could detect any attack regardless of the specific attack type through indications of the changes in rating distribution.

An additional point is many attacks are short periods so that the attackers could maximum their profits, which means that attack ratings in nearly all time-ordered rating sequences of target items must be close to each other or even neighbors. We then proposed a dynamic time segmentation technique to divide the whole rating series into several time intervals and gather together as many attack ratings as possible, which lowers the computational cost and can be applied online effectively.

The key point of any detection method is the performance, and the most popular aspect is the accuracy [7], [10], [11], [12], [17], [23], [28], [38], [39], [62]. However, we believe that there are other important aspects of detection algorithm performance aside from the accuracy that have been largely overlooked in the current literature. In particular, we introduce a new stability metric for a complementary assessment of the robustness of the detection algorithm inspired by [2], [41] because the robustness of the detection algorithm should include two aspects: the first is the accuracy, and the second is the stability.

The rest of this paper is organized as follows. Section 2 briefly discusses the related work on shilling attacks and commonly used detection methods. In Section 3, we elaborate the intrinsic features and the categories of shilling attacks through the perspective of the item profile, and we also offer a comprehensive description of the stability metric. Section 4 lists our anomaly detection method. Next, in Section 5, we experiment and analyze the performance of the proposed algorithm in three aspects: the effectiveness, the robustness and the timeliness. Finally, we present our paper’s conclusions and note directions for future work in Section 6.

Section snippets

Shilling attack types

There are two categories in shilling attacks concerning attack intention: push attacks and nuke attacks [25], [26], [39], [64]. Attacks that intend to increase the reputation of some targeted items are referred to as push attacks, while others aiming to decrease the popularity of the targeted items are known as nuke attacks. Gunes et al. [25] indicated several widely used shilling attack types based on the research of Mobasher et al. [39], as displayed in Table 1. We make a light modification

Preliminaries

In this part, some basic notations are first presented for clarity. Then, several common features and three types of shilling attacks are analyzed. Finally, the definition and analysis of the new stability metric are introduced.

Dynamic time interval segmentation and hypothesis test detection-based framework (SDF)

In this paper, we quantified the characteristics of the time-ordered rating sequence of an item with a skewness metric, which describes the asymmetry of the probability distribution of a real-valued random variable about its mean [20]. Then, the changes of the skewness quantities between the neighboring ratings imply the influence of the latter coming rating to the whole rating distribution. Moreover, the rate of the change or the first order difference of skewness at each rating represents the

Experiment evaluation

We selected a Movielens dataset [65], which has 1682 items with 100,000 ratings from 942 users. Ratings are discrete-valued between 1 and 5. We sorted ratings for each item by their time stamp. The generation of attack event (here, we only focus on the push attack) is in line with three types of strategies as indicated in Section 3.3.

As opposed to the categorization in [7], we only considered low average rating for the push attack (the experiment can easily be changed and applied to a nuke

Conclusions

The proposed detection framework offers an effective solution to the very real problem of detecting the attacked item with its malicious rating intervals and, further, the malicious users, regardless of the attack type. The detection algorithm utilizes the rate of change of skewness quantities for time interval segmentation. This dynamic interval segmentation technique is, to the best of our knowledge, the only one that can successfully cluster the consecutive attack ratings of the same type

Acknowledgments

This work was supported, in part, by the National Key Basic Research Program of China (973 Program 2013CB329103 of 2013CB329100), the National Natural Science Foundations of China (NSFC-61173129, 71102065, 61103116, 91420102, 61472053), the Specialized Research Fund for the Doctoral Program of Higher Education of China (20120191110026), the Fundamental Research Funds for the Central Universities under Grant (106112014 CDJZR 095502).

References (65)

A.B. Barragáns Martínez et al.
A hybrid content-based and item-based collaborative filtering approach to recommend TV programs enhanced with singular value decomposition
Inform. Sci.
(2010)
J. Borràs et al.
Intelligent tourism recommender systems: a survey
Expert Syst. Appl.
(2014)
Y.H. Cho et al.
Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce
Expert Syst. Appl.
(2004)
A. Edmunds et al.
The problem of information overload in business organisations: a review of the literature
Int. J. Inform. Manage.
(2000)
D. Gavalas et al.
Mobile recommender systems in tourism
J. Network Comput. Appl.
(2014)
A. Jøsang et al.
A survey of trust and reputation systems for online service provision
Decis. Support Syst.
(2007)
J.K. Kim et al.
A group recommendation system for online communities
Int. J. Inform. Manage.
(2010)
Y. Kim et al.
A trust prediction framework in rating-based experience sharing social networks without a web of trust
Inform. Sci.
(2012)
Y. Li et al.
A hybrid collaborative filtering method for multiple-interests and multiple-content recommendation in E-Commerce
Expert Syst. Appl.
(2005)
C. Porcel et al.
A hybrid recommender system for the selective dissemination of research resources in a technology transfer office
Inform. Sci.
(2012)

J. Serrano-Guerrero et al.

A Google wave-based fuzzy recommender system to disseminate information in university digital libraries 2.0

Inform. Sci.

(2011)

W.T. Teacy et al.

An efficient and versatile approach to trust and reputation using hierarchical Bayesian modelling

Artif. Intell.

(2012)

A. Tejeda-Lorente et al.

A quality based recommender system to disseminate information in a university digital library

Inform. Sci.

(2014)

M.G. Vozalis et al.

Using SVD and demographic data for the enhancement of generalized collaborative filtering

Inform. Sci.

(2007)

G. Adomavicius et al.

Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

IEEE Trans. Knowl. Data Eng.

(2005)

G. Adomavicius et al.

Stability of recommendation algorithms

ACM Trans. Inform. Syst. (TOIS)

(2012)

V. Agarwal et al.

A collaborative filtering framework for friends recommendation in social networks based on interaction intensity and adaptive user similarity

Social Network Anal. Min.

(2013)

E. Ayday et al.

Application of belief propagation to trust and reputation management

E. Ayday et al.

Iterative trust and reputation management using belief propagation

IEEE Trans. Dependable Secure Comput.

(2012)

R. Bhaumik, C. Williams, B. Mobasher, R. Burke, Securing collaborative filtering against malicious attacks through...

R. Bhaumik, B. Mobasher, R. Burke, A clustering approach to unsupervised attack detection in collaborative recommender...

K. Bryan, M. O’Mahony, P. Cunningham, Unsupervised retrieval of attack profiles in collaborative recommender systems,...

R. Burke, B. Mobasher, C. Williams, R. Bhaumik, Classification features for attack detection in collaborative...

R. Burke, B. Mobasher, C. Williams, R. Bhaumik, Detecting profile injection attacks in collaborative recommender...

J. Cao et al.

Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system

World Wide Web

(2013)

C. Carlos et al.

A hybrid system of pedagogical pattern recommendations based on singular value decomposition and variable data attributes

Inform. Process. Manage.

(2013)

W. Chen et al.

A hybrid recommendation algorithm adapted in e-learning environments

World Wide Web

(2014)

Z.P. Cheng, N. Hurley, Effective diverse and obfuscated attacks on model-based recommender systems, in: Proc. 3rd ACM...

P.A. Chirita, W. Nejdl, C. Zamfir, Preventing shilling attacks in online recommender systems, in. Proc. 7th Annual ACM...

C. Cristian et al.

Evaluating collaborative filtering recommendations inside large learning object repositories

Inform. Process. Manage.

(2013)

J.L. Devore

Probability and Statistics for Engineering and the Sciences

(2011)

M. Gao et al.

Personalisation in web computing and informatics: theories, techniques, applications, and future research

Inform. Syst. Front.

(2010)

Cited by (59)

Few-shot time-series anomaly detection with unsupervised domain adaptation
2023, Information Sciences
Anomaly detection for time-series data is crucial in the management of systems for streaming applications, computational services, and cloud platforms. The majority of current few-shot learning (FSL) approaches are supposed to discover the remarkably low fraction of anomaly samples in a large number of time-series samples. Furthermore, due to the tremendous effort required to label data, most time-series datasets lack data labels, necessitating unsupervised domain adaptation (UDA) methods. Therefore, time-series anomaly detection is a problem that combines the aforementioned two difficulties, termed FS-UDA. To solve the problem, we propose a Few-Shot time-series Anomaly Detection framework with unsupervised domAin adaPTation (FS-ADAPT), which consists of two modules: a dueling triplet network to address the constraints of unsupervised target information, and an incremental adaptation module for addressing the limitations of few anomaly samples in an online scenario. The dueling triplet network is adversarially trained with augmented data and unlabeled target samples to learn a classifier. The incremental adaptation module fully exploits both the critical anomaly samples and the freshest normal samples to keep the classifier up to date. Extensive experiments on five real-world time-series datasets are conducted to assess FS-ADAPT, which outperforms the state-of-the-art FSL and UDA based time-series classification models, as well as their naive combinations.
Detecting shilling groups in online recommender systems based on graph convolutional network
2022, Information Processing and Management
Online recommender systems have been shown to be vulnerable to group shilling attacks in which attackers of a shilling group collaboratively inject fake profiles with the aim of increasing or decreasing the frequency that particular items are recommended. Existing detection methods mainly use the frequent itemset (dense subgraph) mining or clustering method to generate candidate groups and then utilize the hand-crafted features to identify shilling groups. However, such two-stage detection methods have two limitations. On the one hand, due to the sensitivity of support threshold or clustering parameters setting, it is difficult to guarantee the quality of candidate groups generated. On the other hand, they all rely on manual feature engineering to extract detection features, which is costly and time-consuming. To address these two limitations, we present a shilling group detection method based on graph convolutional network. First, we model the given dataset as a graph by treating users as nodes and co-rating relations between users as edges. By assigning edge weights and filtering normal user relations, we obtain the suspicious user relation graph. Second, we use principal component analysis to refine the rating features of users and obtain the user feature matrix. Third, we design a three-layer graph convolutional network model with a neighbor filtering mechanism and perform user classification by combining both structure and rating features of users. Finally, we detect shilling groups through identifying target items rated by the attackers according to the user classification results. Extensive experiments show that the classification accuracy and detection performance (F1-measure) of the proposed method can reach 98.92% and 99.92% on the Netflix dataset and 93.18% and 92.41% on the Amazon dataset.
Ready for emerging threats to recommender systems? A graph convolution-based generative shilling attack
2021, Information Sciences
To explore the robustness of recommender systems, researchers have proposed various shilling attack models and analyzed their adverse effects. Primitive attacks are highly feasible but less effective due to simplistic handcrafted rules, while upgraded attacks are more powerful but costly and difficult to deploy because they require more knowledge from recommendations. In this paper, we explore a novel shilling attack called Graph cOnvolution-based generative shilling ATtack (GOAT) to balance the attacks’ feasibility and effectiveness. GOAT adopts the primitive attacks’ paradigm that assigns items for fake users by sampling and the upgraded attacks’ paradigm that generates fake ratings by a deep learning-based model. It deploys a generative adversarial network (GAN) that learns the real rating distribution to generate fake ratings. Additionally, the generator combines a tailored graph convolution structure that leverages the correlations between co-rated items to smoothen the fake ratings and enhance their authenticity. The extensive experiments on two public datasets evaluate GOAT’s performance from multiple perspectives. Our study of the GOAT demonstrates technical feasibility for building a more powerful and intelligent attack model with a much-reduced cost, enables analysis the threat of such an attack and guides for investigating necessary prevention measures.
An effective and efficient fuzzy approach for managing natural noise in recommender systems
2021, Information Sciences
A high-quality recommender system (RS) can effectively alleviate information overload by producing recommendations. The quality of the recommender system not only depends on the recommendation algorithm but also on the quality of collected data. Since users are often affected by environmental and accidental factors during the rating process, natural noise is probably brought into the data of RS by non-malicious users, which will lead to deviations in prediction results. In this paper, we propose a scheme based on fuzzy theory to manage the natural noise in RS. We first classify the ratings into three fuzzy categories with variable boundaries. Then, the fuzzy profiles of users and items are built to detect the natural noise in ratings. Finally, once the noisy ratings are detected, we replace them with the rating threshold values according to the Maximum membership principle. The proposed scheme is tested in two benchmark datasets and experimental results verify that the scheme can significantly improve the recommendation quality and has higher efficiency than the schemes based on re-predication.
Root-cause analysis for time-series anomalies via spatiotemporal graphical modeling in distributed complex systems
2021, Knowledge-Based Systems
Citation Excerpt :
To exploit the large data availability in distributed CPSs and implement effective monitoring and decision making, data-driven approaches are being explored including (i) learning and inference in CPSs with multiple nominal modes, (ii) identifying anomaly and root cause without labeled data, and (iii) handling continuous and discrete data simultaneously [10,11]. Moreover, the data-driven methods for complex technological systems need to be flexible, scalable, robust, and adaptive [12,13]. Graphical modeling is increasingly being applied in complex CPSs for learning and inference [14] including directed acyclic graph, directed cyclic graph and directed pseudograph, where the later two are more difficult to be fully discovered [15] but are more realistic in applications [16].
Performance monitoring, anomaly detection, and root-cause analysis in complex cyber–physical systems (CPSs) are often highly intractable due to widely diverse operational modes, disparate data types, and complex fault propagation mechanisms. This paper presents a new data-driven framework for root-cause analysis, based on a spatiotemporal graphical modeling approach built on the concept of symbolic dynamics for discovering and representing causal interactions among sub-systems of complex CPSs. We formulate the root-cause analysis problem as a minimization problem via the proposed inference based metric and present two approximate approaches for root-cause analysis, namely the sequential state switching ( $S^{3}$ , based on free energy concept of a restricted Boltzmann machine, RBM) and artificial anomaly association ( $A^{3}$ , a classification framework using deep neural networks, DNN). Synthetic data from cases with failed pattern(s) and anomalous node(s) are simulated to validate the proposed approaches. Real dataset based on Tennessee Eastman process (TEP) is also used for comparison with other approaches. The results show that: (1) $S^{3}$ and $A^{3}$ approaches can obtain high accuracy in root-cause analysis under both pattern-based and node-based fault scenarios, in addition to successfully handling multiple nominal operating modes, (2) the proposed tool-chain is shown to be scalable while maintaining high accuracy, and (3) the proposed framework is robust and adaptive in different fault conditions and performs better in comparison with the state-of-the-art methods.
Trustworthy and profit: A new value-based neighbor selection method in recommender systems under shilling attacks
2019, Decision Support Systems
User-based collaborative filtering recommender systems are widely deployed by e-retailers to facilitate customer’ decision-making and enhance e-retailers' profitability. Despite the advantages these systems provide, their recommendation effectiveness is vulnerable to attacks from malicious users who inject biased ratings. Such attacks against recommender systems are called shilling attacks. Although several shilling attack detection mechanisms have been proposed in previous studies, their detection performance is limited in various attack conditions. Furthermore, few of these mechanisms consider the value-dimension associated with recommendations, which is crucial for e-retailers. This research proposes a novel approach called Value-based Neighbor Selection (VNS) to address the above limitations. The objective of this approach is to protect recommender systems from shilling attacks while improving e-retailers' profitability. It alleviates the aforementioned problems through strategically selecting neighbors whose preferences are then used to make recommendations. We have performed a series of empirical validations in various attack conditions to compare the performance of the proposed method and three benchmark methods, in terms of both recommendation accuracy and e-retailer profitability. The results show the advantages of the proposed method in balancing customer satisfaction and e-retailer profitability.

View all citing articles on Scopus

View full text

A novel item anomaly detection approach against shilling attacks in collaborative recommendation systems using the dynamic time interval segmentation technique

Abstract

Introduction

Section snippets

Shilling attack types

Preliminaries

Dynamic time interval segmentation and hypothesis test detection-based framework (SDF)

Experiment evaluation

Conclusions

Acknowledgments

Inform. Sci.

Expert Syst. Appl.

Expert Syst. Appl.

Int. J. Inform. Manage.

J. Network Comput. Appl.

Decis. Support Syst.

Int. J. Inform. Manage.

Inform. Sci.

Expert Syst. Appl.

Inform. Sci.

Inform. Sci.

Artif. Intell.

Inform. Sci.

Inform. Sci.

Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions

IEEE Trans. Knowl. Data Eng.

Stability of recommendation algorithms

ACM Trans. Inform. Syst. (TOIS)

A collaborative filtering framework for friends recommendation in social networks based on interaction intensity and adaptive user similarity

Social Network Anal. Min.

Application of belief propagation to trust and reputation management

Iterative trust and reputation management using belief propagation

IEEE Trans. Dependable Secure Comput.

Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system

World Wide Web

A hybrid system of pedagogical pattern recommendations based on singular value decomposition and variable data attributes

Inform. Process. Manage.

A hybrid recommendation algorithm adapted in e-learning environments

World Wide Web

Evaluating collaborative filtering recommendations inside large learning object repositories

Inform. Process. Manage.

Probability and Statistics for Engineering and the Sciences

Personalisation in web computing and informatics: theories, techniques, applications, and future research

Inform. Syst. Front.