1 Introduction

With the increasing popularity of e-commerce, customers are faced with a large number of products and advertisements. Navigating web pages to find a desired product consumes valuable time. It is advantageous to develop an intelligent mechanism that can guide a customer to a desired product without wasting time navigating irrelevant web pages. One tool to support this development is a recommender system that explores the interests of past user activities and relies on decisions made by like-minded people (Teevan et al. 2010).

Recommender systems use a variety of algorithms to deduce customer preference. Filtering algorithms (collaborative, content-based and simple) have gained more attention than others. Simple filtering classifies customers into categories based on their static data, and then, a new product is recommended based on the customer category. Content-based filtering recommends products (items) to a customer based on a set of products for which she has expressed interest in the past. For instance, the contents of a customer’s shopping basket are analyzed to recommend similar or complementary products. Collaborative filtering is one of the most successful recommendation algorithms and is currently used in different applications, especially for recommending movies, music, and books (Sarwar et al. 2001; Kim et al. 2001). There are two main types of collaborative filtering techniques: user-based and item-based. The user-based technique works by comparing users’ similarities based on their pattern of rating to items, while the item-based technique works by comparing items’ similarities based on their pattern of ratings across users. Since the former technique is more popular, we focus specifically on this technique and the collaborative filtering (CF) refers to the user-based technique in the current study.

Collaborative filtering uses a database of customer ratings to identify items that a user is likely to prefer. The likelihood of preference to an item is estimated based on the recorded preferences of other customers to that item and the degree of similarity between these customers and the target customer (Ahn et al. 2010; Herlocker et al. 2004). By employing this technique, some users will have their ratings used by the recommender system more frequently than others. These users have more influence on the recommender system performance and are called the influential users (Rashid et al. 2008).

However, recommender systems are vulnerable to shilling, or profile injection, attacks. “Shill” refers to a customer who tries to encourage others to purchase a product by giving the impression that she is an enthusiastic independent customer. A similar scenario occurs when an attacker attempts to make the recommender system suggesting a product to all users to promote selling of the product. To do so, profile injection attackers deliberately insert attack profiles into a genuine user profile to change the recommendation results. There are several reports about such attacks on online recommender systems such as Amazon or eBay. Some examples of such reports are provided in (Lam and Riedl 2004). As a result, enhancing the robustness of recommender systems and devising solutions to defend against such attacks has become a serious issue in academic and industrial communities (Williams et al. 2007).

This paper introduces an attack detection method based on user influence in recommender systems. The goal of an attacker is to influence recommender system performance toward a specific product or item. This means that there is a relationship between the attacker and the influence of a user. In this paper we show that applying established attack detection methods to influential users instead of a whole user set can improve detection performance.

2 Social network on recommender system

In CF recommender systems, relationships among users are formed by what they rate in common and the way they rate items. A relationship among users is established when a group of users rate a common pool of items that represents their similarity of taste. A social network between the users of a CF recommender system can be inferred. Nodes in the social network represent users and an edge between two nodes expresses an implicit connection between the corresponding users (Rashid 2007). This connection is formed according to their non-zero similarity based on the Pearson correlation coefficient.

Figure 1 shows a scenario where a recommendation is computed for a target (user, item) pair (u t ,m) using a CF algorithm. All users who have rated item m are inspected and the Pearson correlation coefficient is computed between u t and each of them, so that the greatest number of k users most similar to u t (i.e., those with the highest correlation) can be selected as neighbors. Since among all users only opinions of these k neighbors are contributed to compute the recommendation, a direct link can be envisioned from u t to each of the neighbors. The originating node of a link is u t and the destination node of the same edge is one selected neighbor (Rashid and Karypis 2005). By expanding this scenario to all (user, item) pairs, a promising social network will be constructed.

Fig. 1
figure 1

Social network of user u t with her neighbours

2.1 Influential users in a social network

In the introduced social network, there exists a set of users who help a large number of other users receive recommendations. These influential users in a social network graph, such as shown in Fig. 1, are the destination nodes for many directed links.

A method is needed to detect influential users that identify the influence value of all users and select users with the highest values. A novel method to accomplish this is the loo-based influence introduced by Rashid and Karypis (2005). The method models the influence of a user in a recommender system by observing what happens when that user is absent from the system. The more people affected by the absence of the target user, the more influential that user is. To do this, the “leave one out” (loo) strategy is employed. The target user rating profile is removed from the recommender system data set and its effect on the rest of the users is analyzed. The influence of a user (e.g., u i ) as described by Rashid (2007) is computed as:

$$Influ_{{u_{i} }} = \sum\limits_{j}^{NumOfItems} {w_{aj} { \Pr }\,\left( {C_{aj} = 1\left| {\widetilde{u}_{l} } \right.} \right)} $$
(1)

where, \(w_{{a_{j} }}\) is the probability of a j being rated and is computed as the fraction of users who have rated this item. \(C_{{a_{j} }}\) is a binary random variable indicating a shift in recommendation on a j after u i is removed. If recommendations for a j are computed for a total of \(n_{{a_{j} }}\) users and \(n_{{a_{j} \overline{{\overline{u}_{i}^{{}} }} }}\) of them experience changes in the recommendation \(\Pr \left( {C_{{a_{j} }} = 1\left| {\widetilde{u}_{l} } \right.} \right)\) is computed as:

$$Pr\left( {C_{{a_{j} }} = 1\left| {\widetilde{u}_{l} } \right.} \right) = \frac{{n_{{a_{j} \overline{{\overline{u}_{i}^{{}} }} }} }}{{n_{{a_{j} }} }} $$
(2)

Details for this equation can be found in (Rashid 2007).

Analyzing and modeling the behavior of the influential users in recommender systems have been studied by Morid et al. (2011), and different kinds of behaviors which make a user influential were investigated. For instance, rating the items rated with good degree of agreement with other users’ rates on the items is an example of such behaviors. Also, users who rate items which are not seen frequently are influential. More details in this regard are available in (Morid et al. 2011).

3 Attacks in recommender systems

As discussed before, collaborative recommender systems generate recommendations for a given item based on the opinions of other users about that item. Users often express their opinions as ratings on a numerical scale and the similarity between two users is calculated based on their previous ratings (Liu and Lee 2010). It is assumed that users who had similar ratings in the past will have similar ratings in the future (Liu et al. 2009; Li et al. 2010).

A profile injection (shilling) attack is a set of fake profiles that include fake ratings for items. The attack profiles contain a set of randomly selected filler items that receive ratings as specified by the attack strategy, a set of selected items with particular characteristics determined by the attacker and the target item (O’Mahony et al. 2004). Filler size is the number of filler items in each profile and attack size is the number of attack profiles in the entire system.

There two main types of shilling attacks: push and nuke. Forcing a system to recommend a product to customers more often is to push that product. In nuke attacks, attackers make a competitive product less likely to be recommended by the system (it is not easy to make the product not being recommended at all). In other words, in push attacks the maximum rating is given to the target item, while in nuke attacks the minimum rating is assigned to the target item.

Shilling attacks on a recommender system employ different strategies, with “random” and “average” being the two main ones (Lam and Riedl 2004; Chirita et al. 2005). In a random attack, ratings are assigned to filler items randomly within the distribution of user ratings. In an average attack, an average rating for each item across ratings for all users is assigned to each filler item. The “bandwagon” attack is similar to a random attack, but uses the credit of a few of the most popular items in a particular domain (blockbuster movies, for example) (Burke et al. 2005). The profile of an attacker using this model will have a good probability of being similar to a large number of users, since the most popular items are those that many users have rated. These popular items, which are given high ratings in attack profiles, are the selected items of bandwagon attacks. This attack is only effective at producing recommender bias when used as a push attack. Finally, the “love/hate” attack strategy associates a low rating with the target item and high ratings to the filler items (Burke et al. 2006). While “love/hate” attack can also be used as a push attack by giving a high rating to the target item, it is proven to be ineffective in that manner (Mobasher et al. 2007).

4 Established attack detection research works

Published studies of attack detection in recommender systems have proposed different detection techniques. Sandvig et al. (2007) indicated that model-based algorithms such as probabilistic latent semantic analysis (PLSA) are more robust than memory-based algorithms against profile injection attacks. Chirita et al. (2005) used an algorithm based on statistical analysis to detect attack profiles. They used rating deviation from mean agreement (RDMA) to analyze user rating patterns. Bryan et al. (2008) used an anomalous detection approach that focuses on different network structures around false and genuine users. They proposed UnRAP, an unsupervised algorithm, and utilized the Hv-score metric to detect attackers.

Su et al. (2010) targeted group shilling attackers by examining similarities between group attackers and utilizing a spreading similarity algorithm to find them. Massa and Avesani (2007) used trust and rating matrices as inputs to build a recommender system based local and global trust metrics. Williams et al. (2007) employed C4.5 and KNN approaches to detect profile injection attacks. They showed that these algorithms, when combined with detection attributes, can effectively improve the robustness of a recommender system.

Burke et al. (2006) used the KNN classifier to classify genuine and attack profiles. They computed generic and model-specific attributes from user profiles to train the KNN classifier. Mobasher et al. (2007) used the same attributes to classify user profiles using KNN, C4.5 and SVM learning algorithms. They demonstrated that the KNN classifier is more efficient in identifying the attack profiles.

A number of studies have different approaches to improve recommender system robustness against attacks. These studies changed the recommender system process and its basic algorithms to enhance robustness. Trust-based recommender systems, such as those introduced by Weiwei et al. (2010), Jamali and Ester (2009) and Avesani et al. (2005) are examples of these enhanced systems. Other research analyzed the robustness of trust-based recommender systems against attacks such as average or love/hate (Zhang 2009, 2010). A number of studies targeted attack detection of recommender attackers in online social networks (Jamali and Ester 2010; Li and Lui 2011; Lang et al. 2010). However, all these trust-based researches have their own disadvantages (e.g., they need users to explicitly express their trust to other users, which is not usually feasible). Therefore, in this paper we attempted to improve the performance of the attack detection methods, which are based on the conventional algorithms of recommender systems. The approach proposed by Mobasher et al. (2007) is one of the best attack detection methods introduced so far. The present research selected this approach as the base method to evaluate the proposed attack detection method.

5 Methods and materials

5.1 Data

In this research, the public data of the movie recommendation website MovieLens.org was used. This dataset consists of 100,000 ratings on 1,682 movies by 943 users. All ratings are integer values between one and five where one is the lowest rating (disliked) and five is the highest (most liked). The dataset also contains additional information about users (occupation, age, marriage, etc.) and movies (genre, production date, director, etc.). This data has also been used in most research on recommender system attack detection (Adomavicius and Zhang 2010).

5.2 K-nearest neighbor

The KNN method is an example of an instance-based, or lazy, learner. An instance-based learner stores training samples (instances). When a test sample is presented, it classifies the data based on similarities between the test sample and previously-stored instances. An eager learner, such as backpropagation or C4.5, classifies training samples as soon as they are received (Yu et al. 2011; Herlocker et al. 1999). When a test sample is presented to the KNN classifier, it searches for the most similar samples in its storage space and designates these samples as neighbors for the test sample (Kolbe et al. 2010; Weinberger and Saul 2009). The similarity measure is calculated based on the Euclidean distance metric. The value for k is determined experimentally, beginning with k = 1 and estimating the classifier error. The value for k is then incremented to include one more neighbor. This process is repeated several times to find the value of k having minimum error.

5.3 Classification performance criteria

Precision and recall are two of the best criteria to measure the performance of classifiers. These measures enable the comparison of any two classifiers having the same target output. They are defined as follows:

$$precision = \frac{true\,positives}{true\,positives + fales\,positives} $$
(3)
$$recall = \frac{true\,positive}{true\,positive + fales\,positive} $$
(4)

where true positives is the number of attack profiles classified correctly, false positives is the number of attack profiles classified incorrectly and false negatives is the number of genuine profiles classified incorrectly.

6 Proposed method

As discussed, influential users in recommender systems are those who can change recommender system performance by their absence. The goal of an attacker is to alter recommender system performance to produce a desired result. The profile of the successful attacker will then be used many times by the recommender system, creating an influential user.

There is a relationship between the attacking power and the influence of a user in recommender systems (Morid et al. 2011). More specifically, we believe that a successful attacker is an influential user, while the reverse is not necessarily true. Applying established attack detection methods to influential users instead of the whole user set can improve their detection performance.

To do this, the influence value for all users is computed according to the formulas described in subsection 2.1. Then, users with influence values of more than a specific threshold are considered influential and are extracted from the dataset. Finally, since the KNN-based attack detection method introduced by Mobasher et al. (2007) has shown to be the most effective approach so far, we applied their method only to the influential users.

The Mobasher et al. (2007) approach uses 14 attributes of all users as the inputs for a KNN algorithm that classifies the user (attacker or authorized) as its output. These attributes are either generic or attack type-specific. Generic attributes are basic statistical features that capture characteristics that make an attacker profile look different from that of a genuine user. Attack type-specific attributes detect profile characteristics associated with a known attack type. Mobasher et al. (2007) compared their method with other model-based detection methods in several papers and concluded that it is the most effective.

6.1 Generic attributes

It is expected that the overall statistical image of an attack profile will differ significantly from that of an authentic profile. This difference comes from two sources: the rating given the target item and the distribution of ratings among the filler items. Research by Chirita et al. (2005), Lam and Reidl (2004) and Mobasher et al. (2006) assert that generated attack profiles will deviate from rating patterns seen for authentic users in an attack. This deviation can manifest in many ways, including as an abnormal deviation from the system average rating or an unusual number of ratings in a profile. Therefore, an attribute that captures these anomalies is likely to be informative in detecting attack profiles. A number of generic attributes are used to capture deviations for the KNN detection classifier data set. These attributes are:

6.1.1 Rating deviation from mean agreement (RDMA)

This identifies attackers by examining the profile’s average deviation per item weighted by the inverse of the number of ratings for that item.

6.1.2 Weighted degree of agreement (WDA)

This is the sum of the differences of the profile’s ratings from the item’s average rating divided by the item’s rating frequency and is not weighted by the number of ratings by the user. This attribute is calculated as:

$$WDA_{u} = \sum\limits_{i = 0}^{{n_{u} }} {\frac{{\left| {r_{u,i} - \overline{{r_{l} }} } \right|}}{{R_{i} }}} $$
(5)

where r u,i is the rating of user u for item i, and n u is the total number of ratings provided by this user. Also, R i is the number of ratings provided for item i by all users, and \(\overline{{r_{l} }}\) is the average of these ratings.

6.1.3 Weighted deviation from mean agreement (WDMA)

This identifies anomalies and considers a high weight for rating deviations of sparse items. This attribute can be computed as:

$$WDMA_{u} = \frac{{\sum\limits_{i = 0}^{{n_{u} }} {\frac{{\left| {r_{u,i} - \overline{{r_{l} }} } \right|}}{{R_{i}^{2} }}} }}{{n_{u} }} $$
(6)

6.1.4 Degree of similarity with top neighbors (DegSim)

This captures the average similarity of a profile’s top k nearest neighbors and is calculated as:

$$DegSim_{u}= \frac{{\sum\nolimits_{{\nu\in neighbour(u)}} {W_{{u,v}} } }}{k} $$
(7)

where W u,v is the amount of similarity between user u and user v.

6.1.5 Length variance (LengthVar)

This shows how much the length of a given profile varies from the average length of all profiles in the database.

Details of these attributes and the justification for using them can be found in Mobasher et al. (2007).

6.2 Type-specific attributes

Generic attributes are insufficient to distinguish a true attack profile from an eccentric but authentic profile (Mobasher et al. 2006). This is especially true when the profiles are small and contain few filler items. Since such attacks can still be successful, the generic attributes should be used in conjunction with some that are designed specifically to match the characteristics of the attack types (Williams et al. 2007).

As discussed, attacks can be characterized based on the way their partitions i t (the target item), I S (small group of selected items that have been selected because of their association with the target item or a targeted segment of users), and I F (filler items) are constructed. Type-specific attributes recognize the unique signature of a certain attack type. These attributes are based on partitioning each profile to maximize the profile’s similarity to one generated by a known attack type.

To model this partitioning, each profile is split into two sets. The set P u,T contains all items in the profile that are hypothesized as targets of the attack, and the set P u,F containing all other ratings in the profile. The intention is for P u,T to approximate {i t } ∪ I S and P u,F to approximate I F .

According to the method introduced by Mobasher et al. (2007), there are several useful measures to detect the specific signatures of attacks. These type-specific attributes recognize the characteristics of the filler partition to indicate whether or not the profile was created by an authentic user. These measures are:

6.2.1 Filler mean variance (FMV)

This measures the variance of a user’s ratings in the hypothesized filler partition from the average rating for each of the items and can be calculated as:

$$FMV_{u,m} = \sum\limits_{{i \in P_{{u,F_{m} }} }} {\frac{{\left( {r_{u,i} - \overline{{r_{l} }} } \right)}}{{\left| {P_{{_{{u,F_{m} }} }} } \right|}}}^{2} $$
(8)

where P u,Fm is the partition of the profile of user u hypothesized to be the set of filler items F by model m; r u,i is the rating given by user u to item I; \(\overline{{r_{l} }}\) is the average rating of item i given by all users; |P u,Fm | is the number of ratings in the hypothesized filler partition of profile P u by model m.

6.2.2 Filler mean difference (FMD)

This computes the average of the absolute value of the difference between the user rating and the average rating for the hypothesized filler items (unlike FMV, which uses the squared value).

6.2.3 Filler average correlation (FAC)

This calculates the correlation between the filler ratings in the profile and the average rating for each item.

6.2.4 Average attack model

An average type attacker divides the profile into two partitions: the target item given an extreme rating, and the filler items given other ratings. In order to detect such an attacker, the model needs to select an item to be the target and then all other rated items become fillers. For this attack type, the partitioning is selected such that the ratings in the filler partition minimize the FMV.

6.2.5 Random attack model

This model divides the ratings into partitions similar to average attack model partitions with the target partition being a single rating. The partitioning is done by choosing the filler items such that the ratings placed in the filler partition minimize the FAC.

6.2.6 Bandwagon attack model

In this model, all ratings in the profile that are given the profile’s maximum rating are placed in the target partition, and all other ratings become the filler items. Then, the filler mean target difference (FMTD) attribute, the difference between the average of the ratings in the target and the average of the ratings in the filler partition, can be calculated as:

$$FMTD_{u} = \left| {\left( {\frac{{\sum\nolimits_{{i \in P_{u,T} }} {r_{u,i} } }}{{\left| {P_{u,T} } \right|}}} \right) - \left( {\frac{{\sum\nolimits_{{k \in P_{u,T} }} {r_{u,k} } }}{{\left| {P_{u,F} } \right|}}} \right)} \right| $$
(9)

where r u,i is the rating given by user u to item i. The overall average FMTD among all users is then subtracted from FMTD u as a normalizing factor.

6.2.7 Target focus model

A single profile cannot actually influence the recommender system and the density of target items across profiles should also be examined. For the target model focus (TMF) attribute, the degree to which the partitioning of a given profile focuses on items common to other attack partitions is calculated to measure a consensus of suspicion of each user profile. To calculate TMF for a profile, first F i is defined as the degree of focus on a given item i. Then, the item that has the highest focus value is selected from among the profile’s target partition.

Details about type-specific attributes and the motivation behind them can be found in Mobasher et al. (2007).

7 Experimental studies

In our experiments, the proposed method applied a KNN supervised classification on influential user profiles. It showed that the proposed detection method was effective at detecting and reducing the impact of different attack models.

7.1 Experimental setup

Train and test datasets were generated according to the methodology described by Mobasher et al. (2007). As mentioned before, we used the Movie-Lens 100 K dataset described in subsection 5.1. To minimize over-training, the dataset was split into two equal-sized partitions, one for training and the other for testing. Also, a mix of average, random and bandwagon push attacks and average, random, and love/hate nuke attacks in filler sizes ranging from 3 to 100 % and attack sizes between .5 and 1 % were inserted to the training data. Since there are approximately 1,000 users in the MoveLens dataset, the attack size of 1 % corresponds to 10 attack profiles added to the dataset. The reason of using this training strategy is that how an attacker will attack our recommender system (e.g., in what filler size or attack size) is unknown. Therefore, different attack profiles in diverse ranges of filler sizes and attack sizes were inserted to the training data, so that, the system will be prepared for any sort of attacks. Moreover, it is proven that attacking in real world situations with more than 1 % attack size is almost impossible (e.g., on a website like Amazon with millions of users), and even if it occurs, it cannot be effective. The reason is that such attack profiles (those with more than 1 % attack size) will not have the chance of being similar to most of the users, who cannot watch and rate such a large number of movies (Burke et al. 2005).

Like Mobasher et al. (2007), the proposed method classifiers used a total of 14 detection attributes: 5 generic (WDMA, RDMA, WDA, LengthVar and DegSim with k = 450); 6 average attack model (filler mean variance, filler mean difference and profile variance; computed for both push and nuke); 2 bandwagon attack model (FMTD; computed for both push and nuke); and 1 target detection model (TMF). Class labels and detection attributes were produced for the whole dataset.

As our classifier, a KNN where k = 9 was used, the same value used in Mobasher et al. (2007), which makes the results of the proposed method comparable to the results reported by them. It should be mentioned that all experiments in this paper are conducted using both the proposed method and the one introduced by Mobasher et al. (2007).

Influence value (IV) of 10 was used as the relevant threshold for indicating influential users. Setting this value is a dataset dependent issue and needs some engineering judgments. To explain how we came up with IV = 10, Table 1 can be useful. The table shows the number of authentic users in each influence range in our dataset. Also, the least information value of attackers in the training data (under different attack scenarios) was 15.

Table 1 Number of authentic users in each influence range

As seen, more than half of the users have a zero influence value, while it is lower than 5 for about 70 percent of them. Also, number of users having an influence value higher than 10 is much smaller than other ranges as the concept of influential users becomes more pronounced. Therefore, conducting several experiments, value of 10 was chosen as the influence threshold. Choosing a threshold bigger than 10 (e.g., 15 which is the least IV for an attacker in the training dataset) can improve the performance of the proposed method, since the dataset becomes more limited to high influential authenticated users and attackers. However, in the testing dataset, attackers may not have the same behavior as in the training and they may have smaller influence values. Therefore, choosing a threshold bigger than 10 would have not been a reasonable decision.

Finally, for each experiment, attack data was inserted into the second half of the data, which was then executed with the classifier that had been trained on the augmented first half of the data. The inserted attack data was created with the same strategy as used in training. A single training data set was used in all the detection experiments. Moreover, 50 movies were selected as target items and the same approach as in Mobasher et al. (2007) was used for selecting them. These movies were selected randomly and represent a wide range of average ratings and number of ratings. Table 2 shows the statistics of the 50 target movies. For instance, 7 of the movies have an average rating between 3 and 4, and they have been rated by more than 50 and <151 users. Each of these target movies was attacked individually and the reported results in our experiments represent the averages over the combinations of them.

Table 2 Statistics of the 50 target movies

To measure classification performance, standard measurements of precision and recall were used (subsection 5.3). In all experimental results, we report filler size as the proportion of number of filler items in attack profiles to the total number of items in the whole dataset.

7.2 Results

This section represents the experimental results of our proposed detection method in comparison with the method introduced by Mobasher et al. (2007), across diverse filler sizes, for 1 % attack size and under different attack scenarios.

Figure 2 shows the performance of the proposed attack detection method in comparison with Mobasher et al. (2007) for an average attack.

Fig. 2
figure 2

a Precision detection performance for average attack b recall detection performance for average attack

The proposed method performed better than that of Mobasher et al. (2007), especially in precision and for the filler sizes lower than 20 %. However, the precision difference decreased above the 20 % filler size and, when the filler size was 100 %, the difference fell to about 15 %. Both methods performed well on recall and all attack profiles were detected successfully for filler sizes over 10 %. However, for filler sizes <10 %, the proposed method performed better. Since the results for both the proposed method and Mobasher et al. (2007) had similar performances for push and nuke models, the results for the both models are not shown.

By evaluating our method against the introduced method in (Mobasher et al. 2007) for random attack, similar results as the average attack were achieved and shown in Fig. 3. However, the difference between two methods is not as significant as our results from the average attack.

Fig. 3
figure 3

a Precision detection performance for random attack b recall detection performance for random attack

The proposed method was also compared to Mobasher et al. (2007) for bandwagon and love/hate attacks. As mentioned before, the bandwagon attack is effective only against push models, while the love/hate attack is effective against a nuke model. Figures 4, 5 show the proposed method performed better than Mobasher et al.

Fig. 4
figure 4

a Precision detection performance for bandwagon attack b recall detection performance for bandwagon attack

Fig. 5
figure 5

a Precision detection performance for love/hate attack b recall detection performance for love/hate attack

As seen, the proposed method performed better than Mobasher et al. (2007) and was significantly better for precision and filler sizes less than 40 %. However, for recall, both methods performed well for filler sizes >10 %. For filler sizes <10 %, the proposed method performed better performance for recall. The proposed method also performed better on love/hate and bandwagon attacks than on average and random attacks.

8 Discussion

The better results produced by the proposed method originated from limiting the suspicious user set to suspicious users who are also influential, which the assumption upon which the method is based. Data mining analysis was conducted on a limited dataset with limited diversity for a user behavior pattern. This allowed the modeling of a simpler dataset with better performance.

The advantage of the proposed method is evident for precision compared to recall. In other words, Mobasher et al. (2007) had good accuracy detecting attack profiles, but was deficient in distinguishing between authentic and attacker profiles. This stems from the diversity of behavior patterns among authentic users that makes them hard to distinguish from attackers. By limiting the population of authentic users, the proposed method limited diversity and reduced the task of the data mining model to distinguishing authentic influential users from attackers. This limitation produced better performance in the proposed method.

For lower filler sizes, the advantage the proposed method was evident because attack detection is simpler for higher filler sizes since the attacker must rate many items, making him easier to detect. Therefore, other detection methods also show good performance for higher filler sizes. In the real world context, rating large numbers of items is difficult for an attacker. Thus, the focus of all detection methods is on attacks with lower filler sizes, which are more feasible to implement but harder to detect. This makes the unique approach of the proposed method important for attack detection.

9 Conclusion and future works

Profile injection attacks are a serious threat to the robustness and trustworthiness of collaborative recommender systems. An essential component of a robust recommender system is a mechanism that detects profiles originating from attacks and allows them to be quarantined and their impact reduced.

This paper demonstrates an attack detection method based on user influence in recommender systems. Influential users from a recommender system user set were identified and used to extract detection features from each user profile. A KNN data mining model that uses extracted features as input was applied to the selected user set to identify user profiles that were from attackers as its output.

The results showed that the proposed method detected attacks accurately and can improve the stability of a recommender for most attack scenarios. Precision for attacks inserting a small number of attack profiles (filler sizes <20 %) was significantly better than for established detection methods. However, the difference for precision between the proposed and established methods is smaller for filler sizes >40 %. Both methods performed well for recall for filler sizes over 10 % for all attack profiles. However, for filler sizes lower than 10 %, the proposed method performed better.

While the proposed method showed high accuracy in detecting all attack types, it performed better for the love/hate and bandwagon attacks in comparison with other attack scenarios. Also, the proposed method performed similarly for both nuke and push models.

Although this paper evaluated the proposed attack detection method for user-based collaborative filtering systems, it appears to perform well on item-based collaborative filtering systems as well. However, more studies are necessary for the proposed method to be applied on other detection approaches, such as trust-based recommender systems in which the definition of influential users is different.

Enhancing the proposed model is also a topic of future studies. In this study, the inputs of the KNN data mining model used the features of the established researches. However, we believe by extracting and adding other features related to the concept of influence the results will be improved.