1 Introduction

Recommendation system based on online social networks (OSNs) is an effective way to get/share insights among explicit or implicit relationships in OSNs. This raised two questions: (1) how peers’ opinions shall be treated, taken as they are, ignored, or if weighted, how? and (2) how a recommendation request shall be handled. In order to answer the first question, users’ rating behaviour need to be analysed and classified. The second question is related to the architecture of a recommendation system. Let us look at the following example.

Fig. 1.
figure 1

Peer-based relay scheme

Figure 1(a) is a social rating network (SRN). A pair of users with direct connection are peers. The user \(U_0\) has three peers [\(U_1\), \(U_2\), \(U_3\)], and he owns a list of items [A, D, E] with associated ratings [2, 3, 3.5]. \(U_0\) has a common item [E] with \(U_1\) and \(U_3\). An item like [E] is referred to as a Common Interested Item (CII). Peers with CII(s) are referred to as co-peers. When \(U_0\) wants to expand his item collection, it is natural for him to ask his peers who share the common interests with him to give recommendations. This request can pass on in the network. When aggregating the ratings of the same item from different peers, we have to deal with the issue on how different peers’ ratings shall be treated. In the previous work [1], all peers’s ratings are treated as they are, and inaccurate and inappropriate ratings are unavoidably included in the results. For example, U4 gives 5.0 to all the items, which is not worth taking seriously. This work provides a solution about how rating credibility is calculated when the number of ratings and the rating distribution are taken into consideration.

Now we address the second issue: the recommendation architecture. We adopt a peer relay scheme in [1] as a potential solution. As the relay of a request of an active userFootnote 1 through the SRN, a Co-Peer Rating Graph (CPRG) will be formed. Figure 1(b) presents the CPRG of \(U_0\) derived from Fig. 1(a).

In this relay process, two challenging issues are raised: (a) how the credibility of each user’s ratings is evaluated and aggregated along the relay path? and (b) how the rating credibility is incorporated into the peer-based relay recommendation? The contributions of this work are summarized as follows:

  • We propose a weighting method to evaluate the rating credibility of users with different rating behaviour. Users are classified into three groups based on the number of ratings they have given. Different rules are used to evaluate the credibilities of users in different groups.

  • The credibility values are incorporated to reduce the impact of recommended ratings from peers with low credibility.

  • We conduct experiments to evaluate the effectiveness of the proposed method with Flixster dataset [2]. The results show that our method can improve the robustness significantly.

2 Users’ Rating Behaviour

2.1 Dataset

For this work, we use the Flixster dataset from [2], presented in RecSys2010. Flixster allows users to rate movies with 10 available scores, from 0.5 to 5.0 with a step size of 0.5. The social relations in Flixster are undirected. Users with more than 757 ratings are not analysed based on the belief that users who rate too many movies in a fixed period of time are highly fake users. And the users who only have friend connections or ratings are filtered. The dataset includes 134,907 users, 31,973 items, 5,277,346 ratings, and 1,179,295 social relations.

2.2 Credibility of Users’ Ratings

75% of the users have rated no more than 19 items and 52.5% provide fewer than 5 ratings. Users who has limited experience with a very small number of ratings are the majority of the population. Four types of rating behaviour are shown in Fig. 2. Case a is a user with only one rating. Case b shows a user with seven ratings. Case c and Case d are users with a significant amount of ratings. Case c represents rating almost all items with a same score, while Case d stands for a user whose ratings are widely distributed. The credibility value is determined by two factors: (a) the total number of ratings a user has given, denoted as n and (b) the variation of a user’s ratings, which is represented by the number of different rating scores in a user’s ratings, denoted as m. We divide the users into three different groups and weight them with different rules.

Fig. 2.
figure 2

Users’ rating behaviour examples

Group 1: Users with a large number of ratings. “A large number” means that the frequency distribution of ratings can have a clear statistical meaning. A distribution of all the raw ratings is obtained and fitted into a Weibull curve (see Fig. 3(a)) with \(A=0.9764, k=5.0246, \lambda =3.9004\). An Absolute Error (AE) calculates the absolute difference between the user’s ratings and the fitted curve. Credibility values in this group will be used as a reference for adjusting the values of users in the other groups. A user with smaller AE has a higher credibility. Users with more than or equal to 20 ratings (\(n\ge 20\)) (33,192 users) are in this group.

Fig. 3.
figure 3

Distribution and adjustment of ratings

Due to different users having different rating standards, we adjust the ratings with a method described later in Sect. 3.2. Figure 3(b) gives examples of two users’ original and adjusted rating scores.

For a specific user, we denote the frequency of an adjusted rating score \(r'_u(t_i)\) as \(g_{r'_u(t_i)}\) and the total number of available rating scores as \(N_0\), and the AE is:

$$\begin{aligned} AE_u = \sum _{i=1}^{N_0}|g_{r'_u(t_i)}-f(r'_u(t_i))| \end{aligned}$$
(1)

\(AE_{min}\) is a minimum value and \(\alpha \) is a parameter, the credibility of a user is:

$$\begin{aligned} W_{u} = e^{-\alpha (AE_u - AE_{min})} \end{aligned}$$
(2)

Group 2: Users with a medium number of ratings. A user in this group has enough number of ratings to consider their diversity, but not big enough to analyse the distribution. The number of different rating scores m represents the rating diversity. A user with more different rating values has higher credibility. Users with \(5\le n <20\) (30,901 users) are in this group. The credibility is:

$$\begin{aligned} W_{u} = k \cdot e^{-\beta \frac{n}{m} (10-m)} \end{aligned}$$
(3)

\(\beta \) is a parameter, k scales the values to be consistent with that in Group 1.

Group 3: Users with a small number of ratings. The users in this group have a uniform credibility. Here, this uniform weight is taken to be the bottom 20% of the weights of the group with a large number of ratings.

Please note that the classification boundaries will be set according to the specific characteristics of the data. The proposed method can be applied to any recommendation algorithm.

3 Peer-Based Relay Recommendation

3.1 Recommendation Relay

Neighbourhood for Recommendation: A social rating network is specified as an undirected graph \(G=(V,E,I,R)\), where V stands for a set of nodes; E stands for a set of undirected edges, representing social relations or friends over V; and I stands for a set of items on which the users have expressed ratings and R is the associated ratings. Each user \(v \in V\) possesses three sets: items T(v), ratings R(v) and friends F(v). A CPRG of an active user \(v \in V\) is defined as a directed graph \(CPRG_{v}=(V_{v},E_{v},I_{v})\). The CIIs between a pair of peers (\(T(v) \cap T(u)\)) is denoted as \(C^{u}_{v}\). If \(C^{u}_{v}\ne \phi \), a directed edge \(e(v,u)\in E_{v}\) from v to u is added to the CPRG. A common-interests-based breadth first search of G from the active user v is performed to construct the CPRG as a tree. A relay depth measuring the distance between a peer to the active user is defined. Co-peers of the active users are 1-depth. By continuing the relay to k-depth co-peers, a k-depth CPRG is obtained. The relay will stop when a constraint is met, such as a certain relay depth has been reached.

Inbound/Outbound Co-Peers: For \(u\in CPRG\), a co-peer sending a request to u is called an inbound co-peer. The active user has no inbound co-peers and the leaf nodes has no outbound co-peers.

Recommendation Process: After constructing a CPRG for an active user, the recommendation is sent back to him through multiple depths in a relay style. At each relay depth, Rating Adjustment and Rating Aggregation are performed.

3.2 Rating Adjustment

For a pair of co-peers v and u with u making recommendations for v, we denote the set of “potential items” recommended to v by u as \(P^{u}_{v}\) (\(P^{u}_{v} = T(u)\setminus T(v)\)). \(\tilde{r}^{u}_{v}(t_{i})\) denotes the estimated rating that v will give to \(t_{i}\).

Let the mean rating of \(C^{u}_{v}\) given by v be \( \bar{r}^{u}_{v}\) and given by u be \(\bar{r}^{v}_{u}\). MIN is the minimum and MAX is the maximum of rating values. If \(\bar{r}^{u}_{v} \le \bar{r}^{v}_{u}\),

$$\begin{aligned} \tilde{r}^{u}_{v}(t_{i})= MIN + [r_{u}(t_{i})-MIN] \cdot \frac{\bar{r}^{u}_{v}-MIN+\gamma }{\bar{r}^{v}_{u}-MIN+\gamma } \quad \quad (\bar{r}^{u}_{v} \le \bar{r}^{v}_{u}; t_{i} \in P^{u}_{v}; \gamma >0) \end{aligned}$$
(4)

Here \(\gamma \) is set to be 0.5 to smooth the function. If \(\bar{r}^{u}_{v} > \bar{r}^{v}_{u}\),

$$\begin{aligned} \tilde{r}^{u}_{v}(t_{i})= MAX - [MAX-r_{u}(t_{i})] \cdot \frac{MAX-\bar{r}^{u}_{v}+\gamma }{MAX-\bar{r}^{v}_{u}+\gamma } \quad (\bar{r}^{u}_{v}> \bar{r}^{v}_{u}; t_{i} \in P^{u}_{v}; \gamma >0) \end{aligned}$$
(5)

3.3 Rating Aggregation

Denote u as an outbound co-peer of v and \(n_{u}(t_{i})\) as the recommended frequency for item \(t_{i}\) when the relay comes to u. If v is an active user or has outbound co-peers but without a rating for item \(t_{i}\), the aggregated rating is:

$$\begin{aligned} \tilde{r}_{v}(t_{i}) = \frac{\sum _{u \in Ocp_{i}(v)}n_{u}(t_{i}) \cdot W_{u}(t_{i}) \cdot \tilde{r}^{u}_{v}(t_{i})}{\sum _{u \in Ocp_{i}(v)}n_{u}(t_{i}) \cdot W_{u}(t_{i})} \quad \quad (t_{i} \in P^{u}_{v}) \end{aligned}$$
(6)

If v has outbound co-peers and has a rating \(r_{v}(t_{i})\) with a weight \(W_{v}(t_{i})\), the aggregated rating including \(r_{v}(t_{i})\) is:

$$\begin{aligned} \tilde{r}_{v}(t_{i}) = \frac{\sum _{u \in Ocp_{i}(v)}n_{u}(t_{i}) \cdot W_{u}(t_{i}) \cdot \tilde{r}^{u}_{v}(t_{i}) + W_{v}(t_{i}) \cdot r_{v}(t_{i})}{\sum _{u \in Ocp_{i}(v)}n_{u}(t_{i}) \cdot W_{u}(t_{i})+ W_{v}(t_{i})} \quad (t_{i} \in P^{u}_{v}) \end{aligned}$$
(7)

\(n_{u}(t_{i})\) is applied to update the weight of \(t_{i}\). If v has no rating for \(t_{i}\), we have:

$$\begin{aligned} \tilde{W}_{v}(t_{i}) = \frac{\sum _{u \in Ocp_{i}(v)}n_{u}(t_{i}) \cdot W_{u}(t_{i}) \cdot W_{u}(t_{i})}{\sum _{u \in Ocp_{i}(v)}n_{u}(t_{i}) \cdot W_{u}(t_{i})} \end{aligned}$$
(8)

If v has a rating \(r_{v}(t_{i})\) for \(t_{i}\) with weight \(W_{v}(t_{i})\), \(W_{v}(t_{i})\) is included in Eq. 8. As the relay continues, \(n(t_i)\) of an item \(t_{i}\) is updated by calculating the total number of recommended ratings for it. By assigning all the \(W_{u}(t_{i})\)s in Eqs. 6 and 7 a same value, it becomes a non-weighted rating aggregation approach.

4 Experiments and Analysis

4.1 Experiment Setup

A set of experiments is conducted against the dataset described in Sect. 2.1. The \(N_0\) in Eq. 1 is set to 10 (10 rating scores). The \(\alpha \) in Eq. 2 is set to 3. The credibility of a user in Group 1 has a value from 0.0016 to 1. In Eq. 3, the \(\beta \) is set to 0.1, and k is 0.8343 calculated accordingly. The weight of a user in Group 2 has a value from 0.0001 to 0.8343. A user in Group 3 has the credibility value 0.1. Table 1 shows the percentage of users in these groups. Our proposed weighting method is consistent to consider all users with different rating behaviour.

Table 1. Results of users’ weights

We conduct experiments with 671 users. For the non-weighted method, the recommended frequency not less than 3 is used as a constraint when selecting the top N recommendations. For the proposed weighted method, we add another constraint that the product of the credibility and the recommended frequency is no less than a threshold value 1.2. After applying the constraint, items are ranked according to their recommended rating values.

4.2 Results and Discussion

Robustness of the Proposed Recommendation Method. The TOP 10 recommendation lists for the user 119526 when the relay depth is 3 are given in Table 2. The rating information for items listed on only one list is provided on the right side. With the weighted method, ratings for items recommended by users with low credibility will have less contributions when integrating the rating scores; meanwhile the aggregated rating scores will have low credibility. As an example, item 3132 in Table 2 for user 119526 has peers [39476, 48833, 50891, 66751] with rating scores [3.5, 5.0, 5.0, 5.0]. Their credibility values are [0.26, 0, 0.01, 0.03]. With the non-weighted method, item 3132 has a score of 4.76 and is listed in TOP 10. With the weighted method, it has the score 4.3 and is excluded from the top list. This is due to the rating 3.5 has much higher credibility value comparing with other ratings. Item 15639 is on the top of the list with the weighted method but it is not on the TOP 10 list with the non-weighted method. Item 15639 has ratings [4.0, 5.0, 4.5] from users [17626, 44248, 47110]. Their credibility values are [0.1, 0.38, 0.75]. With the weighted method, the recommended rating is 4.8474. The ratings 4.5 and 5.0 have more contribution comparing with the rating 4.0 as they have different credibility values. The first example shows how the weighted method filters out the items with high ratings recommended by users with low credibility values. The second example shows how the weighted method pushes up the items with recommended ratings which include both low ratings recommended by users with low credibility values and high ratings recommended by users with high credibility values.

Table 2. TOP 10 items recommended to user 119526

\(S(r_{nw})\) and \(S(r_w)\) denote the set of ratings contributing to the TOP 10 recommendations with the non-weighted and the weighted method, respectively. On average, the ratio between the number of ratings in \(S(r_{nw})-S(r_w)\) and the total number of ratings in \(S(r_{nw})\) is 57%, 73%, and 82%, and the average credibility value of ratings in \(S(r_w)-S(r_{nw})\) is 14%, 22% and 17% higher than that in \(S(r_{nw}) - S(r_w)\) for 1, 2, 3 relay depth, respectively. The incorporation of the credibility of ratings in the peer-based recommendation can help to improve the robustness of the recommendation result.

5 Related Work

Most of the user behaviour investigations focus on detecting identified types of spammers or attackers [3, 4]. [5] evaluates users’ knowledge according to ratings and followers. Both the rating and rating confidence are calculated in [6]. The “rating confidence” is calculated as a combination of the trust value, rating similarity and social similarity between two users. These methods do not take users’ rating behaviour into account. The concept of Trust has been adopted to form a trusted neighbourhoods to provide recommendations. The proposed method in [7] creates a category-specific social trust neighbourhood. [8] differentiates the influential power of different recommending friends. The trust value is usually calculated based on evidences which may be unavailable in most social rating datasets. Different from existing work, we consider the whole set of users’ credibility according to their rating behaviour in the recommendation calculation.

6 Conclusion

In this work, we develop a peer-based relay recommendation approach by incorporating the credibility of users’ ratings. The credibility of users’ ratings are evaluated based on their rating behaviour. Recommendation is calculated by integrating the credibility values of ratings of social peers in a relay style. Experimental results show that the incorporating of users’ credibility helps to improve the robustness of the peer-based recommendation.