Keywords

1 Introduction

The rapid development of online social network makes it an important platform for services, which leads to service-oriented social networks like Epinion.com. Users expect to obtain high-quality service through such platforms, and they usually make their choices according to the credits of service providers. Thus, trustworthiness has been considered as one of the most important factors for recommendations on high-quality service providers.

Some trust information might be gathered explicitly through targeted user survey feedback or implicitly through user behavioral analytics. However, a user can get in touch with only a few other users, so the trust information that can be collected is insufficient, which leads to a challenging problem of trust prediction to infer the unknown trust between any two users.

More specifically, in a service-oriented social network trust model, each user is represented by a node in a graph, while the trust relationship and the amount of trust is indicated by an weighted edge between two users. Since most of the pairwise trust ratings and reliability factors of the users are unknown, if we consider a sparse \(n \times n\) trust rating matrix T for users, the objective of trust prediction problem is to predict the unknown values of matrix using some evidence derived from user behaviors or other metrics.

Trust prediction problem can be studied by several strategies, such as trust propagation [2, 8, 11, 1517], matrix factorization [23, 27, 28], and K-NN search [5, 25], etc. However, the existing methods have many drawbacks: (1) Majority of them ignore the possible complementarity among mainstream strategies, yet it is helpful to solve trust prediction problem. (2) They suffer from low query efficiency in terms of time. Although some approaches speed up by preprocessing, they bear the consequences of high time cost in preprocessing. (3) Their prediction accuracy still have room for improvement.

To resolve the above drawbacks, we propose a novel efficient method named iSim: an integrated similarity based collaborative filtering approach for trust prediction with the help of user similarity. iSim integrates three similarity factors, say, vector space similarity, matrix factorization, and propagated trust, among which matrix factorization and propagated trust are first leveraged in the study of similarity. Next, we involve several methods like adding inverted index to reduce the time complexity of iSim, and compute its theoretical time bound precisely. Finally, we compare iSim with two state-of-the-art trust inference approaches: PROP [7] and SCMF [28] by extensive experiments with real-world dataset. Experiments show that iSim improves the predicting accuracy by \(11.9\,\%\)\(30.5\,\%\), and its execution time is only \(6.8\,\%\) of SCMF and \(3.5\,\%\) of PROP.

In all, our contribution is summarized as follows: (1) We propose a novel integrating method based on collaborative filtering for trust prediction. (2) We define a hybrid similarity involving matrix factorization and propagated trust, which might be generalized to other related fields. (3) We have conducted extensive experiments on a trace dataset, which demonstrate that our approach achieves much higher accuracy and greatly faster speed than state-of-the-art approaches.

2 Related Works

Existing works for trust prediction in online social networks can be categorized into several classes: propagation based trust prediction, matrix factorization and matrix recovery trust, K-NN based methods, as well as other ad hoc methods.

Propagation Based Trust Prediction. Considering all paths from the source user to the target user, Hang et al. [8] predicted trust by calculating propagated trust through concatenation, aggregation, and selection operations. The multiplication propagating strategy has been exploited in [15, 24], which predicts trust between two users based on the product of trust values on all links of a path. Moveover, Li et al. [15] discussed the feasibility of this strategy based on Baysian Inference. On the other hand, the average propagating strategy has been utilized in [17] for trust propagation, which predicts trustworthiness based on a weighted average of trust values on all links of a path between two users. Furthermore, min/max propagating strategy predicts trust between two users based on the minimal or maximal trust values on all links of a path, which is leveraged in [2, 22], and  [22] improves the efficiency of trust propagation by preprocessing. Instead of considering all possible paths, the trust prediction problem is transformed into the restricted k-optimal path problem [11, 16, 17], by establishing constraint on propagated trust path. Besides considering propagated trust on paths, path variance and path length are introduced to adjust the propagated trust [7].

Matrix Factorization and Matrix Recovery Trust. Matrix factorization can explain the trust ratings by featuring both trustors and trustees on several latent factors inferred from rating values. A trust prediction model was presented based on rating matrix factorization, which integrates both propagated trust and other related information to improve the result [23]. Another work proposed similarity regularization term to constrain matrix factorization [28]. Zheng et al. utilized matrix factorization in trust prediction in different context [27]. Huang et al. introduced joint factorization with the integration of user-item rating matrix and trust rating matrix [10]. Additionally, Matrix recovery was also exploited to predict unknown value in trust matrix [9].

K -NN Based Methods. According to the similarity of user ratings to films or commodities, the trustworthiness of a user to others can be represented by the weighted average of trust propensity of other K users who are the most similar to this user [5]. Another work integrated the authority similarity, interest similarity, and rating similarity to calculate the K-NN for trust prediction [25]. The main difference between iSim and the existing K-NN approaches is that iSim does not need additional information other than the trust ratings for trust prediction, while the others require much extra information.

Other Approaches. Trust sub-network extraction can improve the accuracy of trust prediction, and it has been studied with binary ant colony algorithm [26]. Trust balance theory can also be employed for trust prediction and adjustment [18]. The network flow from the source to the target can be transformed into the trust between them if we convert trust network to flow network [12]. Orman developed a Bayesian formulation, and a Bayesian Network analysis was employed to compute the unknown trust values [20].

3 Problem Statement

In most cases, trust rating value is defined as the belief one user has in the other. These values can be organized to a square matrix T with size n, which we call an n-user Trust Rating Matrix. We use \(T_{i,j}\) to indicate the trust rating value a user i gives to user j. We call the user who gives trust ratings to others as trustor, and the user who receives trust ratings as trustee. To simplify our calculation, we assume that one user gives trust rating value to at least one other user. We denote our predicted trust rating value matrix as \(\widetilde{T}\), whose value is generated by our prediction approach iSim. Definition 1 presents the basic problem statement.

Definition 1

(Trust Prediction Problem). Given a trust rating matrix T, whose non-zero elements denote the trust rating values from the row users to the column users. Predict the most proper values for zero elements and fill T as \(\widetilde{T}\).

Fig. 1.
figure 1

An example scenario in social network

4 iSim: A Novel Trust Prediction Approach

Now we formally introduce our integrated similarity based trust prediction approach iSim, including its motivation, configuration, and detailed description.

Motivation: In a large-scale social network, some users may share, to some degree, similar kind of standards or preference when giving trust rating values. For example, as showed in Fig. 1, suppose there are 4 trusters \(i_1, i_2, i_3, i_4\) and many trustees \(j_k\)(\(k=1,...,n\)) in the network. Truster \(i_1\), \(i_2\) and \(i_3\) have similar behaviors on putting trust ratings on trustee \(j_k\) while \(i_4\) does not, that is, \(i_1\), \(i_2\), \(i_3\) share high behavior similarity with each other and low with \(i_4\). Different from \(i_3\), \(i_2\) and \(i_4\) is given a trust rating directly by \(i_1\), which suggests that \(i_1\) is closely connected with \(i_2\) and \(i_4\). Comprehensively considering both factors of behavior similarity and propagation, we can finally speculate that \(i_1\) and \(i_2\) share higher iSim similarity when rating others, which will greatly assist the prediction of the unknown values from \(i_1\) to \(j_2\) in T.

Work Flow: Figure 2 shows the flow path of iSim, which contains three main parts: iSim-VSS, iSim-LFS and iSim-PT. We first calculate the three parts independently based on the input trust rating matrix T, and then integrate them appropriately to get the similarity values between users. Finally, we generate an output predicted trust rating matrix \(\widetilde{T}\) from these similarities based on K-NN collaborating filtering method.

In detail, for iSim-VSS, we use row vectors in T to describe a user’s preference and find similarity between these preferences. Because this similarity is extracted directly from T, we call it explicit similarity. For iSim-LFS, we describe users’ trust rating styles by the row vectors of specific matrix factorized from T, which reflect a promising similarity. Since this similarity is lurked in T, we call it implicit similarity. For iSim-PT, we depict the trust relationship between users via the weight-sums along the paths linking the two users in the social network generated by T. We then integrate the above three parts and adopt K-NN method to pick up top-K users who share the highest similarities with a particular user i. Finally, we will comprehensively consider the rating values these K users give to user j (if any) to predict the trust value user i will give to user j, which leads to our final predicted trust rating matrix \(\widetilde{T}\).

Fig. 2.
figure 2

The framework of iSim

iSim-VSS: The similarity between two users could be described by the rating values they give to their common trustees [16]. This kind of similarity, known as vector space similarity (VSS), shows how much the two users are in common on their rating preferences. Correspondingly, \(VSS_{i,j}\), which is the VSS between user i and user j, can be defined as Eq. (1).

$$\begin{aligned} VSS_{i,j}=\frac{\sum \limits _{k \in From_i \cap From_j}T_{i,k} \cdot T_{j,k}}{\sqrt{\sum \limits _{k \in From_i \cap From_j}T_{i,k}^2}\cdot {\sqrt{\sum \limits _{k \in From_i \cap From_j}T_{j,k}^2}}} \end{aligned}$$
(1)

In Eq. (1), \(From_i\) is the set of users that i trusts, and k is the user trusted by both i and j. Note that whenever we use terms like \(VSS_{i,j}\) to represent the corresponding value related to users i and j, we will use VSS to represent a matrix storing such type of values by default. However, computing Eq. (1) is extremely time consuming, so we introduce inverted index to reduce the time complexity [14]. Algorithm 1 gives out the details. Here \(To_k\) is the set of users who give a trust rating value to user k, and we also call it the inverted index of i. Through inverted index, when calculating the similarity between user i and user j, we can avoid enumerating the users to whom user i or j does not give trust rating value, which can bring significant improvement on time efficiency.

figure a

iSim-LFS: Sparsity problem has a fatal influence on the calculation of VSS. As a result, VSS alone cannot satisfy our demand when determining the similarity between two users. Therefore, we propose latent factor similarity (LFS) to overcome the sparsity problem, which describes a user’s rating style. Actually, when users give trust rating values, they consider several factors. For example if i is familiar with j, it is possible that i puts a higher trust rating value on j. This is called familiarity factor. Another example is that some group may have a tendency to give out average higher trust values while other groups may not, which is named as rating tendency factor. Such factors are called “latent factors” and we denote \(\ell \) as the number of latent factors. Thus, user i’s rating style can be represented as a vector of size \(\ell \), named as trustor-specific vector \(\varvec{u_i}\), and these \(\varvec{u_i}\)’s could form an \(n \times \ell \) trustor-specific matrix U. Similar to iSim-VSS, we define latent factor similarity between users as the cosine value of their trust-specific vectors.

We can factorize our trust rating matrix T into trustor-specific matrix U and trustee-specific matrix V [28] according to matrix factorization method. To cooperate with \(\varvec{u_i}\), we denote the \(j^{th}\) row vector in V as \(\varvec{v_j}\), which makes the inner product of \(\varvec{u_i}\) and \(\varvec{v_j}\) an approximation of trust rating value \(T_{i,j}\). In order to get U and V, we need to minimize the sum-of-squared-error objective function with quadratic regularization terms [19]. In this case, we introduce an indicator matrix I to indicate whether there is a value in the corresponding position in T. Additionally, in order to avoid overfitting, two regularization terms from zero-mean spherical Gaussian priors are taken into consideration. Thus, we formulate our factorization function as Eq. (2):

$$\begin{aligned} \mathcal {L} (T,U,V)=\min _{U,V} \frac{1}{2} \sum \limits _{i=1}^n \sum \limits _{j=1}^n I_{i,j}\cdot ||T_{i,j}-\varvec{u_i}\cdot \varvec{v_j'}||_F^2 +\frac{\lambda _1}{2}||U||_F^2+\frac{\lambda _2}{2}||V||_F^2 \end{aligned}$$
(2)

where \(||.||_F\) refers to Frobenius norm while \(\lambda _1\) and \(\lambda _2\) are coefficients controlling the extent of Gaussian priors. \(I_{i,j}\) equals to 1 iff. \(T_{i,j}\) exists, otherwise it is 0. We use gradient descent method to minimize the value of \(\mathcal {L}(T,U,V)\) [19] by Eqs. (3) and (4). Algorithm 2 demonstrates the details to compute iSim-LFS.

$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial U} = -\sum \limits _{i=1}^n \sum \limits _{j=1}^n I_{i,j}\cdot ||T_{i,j}-\varvec{u_i}\cdot \varvec{v_j'}||_F\cdot \varvec{v_j} +\sum \limits _{i=1}^n \lambda _1 \varvec{u_i} \end{aligned}$$
(3)
$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial V} = -\sum \limits _{i=1}^n \sum \limits _{j=1}^n I_{i,j}\cdot ||T_{i,j}-\varvec{u_i}\cdot \varvec{v_j'}||_F\cdot \varvec{u_i} +\sum \limits _{j=1}^n \lambda _2 \varvec{v_j} \end{aligned}$$
(4)
figure b

iSim-PT: iSim-VSS and iSim-LFS are inspired by collaborative filtering. To make iSim specifically crafted for trust predicting problem, we consider trust propagation. It has been proved that in a real social network, users tend to trust a stranger if this person is a friend’s friend [6], so propagated trust strategy predicts the unknown trust by calculating the propagated trust [2, 11, 22]. iSim-PT sets the base value of similarity of users based on propagated trust, because there is a correlation between preference similarity and trust [1, 4].

Here, we use the min-max strategy [22] to calculate propagated trust matrix PT, which consists of propagated trust between any two users. Assume there are m paths from user i to user j. In one particular path, we pick up the maximal weight of all links in the path as the propagated trust of the path,and among these m paths, we pick up the path with minimal propagated trust, and that will be the value of \(PT_{i,j}\).

However, there is a problem that only a few paths exist between two users in the directed graph. Thus, we transform the directed graph into an undirected one. For two users i and j, we set the weight of the undirected edge as the average of two weights of directed edge between them. It can be proved by the Cut Property [3] that \(PT_{i,j}\) has the same value in the original undirected graph as that in the minimal spanning tree of the graph. Therefore, we reform Kruskal algorithm [13] in order to find out the matrix PT. Algorithm 3 gives out the detailed description of our method.

figure c

Trust Prediction: Comprehensively considering the effects of these factors, \(VSS_{i,j}\) and \(LFS_{i,j}\) respectively describes the explicit and implicit similarity between two users, while \(PT_{i,j}\) pays attention to their trust relationship between them. Combing the three factors together, we define hybrid similarity for the first time by Eq. (5). Equation (5) multiplies the linear combination of explicit similarity and implicit similarity by similarity baseline derived from propagated trust. To measure similarity baseline, we explore the relationship among similarity, intimacy and trust. There is a linear relationship between similarity and intimacy among people [1], and the more intimate the relationship is, the harder people can strengthen trust by making the relationship more intimate (marginal effect) [4]. Therefore, trust can be treated as a logarithmic function on similarity, which leads to the similarity baseline in Eq. (5).

$$\begin{aligned} Sim_{i,j} = (\alpha _1\cdot VSS_{i,j} + \alpha _2\cdot LFS_{i,j}) \cdot e^{PT_{i,j}-1} \end{aligned}$$
(5)

\(\alpha _1\) and \(\alpha _2\) are two coefficients used to balance the explicit similarity and implicit similarity, and \(\alpha _1 + \alpha _2 = 1\).

So far, with the aforementioned definitions and concepts, we can predict the trust from the trustor to the trustee based on the set \(\varvec{P}\) generated by iSim. Let \(\varvec{P_i}\) be the set of K-NN of user i. Equation (6) is our formula of trust prediction.

$$\begin{aligned} \widetilde{T}_{i,j} =\left( \sum \limits _{k \in \varvec{P_i}}Sim_{i,j}\cdot T_{k, j}\right) / \sum \limits _{k \in \varvec{P_i}} Sim_{i,j} \end{aligned}$$
(6)

5 Theoretical Analysis of iSim

In this section, we will analyze the time complexity of iSim. Our approach can be divided into two parts: the pre-training process and the predicting process. In the pre-training process, iSim calculates the similarities and finds the top-K similar users. In the predicting process, iSim predicts the trust rating values based on the similarity.

We first introduce three lemmas. Note that n is the number of users, t is the number of non-zero values in T, m is the number of iteration of matrix factorization, \(\ell \) is the number of latent factors, K is the numbers of the similar users selected, and \(\mathcal {A}^{-1}\) is the inverse of Ackermann’s function [21].

Lemma 1

The time complexity to compute matrix VSS is \(\mathcal {O}(n^2+\frac{t^2}{n})\).

Proof

The time complexity is \(\mathcal {O}(t)\) in Step 1, and it is \(\mathcal {O}(n^2)\) in Step 2. iSim-VSS runs from Step 4 to Step 4 for k times, where . By the assumption that the data is randomly distributed, the time complexity is \(\mathcal {O}(\frac{t^2}{n})\). Lastly, the time complexity from Step 8 to Step 8 is \(\mathcal {O}(n^2)\). Thus, the total time complexity is \(\mathcal {O}(n^2+\frac{t^2}{n})\).    \(\square \)

Lemma 2

The time complexity to compute matrix LFS is \(\mathcal {O}(n^2{\ell }+m{\ell }(n\,+\,t))\).

Proof

The time complexity is \(\mathcal {O}(n\ell )\) in Step 2, where \(\ell \) is the number of latent factors. The algorithm runs from Step 3 to Step 3 with the complexity of \(\mathcal {O}(m\ell (n+t))\), where m is the number of iterations. The complexity from Step 6 to Step 6 is \(\mathcal {O}(n^2{\ell })\). Thus, the total time complexity is \(\mathcal {O}({\ell }n^2+m\ell (n+t))\).    \(\square \)

Lemma 3

The complexity to compute matrix PT is \(\mathcal {O}(t\log {t}+t\mathcal {A}^{-1}(n,t)+n^2)\).

Proof

The complexity is \(\mathcal {O}(n^2)\) in Step 1, \(\mathcal {O}(n)\) in Step 2, and \(\mathcal {O}(t)\) in Step 3. Step 4 enumerates every record in ascending order. It determines the ascending order in \(\mathcal {O}(t\log {t})\), and enumerates t times. The operators in Step 5 and Step 10 can be executed by Disjoint Sets [21] in \(\mathcal {O}(\mathcal {A}^{-1}(n, t))\), where \(\mathcal {A}^{-1}\) is the inverse of Ackermann’s function. It is hard to determine directly how many times the code in Step 9 would execute, but obviously every \(PT_{i,j}\) is only calculated once. Thus, the time complexity from Step 4 to Step 4 is \(\mathcal {O}(t\log {t}+t\mathcal {A}^{-1}(n,t)+n^2)\). Thus, the total time complexity is \(\mathcal {O}(t\log {t}+t\mathcal {A}^{-1}(n,t)+t+n^2)\).    \(\square \)

Theorem 1

The time complexity of iSim is \(\mathcal {O}(t\log {t}+t\mathcal {A}^{-1}(n,t)+n^2{\ell }+m{\ell }(n+t)+n^2K)\).

Proof

Based on the the Lemmas 1, 2, and 3, the time complexity of pre-training process of iSim is \(\mathcal {O}(t\log {t}+t\mathcal {A}^{-1}(n,t)+{\ell }n^2+m{\ell }(n+t))\). The time complexity to predict the trust between two users is \(\mathcal {O}(K)\). Thus, the time complexity of iSim to predict the trust of all pairs of users is \(\mathcal {O}(t\log {t}+t\mathcal {A}^{-1}(n,t)+{\ell }n^2+m{\ell }(n+t)+n^2K)\).    \(\square \)

6 Experimental Evaluation

In this section, we present the experimental results of iSim, and analyze the effectiveness and efficiency. All the experiments are based on a real-world dataset. All of the experiments are implemented using Matlab R2014b running on an PC with Intel Core i5, 2.6 GHz CPU, 4 GB RAM and Ubuntu 14.04 operating system.

6.1 Experiment Setup

Dataset Description. The dataset used in our experiment is advotago Footnote 1. advotago is a trust-based social network for open source software development. To allow users to certify others, The social network provides 4 levels of trust ratings between users, which are “Observer”, “Apprentice”, “Journeyer”, and “Master”. The “Observer” is the lowest trust level, and the “Master” is the highest trust level. To quantify the dataset for calculation, we convert “Observer”, “Apprentice”, “Journeyer”, and “Master” to 0.1, 0.4, 0.7, and 1.

We use the statistics that contains 7, 425 users and 56, 550 trust rating records. Each record contains the trust rating values that a user gives to another user. To support repeated experiments, we conduct random sampling for 50 times. In each sampling, \(80\,\%\) of records are selected as training set, and \(20\,\%\) are selected as testing set. iSim may fail to predict trust values on rare occasions because of data sparsity of advotago. In that case, we use the average trust value of dataset as the default predicting value.

Metric Measures. In our experiments, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are used to measure the predicting effectiveness, which are common metrics in the area of trust predicting [23, 28]. The formulas of them are

$$\begin{aligned} MAE = \tfrac{\sum \limits _{i=1}^n\sum \limits _{j=1}^n{I_{i,j}\vert T_{i,j}-\widetilde{T}_{i,j} \vert }}{\sum \limits _{i=1}^n\sum \limits _{j=1}^n{I_{i,j}}}, RMSE = \sqrt{\tfrac{\sum \limits _{i=1}^n\sum \limits _{j=1}^n {I_{i,j}(T_{i,j}-\widetilde{T}_{i,j})^2}}{\sum \limits _{i=1}^n\sum \limits _{j=1}^n{I_{i,j}}}} \end{aligned}$$

In the formula, \(T_{i,j}\) denotes the known trust rating from user i to user j, \(\widetilde{T}_{i,j}\) denotes the predicted trust value from user i to user j. MAE is the average value of each absolute error. RMSE is the variance of each absolute error. Compared with MAE, RMSE is sensitive to the large absolute error. The predicting accuracy is higher if MAE and RMSE is lower.

To measure the efficiency of iSim, we consider the execution time of the program. The unit of execution time is the second.

Competitive Methods. In order to evaluate the effectiveness and efficiency of iSim, we select two approaches (SCMF) [28] and (PROP) [7] for comparison. The SCMF is the most efficient trust predicting algorithm based on matrix factorization, which considers the trust rating value similarity and trust rating distribution similarity between users to constrain the process of matrix factorization and result in high predicting accuracy. The PROP predicts trust rating from user i to user j by finding a path with high propagated trust. The propagated trust consists of both mean and variance of trust of the edges on the path and the length of path.

6.2 Experiments for Parameter Settings

The selection of K in the K-NN query is very important. Figure 3(a) shows the influence of K in iSim. In this figure, we set \(\alpha _1=\alpha _2=0.5\), and try to set K from 10 to 50 in ascending order with the step of 5. The result shows MAE and RMSE are almost constant, so the accuracy of our approach is not sensitive to K. Thus, we set \(K = 20\).

Fig. 3.
figure 3

Parameter setting experiments for iSim

The \(\alpha _1\) and \(\alpha _2\) determine the weight of two similarities, which influence the predicting accuracy. We try to test the influence of \(\alpha _1\) and \(\alpha _2\) for iSim, and the results are presented in Fig. 3(b). In this figure, \(K = 20\), \(\alpha _1\) is set from 0 to 1 with the step of 0.1, and \(\alpha _2=1-\alpha _1\), respectively. It shows when \(\alpha _1\) increases from 0.2 to 1 or decreases from 0.2 to 0, the accuracy decreases. Thus, we select \(\alpha _1=0.2\) and \(\alpha _2=0.8\). Obviously, in the dataset, since the dataset is sparse, latent factor similarity is more important. However, it is hard to know the weight in the reality, so we also set \(\alpha _1=0.5\) and \(\alpha _2=0.5\) in the experiment.

The \(\ell \), \(\lambda _1\) and \(\lambda _2\) are the parameters in the matrix factorization. \(\ell \) is the number of latent factors. A large \(\ell \) can increase the accuracy of predicting but slow down the speed of prediction. \(\lambda _1\) and \(\lambda _2\) control the extent of Gaussian priors. We set the same values with SCMF: \(\ell =10\), \(\lambda _1 = \lambda _2 = 0.01\). The parameter setting of SCMF and PROP are the same as the corresponding paper.

6.3 Evaluation Analysis

Effectiveness Analysis. We compare the predicting accuracy with SCMF and PROP. The results are shown in Fig. 4. In the figure, \(\alpha _1\) is set as 0.2. We can see that iSim outperforms other two approaches. Compared with PROP, the MAE decreases by \(29.6\,\%\) and the RMSE decreases by \(30.5\,\%\). Compared with SCMF, the MAE decreases by \(18.8\,\%\) and the RMSE deceases by \(11.9\,\%\). These results show that iSim achieves significant improvement, since SCMF is the state-of-art literature with excellent performance.

Fig. 4.
figure 4

Accuracy comparisons of PROP, SCMF and iSim

Specially, we analyze the performance gain from different components. Table 1 shows the results of predicting accuracy of the approaches that adds iSim-VSS, iSim-PT, and iSim-LFS progressively. In the table, “VSS” only uses iSim-VSS, and “VSS+PT” uses iSim-VSS and iSim-PT, (set \(\alpha _1\) as 1). “VSS+PT+LFS” denotes iSim with \(\alpha _1 = 0.2\). “norm VSS+PT+LFS” denotes iSim with \(\alpha _1 = 0.5\). We compare all of them with SCMF, which is the best competitor.

Table 1. Accuracy gain analysis

Now we analyze the results in Table 1. Vector space similarity is a common metric of similarity, so we treat it as a basic form. In the result of the basic form, the accuracy is improved by \(10.2\,\%\) in terms of MAE and improved by \(5.4\,\%\) in terms of RMSE, if compared with the best competitor. If utilizing iSim-VSS and iSim-PT, the accuracy is improved by \(13.8\,\%\) in terms of MAE and improved by \(7.8\,\%\) in terms of RMSE. If utilizing iSim-VSS, iSim-PT, and iSim-LFS with \(\alpha _1=0.5\), the accuracy is improved by \(18.1\,\%\) in terms of MAE and improved by \(11.3\,\%\) in terms of RMSE. If utilizing iSim-VSS, iSim-PT, and iSim-LFS with \(\alpha _1=0.2\), the accuracy is improved by \(18.8\,\%\) in terms of MAE and improved by \(11.9\,\%\) in terms of RMSE. The results confirm that the three methods of measuring similarity are efficient.

Efficiency Analysis. In fact, some of the trust predicting approaches can be divided into pre-training process and predicting process, e.g., the SCMF. The pre-training process mines the information of datasets which can improve the effectiveness or efficiency of predicting, and then saves it in the small storage space. Thus, we compare the approaches in terms of the two process.

We first give the time complexity of the three approaches. In the pre-training process, the time complexity of iSim is \(\mathcal {O}(t\log {t}+t\mathcal {A}^{-1}(n,t)+{\ell }n^2+m{\ell }(n+t))\), while that of SCMF is \(\mathcal {O}(mn^2\ell )\), which is higher than that of iSim. The PROP does not have pre-training process. In the predicting process, the time complexity of iSim to predict the trust of one pair of users is \(\mathcal {O}(K)\), and that of SCMF is also p\(\mathcal {O}(\ell )\). However, that of PROP is \(\mathcal {O}((\frac{t}{n})^p)\), which is much higher. Note that p is the maximum length of paths from the two corresponding users. In all, combining the two process, iSim has the best time bound.

Now we consider the execution time of pre-training process. The PROP does not have pre-training process, so we compared the execution time between SCMF with our approach with different components. Figure 5(a) shows the comparisons of execution time. The labels in the figure are same as the previous subsection. Note that the setting of \(\alpha _1\) and \(\alpha _2\) does not affect the execution time. We can see that the pre-training time of iSim is only approximately \(6.7\,\%\) of SCMF. If we do not use iSim-LFS, the execution time ratio of “VSS+PT” to SCMF is only approximately \(2.8\,\%\). If we do not consider iSim-PT neither, the ratio is only about \(1.8\,\%\). The pre-training executing efficiency is far superior to SCMF, because the constrain component of SCMF makes its time complexity much higher than simple matrix factorization.

Fig. 5.
figure 5

Comparison of execution time

Then we consider the execution time of predicting process. Note that the predicting execution time is the same regardless of the component of iSim. Figure 5(b) shows the comparison of execution time. in which log scale is used for the Y axis. We can see the predicting time of iSim is approximately same as that of SCMF, but it is merely about \(0.012\,\%\) of that of PROP.

Moreover, we compare the total execution time of three approaches to predict the trust of the whole testing datasets. Figure 5(c) shows the results. We can see that the total execution time of iSim is only about \(6.8\,\%\) of SCMF, and only about \(3.5\,\%\) of PROP. Specially, if we only use VSS and PT to measure similarity, the total execution time of “VSS+PT” is merely about \(2.8\,\%\) of SCMF and about \(1.4\,\%\) of PROP. If we exclusively use iSim-VSS to measure similarity, the total execution time is merely \(1.8\,\%\) of SCMF and \(0.9\,\%\) of PROP. Obviously, iSim is superior to other two approaches in terms of execution time.

7 Conclusion

In this paper, we have proposed a trust predicting approach named iSim: an integrated similarity based collaborative filtering approach for trust prediction in service-oriented social networks. iSim measures the similarity from three respects: vector space similarity, latent factor similarity and propagated trust. It novelly integrates three strategies and makes use of their complementarity. We provide both theoretical analysis for time bound and numerical experiments with real-world dataset to validate the priority of iSim. Moveover, we innovatively leverage latent factor vector in the study of similarity, which might be an inspiration in other fields.