Keywords

1 Introduction

Trust is a personal and subjective phenomenon, it relies on various factors or evidences [1]. An individual’s trust can be derived from a combination of referrals and personal experience. In interactional networks, trust mirrors the different-level reliability, further implies the potential willing to behave honestly. However, the existence of strategically malicious participants, here we refer those participants as strategically malicious participants that can offer authentic services on the one hand and give dishonest/exaggerated feedback-ratings to their malicious partners on the other hand alike colluding and disguise participants, makes traditional trust metrics become difficult to infer rational trust, i.e. they cannot degrade the trust scores of strategically malicious participants, but inversely promote their trust scores resulting from the authentic services provided by them [2]. Hence, we ought to address this sophisticated misbehavior problem from a new angle, i.e. explore an effective mechanism to identify strategically malicious participants rather than only depending on single trust value.

The trust mechanism can validate the internet/web service delivery or service recommendation in diverse collaborative/interactional networks, such as peer-to-peer networks, BitTorrent, cloud platform, eBay, Amazon, etc. Nevertheless, the inherent properties of anonymity, autonomy and openness can easily bring some security issues for these trust-enabled interactional networks [2,3,4], such as independently malicious, malicious collectives, malicious collective with disguise and malicious spies, etc. Confronting with various sophisticated attacks, it is hard to figure out a comprehensive trust metric to handle them, and the present-day challenges mainly lies in: (i) how to identify unreliable participants and promote reliable participants, especially strategically malicious participants? (ii) how to resist colluding and disguise behaviors rather than only depending on the single trust? (iii) what policy should be implemented to guarantee the interacted targets are reliable? To address these challenges, we propose a misbehavior identification mechanism TrustId using an information entropy-controlled cluster algorithm. Our proposed TrustID could cluster good participants and different categories of malicious participants into appropriate groups with respect to three facets of transactional attributers, further infer cluster-trust for each cluster to make reputable participants with high cluster-trust value have high priorities to be selected as service providers, in this way, the honest service-based transactional behaviors can be effectively conducted and dishonest service-based transactional behaviors can be constrained strictly. Therefore, an effective trust mechanism can guard against various simple and strategically malicious behaviors. Concretely, our main contributions are concluded as follows.

  1. (i)

    We propose an information entropy-controlled cluster algorithm and extract three facets of attributes for clustering in trust-enabled interactional networks, by which we can cluster different categories of participants into appropriate communities. Furthermore, we calculate the cluster-trust for each cluster to distinguish their priorities to provide services for resource request participants, aiming at isolating and restricting malicious participants even with high trust.

  2. (ii)

    We employ information entropy to automatically determine the appropriate number of clustering, by which different categories of participants can be appropriately assigned into a particular number of communities, and participants with similar interactional characteristics will be clustered together.

  3. (iii)

    We conduct extensive experiments to verify the effectiveness of our proposed TrustId and compare with EigenTrust and PathTrust. The results show our TrustId can appropriately cluster various participants and achieve a much better performance than EiegnTrust and PathTrust against the representative colluding and disguise misbehaviors.

The remaining of this paper are organized as following. Section 2 introduces the related work. Section 3 interprets our misbehavior identification mechanism in detail. We perform extensive experiments to evaluate the efficiency in Sect. 4, and conclude this paper in Sect. 5.

2 Related Work

In eBay system [5], pairwise feedback-rating can be defined as positive (1), negative (−1) and neutral (0), an individual’s trust can be generated through summarizing these feedback-ratings simply. Although this way of computing trust is easy, but it may induce some strategically malicious participants to gain high trust scores through strategic manipulations. Kerschbaum et al. propose PathTrust [6] with regard to the maximum-weight path between the service consumer and server provider. This metric only regards the pairwise feedback-rating as the guidance whether to make a transaction, this kind of trust inference kernel depends on limited feedback information. Differently, we utilize complete feedback-ratings to produce trust via direct trust and indirect trust in our work.

Kamvar et al. [2] propose the pioneering trust metric EigenTrust in terms of trust propagation. The system previously assigns pre-trusted participants as authority centers, each participant’s trust is calculated through aggregating the others’ feedback-ratings placed on this participant. The trust scores of all the participants are computed iteratively and ultimately converges to the left principle eigenvector of the normalized feedback-rating matrix. However, the entry of strategically malicious participants makes it hard to produce rational trust. On the contrary, EigenTrust even promotes the trust scores of strategically malicious participants as analyzed in our previous work [3, 4]. In this paper, our misbehavior identification approach clusters different types of participants into different communities appropriately, by which the strategically malicious participants can be confined effectively even though they possess high trust.

In SORT [7], participants create their own trust network in their proximity by referring to the local information, rather than learning global trust information. In our work, we firstly cluster diverse participants, then compute trust score for each participant in each cluster so as to make the requester selects the candidate with high trust if more than one response participants exist in the same cluster.

Gai et al. [8] proposed the channel-related attack, node-based attack and infrastructure vulnerabilities emerged in intelligent transportation systems (ITS). For the former, the attackers mainly launch misbehaviors in terms of sending duplicate message, adjusting data package and inserting harm message. For the middle, the attackers pretend to be one of the nodes in the communication, then in return spread harmful information. For the latter, the attacks adopt three intrusion methods to disable roadside units’ function, increase harm in conjunction with node-based attack and cause dramatic unexpected abuse. This reference extended the malicious behaviors to some extent, however, it primarily focused on the aspect of data transmission in ITS, which differentiates our concentration on service-based misbehavior study in trust-enabled interactional networks.

To date some scholars have developed more trust metrics on the base of popular EigenTrust [2] and PathTrust [6], such as EigenTrust\(^{++}\) [3], ServiceTrust\(^{++}\) [9], GroupTrust [4], etc. These trust metrics aim at inferring reasonable trust scores for different categories of participants, however, in our work we target at clustering different categories of participants into appropriate groups considering their differential transactional behaviors.

3 TrustId: Misbehavior Identification Mechanism

3.1 Information Entropy Controlled Clustering

We adopt typical \(k\)-means algorithm [10] to cluster differential behavioral participants. This algorithm can divide \(n\) observations into \(K\) clusters, each observation will be clustered into one of clusters. We assume \(X = \{ x_i \} ,i = 1,...,n\) is the set of \(n\) observations with \(d\)-dimension to be clustered into \(K\) clusters \(C = \{ c_k \} ,k = 1,...,K\), the function of \(k\)-means algorithm is to find a partition, and this partition will minimize the squared error between the empirical mean of a cluster and the observations in this cluster. We use \(\mu _k\) to denote the mean of cluster \(c_k\). Accordingly, the squared error between \(\mu _k\) and the observations within this cluster can be defined as:

$$\begin{aligned} J(c_k ) = \sum \limits _{x_i \in c_k } {||x_i - \mu _k ||^2 }. \end{aligned}$$
(1)
$$\begin{aligned} c_k = \frac{{\sum \nolimits _{x_{i \in S_k } } {x_i } }}{{|S_k |}}\begin{array}{*{20}c}, \end{array} \end{aligned}$$
(2)

where \(|S_k |\) denotes the number of observations in the \(k\)th cluster. To adequately partition all observations, the \(k\)-means algorithm must minimize the summation of the squared error over all clusters:

$$\begin{aligned} J(C) = \sum \limits _{k = 1}^K {\sum \limits _{x_i \in c_k } {||x_i - \mu _k ||^2 \begin{array}{*{20}c}. \end{array}} } \end{aligned}$$
(3)

Some references had proved the \(k\)-means algorithm is an NP-hard problem even when the total number of clusters \(K\) = 2 [11]. This means it only converges to a local minimum, however, if the clusters are well separated, it may converge to the global optimum with a high probability [12]. Currently, diverse investigating solutions of partitioning datum can be roughly classified into two approaches: geometrical properties (compactness, isolation, dispersion, etc.) and stability. In our work, we utilize the stability approach to study the partition fashion. In statistical mechanics, entropy is usually used to measure the disorder in an arranged system in which larger entropy means higher disorder [13]. We here adopt information entropy to automatically determine the number of clustering.

$$\begin{aligned} H(x) = - \sum \limits _{i = 1}^n {p(x_i )\log p(x_i )} \begin{array}{*{20}c}, \end{array} \end{aligned}$$
(4)

where \(p(x_i )\) is the probability being the state \(i\). In conformity with the purpose of identifying different categories of participants, we redefine it as:

$$\begin{aligned} \begin{array}{l} H(K) = - \sum \limits _{j = 1}^K {\sum \limits _{i \in c_j }^{} {p_{ic_j } \ln \begin{array}{*{20}c} {\begin{array}{*{20}c} {p_{ic_j } } \\ \end{array}} \\ \end{array}} } \\ \begin{array}{*{20}c} {} &{} {} \\ \end{array}p_{ic_j } = \frac{{d_{ic_j } }}{{\sum \limits _{j = 1}^K {d_{ic_j } } }} \\ \end{array}, \end{aligned}$$
(5)

where \(c_j\) denotes the \(j\)th cluster, and \(p_{ic_j}\) is the deviation degree expressed by Euclidean distance \(d_{ic_j}\) from each participant \(i\) to the center of cluster \(c_j\). This formula interprets that the information entropy \(H(K)\) is subject to the current deviation degree from each participant to the \(K\) cluster-centers. The smaller the information entropy, the more stable the cluster status.

3.2 Attribute Extraction

Attribute extraction is an important segment for the high-quality clustering, thus we ought to capture reasonable attributes for trust-enabled interactional networks. As stated in our previous work [3, 4, 14], each participant can get resource services (such as merchandises/files) from other participants, as well as upload resource services for other request participants, thus each individual has two roles: service provider (server) and service consumer (client). Accordingly, an individual can obtain recommended feedback-ratings from other participants through providing services; on the other hand, it can also launch self-recommending feedback-ratings while consuming services from other participants. Our TrustId generates two kinds of personal trust measures by aggregating recommended and self-recommending feedback-ratings, namely service trust and recommendation trust. The service trust represents the capability of providing authentic services for other participants, and recommendation trust reflects the ability of executing authentic feedback-ratings to other participants.

In trust-enabled interactional networks [8], the interactional/transactional relationship among participants can be presented by a directed weighted graph. The vertex set represents participants, edge set \(E = \{ i|j \in Trans(i),l_{ij} \}\) denotes pairwise feedback-ratings, here \(Trans(i)\) is the set of participants with which individual \(i\) has interacted, and \(l_{ij}\) expresses the direct feedback-rating placed on participant \(j\) from participant \(i\). After accomplishing a number of transactions, the transactional relationship would generate a trust graph. Once participant \(i\) receives satisfied services from \(j\), then it will give positive feedback-rating; otherwise negative feedback-rating if unsatisfied. This interprets the personal confidence (opinion) that participant \(i\) places on participant \(j\). We define this direct opinion one participant has in another as pairwise feedback-rating \(l_{ij}\):

$$\begin{aligned} \begin{array}{l} l_{ij} = \left\{ \begin{array}{l} \frac{{max(p_{ij} ,0)}}{{\sum \limits _m {max(p_{im} ,0)} }}\begin{array}{*{20}c} {\begin{array}{*{20}c} {} \\ \end{array}if\sum \limits _m {max(p_{im} ,0)} \ne 0} \\ \end{array} \\ 0\begin{array}{*{20}c} {} \\ \end{array}\begin{array}{*{20}c} {} &{} {} &{} {} \\ \end{array}\begin{array}{*{20}c} {} &{} {otherwise} \\ \end{array} \\ \end{array} \right. \\ p_{ij} = succ(i,j) - unsu(i,j)\\ \end{array}, \end{aligned}$$
(6)

where \(succ(i,j)\) is the amount of satisfied transactions between participant \(i\) and participant \(j\), \(unsu(i,j)\) is the amount of unsatisfied transactions, and \(max(p_{ij} ,0)\) denotes the larger.

The pairwise feedback-rating interprets the confidence information one participant placed on another, we can infer service trust and recommendation trust through aggregating the pairwise feedback-ratings. An individual’s service trust represents the others’ common view of belief; recommendation trust denotes self-recommending belief. We define service trust and recommendation trust as:

$$\begin{aligned} r_s (i) = \left\{ \begin{array}{l} \sum \limits _{j \in I(i)} {l_{ji} } /|I(i)|_{} if|I(i)| \ne 0 \\ 0_{} \ otherwise \\ \end{array} \right. , \end{aligned}$$
(7)
$$\begin{aligned} r_r (i) = \left\{ \begin{array}{l} \sum \limits _{j \in O(i)} {l_{ij} } /|O(i)|_{} if|O(i)| \ne 0 \\ 0_{} \ otherwise \\ \end{array} \right. , \end{aligned}$$
(8)

where \(I(i)\) is the set of participants which have interacted with \(i\), \(O(i)\) is the set of participants with which \(i\) has interacted.

We can see the results of Formulas (7) and (8) are decimal values. They cannot felicitously reflect a participant’s truthful behavior, e.g. some individuals are mediocre, but their trust may be high since they have few transactional partners, that is to say the denominator is small. Additionally, the malicious participants can also obtain high trust through exaggerated feedback-ratings by other few malicious partners. Hence we introduce the transaction quantity as the third attribute to reflect an individual’s behavior comprehensively. In this paper, the transaction quantity \(tr(i)\) is straightly defined as the total number of transactions no matter which roles it plays.

$$\begin{aligned} tr(i) = pros(i) + dwnr(i)\begin{array}{*{20}c}, \end{array} \end{aligned}$$
(9)

where \(pros(i)\) is the transacted number for providing services as a server, and \(dwnr(i)\) is the transacted number to receive services as a client.

We adopt Euclidean distance \(d_{ic_j}\) from each participant to the center of corresponding cluster to express the aforementioned deviation degree using the three attributes: service trust, recommendation trust and transaction quantity.

$$\begin{aligned} d_{ic_j } = \left( {(r_s (i) - c_j (s))^2 + (r_r (i) - c_j (r))^2 + (tran(i) - c_j (tran))^2 ]} \right) ^{\frac{1}{2}}, \end{aligned}$$
(10)

where \(c_j(s)\), \(c_j(r)\) and \(c_j(tran)\) denote the cluster center’s service trust, recommendation trust and transaction quantity inferred by Formula (2). The deviation degree conduct how to cluster each individual as clustering criterion. The smaller the deviation degree, the higher the probability participant \(i\) belongs to the cluster \(c_j\). To distinguish the fame of clusters, we again define cluster-trust \(r(c_k )\) for each cluster through the member’s service trust and recommendation trust:

$$\begin{aligned} r(c_k ) = \frac{{\sum \limits _{j = 1}^{|c_k |} {(r_s (j) + r_r (j))} }}{{2 \cdot |c_k |}}\begin{array}{*{20}c}, \end{array} \end{aligned}$$
(11)

where \(|c_k|\) is the number of participants in cluster \(c_k\). The larger the cluster-trust, the higher the priority the response participant in this cluster is selected as interactional target. In this way, although some strategically malicious participants can gain high trust, they may not be selected as transacted targets resulting from their low cluster-trust.

3.3 Algorithm Framework and Complexity Analysis

The entire trust-conducted interactional framework can be illuminated as three steps: (i) collect feedback-ratings via asking transacted partners; (ii) aggregate these feedback-ratings information and calculate service trust, recommendation trust and count transaction quantity; (iii) perform Algorithm 1 to obtain an information entropy-controlled appropriate number of clusters. Obviously, the minimum number of clusters \(c_{min}\) can be set as two, but the question is how to fix the maximum number. According to a rule of thumb many investigators use [15], the maximum number of cluster \(c_{max}\) is set as \(\sqrt{n}\) usually.

figure a

Upon Algorithm 1, we observe the computational complexity mainly depends on performing \(k\)-means algorithm and calculating information entropy. The complexity of performing \(k\)-means is \(\sum \limits _{K = 2}^{\sqrt{n} } {(n - K) \cdot K \cdot t} \sim O(nKt)\), where \(K\) is the appropriately formed cluster number, \(n\) is the amount of system participants, \(t\) is number of iteration rounds. On the other hand, the complexity of calculating information entropy almost relies on the deviation degree, its complexity is \(\sum \limits _{K = 2}^{\sqrt{n} } {(n - K) \cdot K} \sim O(nK)\). Additionally, we need store system participants and information entropy for each clusters, thus the space complexity is \(O(n + c_{max} )\).

4 Experiment Evaluation

4.1 Experiment Configuration

For the strategically malicious behaviors, they can gain high trust scores through cunning manipulations, and we define them as follows.

Definition 1

Colluding malicious participants (CMP): malicious participants are organized collectively. They, on the one hand, service authentic resources to gain high trust scores; on the other hand, each malicious participant in return provides exaggerated feedback-ratings to promote other malicious participants once transactions take place.

Definition 2

Disguised malicious participants (DMP): two types of malicious participants are co-existed: pure malicious participants and disguised participants. The pure malicious participants always provide inauthentic resources, but the disguised participants occasionally service authentic resources to gain high trust, in return they boost the trust scores of those pure malicious participants via exaggerated feedback-ratings.

We assume the number of pure malicious participants equal to that of disguised ones in DMP. Since the feedback-rating network is accordance with power-law distribution [3, 4], we elaborate our experiment in terms of power-law distribution too. We develop a prototype using C++ program, and launch a set of experiments with the configuration: Microsoft Windows, Genuine Intel(R) CPU, 1.61 GHz, 1 G Memory. We utilize the Gnutella manner [3] to handle the procedure of service querying and answering. Table 1 lists the primary parameters.

Table 1. Parameter configuration.

Our experimental platform includes 500 participants with 500 distinct file resources, the distinct file resources possess different numbers of copies in the light of Zipf distribution. In one-time experiment, we set 100 time series, in which 100 participant are randomly selected to issue service-request queries for each time series, which are performed 2 rounds. We perform 10 times and average the values, define the failed transaction ratio (FTR) to evaluate effectiveness:

$$\begin{aligned} \varphi = \sum \limits _i {\sum \limits _j {unsu(i,j)} } /\sum \limits _i {\sum \limits _j {(succ(i,j)} } + unsu(i,j)). \end{aligned}$$
(12)

4.2 Cluster Generation

Rational number of clustering can effectively identify and isolate malicious participants. We compute the information entropy in different statuses, and list the results in Tables 2 and 3 when time series are both 100. NoC denotes the number of clustering, and PoMP represents the percentage of malicious participants.

Table 2. The information entropy with different percentages of CMP.
Table 3. The information entropy with different percentages of DMP.
Fig. 1.
figure 1

Transition increment of information entropy of CMP.

Fig. 2.
figure 2

Transition increment of information entropy of DMP.

Fig. 3.
figure 3

The FTRs with different percentages of CMP.

Fig. 4.
figure 4

The FTRs with different percentages of DMP.

We observe the information entropy may be larger at first statuses, but it becomes steady gradually as the number of clusters increases. We also draw the 20%, 40%, 50% and 70% cases in Figs. 1 and 2. This information entropy can effectively reflect the stability of clustering, the larger the information entropy, the more disordered the participant assignment. Therefore, we compute the minimum of transition increment mirroring different statuses as the determinant criterion of appropriate number of clustering. At the beginning, the increment range is large, but becomes small step by step as the number of clusters enlarges, this indicates our TrustId can validly make a smooth clustering for different categories of participants.

4.3 Performance Evaluation

To verify the performance of our TrustId, we utilize the FTR to illuminate the dynamic transactional process with time series changing from 1 to 100. The experimental results are depicted in Figs. 3 and 4 under CMP and DMP.

Each figure has two curves, one is the FTR from the viewpoint of all accomplished time-series at present; the other is from the viewpoint of a single time series, in which 2 (rounds) \(\times \) 100 (requests) transactions take place. From Fig. 3, we observe that FTRs decrease gradually with slope becoming slow as the number of malicious participants increases. However, FTRs change obviously, e.g. it becomes 0 as time series enlarges when the percentage of malicious participants is 20%. At the same time, Fig. 4 shows the similar trend as well.

Fig. 5.
figure 5

Participant distribution with varied percentages of CMP.

Fig. 6.
figure 6

Participant distribution with varied percentages of DMP.

Furthermore, in order to display the efficiency of our proposed information entropy-controlled cluster algorithm, we exhibit the distribution status for each participant in addition to the cluster-trust in Figs. 5 and 6. The malicious and good participants are divided into different clusters corresponding to different-level of trust scores as time series increases. At first, the separation may be ambiguous, but continuous transactions will ultimately assign different categories of participants into adequate clusters, and forms a clear separation. From the data, we observe the larger the percentage of malicious participants, the fuzzier the separation degree to some extent, which interprets the attack becomes more and more rigorous as the amount of malicious participants increases. However, our TrustId can effectively induce malicious participants into cluster(s) with low cluster-trust, and good participants into cluster(s) with high cluster-trust at the end of time series.

Additionally, we compare our TrustId with EigenTrust and PathTrust, and depict the experimental results in Fig. 7. In CMP experiment, our TrustId can significantly outperform EigenTrust and PathTrust. The strategically malicious participants in EigenTrust can gain high trust through collaborating mutually, in return they boost other partners dramatically. Nevertheless, in our TrustId the good participants are divided into reliable cluster(s) with high cluster-trust, by which the strategically malicious participants are isolated from good participant cluster(s). In PathTrust, only the feedback-ratings between initiator and candidates are computed without referring to the collusive feedback-ratings, this means it can prevent the exaggerated trust propagation among malicious participants more or less, thus it performs a little better than EigenTrust.

Fig. 7.
figure 7

FTRs with different percentages of CMP and DMP.

The “cunning” disguised malicious participants in DMP, on the one hand, provide popular and authentic file resources for good participants to gain high feedback-ratings when selected as transactional targets; on the other hand, they boost the pure malicious ones through rating them exaggeratedly. Since the existence of pre-trusted participants declines the probability to select those disguised participants, EigenTrust [2] perform well at the beginning, however, the FTR goes up as the number of malicious participants increases. Our TrustId divides different types of participants into appropriate clusters, only few disguised participants may be moved into good participant cluster. Our algorithm also make a further check referring to the average of service trust and recommendation trust to confine the disguised participants’ transacted behaviors according. PathTrust [6] only selects the maximum-weight paths as the trust between initiator and candidates, which avoids the trust transitivity between disguised participants and the candidates to some extent, thus the experimental results are a little better than EigenTrust [2].

5 Conclusion

We have presented our misbehavior identification mechanism TrustId through information entropy-controlled cluster algorithm. We also define cluster-trust using service trust and recommendation trust for each clusters, the higher the cluster-trust, the larger the probability the response participant in the cluster is selected as transacted target, in this way those colluding and disguised participants, even though they possess high trust scores, they cannot be selected as transacted targets because their cluster-trust is low. Extensive experiments also show that our proposed TrustId appropriately clusters different categories of participants into appropriate communities and dramatically outperforms EigenTrust and PathTrust against the representative strategically colluding and disguise misbehaviors.