Keywords

1 Introduction

Measuring trust in social networks is an extensively studied topic. Several factors drive the research interest in understanding and measuring Trust. These include social, political, commercial, economical and for security and safety. Moreover, the advances in social network analysis have facilitated quantified research on trust. Nevertheless, the ceaseless expansion of social media content continues to present new challenges for providing scalable solutions.

In recent times, the Arab world has gone through political, social and cultural shifts. It is a historically difficult and interesting period where trust ties and alliances have been strained, broken and reformed. Arabs form a significant portion of social media users on Twitter, and it is estimated that by March 2017 there will be over 11 million Arab users [1].

Most of the research about trust on Twitter has not considered the content of the tweets, it was mostly based on the structure of the social network and the users’ interaction. Sherchan et al. have emphasized the importance of context to modelling trust [2], meaning that trusting a person in one context, does not mean they will be trusted in another context. We take this notion a step further and hypothesize that not only the context is important, but having similar judgments about context affects trust. In this paper, we introduce a framework to investigate the relationship between trust and agreement on sentiment among twitter users. Sentiment analysis uses natural language processing to identify a user’s attitude towards a given topic.

The paper is structured as follows, Sect. 2 provides a literature review of trust metrics on twitter, Sect. 3 introduces our proposed framework. In Sect. 4 a use case demonstrates how we intend the framework to be used. Section 5 concludes the paper and discusses future work.

2 Related Work

Our proposed methodology requires measuring trust and sentiment agreement; in this section, we review approaches for modelling trust on twitter. Then provide a brief overview of sentiment analysis in Arabic, and cite several surveys which explain the current research efforts.

2.1 Measuring Trust on Twitter

There are several existing approaches to model trust on Twitter. In this section, we review seven approaches. These approaches differ in terms of their aims, and the factors they included to represent trust. There are three types of factors: structural, behavioral or interaction-based, and content or semantic based.

Adali et al. [7] developed statistical measures based on the timing and sequence of communications on twitter (interaction information), they did not base the measures on the textual content (semantic information) nor the existing social network structure. Two trust metrics were introduced: conversational trust, and propagation trust. Conversational trust is computed based on the length, frequency and balance of conversations between two users. Propagation trust on the other hand is the measure of the frequency of message propagation between two users. It is based on the assumption that a user A will propagate messages from a user B, if A trusts B, and since the approach is based solely on interaction information, they measured potential propagations, instead of actual propagations, where a message was considered a propagation of another message if the difference in their times was less than threshold. The twitter dataset used for testing had more than 2 million users, containing 15,563,120 public directed messages and 34,178,314 broadcast messages. They also assumed that retweets are indicators of trust, meaning if user A retweets a tweet from user B, then A trusts B. They built a retweet graph accordingly. They then validated their proposed approaches by measuring their coverage of the retweet graph. For the conversational trust the coverage was 11.6% and for the propagation trust it was 14.4%, and both were significantly higher than the random coverage.

MarkovTrust [8] is an adaptation of the EigenTrust algorithm [9] for modelling trust and trust propagation on Twitter, the authors aim was to use their approach to improve tweet recommendations. MarkovTrust is calculated based on users’ interactions by retweets and mentions. In addition, it calculates the trust propagation or transitive trust. They developed a system composed of a crawler and a recommender. The crawler calculates trust values for a node’s neighbors and updates a ranked list of trusted users using the MarkovTrust while retrieving their tweets. Evaluation of the model was a challenge, since there are no trust scores assigned explicitly on Twitter, there is no straightforward way to achieve this. Therefore, they evaluated its effectiveness using an approach in which they calculated the difference between a node’s ranking of trusted nodes, and its neighbors ranking of the same nodes. A dataset was collected that contained 20 users, and after preprocessing they resulted in 314 balanced tweets, 50% where retweets, and 50% where non-retweet. If their model was valid, the differences in ranking would increase as the rank decreased, meaning the users would agree on the ranking of common neighbors. However, in the experiment this was not the case, meaning either the model was wrong or the dataset was too small to verify the model. The recommender part of the system recommends tweets to the user periodically, using machine learning. To create the dataset the authors regarded retweeting as an endorsement, and accordingly checked the origin of the retweet, if it was from the top trusted neighbors they labelled that tweet as positive, and the previous tweet from the same origin as negative this is the same dataset used for evaluating the crawler above. The features they used were the trust score, the words contained in the tweet, and whether the tweet contained a URL. The trust score was calculated by the crawler, the words were labelled negative or positive based on a dictionary. The dictionary was created using words in the labelled tweets in the dataset, a word was labelled positive or negative based on their frequency of occurrence in labelled tweets. They trained two classifiers an SVM and a Naïve Bayes one. To evaluate their recommender, they used 75% of the dataset as a training set, and 25% as test set, and ran the experiment with and without the trust features, although the performance in both settings was low, a slight improvement was exhibited when including trust as a feature, however this result is weakened due to the small size of the dataset.

Unlike the approaches above, the authors’ aim in [10] was to develop a model of consumers’ trust in new technologies not trust between users. It is still related to our research for two reasons, the first is their model for trust uses twitter as medium, the second is their utilization of tweet sentiment in modelling trust. Their model takes into account the authority of users tweeting about the technology and their sentiment towards it. It incorporates what they call situational trust and behavioral trust and is calculated at a certain point in time. Situational trust is based on the tweets’ sentiment and the degree of their senders’ authority which measured by the number of followers. Behavioral trust is based on the sentiment of tweets regardless of the senders’ authority. The model also includes several parameters as weights that regulate it in terms of the degree of past trust to include, the balance between situational trust and behavioral trust and others. The sentiment of tweets and their topic were determined using SVM classifiers. The parameters for the model itself were learnt by sampling data from targeted users through questionnaires and tuning the parameters according to the results using regression. They experimented by modelling users’ trust in Apple’s “iCloud’, the resulting mean squared error (MSE) was 0.0439. Moreover, they used the model for building a dynamic visualization tool to monitor trust in ICT services.

In [11], the author introduces a tool for analyzing opinion, influence and trust in twitter. The author introduced three sentiment analysis algorithms, named surface, deep and shallow, for coloring tweets according to their sentiment. For the surface analysis algorithm, negative and positive word lexica were created. To create them, the author used clustering to select tweets in the political domain, then the sense-bearing words are labelled as positive and negative, and supervised classification is applied to the tweets using the lexica as bag of words. In the shallow analysis, instead of using the bag of words, a feature vector was used containing not only words but also phrases. The deep analysis involves relating words to concepts in WordNet to find related concepts, and building what the author called AffectiveSpace. To measure the propagation of opinion, influence and trust, three edge-coloring algorithms were introduced, that utilized the tweet coloring algorithms mentioned above and maximum-flow minimum-cut traditional graph algorithms. These algorithms were demonstrated by implementing them on top of NodeXL [12] as MS Excel macros, which produced several graph illustrations and a dashboard.

The authors objectives in [13] were to study the usage of twitter after the Chilean earthquake in 2010 and to examine the differences between the propagation of false rumors and confirmed news on twitter. They collected 4,727,424 tweets, from 716,344 users and calculated several metrics to analyze the network. They then identified 7 confirmed news and 7 false rumors, and collected tweets about them and categorized those tweets into retweets, affirms, denies and questions, meaning was the tweet about those topics a retweet, a confirmation, a denial, or a question about it. Their results show that the propagation patterns differ between false rumors and confirmed news, as false rumors tend to be denied and questioned more.

In this work [14], the authors focus on inferring trust among twitter users, the inferred trust network is based on interactions and relationships among users. The authors’ model includes two stages: inferring (filtering) and propagation (discovery). In the first stage, which is inferring relationships, the authors utilized two twitter indicators to decide whether user A trusted user B, these are retweets, and favorite lists. This resulted in two graphs. To propagate the calculated trust, they used the following methods: simple-transitivity, weighted-transitivity, golbeck-transitivity [15], and structural similarity. Structural similarity, unlike the three transitivity propagation methods, is based on whether two users share similar structures in who they trust, and who they are trusted by. Thus, they resulted in two graphs: retweet and favorite, with four propagation methods. To evaluate the models, they collected data from more than 20,000 users. Because there are no existing trust values for comparison, they assumed that the methods that correctly predict the existing relationships in the Retweet and Favorite graphs, will be able to predict relationships in the rest of the network. Then they measured the coverage of the method in the two graphs, and the mean absolute error (MAE). They found that the structural similarity method gave the best results.

The aim of the work in [16] is to rate the topic-focused trustworthiness of twitter users and tweets. Their trust model is assumed to be context-dependent. It has two main components: topic-focused similarity-based trust evaluation, and trust propagation. The topic-focused similarity-based trust evaluation focuses on the tweets credibility, which is rated according to their similarity to authentic news articles. The similarity is calculated by integrating three metrics: textual similarity, spatial similarity, and temporal similarity. As for the trust propagation, it is calculated by iteratively propagating the trust evaluation based on four rules, the first and second are related to similarity in semantic and conversational features, and the third and fourth propagate the trust from author of a trusted tweet to the author and their other tweets, and to a user from their friends. They evaluated the model by first evaluating the topic-focused similarity-based trust measure. They collected tweets with the highest score about a specific topic, then they trained an SVM classifier, then used the classifier on new data. They compared the results of the classifier to two classifiers as baselines, one trained on manually ranked tweets and the other on tweets collected by keyword matching. Their classifier achieved better performance than the baseline method. In terms of user trustworthiness, they assumed it depends on the trustworthiness of their tweets. To validate this, they used some twitter measures as trust indicators such as: account time length, favorite count, follower count, friends count, list count, and whether the account was verified or not. They then checked that the correlation between their measure of trustworthiness and the twitter trust indicators was positive and statistically significant. The results show that the positive correlation is significant in most of the indicators.

As many of the proposed trust models for Twitter are adapted from P2P network trust models, Sherchan et al. have stressed the importance of developing trust models that utilize the social network indicators, because the notion of trust and the purpose for the model differs between the two types of networks [2].

2.2 Arabic Sentiment Analysis

Sentiment analysis is concerned with automatically identifying the sentiment of a certain expression or text, the sentiment polarity could be either positive or negative (binary score) or it could have scores representing different degrees of sentiment. Liu [3] provides an extensive background and survey of recent research. Approaches to sentiment analysis can be either supervised, unsupervised or combination of both. Sentiment Analysis in Arabic introduces several challenges such as use of dialectical Arabic, lack of resources, spam detection, co-reference resolution and several others [4]. Alhumoud et al. [5] reviewed several Arabic sentiment analysis approaches. Also, existing resources for Arabic Sentiment Analysis, such as preprocessing tools, sentiment lexica and datasets have been surveyed in [6].

3 Framework

In this section, we introduce a framework that illustrates how we can conceptualize the relationship between trust and sentiment analysis on social networks. Since the research in this field is still in its early stages it would be beneficial to have a general framework that will assist in designing a proper research methodology for this field. This framework can be adapted based on the intended research purposes and the data available for researchers. This framework consists of five stages:

Stage One: Selecting the Basis of Trust

As we have seen in Sect. 2 there are many trust models. Therefore, researchers derive trust measures by utilizing the different functionalities available from social network platforms. In this stage, we will define the meaning of trust, hence; decide what measures are we going to use to measure it. This selection stage will affect how we calculate trust scores in the stage three and what information we need to gather during the data collection phase.

Stage Two: Selecting a User Group to Study

In this stage, we will select users whom we want to measure the extent to which they trust each other according to our definition of trust. We suggest using one of two approaches in this stage, either selecting a group of users manually who we assume have elements of trust between them, or selecting a group of users randomly without a prior assumption. In the first approach, we would select users based on the fact that they collaborate with each other, they explicitly express their belonging to the same movement or that they care about similar issues. This assumption will be our hypothesis that these users trust each other, thus, we want to measure to what extent they statistically or empirically trust each other. For the second approach, we will neglect this assumption and randomly pick users who for example participate in the same hashtags or who follow each other. This stage and the one before it will decide how the data collection will be carried out.

Stage Three: Calculating Trust Scores

In stage one, we define and select the metrics we will use to measure trust between two users. In this stage one or more trust metrics will be calculated, for each pair of users selected in the second stage we will compute the trust between them using all of the selected trust metrics. Each metric yields a score and the scores resulted from all the metrics will be summed so there will be one trust score for each pair of users. Figure 1 shows a matrix of pairwise trust scores among all users in the dataset.

Fig. 1.
figure 1

Trust metric matrix

The symbols A, B, C, D represent the users in the dataset, and the values t xy represents the trust score of user X trusting user Y, which naturally differs from t yx which represents the opposite. As we have seen in Sect. 2, trust metrics differ in the measures obtained from the social network, which determines the data collection approach.

Stage Four: Calculating Sentiment Agreement

Since our approach aims to study the relationship between the trust metric and the sentiment agreement among users, it is important to state how the sentiment agreement is calculated. To do so we will select a topic which the group of users are interested in, and measure their sentiment towards it. The ideal type of topics would be a hashtag, but it can also be any keyword that the users are posting or tweeting about. A user’s sentiment towards a topic would be calculated as a score SA ∈ [–1,1], where –1 means negative sentiment and 1 means positive sentiment and 0 means that the user’s tweets sentiment is neutral towards that topic. Sentiment agreement calculation will take into account all the tweets that contain that particular topic. Two users’ agreement on a sentiment would be calculated as the Euclidian distance between their sentiment scores, for example the sentiment agreement between user A, and user B on topic t1 would be St1ab. Users’ agreement scores will also be stored in a matrix as shown in Fig. 2. As users might be interested in several topics, we can have several matrices, each matrix represents the sentiment agreement of users towards one topic as illustrated in Fig. 3.

Fig. 2.
figure 2

Sentiment agreement matrix

Fig. 3.
figure 3

Sentiment agreement matrices for several topics

Stage Five: Comparing Sentiment Agreement and Trust

After computing both the sentiment agreement and the trust between users’ pairs, which yield two (or more) matrices one for users’ trust and several matrices for their sentiment agreement on one topic. In this stage, we will compare the users’ pair trust score and sentiment agreement scores. This will help us to precisely assess the relation between the users, and to go beyond measuring their numerical trust to actually assess whether they agree on the different topics they are interested in. This stage is also beneficial to assess the relation between trust and sentiment agreement and will give us a deep insight to understand the relation between the two. This will allow us to understand whether trust entails agreement, or not and to what extent trust entails sentiment agreement.

4 Use Case

In this section, we will explain with a practical example how this framework can be applied. During the last quarter of 2016, a huge Saudi feminist movement became very popular among the Saudi community on Twitter. It raises controversial issues about women’s rights in Saudi Arabia and gathered people’s attention both locally and globally. Many users become involved in this movement and participated by tweeting heavily using the chosen hashtags which are: and #StopEnslavingSaudiWomen. In order to apply the framework in this use case, we will explain the process by which the methodology will be carried out stage by stage.

Stage One

We assume that the conversational trust [7] is selected for measuring trust, then we will need to collect for each user the following: mentions, and tweets, especially the tweets’ timings.

Stage Two

We will select the users from Twitter who support this movement, after manually checking that they support the movement and heavily involved in it by tweeting using the hashtags.

Stage Three

Using the metrics chosen in stage one, we will compute the trust metrics and generate the trust pairwise scores matrix.

Stage Four

Since we are only concerned with one topic (regardless of the several hashtags used for it) we will measure users’ sentiment agreement and compute the sentiment agreement matrix.

Stage Five

Using both matrices computed in stages three and four, we will compare the trust and sentiment agreement between the selected users and first assess the extent to which users who are involved in this topic trust each other and the extent to which these users actually agree on their sentiment and compare the two scores.

5 Conclusion

In this paper, we have presented a research methodology framework for investigating the relationship between trust and sentiment agreement for Arab Twitter users. The framework was demonstrated by means of a use case from Saudi Arabia. As for future work we will implement a system based on the framework to measure the trust and compare it to sentiment agreement. Our intention is that our work will provide useful insight into the effect of the social and political changes on social capital as signified by trust. However, we must bear in mind that although social network analysis is a powerful lens in understanding societies it can amplify and distort reality and extra measures need to be taken to ensure a reliable representation.