Linguistic and News-Sharing Polarization During the 2019 South American Protests

Villa-Cox, Ramon; Zeng, Helen Shuxuan; KhudaBukhsh, Ashiqur R.; Carley, Kathleen M.

doi:10.1007/978-3-031-19097-1_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13618))

Included in the following conference series:

International Conference on Social Informatics

982 Accesses

Abstract

In computational social science, two parallel research directions exploring – news consumption patterns and linguistic regularities – have made significant inroads into better understanding complex political polarization in the era of ubiquitous internet. However, little or no literature exists that presented a unified treatment combining both these research directions. When working on social events from countries that do not speak English as a first language, computational linguistic resource availability is often a barrier to sophisticated linguistic analyses. In this work, we analyze an important sociopolitical event, the 2019 South American protests, and demonstrate that (1) a combined treatment offers a more comprehensive understanding of the event; and (2) these cross-cutting methods can be applied in a synergistic way. The insights gained by the combination of these methods include that polarization in users’ news sharing patterns was consistent with their stances towards the government and that polarization in their language mainly manifested along ideological, political, or protest-related lines. In addition, we release a massive dataset of 15 million tweets relevant to this crisis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.nytimes.com/2020/01/19/us/politics/south-america-russian-twitter.html.
2.
Publicly available at: https://doi.org/10.5281/zenodo.6213032.
3.
This refers to the decree 883 which proposed austerity measures and started the protests in the country.
4.
An outlet can be retweeted either directly or indirectly via a tweet originating from their account, or a third party tweet containing a url with their domain.
5.
Note to reviewers: to uphold the anonymization policies for submission, we will make the link publicly available before publication.
6.
https://netanomics.com/netmapper-government-commercial-version/.

References

Alhazmi, K., Alsumari, W., Seppo, I., Podkuiko, L., Simon, M.: Effects of annotation quality on model performance. In: 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 063–067 (2021)
Google Scholar
Babcock, M., Cox, R.V.C., Kumar, S.: Diffusion of pro-and anti-false information tweets: the black panther movie case. Comput. Math. Organ. Theory 25(1), 72–84 (2019)
Article Google Scholar
Babcock, M., Villa-Cox, R., Carley, K.M.: Pretending positive, pushing false: comparing captain marvel misinformation campaigns. In: Shu, K., Wang, S., Lee, D., Liu, H. (eds.) Disinformation, Misinformation, and Fake News in Social Media. LNSN, pp. 83–94. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-42699-6_5
Chapter Google Scholar
Baldwin, M., Lammers, J.: Past-focused environmental comparisons promote proenvironmental outcomes for conservatives. Proc. Natl. Acad. Sci. 113(52), 14953–14957 (2016)
Article Google Scholar
Barberá, P., et al.: The critical periphery in the growth of social protests. PLoS ONE 10(11), e0143611 (2015)
Article Google Scholar
Beguerisse-Díaz, M., Garduno-Hernández, G., Vangelov, B., Yaliraki, S.N., Barahona, M.: Interest communities and flow roles in directed networks: the twitter network of the UK riots. J. R. Soc. Interface 11(101), 20140940 (2014)
Article Google Scholar
Darwish, K.: Quantifying polarization on twitter: the Kavanaugh nomination. arXiv abs/2001.02125 (2020)
Google Scholar
Del, M., et al.: The spreading of misinformation online. Proc. Natl. Acad. Sci. 113(3), 554–559 (2016)
Article Google Scholar
Demszky, D., et al.: Analyzing polarization in social media: method and application to tweets on 21 mass shootings. In: NAACL-HLT 2019, pp. 2970–3005. Association for Computational Linguistics (2019)
Google Scholar
Evans, A.: Stance and identity in twitter hashtags. Lang. Internet 13(1) (2016)
Google Scholar
Fisher, D.R., Waggle, J., Leifeld, P.: Where does political polarization come from? Locating polarization within the us climate change debate. Am. Behav. Sci. 57(1), 70–92 (2013)
Article Google Scholar
Garrett, R.K.: The “echo chamber" distraction: disinformation campaigns are the problem, not audience fragmentation. J. Appl. Res. Mem. Cogn. 6(4), 370–376 (2017). https://www.sciencedirect.com/science/article/pii/S2211368117301936
Golbeck, J., Hansen, D.: Computing political preference among twitter followers. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1105–1108 (2011)
Google Scholar
González-Bailón, S., Wang, N.: Networked discontent: the anatomy of protest campaigns in social media. Soc.l Netw. 44, 95–104 (2016)
Article Google Scholar
Gu, Y., Chen, T., Sun, Y., Wang, B.: Ideology Detection for twitter users via link analysis. In: Lee, D., Lin, Y.-R., Osgood, N., Thomson, R. (eds.) SBP-BRiMS 2017. LNCS, vol. 10354, pp. 262–268. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60240-0_32
Chapter Google Scholar
Gurganus, J.: Russia: Playing a Geopolitical Game in Latin America. Carnegie Endownent for Peace (2018)
Google Scholar
Hovy, D., Spruit, S.L.: The social impact of natural language processing. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 591–598 (2016)
Google Scholar
KhudaBukhsh, A.R., Sarkar, R., Kamlet, M.S., Mitchell, T.M.: We don’t speak the same language: interpreting polarization through machine translation. In: AAAI 2021, pp. 14893–14901 (2021)
Google Scholar
KhudaBukhsh, A.R., Sarkar, R., Kamlet, M.S., Mitchell, T.M.: Fringe news networks: dynamics of US news viewership following the 2020 presidential election. In: WebSci 2022: 14th ACM Web Science Conference 2022, pp. 269–278. ACM (2022)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: EMNLP 2014, pp. 1746–1751, October 2014
Google Scholar
Koutra, D., Bennett, P.N., Horvitz, E.: Events and controversies: Influences of a shocking news event on information seeking. CoRR abs/1405.1486 (2014). https://arxiv.org/abs/1405.1486
Ling, R.: Confirmation bias in the era of mobile news consumption: the social and psychological dimensions. Digit Journal. 8, 1–9 (2020)
Google Scholar
McConnell, C., Margalit, Y., Malhotra, N., Levendusky, M.: Research: Political Polarization Is Changing How Americans Work and Shop. Harvard Business Review (2017)
Google Scholar
Mohammad, S., Kiritchenko, S., Sobhani, P., Zhu, X., Cherry, C.: Semeval-2016 task 6: detecting stance in tweets. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), pp. 31–41 (2016)
Google Scholar
Olteanu, A., Castillo, C., Diaz, F., Kıcıman, E.: Social data: biases, methodological pitfalls, and ethical boundaries. Front. Big Data 2, 13 (2019)
Article Google Scholar
Poole, K.T.: Howard: the polarization of American politics. J. Polit. 46(4), 1061–1079 (1984)
Article Google Scholar
Poole, K.T., Rosenthal, H.: The polarization of American politics. J. Polit. 46(4), 1061–1079 (1984)
Article Google Scholar
Prior, M.: Media and political polarization. Annu. Rev. Polit. Sci. 16, 101–127 (2013)
Article Google Scholar
Rouvinski, V.: Understanding Russian priorities in Latin America. Kennan Cable 20 (2017)
Google Scholar
Smith, S.L., Turban, D.H.P., Hamblin, S., Hammerla, N.Y.: Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In: 5th International Conference on Learning Representations, ICLR 2017 (2017)
Google Scholar
Spohr, D.: Fake news and ideological polarization: filter bubbles and selective exposure on social media. Bus. Inf. Rev. 34(3), 150–160 (2017)
Google Scholar
Swamy, S., Ritter, A., de Marneffe, M.C.: “i have a feeling trump will win..................": forecasting winners and losers from user predictions on twitter. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1583–1592 (2017)
Google Scholar
Tsakalidis, A., Aletras, N., Cristea, A.I., Liakata, M.: Nowcasting the stance of social media users in a sudden vote: the case of the greek referendum. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 367–376 (2018)
Google Scholar
Wong, F.M.F., Tan, C.W., Sen, S., Chiang, M.: Quantifying political leaning from tweets, retweets, and retweeters. IEEE Trans. Knowl. Data Eng. 28(8), 2158–2172 (2016)
Article Google Scholar
Xiao, Z., Song, W., Xu, H., Ren, Z., Sun, Y.: TIMME: Twitter ideology-detection via multi-task multi-relational embedding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2258–2268 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Ramon Villa-Cox, Helen Shuxuan Zeng & Kathleen M. Carley
Rochester Institute of Technology, Rochester, NY, 14623, USA
Ashiqur R. KhudaBukhsh

Authors

Ramon Villa-Cox
View author publications
You can also search for this author in PubMed Google Scholar
Helen Shuxuan Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Ashiqur R. KhudaBukhsh
View author publications
You can also search for this author in PubMed Google Scholar
Kathleen M. Carley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ramon Villa-Cox .

Editor information

Editors and Affiliations

Universität Koblenz-Landau, Koblenz, Germany
Frank Hopfgartner
National University of Singapore, Singapore, Singapore
Kokil Jaidka
GESIS – Leibniz-Institut für Sozialwissenschaften, Cologne, Germany
Philipp Mayr
University of Glasgow, Glasgow, UK
Joemon Jose
University of Glasgow, Glasgow, UK
Jan Breitsohl

Appendices

A Data Collection

Table 7. Collection period and number of tweets collected for each country.

Full size table

The dataset consists of 100 million tweets from 15+ million users collected using Twitter’s API v1 around the protests that transpired in countries studied. For each event, we built the queries by first identifying the most prominent hashtags/terms (using Twitter’s trending terms in the country). After some days of streaming, we determined the most frequent relevant hashtags not yet included and taking special effort to include hashtags used by different groups (for and against the each government). We included these to our query which were collected via weekly REST grabs (to ensure their collection from the start). By repeating this process each week, we built up the set of more than 500 hashtags. To improve the quality of the conversational structure present in the data, we also re-hydrated any missing targets or ancestors (up to 5 levels above in the conversation tree) of replies or quotes. Table 7 presents the relevant descriptive statistics for the collection. To better contextualize our work, we first present a brief overview of the main events that transpired in each of the countries.

1.1 A.1 Ethical Considerations

We make our data publicly available and, to adhere to Twitter’s terms and conditions for sharing data, we do not share the full JSON of the collected tweets^{Footnote 5}. Instead, we provide their respective tweet or user IDs, the type of tweet (Original, Reply or Quote), and in the case of weakly labeled users or tweets, their assigned label. Since the Tweets will have to be re-hydrated, if a user deletes a tweet (or their account), it will not be available for analysis ensuring that the user’s right to be forgotten is preserved. However, for the hand-labeled political figures (described later), given their public role during these events, we not only provide their user ID but also their user name and user type. We also release the full set of labeled stance tags.

B Weak-Labeled Dataset

We determine the user stance based on how prominently they tweet (or retweet) a hashtag from a given stance or retweet a labeled political figure. In this appendix we provide further details of the validation methodology used to prune the set of stance-tags and further details of the weak-labels obtained by each signal.

1.1 B.1 Stance-Tags Validation

Our weak labeling methodology relies on the hypothesis that users are more likely to tweet (or retweet) hashtags or political figures that are aligned with their stances during these events. Hence, weak-stance labels are assigned to a user if their percentage of tweets with a consistent stance-tag is above a given threshold. To test this hypothesis, we apply our methodology (just based on stance-tags) to predict the stance of the political figures labeled. We can also use this exercise to determine a suitable threshold for the stance assignment. We limit our analysis to the 88.1% of labeled users that tweeted (or retweeted) at least 5 tweets containing a stance stag. We also present results excluding the set of extra 229 stance-tags obtained using this set of users in order to have a better assessment of the performance in the wild. Figure 2 presents the accuracy of the methodology at different probability thresholds. As expected, higher thresholds are more conservative in the assignment of a label (the percentage of undetermined users increases) but also decrease the likelihood of missclassification. However, in the most aggressive classification threshold, only 2.6% of the users are missclassified, which supports our starting hypothesis.

For the construction of the dataset released in this work, we opt for a conservative 90% threshold, which results in 74.2% correctly classified users but only a 0.3% (2 users) classification error. The reason for this conservative approach is that our validation set is comprised of highly political users, which could result in a higher likelihood of missclassification among more casual users [10].

Nonetheless, we are able to considerably increase the performance of this methodology (with the 90% threshold) by first including the aforementioned 229 hashtags, used exclusively by user of each side, which improves the accuracy to 80.0%. Lastly, we prune our hashtag set by removing tags that were used too frequently by users of a different stance. This results in the removal of 46 hashtags and brings the final classification accuracy of our proposed weak-labeling methodology to 88.6%.

1.2 B.2 Assignment of User Stances

We assign the User stance based on how prominently they tweet (or retweet) a stance tag or retweet a labeled political figure. The threshold used to determine the stance was obtained during the hashtag validation procedure described above and set at 90%.

Table 8. Government Stance of users based on hashtag usage.

Full size table

Hashtag Usage. Users were assigned a stance if they used stance tags either in their tweets (or retweets) or in their user description. In both cases, a stance was assigned to a tweet (or description) if it contains hashtags with the same stance, otherwise it was deemed inconsistent. As before, we only proceeded with users that had at least 5 tweets with a consistent stance or if at least one description was consistent. As less than 1% of labeled users were labeled based on their descriptions, we do not desegregate results based on the origin of the label. A stance was assigned to a user if at least 90% of their tweets had the same stance. The number of users classified and their distribution is presented in Table 8.

Endorsement of Political Figures. The procedure followed to assign a stance to a user based on their endorsement of political figures, follows the same logic as before. As such, users were assigned a stance if at least 90% of their retweets of labeled political figures are from users with the same stance. As before, we only proceed with users that had at least 5 retweets of these users. The number of users classified and their distribution is presented in Table 9.

Table 9. Government stance of users based on endorsement of political figures.

Full size table

Table 10. Number of news agencies in each country. *The regional category includes regional Venezuelan and Russian media among others.

Full size table

Table 11. Distribution of the labeled tweets and resulting predictions after classification.

Full size table

C Filtering Irrelevant Media Tweets

We started with a dataset of news agencies and journalist for the countries explored (this was obtained from the NetMapper software^{Footnote 6}). It had several limitations and was expanded by searching for the most important news agencies operating in each country, manually checking who they follow and adding agencies that were not included. This resulted in a list of 853 news agencies (or major reporters) detailing their Twitter handles and main URL (if available). Notably, the list included agencies from Venezuela and Russia that predominately operate in the region, this is important as we explore influence campaigns on the protests. We then proceeded to identify the agencies that were either directly retweeted by a user or that had a user tweet/retweet a URL corresponding to their domain. The number of news agencies from each country resulting from this process is shown in Table 10.

However, news articles identified in our data set cover topics ranging from the protests to sports. When studying the polarization of news consumption during the political event, it is important to first remove tweets which are irrelevant to the protests. It is not obvious if a tweet from a news agency is relevant or not, but many tweets in our data set contain the URL of an article that they reference. For this reason, we determined the relevance to the protest of a small set of the 900 most tweeted URLs in our dataset distruted among the different countries. We complemented this dataset with an additional set of URLs labeled by extracting subsection metadata from them. If the subsection referenced sports, culture or technology, the URL was labeled as irrelevant to the protests. Then, we assigned the URL label to any tweet that used it. The final sample distribution are presented in Table 11. We note that even though we are able to assign a label to more than 100k tweets, most of them contained duplicated text (as news media tend to tweet the same thing multiple times). The classification was done with the unduplicated dataset.

To classify the relevance of the tweets, we built a CNN text classifier [20] using 300 dimensional FastText embeddings trained on the combined datasets (both by stance and country) used to analyze the language polarization. We used 100 filters on 3 layers with filter sizes 3, 4 and 5 and a dropout rate of 50%. We achieved an accuracy and F1-score of 92% in a held-out test set. After predicting the labels of tweets (relevant or irrelavant to the protests), we obtain a dataset of 1,024,166 relevant and 675,496 irrelevant tweets. The distribution of the data set is shown in Table 11. The analysis of polarization in news consumption patterns presented in this works was done only on the tweets that are relevant to the protests.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Villa-Cox, R., Zeng, H.S., KhudaBukhsh, A.R., Carley, K.M. (2022). Linguistic and News-Sharing Polarization During the 2019 South American Protests. In: Hopfgartner, F., Jaidka, K., Mayr, P., Jose, J., Breitsohl, J. (eds) Social Informatics. SocInfo 2022. Lecture Notes in Computer Science, vol 13618. Springer, Cham. https://doi.org/10.1007/978-3-031-19097-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-19097-1_5
Published: 12 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19096-4
Online ISBN: 978-3-031-19097-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Linguistic and News-Sharing Polarization During the 2019 South American Protests

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Data Collection

1.1 A.1 Ethical Considerations

B Weak-Labeled Dataset

1.1 B.1 Stance-Tags Validation

1.2 B.2 Assignment of User Stances

C Filtering Irrelevant Media Tweets

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation