research-article

Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings

Authors:

Danielly Sorato,

Renato FiletoAuthors Info & Claims

SBSI '19: Proceedings of the XV Brazilian Symposium on Information Systems

Article No.: 19, Pages 1 - 8

https://doi.org/10.1145/3330204.3330228

Published: 20 May 2019 Publication History

Abstract

Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.

References

[1]

Nicolas Béchet, Peggy Cellier, Thierry Charnois, and Bruno Crémilleux. 2012. Discovering linguistic patterns using sequence mining. In International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 154--165.

Digital Library

[2]

Matthew Berland and Eugene Charniak. 1999. Finding parts in very large corpora. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 57--64.

Digital Library

[3]

David J Crowley and David Mitchell. 1994. Communication theory today. Stanford University Press.

[4]

Marti A Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 539--545.

Digital Library

[5]

Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. Sensembed: Learning sense embeddings for word and relational similarity. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. 95--105.

[6]

Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. Embeddings for word sense disambiguation: An evaluation study. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 897--907.

[7]

Thomas K Landauer and Susan T Dumais. 1997. A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review 104, 2 (1997), 211.

[8]

Dekang Lin, Shaojun Zhao, Lijuan Qin, and Ming Zhou. 2003. Identifying synonyms among distributionally similar words. In IJCAI, Vol. 3. 1492--1493.

Digital Library

[9]

Kep Kee Loh and Ryota Kanai. 2016. How has the Internet reshaped human cognition? The Neuroscientist 22, 5 (2016), 506--520.

[10]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).

[11]

Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 746--751.

[12]

Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A measurement study of hate speech in social media. In Proceedings of the 28th ACM Conference on Hypertext and Social Media. ACM, 85--94.

Digital Library

[13]

Brian L Ott. 2017. The age of Twitter: Donald J. Trump and the politics of debasement. Critical Studies in Media Communication 34, 1 (2017), 59--68.

[14]

Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. 2017. A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638 (2017).

[15]

Sascha Rothe and Hinrich Schütze. 2015. Autoextend: Extending word embeddings to embeddings for synsets and lexemes. arXiv preprint arXiv:1507.01127 (2015).

[16]

Roy Schwartz, Roi Reichart, and Ari Rappoport. 2015. Symmetric pattern based word embeddings for improved word similarity prediction. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning. 258--267.

[17]

Stefan Stieglitz and Linh Dang-Xuan. 2013. Emotions and information diffusion in social media---sentiment of microblogs and sharing behavior. Journal of management information systems 29, 4 (2013), 217--248.

[18]

Andrew Trask, Phil Michalak, and John Liu. 2015. sense2vec-A fast and accurate method for word sense disambiguation in neural word embeddings. arXiv preprint arXiv:1511.06388 (2015).

[19]

Hajime Watanabe, Mondher Bouazizi, and Tomoaki Ohtsuki. 2018. Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection. IEEE Access 6 (2018), 13825--13835.

Cited By

Donato Soares NBraga RMaria N. David JBeatriz Siqueira Kda Silva Nogueira TWendelin Campos EAugusto Priamo Moares EVanessa Zabala Capriles Goliatt P(2021)REDIC: Recommendation of Digital Influencers of Brazilian Artisanal CheeseProceedings of the XVII Brazilian Symposium on Information Systems10.1145/3466933.3466987(1-8)Online publication date: 7-Jun-2021
https://dl.acm.org/doi/10.1145/3466933.3466987
Wignell PTan SO’Halloran KChai K(2020)The Twittering PresidentsJournal of Language and Politics10.1075/jlp.19046.wig20:2(197-225)Online publication date: 18-Nov-2020
https://doi.org/10.1075/jlp.19046.wig
Goularte FSorato DNassar SFileto RSaggion H(2020) M S C +Knowledge-Based Systems10.1016/j.knosys.2019.105017188:COnline publication date: 5-Jan-2020
https://dl.acm.org/doi/10.1016/j.knosys.2019.105017

Index Terms

Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Environment-specific retrieval
        Web and social media search

Recommendations

Analysis of Microblog Rumors and Correction Texts for Disaster Situations
IIWAS '13: Proceedings of International Conference on Information Integration and Web-based Applications & Services

Microblogging systems such as Twitter have become popular. They are especially useful and helpful for users in disaster situations. Microblogs have facilitated the spread of information of all kinds, even rumors. Rumors block adequate information ...
Using Tweets Embeddings For Hashtag Recommendation in Twitter

Social microblogging platforms such as Twitter have become hugely popular forms of this latest sort of blogging. Twitter users make and use hashtags in their tweets to categorize them according to topic or theme, likewise to make them ascertainable to ...
A Distributed Approach for Mining Moroccan Hashtags using Twitter Platform
NISS '19: Proceedings of the 2nd International Conference on Networking, Information Systems & Security

Twitter is a social networking service, on which users can share thoughts and interact with events. In this paper, the authors propose a distributed approach to combine the multilingualism analysis of hashtags generated by Moroccan users in the social ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

SBSI '19: Proceedings of the XV Brazilian Symposium on Information Systems

May 2019

623 pages

ISBN:9781450372374

DOI:10.1145/3330204

Copyright © 2019 ACM.

© 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

SBC: Brazilian Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior

Conference

SBSI'19

SBSI'19: XV Brazilian Symposium on Information Systems

May 20 - 24, 2019

Aracaju, Brazil

Acceptance Rates

Overall Acceptance Rate 181 of 557 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
136
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)2

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Donato Soares NBraga RMaria N. David JBeatriz Siqueira Kda Silva Nogueira TWendelin Campos EAugusto Priamo Moares EVanessa Zabala Capriles Goliatt P(2021)REDIC: Recommendation of Digital Influencers of Brazilian Artisanal CheeseProceedings of the XVII Brazilian Symposium on Information Systems10.1145/3466933.3466987(1-8)Online publication date: 7-Jun-2021
https://dl.acm.org/doi/10.1145/3466933.3466987
Wignell PTan SO’Halloran KChai K(2020)The Twittering PresidentsJournal of Language and Politics10.1075/jlp.19046.wig20:2(197-225)Online publication date: 18-Nov-2020
https://doi.org/10.1075/jlp.19046.wig
Goularte FSorato DNassar SFileto RSaggion H(2020) M S C +Knowledge-Based Systems10.1016/j.knosys.2019.105017188:COnline publication date: 5-Jan-2020
https://dl.acm.org/doi/10.1016/j.knosys.2019.105017

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten