skip to main content
10.1145/3330204.3330228acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbsiConference Proceedingsconference-collections
research-article

Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings

Published: 20 May 2019 Publication History

Abstract

Microblog posts (e.g. tweets) often contain users opinions and thoughts about events, products, people, organizations, among other possibilities. However, the usage of social media to promote online disinformation and manipulation is not an uncommon occurrence. Analyzing the characteristics of such discourses in social media is essential for understanding and fighting such actions. Extracting recurrent fragments of text, i.e. word sequences, which are semantically similar can lead to the discovery of linguistic patterns used in certain kinds of discourse. Therefore, we aim to use such patterns to encapsulate frequent discourses textually expressed in microblog posts. In this paper, we propose to exploit linguistic patterns in the context of the 2016 United Estates presidential election. Through a technique that we call Short Semantic Pattern (SSP) mining, we were able to extract sequences of words that share a similar meaning in their word embedding representation. In the experiments we investigate the incidence of SSP instances regarding political adversaries and media in tweets posted by Donald Trump, during the presidential election campaign. Experimental results show a high preponderance of some statements of Donald Trump towards their adversaries and expressions that often appeared in such tweets.

References

[1]
Nicolas Béchet, Peggy Cellier, Thierry Charnois, and Bruno Crémilleux. 2012. Discovering linguistic patterns using sequence mining. In International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 154--165.
[2]
Matthew Berland and Eugene Charniak. 1999. Finding parts in very large corpora. In Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics, 57--64.
[3]
David J Crowley and David Mitchell. 1994. Communication theory today. Stanford University Press.
[4]
Marti A Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 539--545.
[5]
Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. Sensembed: Learning sense embeddings for word and relational similarity. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. 95--105.
[6]
Ignacio Iacobacci, Mohammad Taher Pilehvar, and Roberto Navigli. 2016. Embeddings for word sense disambiguation: An evaluation study. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 897--907.
[7]
Thomas K Landauer and Susan T Dumais. 1997. A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review 104, 2 (1997), 211.
[8]
Dekang Lin, Shaojun Zhao, Lijuan Qin, and Ming Zhou. 2003. Identifying synonyms among distributionally similar words. In IJCAI, Vol. 3. 1492--1493.
[9]
Kep Kee Loh and Ryota Kanai. 2016. How has the Internet reshaped human cognition? The Neuroscientist 22, 5 (2016), 506--520.
[10]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013).
[11]
Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 746--751.
[12]
Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A measurement study of hate speech in social media. In Proceedings of the 28th ACM Conference on Hypertext and Social Media. ACM, 85--94.
[13]
Brian L Ott. 2017. The age of Twitter: Donald J. Trump and the politics of debasement. Critical Studies in Media Communication 34, 1 (2017), 59--68.
[14]
Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. 2017. A stylometric inquiry into hyperpartisan and fake news. arXiv preprint arXiv:1702.05638 (2017).
[15]
Sascha Rothe and Hinrich Schütze. 2015. Autoextend: Extending word embeddings to embeddings for synsets and lexemes. arXiv preprint arXiv:1507.01127 (2015).
[16]
Roy Schwartz, Roi Reichart, and Ari Rappoport. 2015. Symmetric pattern based word embeddings for improved word similarity prediction. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning. 258--267.
[17]
Stefan Stieglitz and Linh Dang-Xuan. 2013. Emotions and information diffusion in social media---sentiment of microblogs and sharing behavior. Journal of management information systems 29, 4 (2013), 217--248.
[18]
Andrew Trask, Phil Michalak, and John Liu. 2015. sense2vec-A fast and accurate method for word sense disambiguation in neural word embeddings. arXiv preprint arXiv:1511.06388 (2015).
[19]
Hajime Watanabe, Mondher Bouazizi, and Tomoaki Ohtsuki. 2018. Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection. IEEE Access 6 (2018), 13825--13835.

Cited By

View all
  • (2021)REDIC: Recommendation of Digital Influencers of Brazilian Artisanal CheeseProceedings of the XVII Brazilian Symposium on Information Systems10.1145/3466933.3466987(1-8)Online publication date: 7-Jun-2021
  • (2020)The Twittering PresidentsJournal of Language and Politics10.1075/jlp.19046.wig20:2(197-225)Online publication date: 18-Nov-2020
  • (2020) M S C +Knowledge-Based Systems10.1016/j.knosys.2019.105017188:COnline publication date: 5-Jan-2020

Index Terms

  1. Linguistic Pattern Mining for Data Analysis in Microblog Texts using Word Embeddings

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SBSI '19: Proceedings of the XV Brazilian Symposium on Information Systems
    May 2019
    623 pages
    ISBN:9781450372374
    DOI:10.1145/3330204
    © 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    In-Cooperation

    • SBC: Brazilian Computer Society

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 May 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Data Analysis
    2. Linguistic Pattern Recognition
    3. Natural Language Processing
    4. Twitter
    5. Word Embeddings

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Coordenação de Aperfeiçoamento de Pessoal de Nível Superior

    Conference

    SBSI'19

    Acceptance Rates

    Overall Acceptance Rate 181 of 557 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)REDIC: Recommendation of Digital Influencers of Brazilian Artisanal CheeseProceedings of the XVII Brazilian Symposium on Information Systems10.1145/3466933.3466987(1-8)Online publication date: 7-Jun-2021
    • (2020)The Twittering PresidentsJournal of Language and Politics10.1075/jlp.19046.wig20:2(197-225)Online publication date: 18-Nov-2020
    • (2020) M S C +Knowledge-Based Systems10.1016/j.knosys.2019.105017188:COnline publication date: 5-Jan-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media