skip to main content
10.1145/3511808.3557503acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
tutorial

Information Extraction from Social Media: A Hands-on Tutorial on Tasks, Data, and Open Source Tools

Published:17 October 2022Publication History

ABSTRACT

Information extraction (IE) is a common sub-area of natural language processing that focuses on identifying structured data from unstructured data. One application domain of IE is Information Retrieval (IR), which relies on accurate and high-performance IE to retrieve high quality results from massive datasets. Another example of IE is to identify named entities in a text. For example, in the the sentence "Katy Perry lives in the USA", Katy Perry and USA are named entities of types of PERSON and LOCATION, respectively. Also, identify the sentiment expressed in a text is another instance of IE: in the sentence, "This movie was awesome", the expressed sentiment is positive. Finally, IE is concerned with identifying various linguistic aspects of text data, e.g., part of speech of words, noun phrases, dependency parses, etc., which can serve as features for additional IE tasks. This tutorial introduces participants to a) the usage of Python based, open-source tools that support IE from social media data (mainly Twitter), and b) best practices for ensuring the responsible use of IE and research data. Participants will learn and practice various lexical, semantic, and syntactic IE techniques that are commonly used for analyzing tweets. Participants will also be familiarized with the landscape of publicly available social media data (including popular NLP and IE benchmarks) and methods for collecting and preparing them for analysis. Furthermore, participants will be trained to use a suite of open source tools (SAIL for active learning, TwitterNER for named entity recognition, TweetNLP for transformer based NLP, and SocialMediaIE for multi task learning), which utilize advanced machine learning techniques (e.g., deep learning, active learning with human-in-the-loop, multi-lingual, and multi-task learning) to perform IE on their own or existing datasets. Participants will also learn how social contexts of text production and usage of results can be integrated into IE systems to improve these systems and to consider the role of time in improving social media IE quality. Finally, participants will learn about the governance of social media data for research purposes. The tools introduced in the tutorial will focus on the three main stages of IE, namely, collection of data (including annotation), data processing and analytics, and visualization of the extracted information. More details can be found at: https://socialmediaie.github.io/tutorials/

References

  1. Aseel Addawood, Rezvaneh Rezapour, Shubhanshu Mishra, Jodi Schneider, and Jana Diesner. 2017. Developing an Information Source Lexicon. In Prioritising Online Content workshop co-located at NIPS.Google ScholarGoogle Scholar
  2. Juan M. Banda, Ramya Tekumalla, Guanyu Wang, Jingyuan Yu, Tuo Liu, Yuning Ding, Ekaterina Artemova, Elena Tutubalina, and Gerardo Chowell. 2021. A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research-An International Collaboration. Epidemiologia, Vol. 2, 3 (2021), 315--324. https://doi.org/10.3390/epidemiologia2030024Google ScholarGoogle ScholarCross RefCross Ref
  3. Francesco Barbieri, Luis Espinosa-Anke, and Jose Camacho-Collados. 2022. XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond. In Proceedings of LREC.Google ScholarGoogle Scholar
  4. Emily M Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, Vol. 6 (2018), 587--604.Google ScholarGoogle ScholarCross RefCross Ref
  5. Danah Boyd and Kate Crawford. 2012. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, Vol. 15, 5 (2012), 662--679.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jose Camacho-Collados, Yerai Doval, Eugenio Martínez-Cámara, Luis Espinosa-Anke, Francesco Barbieri, and Steven Schockaert. 2020. Learning Cross-lingual Embeddings from Twitter via Distant Supervision. In Proceedings of ICWSM (Atlanta, United States).Google ScholarGoogle ScholarCross RefCross Ref
  7. Jose Camacho-Collados, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa-Anke, Fangyu Liu, Eugenio Mart'inez-Cámara, et al. 2022. TweetNLP: Cutting-Edge Natural Language Processing for Social Media. arXiv preprint arXiv:2206.14774 (2022).Google ScholarGoogle Scholar
  8. Kathleen M Carley, Jana Diesner, Jeffrey Reminga, and Maksim Tsvetovat. 2004. An integrated approach to the collection and analysis of network data. In IN PROC OF THE (NAACSOS) 2004 CONFERENCE. Citeseer.Google ScholarGoogle Scholar
  9. Daniel Collier, Shubhanshu Mishra, Derek Houston, Brandon Hensley, Scott Mitchell, and Nicholas Hartlep. 2019b. Who is Most Likely to Oppose Federal Tuition-Free College Policies? Investigating Variable Interactions of Sentiments to America's College Promise. SSRN Electronic Journal (2019). https://doi.org/10.2139/ssrn.3423054Google ScholarGoogle Scholar
  10. Daniel A. Collier, Shubhanshu Mishra, Derek A. Houston, Brandon O. Hensley, and Nicholas D. Hartlep. 2019a. Americans 'support' the idea of tuition-free college: an exploration of sentiment and political identity signals otherwise. Journal of Further and Higher Education, Vol. 43, 3 (mar 2019), 347--362. https://doi.org/10.1080/0309877X.2017.1361516Google ScholarGoogle ScholarCross RefCross Ref
  11. Laura Dabbish, Ben Towne, Jana Diesner, and James Herbsleb. 2011. Construction of association networks from communication in teams working on complex projects. Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 4, 5 (2011), 547--563. https://doi.org/10.1002/sam.10135Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kareem Darwish, Peter Stefanov, Michaël Aupetit, and Preslav Nakov. 2020. Unsupervised user stance detection on twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 141--152.Google ScholarGoogle ScholarCross RefCross Ref
  13. Jana Diesner. 2015a. Small decisions with big impact on data analytics. Big Data & Society, Vol. 2, 2 (2015). https://doi.org/10.1177/2053951715617185Google ScholarGoogle ScholarCross RefCross Ref
  14. Jana Diesner. 2015b. Words and Networks: How Reliable Are Network Data Constructed from Text Data? Springer International Publishing, Cham, 81--89. https://doi.org/10.1007/978--3--319-05467--4_5Google ScholarGoogle Scholar
  15. Jana Diesner and Kathleen M Carley. 2008a. Conditional random fields for entity extraction and ontological text coding. Computational and Mathematical Organization Theory, Vol. 14, 3 (2008), 248--262. https://doi.org/10.1007/s10588-008--9029-zGoogle ScholarGoogle ScholarCross RefCross Ref
  16. Jana Diesner and Kathleen M Carley. 2008b. Looking Under the Hood of Stochastic Machine Learning Algorithms for Parts of Speech Tagging (Carnegie Mellon University-ISR-08--131). Technical Report. Pittsburgh, PA: Carnegie Mellon University, School of Computer Science, Institute for Software Research.Google ScholarGoogle Scholar
  17. Jana Diesner and Kathleen M Carley. 2009. He says, she says. Pat says, Tricia says. How much reference resolution matters for entity extraction, relation extraction, and social network analysis. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. IEEE, Ottawa, Canada, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jana Diesner and Kathleen M. Carley. 2010a. Extraktion relationaler Daten aus Texten [Relation extraction from texts]. In Handbuch Netzwerkforschung [Handbook network research], Christian Stegbauer and Roger H"außling (Eds.). VS Verlag für Sozialwissenschaften, 507--521. https://doi.org/10.1007/978--3--531--92575--2_44Google ScholarGoogle Scholar
  19. Jana Diesner and Kathleen M Carley. 2010b. A methodology for integrating network theory and topic modeling and its application to innovation diffusion. In 2010 IEEE Second International Conference on Social Computing. IEEE, Minneapolis, MN, 687--692. https://doi.org/10.1109/SocialCom.2010.106Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jana Diesner and Kathleen M Carley. 2011. Words and Networks. In Encyclopedia of social networks,, George A Barnett (Ed.). Sage Publications, 958--961.Google ScholarGoogle Scholar
  21. Jana Diesner and Chieh-Li Chin. 2015. Usable ethics: practical considerations for responsibly conducting research with social trace data. Proceedings of Beyond IRBs: Ethical Review Processes for Big Data Research (2015).Google ScholarGoogle Scholar
  22. Jana Diesner and Chieh-Li Chin. 2016a. Gratis, libre, or something else? Regulations and misassumptions related to working with publicly available text data. In ETHI-CA² Workshop (ETHics in Corpus Collection, Annotation & Application), 10th Language Resources and Evaluation Conference (LREC), Portoroz, Slovenia.Google ScholarGoogle Scholar
  23. Jana Diesner and Chieh-Li Chin. 2016b. Seeing the forest for the trees: Considering applicable types of regulation for the responsible collection and analysis of human centered data. In Human-Centered Data Science (HCDS) Workshop at 19th ACM Conference on Computer-Supported Cooperative Work and Social Computing.Google ScholarGoogle Scholar
  24. Jana Diesner and Craig S Evans. 2015. Little bad concerns: Using sentiment analysis to assess structural balance in communication networks. In Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 342--348. https://doi.org/10.1145/2808797.2809403Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jana Diesner, Ponnurangam Kumaraguru, and Kathleen M Carley. 2005. Mental models of data privacy and security extracted from interviews with indians. In 55th Annual Conference of the International Communication Association (ICA), New York, NY.Google ScholarGoogle Scholar
  26. Jacob Eisenstein. 2013. What to do about bad language on the internet. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Atlanta, Georgia, 359--369.Google ScholarGoogle Scholar
  27. Ahmed El-Kishky, Thomas Markovich, Serim Park, Chetan Verma, Baekjin Kim, Ramy Eskander, Yury Malkov, Frank Portman, Sof'ia Samaniego, Ying Xiao, and Aria Haghighi. 2022. TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Washington DC, USA) (KDD '22). Association for Computing Machinery, New York, NY, USA, 2842--2850. https://doi.org/10.1145/3534678.3539080Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ramy Eskander, Peter Martigny, and Shubhanshu Mishra. 2020. Multilingual Named Entity Recognition in Tweets using Wikidata. In The fourth annual WeCNLP (West Coast NLP) Summit (WeCNLP) (virtual). Zenodo. https://doi.org/10.5281/zenodo.7014432Google ScholarGoogle Scholar
  29. Casey Fiesler, Nathan Beard, and Brian C Keegan. 2020. No robots, spiders, or scrapers: Legal and ethical regulation of data collection methods in social media terms of service. In Proceedings of the international AAAI conference on web and social media, Vol. 14. 187--196.Google ScholarGoogle ScholarCross RefCross Ref
  30. Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM, Vol. 64, 12 (2021), 86--92.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kanyao Han, Pingjing Yang, Shubhanshu Mishra, and Jana Diesner. 2020. WikiCSSH: Extracting Computer Science Subject Headings from Wikipedia. In Workshop on Scientific Knowledge Graphs (SKG 2020).Google ScholarGoogle ScholarCross RefCross Ref
  32. Liam Peter Hebert, Raheleh Makki, Yuval Merhav, Hamidreza Saghir, and Shubhanshu Mishra. 2022. Robust Candidate Generation for Entity Linking on Short Social Media Texts. In Proceedings of the Seventh Workshop on Noisy User-generated Text (WNUT).Google ScholarGoogle Scholar
  33. Martin Hilbert, George Barnett, Joshua Blumenstock, Noshir Contractor, Jana Diesner, Seth Frey, Sandra González-Bailón, PJ Lamberson, Jennifer Pan, Tai-Quan Peng, Cuihua (Cindy) Shen, Paul E. Smaldino, Wouter van Atteveldt, Annie Waldherr, Jingwen Zhang, and Jonathan J. H. Zhu. 2019. Computational Communication Science: A Methodological Catalyzer for a Maturing Discipline. International Journal of Communication, Vol. 13, 0 (2019), 3912--3934.Google ScholarGoogle Scholar
  34. Clayton J Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In International AAAI Conference on Web and Social Media. Ann Arbor, Michigan, USA.Google ScholarGoogle ScholarCross RefCross Ref
  35. Andreas M. Kaplan and Michael Haenlein. 2010. Users of the world, unite! The challenges and opportunities of Social Media. Business Horizons, Vol. 53, 1 (jan 2010), 59--68. https://doi.org/10.1016/j.bushor.2009.09.003Google ScholarGoogle ScholarCross RefCross Ref
  36. Michal Kosinski, Sandra C. Matz, Samuel D. Gosling, Vesselin Popov, and David Stillwell. 2015a. Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. American Psychologist, Vol. 70, 6 (sep 2015), 543--556. https://doi.org/10.1037/a0039210Google ScholarGoogle ScholarCross RefCross Ref
  37. Michal Kosinski, Sandra C Matz, Samuel D Gosling, Vesselin Popov, and David Stillwell. 2015b. Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. American psychologist, Vol. 70, 6 (2015), 543--556.Google ScholarGoogle Scholar
  38. Vivek Kulkarni, Shubhanshu Mishra, and Aria Haghighi. 2021. LMSOC: An Approach for Socially Sensitive Pretraining. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 2967--2975. https://doi.org/10.18653/v1/2021.findings-emnlp.254Google ScholarGoogle Scholar
  39. Jinning Li, Shubhanshu Mishra, Ahmed El-Kishki, Sneha Mehta, and Vivek Kulkarni. 2022. Enriching Social Media Text Representations with Non-Textual Units. In Proceedings of the Seventh Workshop on Noisy User-generated Text (WNUT).Google ScholarGoogle Scholar
  40. Daniel Loureiro, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, and Jose Camacho-collados. 2022. TimeLMs: Diachronic Language Models from Twitter. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Dublin, Ireland, 251--260. https://doi.org/10.18653/v1/2022.acl-demo.25Google ScholarGoogle ScholarCross RefCross Ref
  41. Shubhanshu Mishra. 2017. SCTG: Social Communications Temporal Graph -- A novel approach to visualize temporal communication graphs from social data. In UIUC Data Science Day.Google ScholarGoogle Scholar
  42. Shubhanshu Mishra. 2019a. Multi-Dataset Multi-Task Learning Benchmark for Social Media Information Extraction. https://doi.org/10.5281/zenodo.5867160Google ScholarGoogle Scholar
  43. Shubhanshu Mishra. 2019b. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media - HT '19. ACM Press, New York, New York, USA, 283--284. https://doi.org/10.1145/3342220.3344929Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Shubhanshu Mishra. 2020a. Improving Social Media Information Extraction using Multitask Multidataset Learning. In The fourth annual WeCNLP (West Coast NLP) Summit (WeCNLP) (virtual). Zenodo. https://doi.org/10.5281/zenodo.7014470Google ScholarGoogle Scholar
  45. Shubhanshu Mishra. 2020b. Information Extraction from Digital Social Trace Data with Applications to Social Media and Scholarly Communication Data. ACM SIGIR Forum, Vol. 54, 1 (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Shubhanshu Mishra. 2020c. Information Extraction from Digital Social Trace Data with Applications to Social Media and Scholarly Communication Data. Ph.,D. Dissertation. University of Illinois at Urbana-Champaign.Google ScholarGoogle Scholar
  47. Shubhanshu Mishra. 2020d. Information extraction from digital social trace data with applications to social media and scholarly communication data. Ph.,D. Dissertation. University of Illinois at Urbana-Champaign. https://shubhanshu.com/phd_thesis/Google ScholarGoogle Scholar
  48. Shubhanshu Mishra. 2020 e. Non-neural Structured Prediction for Event Detection from News in Indian Languages. In Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation, P Mehta, T Mandl, P Majumder, and M Mitra (Eds.). CEUR Workshop Proceedings, CEUR-WS.org, Hyderabad, India.Google ScholarGoogle Scholar
  49. Shubhanshu Mishra, Sneha Agarwal, Jinlong Guo, Kirstin Phelps, Johna Picco, and Jana Diesner. 2014. Enthusiasm and support: alternative sentiment classification for social movements on social media. In Proceedings of the 2014 ACM conference on Web science - WebSci '14. ACM Press, Bloomington, Indiana, USA, 261--262. https://doi.org/10.1145/2615569.2615667Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Shubhanshu Mishra and Daniel Collier. 2020. A Framework for Generating Annotated Social Media Corpora with Demographics, Stance, Civility, and Topicality. SSRN Electronic Journal (2020). https://doi.org/10.2139/ssrn.3757554Google ScholarGoogle Scholar
  51. Shubhanshu Mishra and Jana Diesner. 2016. Semi-supervised Named Entity Recognition in noisy-text. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). The COLING 2016 Organizing Committee, Osaka, Japan.Google ScholarGoogle Scholar
  52. Shubhanshu Mishra and Jana Diesner. 2018. Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora. In Proceedings of the 29th on Hypertext and Social Media - HT '18. ACM Press, New York, New York, USA, 2--10. https://doi.org/10.1145/3209542.3209562Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Shubhanshu Mishra and Jana Diesner. 2019. Capturing Signals of Enthusiasm and Support Towards Social Issues from Twitter. In Proceedings of the 5th International Workshop on Social Media World Sensors - SIdEWayS'19. ACM Press, New York, New York, USA, 19--24. https://doi.org/10.1145/3345645.3351104Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Shubhanshu Mishra, Jana Diesner, Jason Byrne, and Elizabeth Surbeck. 2015. Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization. In Proceedings of the 26th ACM Conference on Hypertext & Social Media - HT '15. ACM Press, New York, New York, USA, 323--325. https://doi.org/10.1145/2700171.2791022Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Shubhanshu Mishra and Aria Haghighi. 2021. Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021). 381--388. https://doi.org/10.18653/v1/2021.wnut-1.42Google ScholarGoogle ScholarCross RefCross Ref
  56. Shubhanshu Mishra, Sijun He, and Luca Belli. 2020a. Assessing Demographic Bias in Named Entity Recognition. In Bias in Automatic Knowledge Graph Construction - A Workshop at AKBC 2020. arxiv: 2008.03415Google ScholarGoogle Scholar
  57. Shubhanshu Mishra and Sudhanshu Mishra. 2019. 3Idiots at HASOC 2019: Fine-tuning Transformer Neural Networks for Hate Speech Identification in Indo-European Languages. In Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation. Kolkata, India, 208--213.Google ScholarGoogle Scholar
  58. Sudhanshu Mishra, Shivangi Prasad, and Shubhanshu Mishra. 2020b. Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying. European Language Resources Association (ELRA), Marseille, France, 120---125.Google ScholarGoogle Scholar
  59. Sudhanshu Mishra, Shivangi Prasad, and Shubhanshu Mishra. 2021. Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media. SN Computer Science, Vol. 2, 2 (apr 2021), 72. https://doi.org/10.1007/s42979-021-00455--5Google ScholarGoogle ScholarCross RefCross Ref
  60. Shubhanshu Mishra, Aman Saini, Raheleh Makki, Sneha Mehta, Aria Haghighi, and Ali Mollahosseini. 2022a. TweetNERD - End to End Entity Linking Benchmark for Tweets. https://doi.org/10.5281/zenodo.6617192Google ScholarGoogle Scholar
  61. Shubhanshu Mishra, Aman Saini, Raheleh Makki, Sneha Mehta, Aria Haghighi, and Ali Mollahosseini. 2022b. TweetNERD-End to End Entity Linking Benchmark for Tweets. (2022).Google ScholarGoogle Scholar
  62. Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220--229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Dong Nguyen, A Seza Doug ruöz, Carolyn P Rosé, and Franciska De Jong. 2016. Computational sociolinguistics: A survey. Computational linguistics, Vol. 42, 3 (2016).Google ScholarGoogle Scholar
  64. Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kiciman. 2019. Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data, Vol. 2 (2019), 13.Google ScholarGoogle ScholarCross RefCross Ref
  65. Bo Pang and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends® in Information Retrieval, Vol. 2, 1--2 (2008), 1--135. https://doi.org/10.1561/1500000011Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Rezvaneh Rezapour, Ly Dinh, and Jana Diesner. 2021. Incorporating the Measurement of Moral Foundations Theory into Analyzing Stances on Controversial Topics. In Proceedings of the 32st ACM Conference on Hypertext and Social Media. ACM, New York, NY, USA, 177--188. https://doi.org/10.1145/3465336.3475112Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Rezvaneh Rezapour, Saumil H. Shah, and Jana Diesner. 2019. Enhancing the Measurement of Social Effects by Capturing Morality. In Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, Stroudsburg, PA, USA, 35--45. https://doi.org/10.18653/v1/W19--1305Google ScholarGoogle ScholarCross RefCross Ref
  68. Rezvaneh Rezapour, Lufan Wang, Omid Abdar, and Jana Diesner. 2017. Identifying the Overlap between Election Result and Candidates' Ranking Based on Hashtag-Enhanced, Lexicon-Based Sentiment Analysis. In 2017 IEEE 11th International Conference on Semantic Computing (ICSC). IEEE, 93--96. https://doi.org/10.1109/ICSC.2017.92Google ScholarGoogle Scholar
  69. Sunita Sarawagi. 2007. Information Extraction. Foundations and Trends® in Databases, Vol. 1, 3 (mar 2007), 261--377. https://doi.org/10.1561/1900000003Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. M Janina Sarol, Ly Dinh, and Jana Diesner. 2021. Variation in Situational Awareness Information due to Selection of Data Source, Summarization Method, and Method Implementation. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 15. 597--608.Google ScholarGoogle ScholarCross RefCross Ref
  71. M. Janina Sarol, Ly Dinh, Rezvaneh Rezapour, Chieh-Li Chin, Pingjing Yang, and Jana Diesner. 2020. An Empirical Methodology for Detecting and Prioritizing Needs during Crisis Events. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Stroudsburg, PA, USA, 4102--4107. https://doi.org/10.18653/v1/2020.findings-emnlp.366Google ScholarGoogle ScholarCross RefCross Ref
  72. H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin E P Seligman, and Lyle H. Ungar. 2013. Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLoS ONE, Vol. 8, 9 (jan 2013), e73791. https://doi.org/10.1371/journal.pone.0073791Google ScholarGoogle ScholarCross RefCross Ref
  73. Indira Sen, Fabian Flöck, Katrin Weller, Bernd Weiß, and Claudia Wagner. 2021. A total error framework for digital traces of human behavior on online platforms. Public Opinion Quarterly, Vol. 85, S1 (2021), 399--422.Google ScholarGoogle ScholarCross RefCross Ref
  74. Shawn A Weil, Pacey Foster, Jared Freeman, Kathleen Carley, Jana Diesner, Terrill Franz, Nancy J Cooke, Steve Shope, and Jamie C Gorman. 2017. Converging approaches to automated communications-based assessment of team situation awareness. In Macrocognition in Teams. CRC Press, 276--304.Google ScholarGoogle Scholar
  75. Kyra Yee, Uthaipon Tantipongpipat, and Shubhanshu Mishra. 2021. Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency. Proceedings of the ACM on Human-Computer Interaction, Vol. 5, CSCW2 (oct 2021), 1--24. https://doi.org/10.1145/3479594Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Michael Zimmer. 2020. ''But the data is already public": on the ethics of research in Facebook. In The Ethics of Information Technologies. Routledge, 229--241.Google ScholarGoogle Scholar

Index Terms

  1. Information Extraction from Social Media: A Hands-on Tutorial on Tasks, Data, and Open Source Tools

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Article Metrics

            • Downloads (Last 12 months)80
            • Downloads (Last 6 weeks)4

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader