tutorial

Information Extraction from Social Media: A Hands-on Tutorial on Tasks, Data, and Open Source Tools

Authors:
Shubhanshu Mishra

Twitter Inc., Chicago, IL, USA

Twitter Inc., Chicago, IL, USA
View Profile

,
Rezvaneh Rezapour

Drexel University, Philadelphia, PA, USA

Drexel University, Philadelphia, PA, USA
View Profile

,
Jana Diesner

University of Illinois at Urbana-Champaign, Champaign, IL, USA

University of Illinois at Urbana-Champaign, Champaign, IL, USA
View Profile

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementOctober 2022Pages 5148–5151https://doi.org/10.1145/3511808.3557503

Published:17 October 2022Publication History

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pages 5148–5151

ABSTRACT

Information extraction (IE) is a common sub-area of natural language processing that focuses on identifying structured data from unstructured data. One application domain of IE is Information Retrieval (IR), which relies on accurate and high-performance IE to retrieve high quality results from massive datasets. Another example of IE is to identify named entities in a text. For example, in the the sentence "Katy Perry lives in the USA", Katy Perry and USA are named entities of types of PERSON and LOCATION, respectively. Also, identify the sentiment expressed in a text is another instance of IE: in the sentence, "This movie was awesome", the expressed sentiment is positive. Finally, IE is concerned with identifying various linguistic aspects of text data, e.g., part of speech of words, noun phrases, dependency parses, etc., which can serve as features for additional IE tasks. This tutorial introduces participants to a) the usage of Python based, open-source tools that support IE from social media data (mainly Twitter), and b) best practices for ensuring the responsible use of IE and research data. Participants will learn and practice various lexical, semantic, and syntactic IE techniques that are commonly used for analyzing tweets. Participants will also be familiarized with the landscape of publicly available social media data (including popular NLP and IE benchmarks) and methods for collecting and preparing them for analysis. Furthermore, participants will be trained to use a suite of open source tools (SAIL for active learning, TwitterNER for named entity recognition, TweetNLP for transformer based NLP, and SocialMediaIE for multi task learning), which utilize advanced machine learning techniques (e.g., deep learning, active learning with human-in-the-loop, multi-lingual, and multi-task learning) to perform IE on their own or existing datasets. Participants will also learn how social contexts of text production and usage of results can be integrated into IE systems to improve these systems and to consider the role of time in improving social media IE quality. Finally, participants will learn about the governance of social media data for research purposes. The tools introduced in the tutorial will focus on the three main stages of IE, namely, collection of data (including annotation), data processing and analytics, and visualization of the extracted information. More details can be found at: https://socialmediaie.github.io/tutorials/

References

Aseel Addawood, Rezvaneh Rezapour, Shubhanshu Mishra, Jodi Schneider, and Jana Diesner. 2017. Developing an Information Source Lexicon. In Prioritising Online Content workshop co-located at NIPS.Google Scholar
Juan M. Banda, Ramya Tekumalla, Guanyu Wang, Jingyuan Yu, Tuo Liu, Yuning Ding, Ekaterina Artemova, Elena Tutubalina, and Gerardo Chowell. 2021. A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research-An International Collaboration. Epidemiologia, Vol. 2, 3 (2021), 315--324. https://doi.org/10.3390/epidemiologia2030024Google ScholarCross Ref
Francesco Barbieri, Luis Espinosa-Anke, and Jose Camacho-Collados. 2022. XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond. In Proceedings of LREC.Google Scholar
Emily M Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, Vol. 6 (2018), 587--604.Google ScholarCross Ref
Danah Boyd and Kate Crawford. 2012. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, Vol. 15, 5 (2012), 662--679.Google ScholarCross Ref
Jose Camacho-Collados, Yerai Doval, Eugenio Martínez-Cámara, Luis Espinosa-Anke, Francesco Barbieri, and Steven Schockaert. 2020. Learning Cross-lingual Embeddings from Twitter via Distant Supervision. In Proceedings of ICWSM (Atlanta, United States).Google ScholarCross Ref
Jose Camacho-Collados, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa-Anke, Fangyu Liu, Eugenio Mart'inez-Cámara, et al. 2022. TweetNLP: Cutting-Edge Natural Language Processing for Social Media. arXiv preprint arXiv:2206.14774 (2022).Google Scholar
Kathleen M Carley, Jana Diesner, Jeffrey Reminga, and Maksim Tsvetovat. 2004. An integrated approach to the collection and analysis of network data. In IN PROC OF THE (NAACSOS) 2004 CONFERENCE. Citeseer.Google Scholar
Daniel Collier, Shubhanshu Mishra, Derek Houston, Brandon Hensley, Scott Mitchell, and Nicholas Hartlep. 2019b. Who is Most Likely to Oppose Federal Tuition-Free College Policies? Investigating Variable Interactions of Sentiments to America's College Promise. SSRN Electronic Journal (2019). https://doi.org/10.2139/ssrn.3423054Google Scholar
Daniel A. Collier, Shubhanshu Mishra, Derek A. Houston, Brandon O. Hensley, and Nicholas D. Hartlep. 2019a. Americans 'support' the idea of tuition-free college: an exploration of sentiment and political identity signals otherwise. Journal of Further and Higher Education, Vol. 43, 3 (mar 2019), 347--362. https://doi.org/10.1080/0309877X.2017.1361516Google ScholarCross Ref
Laura Dabbish, Ben Towne, Jana Diesner, and James Herbsleb. 2011. Construction of association networks from communication in teams working on complex projects. Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 4, 5 (2011), 547--563. https://doi.org/10.1002/sam.10135Google ScholarDigital Library
Kareem Darwish, Peter Stefanov, Michaël Aupetit, and Preslav Nakov. 2020. Unsupervised user stance detection on twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 141--152.Google ScholarCross Ref
Jana Diesner. 2015a. Small decisions with big impact on data analytics. Big Data & Society, Vol. 2, 2 (2015). https://doi.org/10.1177/2053951715617185Google ScholarCross Ref
Jana Diesner. 2015b. Words and Networks: How Reliable Are Network Data Constructed from Text Data? Springer International Publishing, Cham, 81--89. https://doi.org/10.1007/978--3--319-05467--4_5Google Scholar
Jana Diesner and Kathleen M Carley. 2008a. Conditional random fields for entity extraction and ontological text coding. Computational and Mathematical Organization Theory, Vol. 14, 3 (2008), 248--262. https://doi.org/10.1007/s10588-008--9029-zGoogle ScholarCross Ref
Jana Diesner and Kathleen M Carley. 2008b. Looking Under the Hood of Stochastic Machine Learning Algorithms for Parts of Speech Tagging (Carnegie Mellon University-ISR-08--131). Technical Report. Pittsburgh, PA: Carnegie Mellon University, School of Computer Science, Institute for Software Research.Google Scholar
Jana Diesner and Kathleen M Carley. 2009. He says, she says. Pat says, Tricia says. How much reference resolution matters for entity extraction, relation extraction, and social network analysis. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. IEEE, Ottawa, Canada, 1--8.Google ScholarCross Ref
Jana Diesner and Kathleen M. Carley. 2010a. Extraktion relationaler Daten aus Texten [Relation extraction from texts]. In Handbuch Netzwerkforschung [Handbook network research], Christian Stegbauer and Roger H"außling (Eds.). VS Verlag für Sozialwissenschaften, 507--521. https://doi.org/10.1007/978--3--531--92575--2_44Google Scholar
Jana Diesner and Kathleen M Carley. 2010b. A methodology for integrating network theory and topic modeling and its application to innovation diffusion. In 2010 IEEE Second International Conference on Social Computing. IEEE, Minneapolis, MN, 687--692. https://doi.org/10.1109/SocialCom.2010.106Google ScholarDigital Library
Jana Diesner and Kathleen M Carley. 2011. Words and Networks. In Encyclopedia of social networks,, George A Barnett (Ed.). Sage Publications, 958--961.Google Scholar
Jana Diesner and Chieh-Li Chin. 2015. Usable ethics: practical considerations for responsibly conducting research with social trace data. Proceedings of Beyond IRBs: Ethical Review Processes for Big Data Research (2015).Google Scholar
Jana Diesner and Chieh-Li Chin. 2016a. Gratis, libre, or something else? Regulations and misassumptions related to working with publicly available text data. In ETHI-CA² Workshop (ETHics in Corpus Collection, Annotation & Application), 10th Language Resources and Evaluation Conference (LREC), Portoroz, Slovenia.Google Scholar
Jana Diesner and Chieh-Li Chin. 2016b. Seeing the forest for the trees: Considering applicable types of regulation for the responsible collection and analysis of human centered data. In Human-Centered Data Science (HCDS) Workshop at 19th ACM Conference on Computer-Supported Cooperative Work and Social Computing.Google Scholar
Jana Diesner and Craig S Evans. 2015. Little bad concerns: Using sentiment analysis to assess structural balance in communication networks. In Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 342--348. https://doi.org/10.1145/2808797.2809403Google ScholarDigital Library
Jana Diesner, Ponnurangam Kumaraguru, and Kathleen M Carley. 2005. Mental models of data privacy and security extracted from interviews with indians. In 55th Annual Conference of the International Communication Association (ICA), New York, NY.Google Scholar
Jacob Eisenstein. 2013. What to do about bad language on the internet. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Atlanta, Georgia, 359--369.Google Scholar
Ahmed El-Kishky, Thomas Markovich, Serim Park, Chetan Verma, Baekjin Kim, Ramy Eskander, Yury Malkov, Frank Portman, Sof'ia Samaniego, Ying Xiao, and Aria Haghighi. 2022. TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Washington DC, USA) (KDD '22). Association for Computing Machinery, New York, NY, USA, 2842--2850. https://doi.org/10.1145/3534678.3539080Google ScholarDigital Library
Ramy Eskander, Peter Martigny, and Shubhanshu Mishra. 2020. Multilingual Named Entity Recognition in Tweets using Wikidata. In The fourth annual WeCNLP (West Coast NLP) Summit (WeCNLP) (virtual). Zenodo. https://doi.org/10.5281/zenodo.7014432Google Scholar
Casey Fiesler, Nathan Beard, and Brian C Keegan. 2020. No robots, spiders, or scrapers: Legal and ethical regulation of data collection methods in social media terms of service. In Proceedings of the international AAAI conference on web and social media, Vol. 14. 187--196.Google ScholarCross Ref
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM, Vol. 64, 12 (2021), 86--92.Google ScholarDigital Library
Kanyao Han, Pingjing Yang, Shubhanshu Mishra, and Jana Diesner. 2020. WikiCSSH: Extracting Computer Science Subject Headings from Wikipedia. In Workshop on Scientific Knowledge Graphs (SKG 2020).Google ScholarCross Ref
Liam Peter Hebert, Raheleh Makki, Yuval Merhav, Hamidreza Saghir, and Shubhanshu Mishra. 2022. Robust Candidate Generation for Entity Linking on Short Social Media Texts. In Proceedings of the Seventh Workshop on Noisy User-generated Text (WNUT).Google Scholar
Martin Hilbert, George Barnett, Joshua Blumenstock, Noshir Contractor, Jana Diesner, Seth Frey, Sandra González-Bailón, PJ Lamberson, Jennifer Pan, Tai-Quan Peng, Cuihua (Cindy) Shen, Paul E. Smaldino, Wouter van Atteveldt, Annie Waldherr, Jingwen Zhang, and Jonathan J. H. Zhu. 2019. Computational Communication Science: A Methodological Catalyzer for a Maturing Discipline. International Journal of Communication, Vol. 13, 0 (2019), 3912--3934.Google Scholar
Clayton J Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In International AAAI Conference on Web and Social Media. Ann Arbor, Michigan, USA.Google ScholarCross Ref
Andreas M. Kaplan and Michael Haenlein. 2010. Users of the world, unite! The challenges and opportunities of Social Media. Business Horizons, Vol. 53, 1 (jan 2010), 59--68. https://doi.org/10.1016/j.bushor.2009.09.003Google ScholarCross Ref
Michal Kosinski, Sandra C. Matz, Samuel D. Gosling, Vesselin Popov, and David Stillwell. 2015a. Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. American Psychologist, Vol. 70, 6 (sep 2015), 543--556. https://doi.org/10.1037/a0039210Google ScholarCross Ref
Michal Kosinski, Sandra C Matz, Samuel D Gosling, Vesselin Popov, and David Stillwell. 2015b. Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. American psychologist, Vol. 70, 6 (2015), 543--556.Google Scholar
Vivek Kulkarni, Shubhanshu Mishra, and Aria Haghighi. 2021. LMSOC: An Approach for Socially Sensitive Pretraining. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 2967--2975. https://doi.org/10.18653/v1/2021.findings-emnlp.254Google Scholar
Jinning Li, Shubhanshu Mishra, Ahmed El-Kishki, Sneha Mehta, and Vivek Kulkarni. 2022. Enriching Social Media Text Representations with Non-Textual Units. In Proceedings of the Seventh Workshop on Noisy User-generated Text (WNUT).Google Scholar
Daniel Loureiro, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, and Jose Camacho-collados. 2022. TimeLMs: Diachronic Language Models from Twitter. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Dublin, Ireland, 251--260. https://doi.org/10.18653/v1/2022.acl-demo.25Google ScholarCross Ref
Shubhanshu Mishra. 2017. SCTG: Social Communications Temporal Graph -- A novel approach to visualize temporal communication graphs from social data. In UIUC Data Science Day.Google Scholar
Shubhanshu Mishra. 2019a. Multi-Dataset Multi-Task Learning Benchmark for Social Media Information Extraction. https://doi.org/10.5281/zenodo.5867160Google Scholar
Shubhanshu Mishra. 2019b. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media - HT '19. ACM Press, New York, New York, USA, 283--284. https://doi.org/10.1145/3342220.3344929Google ScholarDigital Library
Shubhanshu Mishra. 2020a. Improving Social Media Information Extraction using Multitask Multidataset Learning. In The fourth annual WeCNLP (West Coast NLP) Summit (WeCNLP) (virtual). Zenodo. https://doi.org/10.5281/zenodo.7014470Google Scholar
Shubhanshu Mishra. 2020b. Information Extraction from Digital Social Trace Data with Applications to Social Media and Scholarly Communication Data. ACM SIGIR Forum, Vol. 54, 1 (2020).Google ScholarDigital Library
Shubhanshu Mishra. 2020c. Information Extraction from Digital Social Trace Data with Applications to Social Media and Scholarly Communication Data. Ph.,D. Dissertation. University of Illinois at Urbana-Champaign.Google Scholar
Shubhanshu Mishra. 2020d. Information extraction from digital social trace data with applications to social media and scholarly communication data. Ph.,D. Dissertation. University of Illinois at Urbana-Champaign. https://shubhanshu.com/phd_thesis/Google Scholar
Shubhanshu Mishra. 2020 e. Non-neural Structured Prediction for Event Detection from News in Indian Languages. In Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation, P Mehta, T Mandl, P Majumder, and M Mitra (Eds.). CEUR Workshop Proceedings, CEUR-WS.org, Hyderabad, India.Google Scholar
Shubhanshu Mishra, Sneha Agarwal, Jinlong Guo, Kirstin Phelps, Johna Picco, and Jana Diesner. 2014. Enthusiasm and support: alternative sentiment classification for social movements on social media. In Proceedings of the 2014 ACM conference on Web science - WebSci '14. ACM Press, Bloomington, Indiana, USA, 261--262. https://doi.org/10.1145/2615569.2615667Google ScholarDigital Library
Shubhanshu Mishra and Daniel Collier. 2020. A Framework for Generating Annotated Social Media Corpora with Demographics, Stance, Civility, and Topicality. SSRN Electronic Journal (2020). https://doi.org/10.2139/ssrn.3757554Google Scholar
Shubhanshu Mishra and Jana Diesner. 2016. Semi-supervised Named Entity Recognition in noisy-text. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). The COLING 2016 Organizing Committee, Osaka, Japan.Google Scholar
Shubhanshu Mishra and Jana Diesner. 2018. Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora. In Proceedings of the 29th on Hypertext and Social Media - HT '18. ACM Press, New York, New York, USA, 2--10. https://doi.org/10.1145/3209542.3209562Google ScholarDigital Library
Shubhanshu Mishra and Jana Diesner. 2019. Capturing Signals of Enthusiasm and Support Towards Social Issues from Twitter. In Proceedings of the 5th International Workshop on Social Media World Sensors - SIdEWayS'19. ACM Press, New York, New York, USA, 19--24. https://doi.org/10.1145/3345645.3351104Google ScholarDigital Library
Shubhanshu Mishra, Jana Diesner, Jason Byrne, and Elizabeth Surbeck. 2015. Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization. In Proceedings of the 26th ACM Conference on Hypertext & Social Media - HT '15. ACM Press, New York, New York, USA, 323--325. https://doi.org/10.1145/2700171.2791022Google ScholarDigital Library
Shubhanshu Mishra and Aria Haghighi. 2021. Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021). 381--388. https://doi.org/10.18653/v1/2021.wnut-1.42Google ScholarCross Ref
Shubhanshu Mishra, Sijun He, and Luca Belli. 2020a. Assessing Demographic Bias in Named Entity Recognition. In Bias in Automatic Knowledge Graph Construction - A Workshop at AKBC 2020. arxiv: 2008.03415Google Scholar
Shubhanshu Mishra and Sudhanshu Mishra. 2019. 3Idiots at HASOC 2019: Fine-tuning Transformer Neural Networks for Hate Speech Identification in Indo-European Languages. In Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation. Kolkata, India, 208--213.Google Scholar
Sudhanshu Mishra, Shivangi Prasad, and Shubhanshu Mishra. 2020b. Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying. European Language Resources Association (ELRA), Marseille, France, 120---125.Google Scholar
Sudhanshu Mishra, Shivangi Prasad, and Shubhanshu Mishra. 2021. Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media. SN Computer Science, Vol. 2, 2 (apr 2021), 72. https://doi.org/10.1007/s42979-021-00455--5Google ScholarCross Ref
Shubhanshu Mishra, Aman Saini, Raheleh Makki, Sneha Mehta, Aria Haghighi, and Ali Mollahosseini. 2022a. TweetNERD - End to End Entity Linking Benchmark for Tweets. https://doi.org/10.5281/zenodo.6617192Google Scholar
Shubhanshu Mishra, Aman Saini, Raheleh Makki, Sneha Mehta, Aria Haghighi, and Ali Mollahosseini. 2022b. TweetNERD-End to End Entity Linking Benchmark for Tweets. (2022).Google Scholar
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220--229.Google ScholarDigital Library
Dong Nguyen, A Seza Doug ruöz, Carolyn P Rosé, and Franciska De Jong. 2016. Computational sociolinguistics: A survey. Computational linguistics, Vol. 42, 3 (2016).Google Scholar
Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kiciman. 2019. Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data, Vol. 2 (2019), 13.Google ScholarCross Ref
Bo Pang and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends® in Information Retrieval, Vol. 2, 1--2 (2008), 1--135. https://doi.org/10.1561/1500000011Google ScholarDigital Library
Rezvaneh Rezapour, Ly Dinh, and Jana Diesner. 2021. Incorporating the Measurement of Moral Foundations Theory into Analyzing Stances on Controversial Topics. In Proceedings of the 32st ACM Conference on Hypertext and Social Media. ACM, New York, NY, USA, 177--188. https://doi.org/10.1145/3465336.3475112Google ScholarDigital Library
Rezvaneh Rezapour, Saumil H. Shah, and Jana Diesner. 2019. Enhancing the Measurement of Social Effects by Capturing Morality. In Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, Stroudsburg, PA, USA, 35--45. https://doi.org/10.18653/v1/W19--1305Google ScholarCross Ref
Rezvaneh Rezapour, Lufan Wang, Omid Abdar, and Jana Diesner. 2017. Identifying the Overlap between Election Result and Candidates' Ranking Based on Hashtag-Enhanced, Lexicon-Based Sentiment Analysis. In 2017 IEEE 11th International Conference on Semantic Computing (ICSC). IEEE, 93--96. https://doi.org/10.1109/ICSC.2017.92Google Scholar
Sunita Sarawagi. 2007. Information Extraction. Foundations and Trends® in Databases, Vol. 1, 3 (mar 2007), 261--377. https://doi.org/10.1561/1900000003Google ScholarDigital Library
M Janina Sarol, Ly Dinh, and Jana Diesner. 2021. Variation in Situational Awareness Information due to Selection of Data Source, Summarization Method, and Method Implementation. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 15. 597--608.Google ScholarCross Ref
M. Janina Sarol, Ly Dinh, Rezvaneh Rezapour, Chieh-Li Chin, Pingjing Yang, and Jana Diesner. 2020. An Empirical Methodology for Detecting and Prioritizing Needs during Crisis Events. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Stroudsburg, PA, USA, 4102--4107. https://doi.org/10.18653/v1/2020.findings-emnlp.366Google ScholarCross Ref
H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin E P Seligman, and Lyle H. Ungar. 2013. Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLoS ONE, Vol. 8, 9 (jan 2013), e73791. https://doi.org/10.1371/journal.pone.0073791Google ScholarCross Ref
Indira Sen, Fabian Flöck, Katrin Weller, Bernd Weiß, and Claudia Wagner. 2021. A total error framework for digital traces of human behavior on online platforms. Public Opinion Quarterly, Vol. 85, S1 (2021), 399--422.Google ScholarCross Ref
Shawn A Weil, Pacey Foster, Jared Freeman, Kathleen Carley, Jana Diesner, Terrill Franz, Nancy J Cooke, Steve Shope, and Jamie C Gorman. 2017. Converging approaches to automated communications-based assessment of team situation awareness. In Macrocognition in Teams. CRC Press, 276--304.Google Scholar
Kyra Yee, Uthaipon Tantipongpipat, and Shubhanshu Mishra. 2021. Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency. Proceedings of the ACM on Human-Computer Interaction, Vol. 5, CSCW2 (oct 2021), 1--24. https://doi.org/10.1145/3479594Google ScholarDigital Library
Michael Zimmer. 2020. ''But the data is already public": on the ethics of research in Facebook. In The Ethics of Information Technologies. Routledge, 229--241.Google Scholar

Index Terms

Information Extraction from Social Media: A Hands-on Tutorial on Tasks, Data, and Open Source Tools

Recommendations

Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets
HT '19: Proceedings of the 30th ACM Conference on Hypertext and Social Media

Multi-task learning is effective in reducing the required data for learning a task, while ensuring competitive accuracy with respect to single task learning. We study effectiveness of multi-dataset-multi-task learning in training neural models for four ...
Read More
Information Extraction from Social Media: A Hands-On Tutorial on Tasks, Data, and Open Source Tools
Advances in Information Retrieval
Abstract
Information extraction (IE) is a common sub-area of natural language processing that focuses on identifying structured data from unstructured data. The community of Information Retrieval (IR) relies on accurate and high-performance IE to be able ...
Read More
Unsupervised biomedical named entity recognition

Display Omitted BM-NER is approached by an unsupervised stepwise method.Noun phrase chunking is a good approximation of boundary detection.Distributional semantics works well in classifying entities.The system performs well on clinical and biological ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
October 2022
5274 pages
ISBN:9781450392365
DOI:10.1145/3511808
General Chairs:
Mohammad Al Hasan
Indiana University Purdue University, Indianapolis, USA
,
Li Xiong
Emory University, Atlanta, USA
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chunking
data governance
deep learning
information extraction
machine learning
machine learning bias
multitask learning
named entity recognition
natural language processing
open data
open source tool
part of speech tagging
social media
supersense tagging
text classification
twitter
Qualifiers
- tutorial
Conference

Acceptance Rates
CIKM '22 Paper Acceptance Rate621of2,257submissions,28%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 169
  Total Downloads
- Downloads (Last 12 months)80
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Information Extraction from Social Media: A Hands-on Tutorial on Tasks, Data, and Open Source Tools

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets

Information Extraction from Social Media: A Hands-On Tutorial on Tasks, Data, and Open Source Tools

Unsupervised biomedical named entity recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Information Extraction from Social Media: A Hands-on Tutorial on Tasks, Data, and Open Source Tools

CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets

Information Extraction from Social Media: A Hands-On Tutorial on Tasks, Data, and Open Source Tools

Unsupervised biomedical named entity recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media