ABSTRACT
Information extraction (IE) is a common sub-area of natural language processing that focuses on identifying structured data from unstructured data. One application domain of IE is Information Retrieval (IR), which relies on accurate and high-performance IE to retrieve high quality results from massive datasets. Another example of IE is to identify named entities in a text. For example, in the the sentence "Katy Perry lives in the USA", Katy Perry and USA are named entities of types of PERSON and LOCATION, respectively. Also, identify the sentiment expressed in a text is another instance of IE: in the sentence, "This movie was awesome", the expressed sentiment is positive. Finally, IE is concerned with identifying various linguistic aspects of text data, e.g., part of speech of words, noun phrases, dependency parses, etc., which can serve as features for additional IE tasks. This tutorial introduces participants to a) the usage of Python based, open-source tools that support IE from social media data (mainly Twitter), and b) best practices for ensuring the responsible use of IE and research data. Participants will learn and practice various lexical, semantic, and syntactic IE techniques that are commonly used for analyzing tweets. Participants will also be familiarized with the landscape of publicly available social media data (including popular NLP and IE benchmarks) and methods for collecting and preparing them for analysis. Furthermore, participants will be trained to use a suite of open source tools (SAIL for active learning, TwitterNER for named entity recognition, TweetNLP for transformer based NLP, and SocialMediaIE for multi task learning), which utilize advanced machine learning techniques (e.g., deep learning, active learning with human-in-the-loop, multi-lingual, and multi-task learning) to perform IE on their own or existing datasets. Participants will also learn how social contexts of text production and usage of results can be integrated into IE systems to improve these systems and to consider the role of time in improving social media IE quality. Finally, participants will learn about the governance of social media data for research purposes. The tools introduced in the tutorial will focus on the three main stages of IE, namely, collection of data (including annotation), data processing and analytics, and visualization of the extracted information. More details can be found at: https://socialmediaie.github.io/tutorials/
- Aseel Addawood, Rezvaneh Rezapour, Shubhanshu Mishra, Jodi Schneider, and Jana Diesner. 2017. Developing an Information Source Lexicon. In Prioritising Online Content workshop co-located at NIPS.Google Scholar
- Juan M. Banda, Ramya Tekumalla, Guanyu Wang, Jingyuan Yu, Tuo Liu, Yuning Ding, Ekaterina Artemova, Elena Tutubalina, and Gerardo Chowell. 2021. A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research-An International Collaboration. Epidemiologia, Vol. 2, 3 (2021), 315--324. https://doi.org/10.3390/epidemiologia2030024Google ScholarCross Ref
- Francesco Barbieri, Luis Espinosa-Anke, and Jose Camacho-Collados. 2022. XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond. In Proceedings of LREC.Google Scholar
- Emily M Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, Vol. 6 (2018), 587--604.Google ScholarCross Ref
- Danah Boyd and Kate Crawford. 2012. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, Vol. 15, 5 (2012), 662--679.Google ScholarCross Ref
- Jose Camacho-Collados, Yerai Doval, Eugenio Martínez-Cámara, Luis Espinosa-Anke, Francesco Barbieri, and Steven Schockaert. 2020. Learning Cross-lingual Embeddings from Twitter via Distant Supervision. In Proceedings of ICWSM (Atlanta, United States).Google ScholarCross Ref
- Jose Camacho-Collados, Kiamehr Rezaee, Talayeh Riahi, Asahi Ushio, Daniel Loureiro, Dimosthenis Antypas, Joanne Boisson, Luis Espinosa-Anke, Fangyu Liu, Eugenio Mart'inez-Cámara, et al. 2022. TweetNLP: Cutting-Edge Natural Language Processing for Social Media. arXiv preprint arXiv:2206.14774 (2022).Google Scholar
- Kathleen M Carley, Jana Diesner, Jeffrey Reminga, and Maksim Tsvetovat. 2004. An integrated approach to the collection and analysis of network data. In IN PROC OF THE (NAACSOS) 2004 CONFERENCE. Citeseer.Google Scholar
- Daniel Collier, Shubhanshu Mishra, Derek Houston, Brandon Hensley, Scott Mitchell, and Nicholas Hartlep. 2019b. Who is Most Likely to Oppose Federal Tuition-Free College Policies? Investigating Variable Interactions of Sentiments to America's College Promise. SSRN Electronic Journal (2019). https://doi.org/10.2139/ssrn.3423054Google Scholar
- Daniel A. Collier, Shubhanshu Mishra, Derek A. Houston, Brandon O. Hensley, and Nicholas D. Hartlep. 2019a. Americans 'support' the idea of tuition-free college: an exploration of sentiment and political identity signals otherwise. Journal of Further and Higher Education, Vol. 43, 3 (mar 2019), 347--362. https://doi.org/10.1080/0309877X.2017.1361516Google ScholarCross Ref
- Laura Dabbish, Ben Towne, Jana Diesner, and James Herbsleb. 2011. Construction of association networks from communication in teams working on complex projects. Statistical Analysis and Data Mining: The ASA Data Science Journal, Vol. 4, 5 (2011), 547--563. https://doi.org/10.1002/sam.10135Google ScholarDigital Library
- Kareem Darwish, Peter Stefanov, Michaël Aupetit, and Preslav Nakov. 2020. Unsupervised user stance detection on twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 141--152.Google ScholarCross Ref
- Jana Diesner. 2015a. Small decisions with big impact on data analytics. Big Data & Society, Vol. 2, 2 (2015). https://doi.org/10.1177/2053951715617185Google ScholarCross Ref
- Jana Diesner. 2015b. Words and Networks: How Reliable Are Network Data Constructed from Text Data? Springer International Publishing, Cham, 81--89. https://doi.org/10.1007/978--3--319-05467--4_5Google Scholar
- Jana Diesner and Kathleen M Carley. 2008a. Conditional random fields for entity extraction and ontological text coding. Computational and Mathematical Organization Theory, Vol. 14, 3 (2008), 248--262. https://doi.org/10.1007/s10588-008--9029-zGoogle ScholarCross Ref
- Jana Diesner and Kathleen M Carley. 2008b. Looking Under the Hood of Stochastic Machine Learning Algorithms for Parts of Speech Tagging (Carnegie Mellon University-ISR-08--131). Technical Report. Pittsburgh, PA: Carnegie Mellon University, School of Computer Science, Institute for Software Research.Google Scholar
- Jana Diesner and Kathleen M Carley. 2009. He says, she says. Pat says, Tricia says. How much reference resolution matters for entity extraction, relation extraction, and social network analysis. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications. IEEE, Ottawa, Canada, 1--8.Google ScholarCross Ref
- Jana Diesner and Kathleen M. Carley. 2010a. Extraktion relationaler Daten aus Texten [Relation extraction from texts]. In Handbuch Netzwerkforschung [Handbook network research], Christian Stegbauer and Roger H"außling (Eds.). VS Verlag für Sozialwissenschaften, 507--521. https://doi.org/10.1007/978--3--531--92575--2_44Google Scholar
- Jana Diesner and Kathleen M Carley. 2010b. A methodology for integrating network theory and topic modeling and its application to innovation diffusion. In 2010 IEEE Second International Conference on Social Computing. IEEE, Minneapolis, MN, 687--692. https://doi.org/10.1109/SocialCom.2010.106Google ScholarDigital Library
- Jana Diesner and Kathleen M Carley. 2011. Words and Networks. In Encyclopedia of social networks,, George A Barnett (Ed.). Sage Publications, 958--961.Google Scholar
- Jana Diesner and Chieh-Li Chin. 2015. Usable ethics: practical considerations for responsibly conducting research with social trace data. Proceedings of Beyond IRBs: Ethical Review Processes for Big Data Research (2015).Google Scholar
- Jana Diesner and Chieh-Li Chin. 2016a. Gratis, libre, or something else? Regulations and misassumptions related to working with publicly available text data. In ETHI-CA² Workshop (ETHics in Corpus Collection, Annotation & Application), 10th Language Resources and Evaluation Conference (LREC), Portoroz, Slovenia.Google Scholar
- Jana Diesner and Chieh-Li Chin. 2016b. Seeing the forest for the trees: Considering applicable types of regulation for the responsible collection and analysis of human centered data. In Human-Centered Data Science (HCDS) Workshop at 19th ACM Conference on Computer-Supported Cooperative Work and Social Computing.Google Scholar
- Jana Diesner and Craig S Evans. 2015. Little bad concerns: Using sentiment analysis to assess structural balance in communication networks. In Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 342--348. https://doi.org/10.1145/2808797.2809403Google ScholarDigital Library
- Jana Diesner, Ponnurangam Kumaraguru, and Kathleen M Carley. 2005. Mental models of data privacy and security extracted from interviews with indians. In 55th Annual Conference of the International Communication Association (ICA), New York, NY.Google Scholar
- Jacob Eisenstein. 2013. What to do about bad language on the internet. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Atlanta, Georgia, 359--369.Google Scholar
- Ahmed El-Kishky, Thomas Markovich, Serim Park, Chetan Verma, Baekjin Kim, Ramy Eskander, Yury Malkov, Frank Portman, Sof'ia Samaniego, Ying Xiao, and Aria Haghighi. 2022. TwHIN: Embedding the Twitter Heterogeneous Information Network for Personalized Recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Washington DC, USA) (KDD '22). Association for Computing Machinery, New York, NY, USA, 2842--2850. https://doi.org/10.1145/3534678.3539080Google ScholarDigital Library
- Ramy Eskander, Peter Martigny, and Shubhanshu Mishra. 2020. Multilingual Named Entity Recognition in Tweets using Wikidata. In The fourth annual WeCNLP (West Coast NLP) Summit (WeCNLP) (virtual). Zenodo. https://doi.org/10.5281/zenodo.7014432Google Scholar
- Casey Fiesler, Nathan Beard, and Brian C Keegan. 2020. No robots, spiders, or scrapers: Legal and ethical regulation of data collection methods in social media terms of service. In Proceedings of the international AAAI conference on web and social media, Vol. 14. 187--196.Google ScholarCross Ref
- Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM, Vol. 64, 12 (2021), 86--92.Google ScholarDigital Library
- Kanyao Han, Pingjing Yang, Shubhanshu Mishra, and Jana Diesner. 2020. WikiCSSH: Extracting Computer Science Subject Headings from Wikipedia. In Workshop on Scientific Knowledge Graphs (SKG 2020).Google ScholarCross Ref
- Liam Peter Hebert, Raheleh Makki, Yuval Merhav, Hamidreza Saghir, and Shubhanshu Mishra. 2022. Robust Candidate Generation for Entity Linking on Short Social Media Texts. In Proceedings of the Seventh Workshop on Noisy User-generated Text (WNUT).Google Scholar
- Martin Hilbert, George Barnett, Joshua Blumenstock, Noshir Contractor, Jana Diesner, Seth Frey, Sandra González-Bailón, PJ Lamberson, Jennifer Pan, Tai-Quan Peng, Cuihua (Cindy) Shen, Paul E. Smaldino, Wouter van Atteveldt, Annie Waldherr, Jingwen Zhang, and Jonathan J. H. Zhu. 2019. Computational Communication Science: A Methodological Catalyzer for a Maturing Discipline. International Journal of Communication, Vol. 13, 0 (2019), 3912--3934.Google Scholar
- Clayton J Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In International AAAI Conference on Web and Social Media. Ann Arbor, Michigan, USA.Google ScholarCross Ref
- Andreas M. Kaplan and Michael Haenlein. 2010. Users of the world, unite! The challenges and opportunities of Social Media. Business Horizons, Vol. 53, 1 (jan 2010), 59--68. https://doi.org/10.1016/j.bushor.2009.09.003Google ScholarCross Ref
- Michal Kosinski, Sandra C. Matz, Samuel D. Gosling, Vesselin Popov, and David Stillwell. 2015a. Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. American Psychologist, Vol. 70, 6 (sep 2015), 543--556. https://doi.org/10.1037/a0039210Google ScholarCross Ref
- Michal Kosinski, Sandra C Matz, Samuel D Gosling, Vesselin Popov, and David Stillwell. 2015b. Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines. American psychologist, Vol. 70, 6 (2015), 543--556.Google Scholar
- Vivek Kulkarni, Shubhanshu Mishra, and Aria Haghighi. 2021. LMSOC: An Approach for Socially Sensitive Pretraining. In Findings of the Association for Computational Linguistics: EMNLP 2021. Association for Computational Linguistics, 2967--2975. https://doi.org/10.18653/v1/2021.findings-emnlp.254Google Scholar
- Jinning Li, Shubhanshu Mishra, Ahmed El-Kishki, Sneha Mehta, and Vivek Kulkarni. 2022. Enriching Social Media Text Representations with Non-Textual Units. In Proceedings of the Seventh Workshop on Noisy User-generated Text (WNUT).Google Scholar
- Daniel Loureiro, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, and Jose Camacho-collados. 2022. TimeLMs: Diachronic Language Models from Twitter. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Dublin, Ireland, 251--260. https://doi.org/10.18653/v1/2022.acl-demo.25Google ScholarCross Ref
- Shubhanshu Mishra. 2017. SCTG: Social Communications Temporal Graph -- A novel approach to visualize temporal communication graphs from social data. In UIUC Data Science Day.Google Scholar
- Shubhanshu Mishra. 2019a. Multi-Dataset Multi-Task Learning Benchmark for Social Media Information Extraction. https://doi.org/10.5281/zenodo.5867160Google Scholar
- Shubhanshu Mishra. 2019b. Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets. In Proceedings of the 30th ACM Conference on Hypertext and Social Media - HT '19. ACM Press, New York, New York, USA, 283--284. https://doi.org/10.1145/3342220.3344929Google ScholarDigital Library
- Shubhanshu Mishra. 2020a. Improving Social Media Information Extraction using Multitask Multidataset Learning. In The fourth annual WeCNLP (West Coast NLP) Summit (WeCNLP) (virtual). Zenodo. https://doi.org/10.5281/zenodo.7014470Google Scholar
- Shubhanshu Mishra. 2020b. Information Extraction from Digital Social Trace Data with Applications to Social Media and Scholarly Communication Data. ACM SIGIR Forum, Vol. 54, 1 (2020).Google ScholarDigital Library
- Shubhanshu Mishra. 2020c. Information Extraction from Digital Social Trace Data with Applications to Social Media and Scholarly Communication Data. Ph.,D. Dissertation. University of Illinois at Urbana-Champaign.Google Scholar
- Shubhanshu Mishra. 2020d. Information extraction from digital social trace data with applications to social media and scholarly communication data. Ph.,D. Dissertation. University of Illinois at Urbana-Champaign. https://shubhanshu.com/phd_thesis/Google Scholar
- Shubhanshu Mishra. 2020 e. Non-neural Structured Prediction for Event Detection from News in Indian Languages. In Working Notes of FIRE 2020 - Forum for Information Retrieval Evaluation, P Mehta, T Mandl, P Majumder, and M Mitra (Eds.). CEUR Workshop Proceedings, CEUR-WS.org, Hyderabad, India.Google Scholar
- Shubhanshu Mishra, Sneha Agarwal, Jinlong Guo, Kirstin Phelps, Johna Picco, and Jana Diesner. 2014. Enthusiasm and support: alternative sentiment classification for social movements on social media. In Proceedings of the 2014 ACM conference on Web science - WebSci '14. ACM Press, Bloomington, Indiana, USA, 261--262. https://doi.org/10.1145/2615569.2615667Google ScholarDigital Library
- Shubhanshu Mishra and Daniel Collier. 2020. A Framework for Generating Annotated Social Media Corpora with Demographics, Stance, Civility, and Topicality. SSRN Electronic Journal (2020). https://doi.org/10.2139/ssrn.3757554Google Scholar
- Shubhanshu Mishra and Jana Diesner. 2016. Semi-supervised Named Entity Recognition in noisy-text. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT). The COLING 2016 Organizing Committee, Osaka, Japan.Google Scholar
- Shubhanshu Mishra and Jana Diesner. 2018. Detecting the Correlation between Sentiment and User-level as well as Text-Level Meta-data from Benchmark Corpora. In Proceedings of the 29th on Hypertext and Social Media - HT '18. ACM Press, New York, New York, USA, 2--10. https://doi.org/10.1145/3209542.3209562Google ScholarDigital Library
- Shubhanshu Mishra and Jana Diesner. 2019. Capturing Signals of Enthusiasm and Support Towards Social Issues from Twitter. In Proceedings of the 5th International Workshop on Social Media World Sensors - SIdEWayS'19. ACM Press, New York, New York, USA, 19--24. https://doi.org/10.1145/3345645.3351104Google ScholarDigital Library
- Shubhanshu Mishra, Jana Diesner, Jason Byrne, and Elizabeth Surbeck. 2015. Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization. In Proceedings of the 26th ACM Conference on Hypertext & Social Media - HT '15. ACM Press, New York, New York, USA, 323--325. https://doi.org/10.1145/2700171.2791022Google ScholarDigital Library
- Shubhanshu Mishra and Aria Haghighi. 2021. Improved Multilingual Language Model Pretraining for Social Media Text via Translation Pair Prediction. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021). 381--388. https://doi.org/10.18653/v1/2021.wnut-1.42Google ScholarCross Ref
- Shubhanshu Mishra, Sijun He, and Luca Belli. 2020a. Assessing Demographic Bias in Named Entity Recognition. In Bias in Automatic Knowledge Graph Construction - A Workshop at AKBC 2020. arxiv: 2008.03415Google Scholar
- Shubhanshu Mishra and Sudhanshu Mishra. 2019. 3Idiots at HASOC 2019: Fine-tuning Transformer Neural Networks for Hate Speech Identification in Indo-European Languages. In Proceedings of the 11th annual meeting of the Forum for Information Retrieval Evaluation. Kolkata, India, 208--213.Google Scholar
- Sudhanshu Mishra, Shivangi Prasad, and Shubhanshu Mishra. 2020b. Multilingual Joint Fine-tuning of Transformer models for identifying Trolling, Aggression and Cyberbullying at TRAC 2020. In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying. European Language Resources Association (ELRA), Marseille, France, 120---125.Google Scholar
- Sudhanshu Mishra, Shivangi Prasad, and Shubhanshu Mishra. 2021. Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media. SN Computer Science, Vol. 2, 2 (apr 2021), 72. https://doi.org/10.1007/s42979-021-00455--5Google ScholarCross Ref
- Shubhanshu Mishra, Aman Saini, Raheleh Makki, Sneha Mehta, Aria Haghighi, and Ali Mollahosseini. 2022a. TweetNERD - End to End Entity Linking Benchmark for Tweets. https://doi.org/10.5281/zenodo.6617192Google Scholar
- Shubhanshu Mishra, Aman Saini, Raheleh Makki, Sneha Mehta, Aria Haghighi, and Ali Mollahosseini. 2022b. TweetNERD-End to End Entity Linking Benchmark for Tweets. (2022).Google Scholar
- Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency. 220--229.Google ScholarDigital Library
- Dong Nguyen, A Seza Doug ruöz, Carolyn P Rosé, and Franciska De Jong. 2016. Computational sociolinguistics: A survey. Computational linguistics, Vol. 42, 3 (2016).Google Scholar
- Alexandra Olteanu, Carlos Castillo, Fernando Diaz, and Emre Kiciman. 2019. Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data, Vol. 2 (2019), 13.Google ScholarCross Ref
- Bo Pang and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Foundations and Trends® in Information Retrieval, Vol. 2, 1--2 (2008), 1--135. https://doi.org/10.1561/1500000011Google ScholarDigital Library
- Rezvaneh Rezapour, Ly Dinh, and Jana Diesner. 2021. Incorporating the Measurement of Moral Foundations Theory into Analyzing Stances on Controversial Topics. In Proceedings of the 32st ACM Conference on Hypertext and Social Media. ACM, New York, NY, USA, 177--188. https://doi.org/10.1145/3465336.3475112Google ScholarDigital Library
- Rezvaneh Rezapour, Saumil H. Shah, and Jana Diesner. 2019. Enhancing the Measurement of Social Effects by Capturing Morality. In Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Association for Computational Linguistics, Stroudsburg, PA, USA, 35--45. https://doi.org/10.18653/v1/W19--1305Google ScholarCross Ref
- Rezvaneh Rezapour, Lufan Wang, Omid Abdar, and Jana Diesner. 2017. Identifying the Overlap between Election Result and Candidates' Ranking Based on Hashtag-Enhanced, Lexicon-Based Sentiment Analysis. In 2017 IEEE 11th International Conference on Semantic Computing (ICSC). IEEE, 93--96. https://doi.org/10.1109/ICSC.2017.92Google Scholar
- Sunita Sarawagi. 2007. Information Extraction. Foundations and Trends® in Databases, Vol. 1, 3 (mar 2007), 261--377. https://doi.org/10.1561/1900000003Google ScholarDigital Library
- M Janina Sarol, Ly Dinh, and Jana Diesner. 2021. Variation in Situational Awareness Information due to Selection of Data Source, Summarization Method, and Method Implementation. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 15. 597--608.Google ScholarCross Ref
- M. Janina Sarol, Ly Dinh, Rezvaneh Rezapour, Chieh-Li Chin, Pingjing Yang, and Jana Diesner. 2020. An Empirical Methodology for Detecting and Prioritizing Needs during Crisis Events. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Stroudsburg, PA, USA, 4102--4107. https://doi.org/10.18653/v1/2020.findings-emnlp.366Google ScholarCross Ref
- H. Andrew Schwartz, Johannes C. Eichstaedt, Margaret L. Kern, Lukasz Dziurzynski, Stephanie M. Ramones, Megha Agrawal, Achal Shah, Michal Kosinski, David Stillwell, Martin E P Seligman, and Lyle H. Ungar. 2013. Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLoS ONE, Vol. 8, 9 (jan 2013), e73791. https://doi.org/10.1371/journal.pone.0073791Google ScholarCross Ref
- Indira Sen, Fabian Flöck, Katrin Weller, Bernd Weiß, and Claudia Wagner. 2021. A total error framework for digital traces of human behavior on online platforms. Public Opinion Quarterly, Vol. 85, S1 (2021), 399--422.Google ScholarCross Ref
- Shawn A Weil, Pacey Foster, Jared Freeman, Kathleen Carley, Jana Diesner, Terrill Franz, Nancy J Cooke, Steve Shope, and Jamie C Gorman. 2017. Converging approaches to automated communications-based assessment of team situation awareness. In Macrocognition in Teams. CRC Press, 276--304.Google Scholar
- Kyra Yee, Uthaipon Tantipongpipat, and Shubhanshu Mishra. 2021. Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency. Proceedings of the ACM on Human-Computer Interaction, Vol. 5, CSCW2 (oct 2021), 1--24. https://doi.org/10.1145/3479594Google ScholarDigital Library
- Michael Zimmer. 2020. ''But the data is already public": on the ethics of research in Facebook. In The Ethics of Information Technologies. Routledge, 229--241.Google Scholar
Index Terms
- Information Extraction from Social Media: A Hands-on Tutorial on Tasks, Data, and Open Source Tools
Recommendations
Multi-dataset-multi-task Neural Sequence Tagging for Information Extraction from Tweets
HT '19: Proceedings of the 30th ACM Conference on Hypertext and Social MediaMulti-task learning is effective in reducing the required data for learning a task, while ensuring competitive accuracy with respect to single task learning. We study effectiveness of multi-dataset-multi-task learning in training neural models for four ...
Information Extraction from Social Media: A Hands-On Tutorial on Tasks, Data, and Open Source Tools
Advances in Information RetrievalAbstractInformation extraction (IE) is a common sub-area of natural language processing that focuses on identifying structured data from unstructured data. The community of Information Retrieval (IR) relies on accurate and high-performance IE to be able ...
Unsupervised biomedical named entity recognition
Display Omitted BM-NER is approached by an unsupervised stepwise method.Noun phrase chunking is a good approximation of boundary detection.Distributional semantics works well in classifying entities.The system performs well on clinical and biological ...
Comments