skip to main content
research-article
Public Access

Impact of Annotator Demographics on Sentiment Dataset Labeling

Authors Info & Claims
Published:11 November 2022Publication History
Skip Abstract Section

Abstract

As machine learning methods become more powerful and capture more nuances of human behavior, biases in the dataset can shape what the model learns and is evaluated on. This paper explores and attempts to quantify the uncertainties and biases due to annotator demographics when creating sentiment analysis datasets. We ask >1000 crowdworkers to provide their demographic information and annotations for multimodal sentiment data and its component modalities. We show that demographic differences among annotators impute a significant effect on their ratings, and that these effects also occur in each component modality. We compare predictions of different state-of-the-art multimodal machine learning algorithms against annotations provided by different demographic groups, and find that changing annotator demographics can cause >4.5 in accuracy difference when determining positive versus negative sentiment. Our findings underscore the importance of accounting for crowdworker attributes, such as demographics, when building datasets, evaluating algorithms, and interpreting results for sentiment analysis.

References

  1. Mayumi Adachi, Sandra E Trehub, and Jun-Ichi Abe. 2004. Perceiving emotion in children's songs across age and culture 1. Japanese Psychological Research, Vol. 46, 4 (2004), 322--336.Google ScholarGoogle ScholarCross RefCross Ref
  2. Hala Al Kuwatly, Maximilian Wich, and Georg Groh. 2020. Identifying and measuring annotator bias based on annotators' demographic characteristics. In Proceedings of the Fourth Workshop on Online Abuse and Harms. 184--190.Google ScholarGoogle ScholarCross RefCross Ref
  3. Sara B Algoe, Brenda N Buswell, and John D DeLamater. 2000. Gender and job status as contextual cues for the interpretation of facial expression of emotion. Sex roles, Vol. 42, 3 (2000), 183--208.Google ScholarGoogle Scholar
  4. Tadas Baltruvs aitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multimodal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence, Vol. 41, 2 (2018), 423--443.Google ScholarGoogle Scholar
  5. Ruha Benjamin. 2019. Race after technology: Abolitionist tools for the new jim code. Social forces (2019).Google ScholarGoogle Scholar
  6. Md Momen Bhuiyan, Amy X Zhang, Connie Moon Sehat, and Tanushree Mitra. 2020. Investigating differences in crowdsourced news credibility assessment: Raters, tasks, and expert criteria. Proceedings of the ACM on Human-Computer Interaction, Vol. 4, CSCW2 (2020), 1--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Leslie R Brody. 1997. Gender and emotion: Beyond stereotypes. Journal of Social issues, Vol. 53, 2 (1997), 369--393.Google ScholarGoogle ScholarCross RefCross Ref
  8. Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation, Vol. 42, 4 (2008), 335--359.Google ScholarGoogle Scholar
  9. Quan Ze Chen, Daniel S Weld, and Amy X Zhang. 2021. Goldilocks: Consistent Crowdsourced Scalar Annotations with Relative Uncertainty. Proceedings of the ACM on Human-Computer Interaction, Vol. 5, CSCW2 (2021), 1--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. John Joon Young Chung, Jean Y Song, Sindhu Kutty, Sungsoo Hong, Juho Kim, and Walter S Lasecki. 2019. Efficient elicitation approaches to estimate collective crowd answers. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Christopher Clark, Mark Yatskar, and Luke Zettlemoyer. 2019. Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3--7, 2019,, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 4067--4080.Google ScholarGoogle ScholarCross RefCross Ref
  12. John Condry and Sandra Condry. 1976. Sex differences: A study of the eye of the beholder. Child development (1976), 812--819.Google ScholarGoogle Scholar
  13. Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019).Google ScholarGoogle Scholar
  14. Elizabeth Davis, Ellen Greenberger, Susan Charles, Chuansheng Chen, Libo Zhao, and Qi Dong. 2012. Emotion experience and regulation in China and the United States: how do culture and gender shape emotion responding? International Journal of Psychology, Vol. 47, 3 (2012), 230--239.Google ScholarGoogle ScholarCross RefCross Ref
  15. Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, Gaurav Nemade, and Sujith Ravi. 2020. GoEmotions: A Dataset of Fine-Grained Emotions. (July 2020), 4040--4054.Google ScholarGoogle Scholar
  16. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  17. Mark D'iaz, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. 2018. Addressing age-related bias in sentiment analysis. In Proceedings of the 2018 chi conference on human factors in computing systems. 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Djellel Difallah, Elena Filatova, and Panos Ipeirotis. 2018. Demographics and dynamics of mechanical turk workers. In Proceedings of the eleventh ACM international conference on web search and data mining. 135--143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yi Ding, Brandon Huynh, Aiwen Xu, Tom Bullock, Hubert Cecotti, Matthew Turk, Barry Giesbrecht, and Tobias Höllerer. 2019. Multimodal Classification of EEG During Physical Activity. In 2019 International Conference on Multimodal Interaction. 185--194.Google ScholarGoogle Scholar
  20. Yi Ding, Alex Rich, Mason Wang, Noah Stier, Pradeep Sen, Matthew Turk, and Tobias Höllerer. 2021. Sparse Fusion for Multimodal Transformers. arXiv preprint arXiv:2111.11992 (2021).Google ScholarGoogle Scholar
  21. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR, Vol. abs/2010.11929 (2020).Google ScholarGoogle Scholar
  22. James Feyrer. 2007. Demographics and productivity. The Review of Economics and Statistics, Vol. 89, 1 (2007), 100--109.Google ScholarGoogle ScholarCross RefCross Ref
  23. Agneta H Fischer, Patricia M Rodriguez Mosquera, Annelies EM Van Vianen, and Antony SR Manstead. 2004. Gender and culture differences in emotion. Emotion, Vol. 4, 1 (2004), 87.Google ScholarGoogle ScholarCross RefCross Ref
  24. Ujwal Gadiraju, Alessandro Checco, Neha Gupta, and Gianluca Demartini. 2017a. Modus operandi of crowd workers: The invisible role of microtask work environments. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 1, 3 (2017), 1--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ujwal Gadiraju, Besnik Fetahu, Ricardo Kawase, Patrick Siehndel, and Stefan Dietze. 2017b. Using worker self-assessments for competence-based pre-selection in crowdsourcing microtasks. ACM Transactions on Computer-Human Interaction (TOCHI), Vol. 24, 4 (2017), 1--26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mor Geva, Yoav Goldberg, and Jonathan Berant. 2019. Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets. arXiv preprint arXiv:1908.07898 (2019).Google ScholarGoogle Scholar
  27. Deepanway Ghosal, Md Shad Akhtar, Dushyant Chauhan, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. 2018. Contextual inter-modal attention for multi-modal sentiment analysis. In proceedings of the 2018 conference on empirical methods in natural language processing. 3454--3466.Google ScholarGoogle ScholarCross RefCross Ref
  28. Chris Giordano, Meghan Brennan, Basma Mohamed, Parisa Rashidi, Francc ois Modave, and Patrick Tighe. 2021. Accessing Artificial Intelligence for Clinical Decision-Making. Frontiers in Digital Health, Vol. 3 (2021), 65.Google ScholarGoogle ScholarCross RefCross Ref
  29. Mitchell L Gordon, Kaitlyn Zhou, Kayur Patel, Tatsunori Hashimoto, and Michael S Bernstein. 2021. The disagreement deconvolution: Bringing machine learning performance metrics in line with reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jeff Howe et al. 2006. The rise of crowdsourcing. Wired magazine, Vol. 14, 6 (2006), 1--4.Google ScholarGoogle Scholar
  32. Christoph Hube, Besnik Fetahu, and Ujwal Gadiraju. 2019. Understanding and mitigating worker biases in the crowdsourced collection of subjective judgments. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David R Karger, Sewoong Oh, and Devavrat Shah. 2011. Iterative learning for reliable crowdsourcing systems. Neural Information Processing Systems.Google ScholarGoogle Scholar
  34. Dacher Keltner, Deborah H Gruenfeld, and Cameron Anderson. 2003. Power, approach, and inhibition. Psychological review, Vol. 110, 2 (2003), 265.Google ScholarGoogle Scholar
  35. Brendan Kennedy, Mohammad Atari, Aida Mostafazadeh Davani, Leigh Yeh, Ali Omrani, Yehsong Kim, Kris Coombs, Shreya Havaldar, Gwenyth Portillo-Wightman, Elaine Gonzalez, et al. 2018. The Gab Hate Corpus: A collection of 27k posts annotated for hate speech. (2018).Google ScholarGoogle Scholar
  36. Eugenia Kim, De'Aira Bryant, Deepak Srikanth, and Ayanna Howard. 2021. Age bias in emotion detection: an analysis of facial emotion recognition performance on young, middle-aged, and older adults. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 638--644.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Chu Kim-Prieto and Ed Diener. 2009. Religion as a source of variation in the experience of positive and negative emotions. The Journal of Positive Psychology, Vol. 4, 6 (2009), 447--460.Google ScholarGoogle ScholarCross RefCross Ref
  38. Sander Koelstra, Christian Muhl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. 2011. Deap: A database for emotion analysis; using physiological signals. IEEE transactions on affective computing, Vol. 3, 1 (2011), 18--31.Google ScholarGoogle Scholar
  39. Jean Kossaifi, Robert Walecki, Yannis Panagakis, Jie Shen, Maximilian Schmitt, Fabien Ringeval, Jing Han, Vedhas Pandit, Antoine Toisoul, Bjoern W Schuller, et al. 2019. Sewa db: A rich database for audio-visual emotion and sentiment research in the wild. IEEE transactions on pattern analysis and machine intelligence (2019).Google ScholarGoogle Scholar
  40. Savannah Larimore, Ian Kennedy, Breon Haskett, and Alina Arseniev-Koehler. 2021. Reconsidering Annotator Disagreement about Racist Language: Noise or Signal?. In Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media. 81--90.Google ScholarGoogle ScholarCross RefCross Ref
  41. Paul Pu Liang, Ruslan Salakhutdinov, and Louis-Philippe Morency. 2018. Computational modeling of human multimodal language: The mosei dataset and interpretable dynamic fusion. In First Workshop and Grand Challenge on Computational Modeling of Human Multimodal Language.Google ScholarGoogle Scholar
  42. Paweł Łowicki, Marcin Zajenkowski, and Patty Van Cappellen. 2020. It's the heart that matters: The relationships among cognitive mentalizing ability, emotional empathy, and religiosity. Personality and Individual Differences, Vol. 161 (2020), 109976.Google ScholarGoogle ScholarCross RefCross Ref
  43. Winter Mason and Siddharth Suri. 2012. Conducting behavioral research on Amazon's Mechanical Turk. Behavior research methods, Vol. 44, 1 (2012), 1--23.Google ScholarGoogle Scholar
  44. Winter Mason and Duncan J Watts. 2009. Financial incentives and the" performance of crowds". In Proceedings of the ACM SIGKDD workshop on human computation. 77--85.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Rachel LC Mitchell, Rachel A Kingston, and Sofia L Barbosa Boucc as. 2011. The specificity of age-related decline in interpretation of emotion cues from prosody. Psychology and aging, Vol. 26, 2 (2011), 406.Google ScholarGoogle Scholar
  46. Louis-Philippe Morency, Rada Mihalcea, and Payal Doshi. 2011. Towards Multimodal Sentiment Analysis: Harvesting Opinions from The Web. Alicante, Spain.Google ScholarGoogle Scholar
  47. Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun. 2021. Attention Bottlenecks for Multimodal Fusion. arXiv preprint arXiv:2107.00135 (2021).Google ScholarGoogle Scholar
  48. Kento Nishi, Yi Ding, Alex Rich, and Tobias Hollerer. 2021. Augmentation strategies for learning with noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8022--8031.Google ScholarGoogle ScholarCross RefCross Ref
  49. Verónica Pérez-Rosas, Rada Mihalcea, and Louis-Philippe Morency. 2013. Utterance-level multimodal sentiment analysis. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 973--982.Google ScholarGoogle Scholar
  50. E Ashby Plant, Janet Shibley Hyde, Dacher Keltner, and Patricia G Devine. 2000. The gender stereotyping of emotions. Psychology of Women Quarterly, Vol. 24, 1 (2000), 81--92.Google ScholarGoogle ScholarCross RefCross Ref
  51. E Ashby Plant, Kristen C Kling, and Ginny L Smith. 2004. The influence of gender and social role on the interpretation of facial expressions. Sex roles, Vol. 51, 3 (2004), 187--196.Google ScholarGoogle Scholar
  52. Jonathan Posner, James A Russell, and Bradley S Peterson. 2005. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Development and psychopathology, Vol. 17, 3 (2005), 715--734.Google ScholarGoogle Scholar
  53. Vinodkumar Prabhakaran, Aida Mostafazadeh Davani, and Mark D'iaz. 2021. On releasing annotator-level labels and information in datasets. arXiv preprint arXiv:2110.05699 (2021).Google ScholarGoogle Scholar
  54. Jordi Quoidbach, Elizabeth W Dunn, Konstantin V Petrides, and Mo"ira Mikolajczak. 2010. Money giveth, money taketh away: The dual effect of wealth on happiness. Psychological science, Vol. 21, 6 (2010), 759--763.Google ScholarGoogle ScholarCross RefCross Ref
  55. Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, and Ehsan Hoque. 2020. Integrating multimodal information in large pretrained transformers. In Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2020. NIH Public Access, 2359.Google ScholarGoogle ScholarCross RefCross Ref
  56. Michaela Riediger, Manuel C Voelkle, Natalie C Ebner, and Ulman Lindenberger. 2011. Beyond ?happy, angry, or sad?": Age-of-poser and age-of-rater effects on multi-dimensional emotion perception. Cognition & emotion, Vol. 25, 6 (2011), 968--982.Google ScholarGoogle Scholar
  57. Michael D Robinson, Joel T Johnson, and Stephanie A Shields. 1998. The gender heuristic and the database: Factors affecting the perception of gender-related differences in the experience and display of emotions. Basic and Applied Social Psychology, Vol. 20, 3 (1998), 206--219.Google ScholarGoogle ScholarCross RefCross Ref
  58. Ted Ruffman, Julie D Henry, Vicki Livingstone, and Louise H Phillips. 2008. A meta-analytic review of emotion recognition and aging: Implications for neuropsychological models of aging. Neuroscience & Biobehavioral Reviews, Vol. 32, 4 (2008), 863--881.Google ScholarGoogle ScholarCross RefCross Ref
  59. Koustuv Saha, Asra Yousuf, Louis Hickman, Pranshu Gupta, Louis Tay, and Munmun De Choudhury. 2021. A social media study on demographic differences in perceived job satisfaction. Proceedings of the ACM on Human-Computer Interaction, Vol. 5, CSCW1 (2021), 1--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, and Noah A Smith. 2021. Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection. arXiv preprint arXiv:2111.07997 (2021).Google ScholarGoogle Scholar
  61. Mike Schaekermann, Joslin Goh, Kate Larson, and Edith Law. 2018. Resolvable vs. irresolvable disagreement: A study on worker deliberation in crowd work. Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW (2018), 1--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Morgan Klaus Scheuerman, Aaron Jiang, Katta Spiel, and Jed R Brubaker. 2021. Revisiting Gendered Web Forms: An Evaluation of Gender Inputs with (Non-) Binary People. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Aaron D Shaw, John J Horton, and Daniel L Chen. 2011. Designing incentives for inexpert human raters. In Proceedings of the ACM 2011 conference on Computer supported cooperative work. 275--284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Sara E Snodgrass. 1985. Women's intuition: The effect of subordinate role on interpersonal sensitivity. Journal of Personality and Social Psychology, Vol. 49, 1 (1985), 146.Google ScholarGoogle ScholarCross RefCross Ref
  65. Rion Snow, Brendan O'connor, Dan Jurafsky, and Andrew Y Ng. 2008. Cheap and fast--but is it good? evaluating non-expert annotations for natural language tasks. In Proceedings of the 2008 conference on empirical methods in natural language processing. 254--263.Google ScholarGoogle Scholar
  66. Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the conference. Association for Computational Linguistics. Meeting, Vol. 2019. NIH Public Access, 6558.Google ScholarGoogle Scholar
  67. Stephen Uzor, Jason T Jacques, John J Dudley, and Per Ola Kristensson. 2021. Investigating the Accessibility of Crowdwork Tasks on Mechanical Turk. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.Google ScholarGoogle Scholar
  69. Liuping Wang, Dakuo Wang, Feng Tian, Zhenhui Peng, Xiangmin Fan, Zhan Zhang, Mo Yu, Xiaojuan Ma, and Hongan Wang. 2021. Cass: Towards building a social-support chatbot for online health community. Proceedings of the ACM on Human-Computer Interaction, Vol. 5, CSCW1 (2021), 1--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Yansen Wang, Ying Shen, Zhun Liu, Paul Pu Liang, Amir Zadeh, and Louis-Philippe Morency. 2019. Words can shift: Dynamically adjusting word representations using nonverbal behaviors. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 7216--7223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Fabian L Wauthier and Michael Jordan. 2011. Bayesian bias mitigation for crowdsourcing. Advances in neural information processing systems, Vol. 24 (2011), 1800--1808.Google ScholarGoogle Scholar
  72. Maximilian Wich, Hala Al Kuwatly, and Georg Groh. 2020. Investigating annotator bias with a graph-based approach. In Proceedings of the Fourth Workshop on Online Abuse and Harms. 191--199.Google ScholarGoogle ScholarCross RefCross Ref
  73. Sherri C Widen and James A Russell. 2002. Gender and preschoolers' perception of emotion. Merrill-Palmer Quarterly (1982-) (2002), 248--262.Google ScholarGoogle Scholar
  74. Bodo Winter. 2013. Linear models and linear mixed effects models in R with linguistic applications. arXiv preprint arXiv:1308.5499 (2013).Google ScholarGoogle Scholar
  75. Christine Wolf and Jeanette Blomberg. 2019. Evaluating the promise of human-algorithm collaborations in everyday work practices. Proceedings of the ACM on Human-Computer Interaction, Vol. 3, CSCW (2019), 1--23.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Tian Xu, Jennifer White, Sinan Kalkan, and Hatice Gunes. 2020. Investigating bias and fairness in facial expression recognition. In European Conference on Computer Vision. Springer, 506--523.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Jing Nathan Yan, Ziwei Gu, Hubert Lin, and Jeffrey M Rzeszotarski. 2020. Silva: Interactively Assessing Machine Learning Fairness Using Causality. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Fan Yang, Xiaochang Peng, Gargi Ghosh, Reshef Shilon, Hao Ma, Eider Moore, and Goran Predovic. 2019b. Exploring deep multimodal fusion of text and photo for hate speech classification. In Proceedings of the third workshop on abusive language online. 11--18.Google ScholarGoogle ScholarCross RefCross Ref
  79. Kaiyu Yang, Klint Qinami, Li Fei-Fei, Jia Deng, and Olga Russakovsky. 2020. Towards fairer datasets: Filtering and balancing the distribution of the people subtree in the imagenet hierarchy. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 547--558.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019a. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, Vol. 32 (2019).Google ScholarGoogle Scholar
  81. Wenmeng Yu, Hua Xu, Fanyang Meng, Yilin Zhu, Yixiao Ma, Jiele Wu, Jiyun Zou, and Kaicheng Yang. 2020. Ch-sims: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3718--3727.Google ScholarGoogle ScholarCross RefCross Ref
  82. Amir Zadeh, Yan Sheng Cao, Simon Hessner, Paul Pu Liang, Soujanya Poria, and Louis-Philippe Morency. 2020. Carnegie Mellon University-MOSEAS: A multimodal language dataset for Spanish, Portuguese, German and French. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing, Vol. 2020. NIH Public Access, 1801.Google ScholarGoogle Scholar
  83. AmirAli Bagher Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2236--2246.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Impact of Annotator Demographics on Sentiment Dataset Labeling

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the ACM on Human-Computer Interaction
        Proceedings of the ACM on Human-Computer Interaction  Volume 6, Issue CSCW2
        CSCW
        November 2022
        8205 pages
        EISSN:2573-0142
        DOI:10.1145/3571154
        Issue’s Table of Contents

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 November 2022
        Published in pacmhci Volume 6, Issue CSCW2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader