skip to main content
research-article

Learning from Multi-annotator Data: A Noise-aware Classification Framework

Published: 21 February 2019 Publication History

Abstract

In the field of sentiment analysis and emotion detection in social media, or other tasks such as text classification involving supervised learning, researchers rely more heavily on large and accurate labelled training datasets. However, obtaining large-scale labelled datasets is time-consuming and high-quality labelled datasets are expensive and scarce. To deal with these problems, online crowdsourcing systems provide us an efficient way to accelerate the process of collecting training data via distributing the enormous tasks to various annotators to help create large amounts of labelled data at an affordable cost. Nowadays, these crowdsourcing platforms are heavily needed in dealing with social media text, since the social network platforms (e.g., Twitter) generate huge amounts of data in textual form everyday. However, people from different social and knowledge backgrounds have different views on various texts, which may lead to noisy labels. The existing noisy label aggregation/refinement algorithms mostly focus on aggregating labels from noisy annotations, which would not guarantee their effectiveness on the subsequent classification/ranking tasks. In this article, we propose a noise-aware classification framework that integrates the steps of noisy label aggregation and classification. The aggregated noisy crowd labels are fed into a classifier for training, while the predicted labels are employed as feedback for adjusting the parameters at the label aggregating stage. The classification framework is suitable for directly running on crowdsourcing datasets and applies to various kinds of classification algorithms. The feedback strategy makes it possible for us to find optimal parameters instead of using known data for parameter selection. Simulation experiments demonstrate that our method provide significant label aggregation performance for both binary and multiple classification tasks under various noisy environments. Experimenting on real-world data validates the feasibility of our framework in real noise data and helps us verify the reasonableness of the simulated experiment settings.

References

[1]
Shenghua Bao, Shengliang Xu, Li Zhang, Rong Yan, Zhong Su, Dingyi Han, and Yong Yu. 2012. Mining social emotions from affective text. IEEE Trans. Knowl. Data Eng. 24, 9 (2012), 1658--1670.
[2]
Dmitriy Bespalov, Bing Bai, Yanjun Qi, and Ali Shokoufandeh. 2011. Sentiment classification based on supervised latent n-gram analysis. In Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM’11). 375--382.
[3]
Sabri Boughorbel, Fethi Jarray, and Mohammed El-Anbari. 2017. Optimal classifier for imbalanced data using matthews correlation coefficient metric. PloS One 12, 6 (2017), e0177678.
[4]
Alexander P. Dawid and Allan M. Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 28, 1 (1979), 20--28.
[5]
Ronen Feldman. 2013. Techniques and applications for sentiment analysis. Commun. ACM 56, 4 (2013), 82--89.
[6]
Xiubo Geng, Tao Qin, Tie-Yan Liu, and Xue-Qi Cheng. 2012. A noise-tolerant graphical model for ranking. Info. Process. Manage. 48, 2 (2012), 374--383.
[7]
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. CS224N Proj. Rep. Stanford 1, 2009 (2009), 12.
[8]
Perry Groot, Adriana Birlutiu, and Tom Heskes. 2011. Learning from multiple annotators with Gaussian processes. In Proceedings of the International Conference on Artificial Neural Networks. Springer, 159--164.
[9]
Panagiotis G. Ipeirotis, Foster J. Provost, Victor S. Sheng, and Jing Wang. 2014. Repeated labeling using multiple noisy labelers. Data Min. Knowl. Discov. 28, 2 (2014), 402--441.
[10]
Valen E. Johnson. 1996. On Bayesian analysis of multirater ordinal data: An application to automated essay grading. J. Amer. Statist. Assoc. 91, 433 (1996), 42--51.
[11]
Neil D. Lawrence and Bernhard Schölkopf. 2001. Estimating a kernel Fisher discriminant in the presence of label noise. In Proceedings of the International Conference on Machine Learning (ICML’01), Vol. 1. Citeseer, 306--313.
[12]
Quoc V. Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 1188--1196.
[13]
Fangtao Li, Minlie Huang, and Xiaoyan Zhu. 2010. Sentiment analysis with global topics and local dependency. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI’10). 1371--1376.
[14]
Bing Liu. 2012. Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5, 1 (2012), 1--167.
[15]
Tom M. Mitchell. 1997. Machine Learning. McGraw-Hill.
[16]
David F. Nettleton, Albert Orriols-Puig, and Albert Fornells. 2010. A study of the effect of different types of noise on the precision of supervised learning techniques. Artific. Intell. Rev. 33, 4 (2010), 275--306.
[17]
Brendan O’Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010. From tweets to polls: Linking text sentiment to public opinion time series. In Proceedings of the 4th International Conference on Weblogs and Social Media (ICWSM’10). 122--129.
[18]
Satoshi Oyama, Yukino Baba, Yuko Sakurai, and Hisashi Kashima. 2013. Accurate integration of crowdsourced labels using workers’ self-reported confidence scores. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI’13). 2554--2560.
[19]
David M. Powers. 2011. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2 (2011), 2229--3981.
[20]
Vikas C. Raykar, Shipeng Yu, Linda H. Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from crowds. J. Mach. Learn. Res. 11 (2010), 1297--1322.
[21]
Ines Rehbein and Josef Ruppenhofer. 2017. Detecting annotation noise in automatically labelled data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 1160--1170.
[22]
Filipe Rodrigues, Francisco Pereira, and Bernardete Ribeiro. 2013. Learning from multiple annotators: Distinguishing good from random labelers. Pattern Recogn. Lett. 34, 12 (2013), 1428--1436.
[23]
Lina Maria Rojas-Barahona. 2016. Deep learning for sentiment analysis. Lang. Linguist. Comp. 10, 12 (2016), 701--719.
[24]
Paul Ruvolo, Jacob Whitehill, and Javier R. Movellan. 2013. Exploiting commonality and interaction effects in crowdsourcing tasks using latent factor models. In Proceedings of the Neural Information Processing Systems Workshop on Crowdsourcing: Theory, Algorithms and Applications.
[25]
Hassan Saif, Miriam Fernández, Yulan He, and Harith Alani. 2013. Evaluation datasets for twitter sentiment analysis: A survey and a new dataset, the STS-Gold. In Proceedings of the 1st International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM'13). 9--21.
[26]
K. R. Scherer and H. G. Wallbott. 1994. Evidence for universality and cultural variation of differential emotion response patterning. J. Personal. Soc. Psychol. 66, 2 (1994), 310--328.
[27]
Victor S. Sheng, Foster J. Provost, and Panagiotis G. Ipeirotis. 2008. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). 614--622.
[28]
Aashish Sheshadri and Matthew Lease. 2013. Square: A benchmark for research on computing crowd consensus. In Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing (HCOMP’13).
[29]
Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 254--263.
[30]
Kaisong Song, Wei Gao, Ling Chen, Shi Feng, Daling Wang, and Chengqi Zhang. 2016. Build emotion lexicon from the mood of crowd via topic-assisted joint non-negative matrix factorization. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’16). 773--776.
[31]
Yangqiu Song, Chenguang Wang, Ming Zhang, Hailong Sun, and Qiang Yang. 2015. Spectral label refinement for noisy and missing text labels. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). 2972--2978.
[32]
Carlo Strapparava and Rada Mihalcea. {n.d.}. Semeval-2007 task 14: Affective text. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval@ACL’07).
[33]
Ron Sun. 2013. Moral judgement, human motivation, and neural networks. Cogn. Comput. 5, 4 (2013), 566--579.
[34]
Tian Tian and Jun Zhu. 2015. Max-margin majority voting for learning from crowds. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’15). 1621--1629.
[35]
Tian Tian and Jun Zhu. 2015. Uncovering the latent structures of crowd labeling. In Proceedings of the 19th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’15). 392--404.
[36]
Yury Ustinovskiy, Valentina Fedorova, Gleb Gusev, and Pavel Serdyukov. 2016. An optimization framework for remapping and reweighting noisy relevance labels. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR’16). 105--114.
[37]
S. V. Vaseghi. 1995. State duration modelling in hidden markov models. Signal Process. 41, 1 (1995), 31--41.
[38]
Jing Wang and Panagiotis Ipeirotis. 2013. Quality-based pricing for crowdsourced workers. Technical Report, New York University. papers.ssrn.com/abstract=2283000.
[39]
Yichen Wang and Aditya Pal. 2015. Detecting emotions in social media: A constrained optimization approach. In Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI’15). 996--1002.
[40]
Yaowei Wang, Yanghui Rao, Xueying Zhan, Huijun Chen, Maoquan Luo, and Jian Yin. 2016. Sentiment and emotion classification over noisy labels. Knowl.-Based Syst. 111 (2016), 207--216.
[41]
Jacob Whitehill, Paul Ruvolo, Tingfan Wu, Jacob Bergsma, and Javier R. Movellan. 2009. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’09). 2035--2043.
[42]
Chad Williams and Bamshad Mobasher. 2006. Thesis: Profile injection attack detection for securing collaborative recommender systems. Serv. Orient. Comput. Appl. 1, 3 (2006), 157--170.
[43]
Ou Wu, Weiming Hu, and Jun Gao. 2011. Learning to rank under multiple annotators. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’11), Vol. 22. 1571.
[44]
Ruifeng Xu, Tao Chen, Yunqing Xia, Qin Lu, Bin Liu, and Xuan Wang. 2015. Word embedding composition for data imbalances in sentiment and emotion classification. Cogn. Comput. 7, 2 (2015), 226--240.
[45]
Yan Yan, Romer Rosales, Glenn Fung, and Jennifer G. Dy. 2011. Active learning from crowds. In Proceedings of the International Conference on Machine Learning (ICML’11), Vol. 11. 1161--1168.
[46]
Xueying Zhan, Yaowei Wang, Yanghui Rao, Haoran Xie, Qing Li, Fu Lee Wang, and Tak-Lam Wong. 2017. A network framework for noisy label aggregation in social media. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 484--490.
[47]
Jing Zhang, Xindong Wu, and Victor Shengli Sheng. 2013. Imbalanced multiple noisy labeling for supervised learning. In Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI’13). 1080--1085.
[48]
Xiao Zhang, Wenzhong Li, and Sanglu Lu. 2017. Emotion detection in online social network based on multi-label learning. In Proceedings of the International Conference on Database Systems for Advanced Applications. Springer, 659--674.
[49]
Ying Zhang, Ning Zhang, Luo Si, Yanshan Lu, Qifan Wang, and Xiaojie Yuan. 2014. Cross-domain and cross-category emotion tagging for comments of online news. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’14). 627--636.
[50]
Dengyong Zhou, Qiang Liu, John C. Platt, and Christopher Meek. 2014. Aggregating ordinal labels from crowds by minimax conditional entropy. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 262--270.

Cited By

View all
  • (2024)MinJoT: Multimodal infusion Joint Training for noise learning in text and multimodal classification problemsInformation Fusion10.1016/j.inffus.2023.102071102(102071)Online publication date: Feb-2024
  • (2023)Much Ado About GenderProceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578316(269-279)Online publication date: 19-Mar-2023
  • (2023)Emotion Detection in Online Social Network- A Multilabel Learning Process2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN)10.1109/ViTECoN58111.2023.10157859(1-5)Online publication date: 5-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 37, Issue 2
April 2019
410 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3306215
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 February 2019
Accepted: 01 January 2019
Revised: 01 November 2018
Received: 01 February 2018
Published in TOIS Volume 37, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Social media
  2. crowdsourcing
  3. emotion detection
  4. sentiment analysis

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • GRF grant from the Research Grants Council of the Hong Kong Special Administrative Region
  • ITF grant from the Innovation and Technology Commission
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)MinJoT: Multimodal infusion Joint Training for noise learning in text and multimodal classification problemsInformation Fusion10.1016/j.inffus.2023.102071102(102071)Online publication date: Feb-2024
  • (2023)Much Ado About GenderProceedings of the 2023 Conference on Human Information Interaction and Retrieval10.1145/3576840.3578316(269-279)Online publication date: 19-Mar-2023
  • (2023)Emotion Detection in Online Social Network- A Multilabel Learning Process2023 2nd International Conference on Vision Towards Emerging Trends in Communication and Networking Technologies (ViTECoN)10.1109/ViTECoN58111.2023.10157859(1-5)Online publication date: 5-May-2023
  • (2021)Towards Robustness to Label Noise in Text Classification via Noise ModelingProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482204(3024-3028)Online publication date: 26-Oct-2021
  • (2021)CrowdGP: a Gaussian Process Model for Inferring Relevance from Crowd AnnotationsProceedings of the Web Conference 202110.1145/3442381.3450047(1821-1832)Online publication date: 19-Apr-2021
  • (2020)OutdoorSentACM Transactions on Information Systems10.1145/338518638:3(1-28)Online publication date: 21-Apr-2020
  • (2020)Emotion Detection in Online Social Networks: A Multilabel Learning ApproachIEEE Internet of Things Journal10.1109/JIOT.2020.30043767:9(8133-8143)Online publication date: Sep-2020
  • (2020)Social Media Sentiment Analysis with Context Space ModelElectronic Governance and Open Society: Challenges in Eurasia10.1007/978-3-030-39296-3_29(399-412)Online publication date: 23-Jan-2020

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media