skip to main content
10.1145/3459637.3482100acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
short-paper

Evaluating the Prediction Bias Induced by Label Imbalance in Multi-label Classification

Published:30 October 2021Publication History

ABSTRACT

Prediction bias is a well-known problem in classification algorithms, which tend to be skewed towards more represented classes. This phenomenon is even more remarkable in multi-label scenarios, where the number of underrepresented classes is usually larger. In light of this, we hereby present the Prediction Bias Coefficient (PBC), a novel measure that aims to assess the bias induced by label imbalance in multi-label classification. The approach leverages Spearman's rank correlation coefficient between the label frequencies and the F-scores obtained for each label individually. After describing the theoretical properties of the proposed indicator, we illustrate its behaviour on a classification task performed with state-of-the-art methods on two real-world datasets, and we compare it experimentally with other metrics described in the literature.

Skip Supplemental Material Section

Supplemental Material

Evaluating the Prediction Bias Induced by Label Imbalance in Multi-label Classification.mp4

mp4

27.1 MB

References

  1. Haibo He and Yunqian Ma. Imbalanced learning: foundations, algorithms, and applications. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Joffrey L Leevy, Taghi M Khoshgoftaar, Richard A Bauder, and Naeem Seliya. A survey on addressing high-class imbalance in big data. Journal of Big Data, 5(1):1--30, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  3. Grigorios Tsoumakas and Ioannis Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM), 3(3):1--13, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bin Liu, Konstantinos Blekas, and Grigorios Tsoumak. Multi-label sampling based on local label imbalance. arXiv preprint arXiv:2005.03240, 2020.Google ScholarGoogle Scholar
  5. Min-Ling Zhang, Yu-Kun Li, Hao Yang, and Xu-Ying Liu. Towards class-imbalance aware multi-label learning. IEEE Transactions on Cybernetics, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  6. Yu Zhang, Yin Wang, Xu-Ying Liu, Siya Mi, and Min-Ling Zhang. Large-scale multi-label classification using unknown streaming images. Pattern Recognition, 99:107100, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  7. Fangfang Luo, Wenzhong Guo, Yuanlong Yu, and Guolong Chen. A multi-label classification algorithm based on kernel extreme learning machine. Neurocomputing, 260:313--320, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Deborah Hellman. Measuring algorithmic fairness. Va. L. Rev., 106:811, 2020.Google ScholarGoogle Scholar
  9. Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Ashesh Rambachan. Algorithmic fairness. In Aea papers and proceedings, volume 108, pages 22--27, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  10. Min-Ling Zhang and Zhi-Hua Zhou. A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering, 26(8):1819--1837, 2013.Google ScholarGoogle Scholar
  11. Haibo He and Edwardo A Garcia. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9):1263--1284, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Li Li and Houfeng Wang. Towards label imbalance in multi-label classification with many labels. arXiv preprint arXiv:1604.01304, 2016.Google ScholarGoogle Scholar
  13. Alberto Fernández, Victoria López, Mikel Galar, Mar'iA José Del Jesus, and Francisco Herrera. Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowledge-based systems, 42:97--110, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Zachary Daniels and Dimitris Metaxas. Addressing imbalance in multi-label classification using structured hellinger forests. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Thibaut Durand, Nazanin Mehrasa, and Greg Mori. Learning a deep convnet for multi-label classification with partial labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 647--657, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  16. Francisco Charte, Antonio J Rivera, Mar'ia J del Jesus, and Francisco Herrera. Addressing imbalance in multilabel classification: Measures and random resampling algorithms. Neurocomputing, 163:3--16, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Muhammad Atif Tahir, Josef Kittler, and Fei Yan. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognition, 45(10):3738--3750, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Francisco Charte, Antonio J Rivera, Mar'ia J del Jesus, and Francisco Herrera. Mlsmote: approaching imbalanced multilabel learning through synthetic instance generation. Knowledge-Based Systems, 89:385--397, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jonathan Ortigosa-Hernández, Inaki Inza, and Jose A Lozano. Measuring the class-imbalance extent of multi-class problems. Pattern Recognition Letters, 98:32--38, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ignazio Pillai, Giorgio Fumera, and Fabio Roli. Designing multi-label classifiers that maximize f measures: State of the art. Pattern Recognition, 61:394--404, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Francisco Charte, Antonio Rivera, Mar'ia José del Jesus, and Francisco Herrera. A first approach to deal with imbalance in multi-label datasets. In International Conference on Hybrid Artificial Intelligence Systems, pages 150--160. Springer, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  22. Rui Zhu, Ziyu Wang, Zhanyu Ma, Guijin Wang, and Jing-Hao Xue. Lrid: A new metric of multi-class imbalance degree based on likelihood-ratio test. Pattern Recognition Letters, 116:36--42, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  23. Marcus A Maloof. Learning when data sets are imbalanced and when costs are unequal and unknown. In ICML-2003 workshop on learning from imbalanced data sets II, volume 2, pages 2--1, 2003.Google ScholarGoogle Scholar
  24. Krzysztof Dembczynski, Arkadiusz Jachnik, Wojciech Kotlowski, Willem Waegeman, and Eyke Hüllermeier. Optimizing the f-measure in multi-label classification: Plug-in rule approach versus structured loss minimization. In International conference on machine learning, pages 1130--1138. PMLR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Charles Spearman. The proof and measurement of association between two things. 1961.Google ScholarGoogle ScholarCross RefCross Ref
  26. Jerrold H Zar. Spearman rank correlation. Encyclopedia of biostatistics, 7, 2005.Google ScholarGoogle Scholar
  27. Joost CF de Winter, Samuel D Gosling, and Jeff Potter. Comparing the pearson and spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological methods, 21(3):273, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  28. Douglas G Bonett and Thomas A Wright. Sample size requirements for estimating pearson, kendall and spearman correlations. Psychometrika, 65(1):23--28, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  29. Mohamed Aly. Survey on multiclass classification methods. Neural Netw, 19:1--9, 2005.Google ScholarGoogle Scholar
  30. Keiron O'Shea and Ryan Nash. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015.Google ScholarGoogle Scholar
  31. Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. Recurrent convolutional neural networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Marcin Michał Miro'nczuk and Jarosław Protasiewicz. A recent overview of the state-of-the-art elements of text classification. Expert Systems with Applications, 106:36--54, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  33. Lawrence Mosley. A balanced approach to the multi-class imbalance problem. 2013.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Evaluating the Prediction Bias Induced by Label Imbalance in Multi-label Classification

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
        October 2021
        4966 pages
        ISBN:9781450384469
        DOI:10.1145/3459637

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 October 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      • Article Metrics

        • Downloads (Last 12 months)32
        • Downloads (Last 6 weeks)4

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader