short-paper

Evaluating the Prediction Bias Induced by Label Imbalance in Multi-label Classification

Authors:
Luca Piras

Eurecat (Centre Tecnològic de Catalunya), Barcelona, Spain

Eurecat (Centre Tecnològic de Catalunya), Barcelona, Spain
View Profile

,
Ludovico Boratto

University of Cagliari, Cagliari, Italy

University of Cagliari, Cagliari, Italy
View Profile

,
Guilherme Ramos

University of Porto, Porto, Portugal

University of Porto, Porto, Portugal
View Profile

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementOctober 2021Pages 3368–3372https://doi.org/10.1145/3459637.3482100

Published:30 October 2021Publication History

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 3368–3372

ABSTRACT

Prediction bias is a well-known problem in classification algorithms, which tend to be skewed towards more represented classes. This phenomenon is even more remarkable in multi-label scenarios, where the number of underrepresented classes is usually larger. In light of this, we hereby present the Prediction Bias Coefficient (PBC), a novel measure that aims to assess the bias induced by label imbalance in multi-label classification. The approach leverages Spearman's rank correlation coefficient between the label frequencies and the F-scores obtained for each label individually. After describing the theoretical properties of the proposed indicator, we illustrate its behaviour on a classification task performed with state-of-the-art methods on two real-world datasets, and we compare it experimentally with other metrics described in the literature.

Supplemental Material

Evaluating the Prediction Bias Induced by Label Imbalance in Multi-label Classification.mp4

mp4

27.1 MB

Download

References

Haibo He and Yunqian Ma. Imbalanced learning: foundations, algorithms, and applications. 2013. Google ScholarDigital Library
Joffrey L Leevy, Taghi M Khoshgoftaar, Richard A Bauder, and Naeem Seliya. A survey on addressing high-class imbalance in big data. Journal of Big Data, 5(1):1--30, 2018.Google ScholarCross Ref
Grigorios Tsoumakas and Ioannis Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM), 3(3):1--13, 2007.Google ScholarCross Ref
Bin Liu, Konstantinos Blekas, and Grigorios Tsoumak. Multi-label sampling based on local label imbalance. arXiv preprint arXiv:2005.03240, 2020.Google Scholar
Min-Ling Zhang, Yu-Kun Li, Hao Yang, and Xu-Ying Liu. Towards class-imbalance aware multi-label learning. IEEE Transactions on Cybernetics, 2020.Google ScholarCross Ref
Yu Zhang, Yin Wang, Xu-Ying Liu, Siya Mi, and Min-Ling Zhang. Large-scale multi-label classification using unknown streaming images. Pattern Recognition, 99:107100, 2020.Google ScholarCross Ref
Fangfang Luo, Wenzhong Guo, Yuanlong Yu, and Guolong Chen. A multi-label classification algorithm based on kernel extreme learning machine. Neurocomputing, 260:313--320, 2017.Google ScholarDigital Library
Deborah Hellman. Measuring algorithmic fairness. Va. L. Rev., 106:811, 2020.Google Scholar
Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Ashesh Rambachan. Algorithmic fairness. In Aea papers and proceedings, volume 108, pages 22--27, 2018.Google ScholarCross Ref
Min-Ling Zhang and Zhi-Hua Zhou. A review on multi-label learning algorithms. IEEE transactions on knowledge and data engineering, 26(8):1819--1837, 2013.Google Scholar
Haibo He and Edwardo A Garcia. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9):1263--1284, 2009. Google ScholarDigital Library
Li Li and Houfeng Wang. Towards label imbalance in multi-label classification with many labels. arXiv preprint arXiv:1604.01304, 2016.Google Scholar
Alberto Fernández, Victoria López, Mikel Galar, Mar'iA José Del Jesus, and Francisco Herrera. Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches. Knowledge-based systems, 42:97--110, 2013. Google ScholarDigital Library
Zachary Daniels and Dimitris Metaxas. Addressing imbalance in multi-label classification using structured hellinger forests. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017. Google ScholarDigital Library
Thibaut Durand, Nazanin Mehrasa, and Greg Mori. Learning a deep convnet for multi-label classification with partial labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 647--657, 2019.Google ScholarCross Ref
Francisco Charte, Antonio J Rivera, Mar'ia J del Jesus, and Francisco Herrera. Addressing imbalance in multilabel classification: Measures and random resampling algorithms. Neurocomputing, 163:3--16, 2015. Google ScholarDigital Library
Muhammad Atif Tahir, Josef Kittler, and Fei Yan. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognition, 45(10):3738--3750, 2012. Google ScholarDigital Library
Francisco Charte, Antonio J Rivera, Mar'ia J del Jesus, and Francisco Herrera. Mlsmote: approaching imbalanced multilabel learning through synthetic instance generation. Knowledge-Based Systems, 89:385--397, 2015. Google ScholarDigital Library
Jonathan Ortigosa-Hernández, Inaki Inza, and Jose A Lozano. Measuring the class-imbalance extent of multi-class problems. Pattern Recognition Letters, 98:32--38, 2017. Google ScholarDigital Library
Ignazio Pillai, Giorgio Fumera, and Fabio Roli. Designing multi-label classifiers that maximize f measures: State of the art. Pattern Recognition, 61:394--404, 2017. Google ScholarDigital Library
Francisco Charte, Antonio Rivera, Mar'ia José del Jesus, and Francisco Herrera. A first approach to deal with imbalance in multi-label datasets. In International Conference on Hybrid Artificial Intelligence Systems, pages 150--160. Springer, 2013.Google ScholarCross Ref
Rui Zhu, Ziyu Wang, Zhanyu Ma, Guijin Wang, and Jing-Hao Xue. Lrid: A new metric of multi-class imbalance degree based on likelihood-ratio test. Pattern Recognition Letters, 116:36--42, 2018.Google ScholarCross Ref
Marcus A Maloof. Learning when data sets are imbalanced and when costs are unequal and unknown. In ICML-2003 workshop on learning from imbalanced data sets II, volume 2, pages 2--1, 2003.Google Scholar
Krzysztof Dembczynski, Arkadiusz Jachnik, Wojciech Kotlowski, Willem Waegeman, and Eyke Hüllermeier. Optimizing the f-measure in multi-label classification: Plug-in rule approach versus structured loss minimization. In International conference on machine learning, pages 1130--1138. PMLR, 2013. Google ScholarDigital Library
Charles Spearman. The proof and measurement of association between two things. 1961.Google ScholarCross Ref
Jerrold H Zar. Spearman rank correlation. Encyclopedia of biostatistics, 7, 2005.Google Scholar
Joost CF de Winter, Samuel D Gosling, and Jeff Potter. Comparing the pearson and spearman correlation coefficients across distributions and sample sizes: A tutorial using simulations and empirical data. Psychological methods, 21(3):273, 2016.Google ScholarCross Ref
Douglas G Bonett and Thomas A Wright. Sample size requirements for estimating pearson, kendall and spearman correlations. Psychometrika, 65(1):23--28, 2000.Google ScholarCross Ref
Mohamed Aly. Survey on multiclass classification methods. Neural Netw, 19:1--9, 2005.Google Scholar
Keiron O'Shea and Ryan Nash. An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458, 2015.Google Scholar
Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao. Recurrent convolutional neural networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015. Google ScholarDigital Library
Marcin Michał Miro'nczuk and Jarosław Protasiewicz. A recent overview of the state-of-the-art elements of text classification. Expert Systems with Applications, 106:36--54, 2018.Google ScholarCross Ref
Lawrence Mosley. A balanced approach to the multi-class imbalance problem. 2013.Google ScholarCross Ref

Index Terms

Evaluating the Prediction Bias Induced by Label Imbalance in Multi-label Classification
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
    2. Retrieval tasks and goals
      1. Clustering and classification

Recommendations

Correlated Multi-label Classification with Incomplete Label Space and Class Imbalance
Special Section on Advances in Causal Discovery and Inference and Regular Papers

Multi-label classification is defined as the problem of identifying the multiple labels or categories of new observations based on labeled training data. Multi-labeled data has several challenges, including class imbalance, label correlation, incomplete ...
Read More
Multi-label classification by exploiting label correlations

Nowadays, multi-label classification methods are of increasing interest in the areas such as text categorization, image annotation and protein function classification. Due to the correlation among the labels, traditional single-label classification ...
Read More
Learning label-specific features with global and local label correlation for multi-label classification
Abstract
Multi-label algorithms often use an identical feature space to build classification models for all labels. However, labels generally express different semantic information and should have their own characteristics. A few algorithms have been ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification bias
evaluation
imbalance
multi-label classification
Qualifiers
- short-paper
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 178
  Total Downloads
- Downloads (Last 12 months)32
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evaluating the Prediction Bias Induced by Label Imbalance in Multi-label Classification

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Correlated Multi-label Classification with Incomplete Label Space and Class Imbalance

Multi-label classification by exploiting label correlations

Learning label-specific features with global and local label correlation for multi-label classification