Multi-label thresholding for cost-sensitive classification

doi:10.1016/j.neucom.2020.12.004

Neurocomputing

Volume 436, 14 May 2021, Pages 232-247

https://doi.org/10.1016/j.neucom.2020.12.004 Get rights and content

Abstract

Multi-label classification associates each instance with a set of labels which reflects the nature of a wide range of real-world applications. However, existing approaches assume that all labels have the same misclassification cost, whereas in real-world problems different types of misclassification errors have different costs, which are generally unknown in the training context or might change from one context to another. Thus, there is a demand for cost-sensitive classification methods that minimise the average misclassification cost rather than error rates or counts. In this paper, we adopt a simple yet general method, called thresholding, which applies to most classification algorithms to adapt them to cost-sensitive multi-label classification. This paper investigates current threshold choice approaches for multi-label classification. It explores the choice of single and multiple thresholds and extends some of the current techniques to support multi-label problems. Moreover, it proposes cost curves and scatter diagrams for performance evaluation in the multi-label setting. Experimental evaluation on 13 multi-label datasets demonstrates that there is no significant loss by adjusting a global threshold rather than a per-label threshold considering different misclassification costs across labels. Although tuning multiple thresholds is the obvious solution, the global threshold can also be valid.

Section snippets

Introduction and motivation

Classification is one of the most prominent learning tasks in machine learning, in which the task is to classify a new instance into one or more potential classes. Traditional classification allows only a single label (binary or multi-class) whereas multi-label classification considers more than one label for each instance simultaneously [1], [2], [3]. Usually, it is inadequate to classify each instance under just one single label, because several labels could describe its content concurrently.

Background and related work

Cost-sensitive learning is “a type of learning in data mining that takes misclassification costs and possibly other types of cost into consideration” [8]. In cost-sensitive learning, the key idea is to treat different misclassification costs differently to achieve higher classification accuracy. Costs are not necessarily monetary but can, for example, be a waste of time or illness. Many real-world applications such as medical decision making, fraud detection, target marketing and object

Preliminaries

In this section, we define key concepts and notation used throughout the paper. Let $X$ be the instance space and $Y$ be the label space. The number of labels is denoted q – in a single-label classification (binary or multi-class) $q = 1$ , while in a multi-label classification $q > 1$ , because the label space is extended to multiple vectors enabling multiple labels per instance. Label sets are subsets of $Y$ .

$D = {(x_{1}, y_{1}), (x_{2}, y_{2}), (x_{3}, y_{3}), \dots, (x_{m}, y_{m})}$ is any dataset with m instances, where $x$ is an instance and $y$

Evaluating multi-label classifiers

As mentioned in the introduction, in multi-label learning, a prediction is obtained for each (instance, label) pair which is typically real-valued score. A thresholding strategy will be used to convert the predicted scores to actual labels. Multi-label classification evaluation measures are divided into two main categories: 1) instance-based methods that compute the average differences of the actual and the predicted labels over all instances; and 2) label-based methods that break down the

Threshold choice methods for multi-label problems

In this section, we present different approaches to perform the final labelling of a multi-label dataset given a set of scores for the potential labels. Let us assume we have trained a multi-label model with a given training dataset $D$ . Then, we run the classifier on a test set $S$ and we obtain a confidence score matrix $CM$ that indicates for each test instance the predicted scores. The final task is to decide the best labels for the test instances. We distinguish between the number of thresholds

Experimental evaluation

13 multi-label datasets have been used in our experiments to clarify the differences among different thresholding techniques. We used the train/test splits that are provided in Meka¹ and Mulan² repositories [43]. Neither repository provide train/test splits for three of the datasets, namely, Cal500, Language log and Slashdot. Instead, we used the random splits (75% train, 25% test) that are made available by the KDIS research group.³

Concluding remarks

There is a great deal of literature on multi-label learning. To the best of our knowledge, none of these studies have introduced changes in misclassification costs across contexts. In this paper, we explore multi-label threshold choice methods: fixed, rate-driven, optimal, RCut and MCut. In addition, we introduce two novel thresholding methods for multi-label classification: score-driven and global optimal. Score-driven threshold can be adjusted globally (per dataset) or locally (per label).

CRediT authorship contribution statement

Reem Alotaibi: Conceptualization, Formal analysis, Investigation, Visualization, Methodology, Software, Writing - original draft. Peter Flach: Conceptualization, Formal analysis, Investigation, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This project is funded by Deanship of Scientific Research (DSR), King Abdulaziz University, Saudi Arabia, Jeddah under Grant No. (J-159-612-1440).

Dr. Reem Alotaibi received her PhD in computer science from University of Bristol, Bristol, U.K., in 2017. SShe he She is an assistant professor at the Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia. Her research interests include Artificial Intelligence, Machine earning, Data mining and Crowd management. Dr. Alotaibi’s research has been funded by several sources in Saudi Arabia including Deputyship for Research & Innovation, Ministry of

References (45)

D.H. Wolpert
Stacked generalization
Neural Networks
(1992)
G. Tsoumakas, I. Katakis, I. Vlahavas, Mining multi-label data, in: Data Mining and Knowledge Discovery Handbook, 2010,...
M.-L. Zhang et al.
A review on multi-label learning algorithms
IEEE Trans. Knowl. Data Eng.
(2014)
E. Gibaja, S. Ventura, A tutorial on multilabel learning, ACM Comput. Surveys 47 (3) (2015) 52:1–52:38. ISSN 0360-0300,...
R. Al-Otaibi, M. Kull, P. Flach, Declaratively capturing local label correlations with multi-label trees, in: G.A....
E.K. Yapp, X. Li, W.F. Lu, P.S. Tan, Comparison of base classifiers for multi-label learning, Neurocomputing. ISSN...
R. Al-Otaibi, P.A. Flach, M. Kull, Multi-label classification: a comparative study on threshold selection methods,...
A. Rivolli, A. de Carvalho, The utiml package: multi-label classification in R, R J. 10 (2019) 24....
C.X. Ling, V.S. Sheng, Cost-Sensitive Learning and the Class Imbalance Problem, 2008, Springer, pp. 869–875. ISBN...
C. Elkan, The foundations of cost-sensitive learning, in: Proceedings of the Seventeenth International Joint Conference...

H.-T. Lin, Cost-sensitive classification: status and beyond, in: Proceedings of Workshop Machine Learning Research in...

Z.-H. Zhou, X.-Y. Liu, On multi-class cost-sensitive learning, in: Proceedings of the 21st National Conference on...

C.X. Ling et al.

Cost-sensitive Learning and the Class Imbalanced Problem

(2007)

J. Li, X. Li, X. Yao, Cost-sensitive classification with genetic programming, in: Proceedings of the 2005 IEEE Congress...

N. Cesa-Bianchi, M. Re, G. Valentini, Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive...

C. Li, H. Lin, Condensed filter tree for cost-sensitive multi-label classification, in: Proceedings of the 31th...

Y.-P. Wu, H.-T. Lin, Progressive random k-labelsets for cost-sensitive multi-label classification, Mach. Learn. (2016)...

H.-Y. Lo, J.-C. Wang, H.-M. Wang, S.-D. Lin, Cost-sensitive multi-label learning for audio tag annotation and...

G. Tsoumakas, I. Vlahavas, Random k-Labelsets: an ensemble method for multilabel classification, in: Proceedings of the...

P. Cao, X. Liu, D. Zhao, O. Zaiane, Cost Sensitive Ranking Support Vector Machine for Multi-label Data Learning, in: A....

K.-H. Huang, H.-T. Lin, Cost-sensitive label embedding for multi-label classification, Mach. Learn. 106 (9) (2017)...

C.-Y. Hsieh, Y.-A. Lin, H.-T. Lin, A deep model with local surrogate loss for general cost-sensitive multi-label...

Cited by (13)

Contrastively enforcing distinctiveness for multi-label image classification
2023, Neurocomputing
Recently, as an effective way of learning latent representations, contrastive learning has been increasingly popular and successful in various domains. The success of contrastive learning in single-label classifications motivates us to leverage this learning framework to enhance distinctiveness for better performance in multi-label image classification. In this paper, we show that a direct application of contrastive learning can hardly improve in multi-label cases. Accordingly, we propose a novel framework for multi-label classification with contrastive learning in a fully supervised setting, which learns multiple representations of an image under the context of different labels. This introduces a simple yet intuitive adaption of contrastive learning into our model to boost its performance in multi-label image classification. Extensive experiments on four benchmark datasets show that the proposed framework achieves state-of-the-art performance in the comparison with the advanced methods in multi-label classification.
Joint optimization of scoring and thresholding models for online multi-label classification
2023, Pattern Recognition
Existing online multi-label classification works cannot well handle the online label thresholding problem and lack regret analysis for their online algorithms. This paper proposes a novel framework of joint optimization of scoring and thresholding models for online multi-label classification, with the aim to overcome the above drawbacks. The key feature of our framework is that both scoring and thresholding models are included as important components of the online multi-label classifier and are incorporated into one online optimization problem. Based on this framework, we present two adaptive label thresholding algorithms and two fixed thresholding algorithms. For each type of algorithms, a first-order method and a second-order one are provided for updating the online multi-label classifier. Both methods enjoy a closed-form update. Our proposed algorithms are proved to achieve a sub-linear regret. Using Mercer kernels, two first-order algorithms can be extended to handle nonlinear multi-label prediction tasks. Experiments show the advantage of the adaptive and the fixed thresholding algorithms, in terms of various multi-label performance metrics.
Subspace screening rule for multi-label estimator with sparsity-inducing regularization
2023, Neurocomputing
Multi-label learning has drawn wide attention for the last decade. To exploit the correlation between labels, a multi-label classifier based on the nuclear norm has been proposed recently, which joints Ranking support vector machine (RankSVM) and Binary Relevance (BR) with robust Low-rank learning (RBRL). Therefore, it has satisfactory classification outcomes in the application. Nonetheless, tackling the large-scale problem still remains a challenge for RBRL. Motivated by this, a Subspace Screening Rule (SSR) for RBRL is proposed to accelerate the solving process. Its primary strategy is to reduce the size of the matrix variable to be estimated, based on the fact that a low-rank matrix can be represented by a few subspaces. Specifically, at each iteration we delete a majority of subspaces with zero coefficients in the optimal solution by matrix decomposition and optimality condition. Then, we solve the small-scale reduced problem rather than the initial large-scale matrix problem. An excellent acceleration effect is obtained. To further improve the solving speed, Approximate Singular Value Decomposition (ASVD) and Accelerated Proximal Gradient (APG) are employed in the different stages. Extensive experiments on seven benchmark datasets as well as an artificial dataset demonstrate the efficiency of SSR.
A contradiction solving method for complex product conceptual design based on deep learning and technological evolution patterns
2023, Advanced Engineering Informatics
Contradictions caused by the various design constraints present increasing challenges to efficiency and innovation in product development. TRIZ provides Inventive Principles (IPs) and Contradiction Matrix that are the most frequently applied in conflict resolution. However, the high-level abstraction and subjective selection of IPs inhibit achieving the transformation process from paradoxical states to physical structures. To fill this gap, a contradiction solving method by integrating deep learning and technological evolution patterns for product conceptual design is proposed, which illustrates the mechanism of contradiction transition from the perspective of system evolution and supplies a systematic and model-based design approach. Firstly, generic engineering parameters are extracted to define the underlying contradictions transformed from critical defects which are found out through function modeling and root-conflict analysis. Then, a fully-connected deep neural network with excellent performance is developed to uncover the non-linear relationships between engineering parameters and evolution patterns. Finally, an evolution tree based on the predicted patterns is constructed to visualize transformation potentials of a technical system and help designers generate innovative specific solutions. In addition, a case study concerning design conflict resolution for beat-up system of three-dimensional tubular weaving machine is used to validate the adaptability and reliability of the proposed approach.
CS-ResNet: Cost-sensitive residual convolutional neural network for PCB cosmetic defect detection
2021, Expert Systems with Applications
Citation Excerpt :
Different from cost-sensitive learning, threshold-moving adjusted the output threshold toward majority classes such that samples with higher costs became harder to be misclassified (Zhu et al., 2018). Some researches have shown that threshold-moving is as effective as resampling in addressing the class imbalance and cost-sensitive problems (Alotaibi & Flach, 2021). The flowchart of our proposed PCB cosmetic defect detection system is shown in Fig. 3.
In the printed circuit board (PCB) industry, cosmetic defect detection is an essential process to ensure product quality. However, existing PCB cosmetic defect detection approaches have a high false alarm rate, which lead to expensive labor costs of manual confirmation. To solve this problem, some traditional machine learning-based approaches have been proposed, but they just utilize hand-crafted features to build classifiers and thus are rough and sub-optimal. Recently, due to its powerful capability in automatic feature extraction, convolutional neural network (CNN) has been widely used in PCB cosmetic defect detection. However, few of them pay attention to the imbalanced class distribution as well as the different misclassification costs of real and pseudo defects, both of which are common problems in the PCB industry. To this end, in this study, we propose a novel model called cost-sensitive residual convolutional neural network (CS-ResNet) by adding a cost-sensitive adjustment layer in the standard ResNet. Specifically, we assign larger weights to minority real defects based on the class-imbalance degree and then optimize CS-ResNet by minimizing the weighted cross-entropy loss function. We conducted a series of experiments by comparing CS-ResNet with the standard ResNet, state-of-the-art CNN-based approach Auto-VRS and traditional machine learning-based approach HOG-SVM on a real-world PCB cosmetic defect dataset. Experimental results show that CS-ResNet achieves the highest $S e n s i t i v i t y$ (0.89), $G$ - $m e a n$ (0.91) and the lowest misclassification costs.
From Scores to Predictions in Multi-Label Classification: Neural Thresholding Strategies
2023, Applied Sciences (Switzerland)

View all citing articles on Scopus

Peter Flach received the PhD degree in Computer Science from Tilburg University, the Netherlands in 1995. He is a Professor of artificial intelligence at the University of Bristol. His research interests include mining highly structured data and the evaluation and improvement of machine learning models. Flach has been Editor-in-Chief of the Machine Learning journal since 2010, and is the author of Machine Learning: The Art and Science of Algorithms That Make Sense of Data (Cambridge University Press, 2012).

View full text