Multi-label imbalanced classification based on assessments of cost and value

Ding, Mengxiao; Yang, Youlong; Lan, Zhiqing

doi:10.1007/s10489-018-1156-8

Multi-label imbalanced classification based on assessments of cost and value

Published: 02 April 2018

Volume 48, pages 3577–3590, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

757 Accesses
15 Citations
Explore all metrics

Abstract

Multi-label imbalanced data comprise data with a disproportionate number of samples in the classes. Traditional classifiers are more suitable for classifying balanced data because the classification performance declines dramatically when the class sizes are imbalanced in multi-label data. In this study, we propose an algorithm that assesses the cost of the majority class and the value of the minority classes to handle the multi-label imbalanced data classification problem. The main idea of our algorithm is to provide a quantitative assessment of the cost of the majority class and the value of the minority class based on an imbalance ratio. In the data preprocessing step, we employ a penalty function to determine the number of majority class instances for elimination. The contributions of an instance determine whether a majority class instance is to be eliminated. In the classification step, we propose a metric to control the cost of the majority class and the value of the minority class. Experiments showed that this algorithm can improve the performance of multi-label imbalanced data classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on semi-supervised learning

Article Open access 15 November 2019

Notes

References

Bielza C, Li G, Larranga P (2011) Multi-dimensional classification with Bayesian network. Int J Proximate Reason 52:705– 727
Article MathSciNet MATH Google Scholar
Zhang M, Zhou Z (2014) A review on multi-label learning algorithms. IEEE Trans 8:1819–1831
Google Scholar
Ying Y, Pedrycz W, Miao D (2014) Multi-label classification by exploiting label correlations. Expert Syst Appl 41:2989–3004
Article Google Scholar
Vens C, Struyf J, Schietgat L (2008) Decision trees for hierarchical multi-label classification. Mach Leaning 73:185–214. https://doi.org/10.1007/s10994-008-5077-3
Article Google Scholar
Blockeel H, Schietgat L, Struyf J, Dzeroki S et al (2006) Decision tree for hierarchical multilabel classification: a case study in functional genomics, vol 2006. Springer, Berlin, pp 18–29
Google Scholar
Goncalves T, Quaresma P (2008) A preliminary approach to the multilabel classification problem of portuguese juridical documents, progress in artificial intelligence. EPIA 2003. Springer, Berlin, pp 435–444
Google Scholar
Hllermeier E, Frnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16-17):1897–1916
Article MathSciNet MATH Google Scholar
Tsoumakas G, Vlahavas I (2007) Random k-Labelsets: an ensemble method for multilabel classification. In: Machine learning ECML 2007. Lecture notes in computer science, vol 4701. Springer, Berlin, Heidelberg
Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39:135–168. https://doi.org/10.1023/A:1007649029923
Article MATH Google Scholar
Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Article MATH Google Scholar
Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Disc 28:92–122. https://doi.org/10.1007/s10618-012-0295-5
Article MathSciNet MATH Google Scholar
Mrquez-Vera C, Cano A, Romero C et al (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl Intell 38:315–330. https://doi.org/10.1007/s10489-012-0374-8
Article Google Scholar
Giraldo-Forero AF, Jaramillo-Garzn JA, Ruiz-Muoz JF, Castellanos-Domnguez CG (2013) Managing imbalanced data sets in multi-label problems: a case study with the SMOTE algorithm. In: Proceedings of the 18th Iberoamerican congress, CIARP 2013. Springer, pp 334–342
Lin W, Xu D (2016) Imbalanced Muli-label learning for identifying antimicrobial peptides and their functional types. Bioinformatics. https://doi.org/10.1093/bioinformatics/btw560
Charte F, Rivera A, del Jesus MJ, Herrera F (2013) A first approach to deal with imbalance in multi-label datasets. Springer, Berlin, pp 150–160
Google Scholar
Akkasi A, Varoglu E, Dimililer N (2017) Balanced undersampling: a novel sentence-based undersampling method to improve recognition of named entities in chemical and biomedical text. Appl Intell. https://doi.org/10.1007/s10489-017-0920-5
Fang M, Xiao Y, Wang C, Xie J (2014) Multi-label classification: dealing with imbalance by combining labels. In: IEEE international conference on TOOLS with artificial intelligence, pp 233–237
Zhang M-L, Li Y-K, Liu X-Y (2015) Towards class-imbalance aware multi-label learning. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, pp 4041–4147
Zhang X, Song Q et al (2015) Guangtaowang and a dissimilarity-based imbalance data classification algorithm. Appl Intell 42:544–565. https://doi.org/10.1007/s10489-014-0610-5
Article Google Scholar
Yi L, Guo H (2004) Murphey neural learning from unbalanced data. Appl Intell 21:117–128
Article MATH Google Scholar
Varando G, Bielza C, Larranga P (2016) Decision function for chain classifiers based on Bayesian network for multi-label classification. Int J Approx Reason 68:164–178
Article MathSciNet MATH Google Scholar
Varando G, Bielza C, Larranaga P (2014) Expressive power of binary relevance and chain classifiers based on Bayesian networks for multi-label classification. Springer, Berlin, pp 519–534
MATH Google Scholar
Varando G, Bielza C, Larranga P (2015) Decision boundary for disctete Bayesian network classifiers. J Mach Learn Res 16:2725–2749
MathSciNet MATH Google Scholar
Yang Y, Yan W (2012) On the properties of concept classes induced by multivalued Bayesian network. Infor Sci 184(1):155–165
Article MathSciNet MATH Google Scholar
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. Springer, Berlin, pp 22–30
Google Scholar
Read J, Pfahringer B, Holmes G et al (2011) Classifier chains for multi-label classification. Mach Learn 85:333–359. https://doi.org/10.1007/s10994-011-5256-5
Article MathSciNet Google Scholar
Sucar L, Bielza C, Eduardo F et al (2014) Morales Enrique multi-label classification with Bayesian network-based chain classifiers. Pattern Recogn Lett 41:14–22
Article Google Scholar
O’Donnell R, Rocco A (2010) Servedio new degree bounds for polynomial threshold functions. Combinatorica 30(3):327–358. https://doi.org/10.1007/s00493-010-2173-3
Article MathSciNet MATH Google Scholar
Devi D, Biswas S, Purkayastha B (2017) Redundancy-driven modified Tomek-link based undersampling: a solution to class imbalance. Pattern Recogn Lett 93:3–12
Cano A, Luna JM, Gibaja EL, Ventura S (2016) Laim discretization for multi-label data. Inform Sci 330(C):370–384
Article Google Scholar
Jiang L, Li C, Wang S et al (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39
Article Google Scholar
Jiang L, Cai Z, Wang D et al (2012) Improving tree augmented naive Bayes for class probability estimation. Knowl-Based Syst 26:239–245
Article Google Scholar
Melki G, Cano A, Kecman V et al (2017) Multi-target support vector regression via correlation regressor chains. Inform Sci 415– 416:53–69
Petterson J, Caetano T (2010) Reverse multi-label learning. Advan Neural Inform Process Syst 23:1912–1920
Google Scholar
Charte F, Rivera AJ, del Jesus MJ et al (2015) Addressing imbalance in multilabel classification; Measures and random resampling algorithms. Neurocomputing 163:3–16
Article Google Scholar
Charte F, Rivera AJ, del Jesus MJ et al (2014) MLeNN: a first approach to heuristic multilabel undersampling. In: International conference on intelligent data engineering and automated learning. Springer International Publishing, pp 1–9

Download references

Acknowledgments

The authors thank the editor and anonymous reviewers for their helpful comments and suggestions. This study was supported by the National Natural Science Foundation of China(Grant Nos. 61573266).

Author information

Authors and Affiliations

School of Mathematics and Statistics, Xidian University, Xi’an, 710126, Shaanxi, China
Mengxiao Ding & Youlong Yang
College of Mathematics and Computer Science, Hunan Normal University, Changsha, 410081, Hunan, China
Zhiqing Lan

Authors

Mengxiao Ding
View author publications
You can also search for this author in PubMed Google Scholar
Youlong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqing Lan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mengxiao Ding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, M., Yang, Y. & Lan, Z. Multi-label imbalanced classification based on assessments of cost and value. Appl Intell 48, 3577–3590 (2018). https://doi.org/10.1007/s10489-018-1156-8

Download citation

Published: 02 April 2018
Issue Date: October 2018
DOI: https://doi.org/10.1007/s10489-018-1156-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-label imbalanced classification based on assessments of cost and value

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on semi-supervised learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-label imbalanced classification based on assessments of cost and value

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on semi-supervised learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation