Skip to main content

Advertisement

Log in

Multi-label imbalanced classification based on assessments of cost and value

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-label imbalanced data comprise data with a disproportionate number of samples in the classes. Traditional classifiers are more suitable for classifying balanced data because the classification performance declines dramatically when the class sizes are imbalanced in multi-label data. In this study, we propose an algorithm that assesses the cost of the majority class and the value of the minority classes to handle the multi-label imbalanced data classification problem. The main idea of our algorithm is to provide a quantitative assessment of the cost of the majority class and the value of the minority class based on an imbalance ratio. In the data preprocessing step, we employ a penalty function to determine the number of majority class instances for elimination. The contributions of an instance determine whether a majority class instance is to be eliminated. In the classification step, we propose a metric to control the cost of the majority class and the value of the minority class. Experiments showed that this algorithm can improve the performance of multi-label imbalanced data classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://www.imdb.com/interfaces/plain

  2. http://mulan.sourceforge.net

References

  1. Bielza C, Li G, Larranga P (2011) Multi-dimensional classification with Bayesian network. Int J Proximate Reason 52:705– 727

    Article  MathSciNet  MATH  Google Scholar 

  2. Zhang M, Zhou Z (2014) A review on multi-label learning algorithms. IEEE Trans 8:1819–1831

    Google Scholar 

  3. Ying Y, Pedrycz W, Miao D (2014) Multi-label classification by exploiting label correlations. Expert Syst Appl 41:2989–3004

    Article  Google Scholar 

  4. Vens C, Struyf J, Schietgat L (2008) Decision trees for hierarchical multi-label classification. Mach Leaning 73:185–214. https://doi.org/10.1007/s10994-008-5077-3

    Article  Google Scholar 

  5. Blockeel H, Schietgat L, Struyf J, Dzeroki S et al (2006) Decision tree for hierarchical multilabel classification: a case study in functional genomics, vol 2006. Springer, Berlin, pp 18–29

    Google Scholar 

  6. Goncalves T, Quaresma P (2008) A preliminary approach to the multilabel classification problem of portuguese juridical documents, progress in artificial intelligence. EPIA 2003. Springer, Berlin, pp 435–444

    Google Scholar 

  7. Hllermeier E, Frnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16-17):1897–1916

    Article  MathSciNet  MATH  Google Scholar 

  8. Tsoumakas G, Vlahavas I (2007) Random k-Labelsets: an ensemble method for multilabel classification. In: Machine learning ECML 2007. Lecture notes in computer science, vol 4701. Springer, Berlin, Heidelberg

  9. Schapire RE, Singer Y (2000) BoosTexter: a boosting-based system for text categorization. Mach Learn 39:135–168. https://doi.org/10.1023/A:1007649029923

    Article  MATH  Google Scholar 

  10. Zhang M-L, Zhou Z-H (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048

    Article  MATH  Google Scholar 

  11. Menardi G, Torelli N (2014) Training and assessing classification rules with imbalanced data. Data Min Knowl Disc 28:92–122. https://doi.org/10.1007/s10618-012-0295-5

    Article  MathSciNet  MATH  Google Scholar 

  12. Mrquez-Vera C, Cano A, Romero C et al (2013) Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl Intell 38:315–330. https://doi.org/10.1007/s10489-012-0374-8

    Article  Google Scholar 

  13. Giraldo-Forero AF, Jaramillo-Garzn JA, Ruiz-Muoz JF, Castellanos-Domnguez CG (2013) Managing imbalanced data sets in multi-label problems: a case study with the SMOTE algorithm. In: Proceedings of the 18th Iberoamerican congress, CIARP 2013. Springer, pp 334–342

  14. Lin W, Xu D (2016) Imbalanced Muli-label learning for identifying antimicrobial peptides and their functional types. Bioinformatics. https://doi.org/10.1093/bioinformatics/btw560

  15. Charte F, Rivera A, del Jesus MJ, Herrera F (2013) A first approach to deal with imbalance in multi-label datasets. Springer, Berlin, pp 150–160

    Google Scholar 

  16. Akkasi A, Varoglu E, Dimililer N (2017) Balanced undersampling: a novel sentence-based undersampling method to improve recognition of named entities in chemical and biomedical text. Appl Intell. https://doi.org/10.1007/s10489-017-0920-5

  17. Fang M, Xiao Y, Wang C, Xie J (2014) Multi-label classification: dealing with imbalance by combining labels. In: IEEE international conference on TOOLS with artificial intelligence, pp 233–237

  18. Zhang M-L, Li Y-K, Liu X-Y (2015) Towards class-imbalance aware multi-label learning. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, pp 4041–4147

  19. Zhang X, Song Q et al (2015) Guangtaowang and a dissimilarity-based imbalance data classification algorithm. Appl Intell 42:544–565. https://doi.org/10.1007/s10489-014-0610-5

    Article  Google Scholar 

  20. Yi L, Guo H (2004) Murphey neural learning from unbalanced data. Appl Intell 21:117–128

    Article  MATH  Google Scholar 

  21. Varando G, Bielza C, Larranga P (2016) Decision function for chain classifiers based on Bayesian network for multi-label classification. Int J Approx Reason 68:164–178

    Article  MathSciNet  MATH  Google Scholar 

  22. Varando G, Bielza C, Larranaga P (2014) Expressive power of binary relevance and chain classifiers based on Bayesian networks for multi-label classification. Springer, Berlin, pp 519–534

    MATH  Google Scholar 

  23. Varando G, Bielza C, Larranga P (2015) Decision boundary for disctete Bayesian network classifiers. J Mach Learn Res 16:2725–2749

    MathSciNet  MATH  Google Scholar 

  24. Yang Y, Yan W (2012) On the properties of concept classes induced by multivalued Bayesian network. Infor Sci 184(1):155–165

    Article  MathSciNet  MATH  Google Scholar 

  25. Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. Springer, Berlin, pp 22–30

    Google Scholar 

  26. Read J, Pfahringer B, Holmes G et al (2011) Classifier chains for multi-label classification. Mach Learn 85:333–359. https://doi.org/10.1007/s10994-011-5256-5

    Article  MathSciNet  Google Scholar 

  27. Sucar L, Bielza C, Eduardo F et al (2014) Morales Enrique multi-label classification with Bayesian network-based chain classifiers. Pattern Recogn Lett 41:14–22

    Article  Google Scholar 

  28. O’Donnell R, Rocco A (2010) Servedio new degree bounds for polynomial threshold functions. Combinatorica 30(3):327–358. https://doi.org/10.1007/s00493-010-2173-3

    Article  MathSciNet  MATH  Google Scholar 

  29. Devi D, Biswas S, Purkayastha B (2017) Redundancy-driven modified Tomek-link based undersampling: a solution to class imbalance. Pattern Recogn Lett 93:3–12

  30. Cano A, Luna JM, Gibaja EL, Ventura S (2016) Laim discretization for multi-label data. Inform Sci 330(C):370–384

    Article  Google Scholar 

  31. Jiang L, Li C, Wang S et al (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39

    Article  Google Scholar 

  32. Jiang L, Cai Z, Wang D et al (2012) Improving tree augmented naive Bayes for class probability estimation. Knowl-Based Syst 26:239–245

    Article  Google Scholar 

  33. Melki G, Cano A, Kecman V et al (2017) Multi-target support vector regression via correlation regressor chains. Inform Sci 415– 416:53–69

  34. Petterson J, Caetano T (2010) Reverse multi-label learning. Advan Neural Inform Process Syst 23:1912–1920

    Google Scholar 

  35. Charte F, Rivera AJ, del Jesus MJ et al (2015) Addressing imbalance in multilabel classification; Measures and random resampling algorithms. Neurocomputing 163:3–16

    Article  Google Scholar 

  36. Charte F, Rivera AJ, del Jesus MJ et al (2014) MLeNN: a first approach to heuristic multilabel undersampling. In: International conference on intelligent data engineering and automated learning. Springer International Publishing, pp 1–9

Download references

Acknowledgments

The authors thank the editor and anonymous reviewers for their helpful comments and suggestions. This study was supported by the National Natural Science Foundation of China(Grant Nos. 61573266).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mengxiao Ding.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, M., Yang, Y. & Lan, Z. Multi-label imbalanced classification based on assessments of cost and value. Appl Intell 48, 3577–3590 (2018). https://doi.org/10.1007/s10489-018-1156-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1156-8

Keywords

Navigation