Skip to main content
Log in

Cost-sensitive hierarchical classification for imbalance classes

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The hierarchical classification with an imbalance class problem is a challenge for in machine learning, and is caused by data with an uneven distribution. Learning from an imbalanced dataset can lead to performance degradation of the classifier. Cost-sensitive learning is a useful solution for handling the gap probability of majority and minority classes. This paper proposes a cost-sensitive hierarchical classification for imbalance classes (CSHCIC), constructing a cost-sensitive factor to balance the relationship between majority and minority classes. First, we divide a large hierarchical classification task into several small subclassification tasks by class hierarchy. Second, we establish a cost-sensitive factor by more precisely using the number of different samples of subclassifications. Then, we calculate the probability of every node using logistic regression. Lastly, we update the cost-sensitive factor using the flexibility factor and the number of samples. The experimental results show that the cost-sensitive hierarchical classification method achieves excellent performance on handling imbalance class datasets. The running time cost of the proposed method is smaller than most state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Datasets and Matlab code in this research have been uploaded to GitHub. They are accessible by the following link: https://github.com/fhqxa//APIN-D-19-01226.

References

  1. Batista G, Prati R, Monard M (2004) A study of the behavior of several methods for balancing machine learning training data. Acm Sigkdd Explor Newslett 6(1):20–29

    Google Scholar 

  2. Braytee A, Wei L, Kennedy P (2016) A cost-sensitive learning strategy for feature extraction from imbalanced data. In: International conference on neural information processing

  3. Cao P, Zhao D, Zaiane O (2013) An optimized cost-sensitive SVM for imbalanced data learning. In: Pacific-Asia conference on knowledge discovery and data mining

  4. Chung Y, Lin H, Yang S (2015) Cost-aware pre-training for multiclass cost-sensitive deep learning. Computer Science

  5. Ding C, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4):349–358

    Google Scholar 

  6. Duda R, Hart P, Stork D (2001) Pattern classification

  7. Fan J, Zhang J, Mei K, Peng J, Gao L (2015) Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection. Pattern Recogn 48(5):1673–1687

    Google Scholar 

  8. Fawcett T, Provost F (1997) Adaptive fraud detection. Data Min Knowl Disc 1(3):291–316

    Google Scholar 

  9. Grimaudo L, Mellia M, Baralis E (2012) Hierarchical learning for fine grained internet traffic classification. In: International wireless communications and mobile computing conference

  10. He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Google Scholar 

  11. Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study

  12. Kai M (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14 (3):659–665

    Google Scholar 

  13. Khan S, Hayat M, Bennamoun M, Sohel F, Togneri R (2018) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573– 3587

    Google Scholar 

  14. Kira K, Rendell L (1992) A practical approach to feature selection. In: International workshop on machine learning

  15. Krawczyk B, Woźniak M, Schaefer G (2014) Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl Soft Comput 14(1):554–562

    Google Scholar 

  16. Li D, Ju Y, Zou Q (2016) Protein folds prediction with hierarchical structured SVM. Curr Proteomics 13(2):79–85

    Google Scholar 

  17. Liu J, Hu Q, Yu D (2008) A weighted rough set based method developed for class imbalance learning. Inform Sci 178(4):1235–1256

    MathSciNet  MATH  Google Scholar 

  18. Liu X, Zhao H (2019) Hierarchical feature extraction based on discriminant analysis. Appl Intell 49 (7):2780–2792

    Google Scholar 

  19. Lu H, Xu Y, Ye M, Ke Y, Jin Q, Gao Z (2018) Learning misclassification costs for imbalanced datasets application in gene expression data classification

  20. Liu X, Wu J, Zhou Z (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans Syst Man Cybern B 39(2):539–550

    Google Scholar 

  21. Min F, Liu F, Wen L, Zhang Z (2018) Tri-partition cost-sensitive active learning through KNN. Soft Comput 10:1–16

    Google Scholar 

  22. Mullick S, Datta S, Das S (2018) Adaptive learning-based k-nearest neighbor classifiers with resilience to class imbalance. IEEE Trans Neural Netw Learn Syst 99:1–13

    MathSciNet  Google Scholar 

  23. Murzin A, Brenner S, Hubbard T, Chothia C (1995) Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540

    Google Scholar 

  24. Nakano F, Pinto W, Pappa G, Cerri R (2017) Top-down strategies for hierarchical classification of transposable elements with neural networks. In: International joint conference on neural networks

  25. Nie F, Huang H, Xiao C, Ding C (2010) Efficient and robust feature selection via joint l2,1-norms minimization. In: International conference on neural information processing systems

  26. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell, 1226–1238

  27. Prati R, Batista G, Monard M (2004) Class imbalances versus class overlapping: An analysis of a learning system behavior. Lect Notes Comput Sci 2972:312–321

    Google Scholar 

  28. Tao Q, Wu G, Wang F, Wang J (2005) Posterior probability support vector machines for unbalanced data. IEEE Trans Neural Netw 16(6):1561–1573

    Google Scholar 

  29. Qu Y, Lin L, Shen F, Lu C, Wu Y, Xie Y, Tao D (2017) Joint hierarchical category structure learning and large-scale image classification. IEEE Trans Image Process, 4331–4346

  30. Sandrine D, Jane F (2002) A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 3(7):1–21

    Google Scholar 

  31. Sun A, Lim E (2001) Hierarchical text classification and evaluation. In: IEEE international conference on data mining

  32. Sun Y, Kamel M, Wong A, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn 40(12):3358–3378

    MATH  Google Scholar 

  33. Tuo Q, Zhao H, Hu Q (2019) Hierarchical feature selection with subtree based graph regularization. Knowl-Based Syst 163:996–1008

    Google Scholar 

  34. Wei L, Liao M, Gao X, Zou Q (2015) An improved protein structural prediction method by incorporating both sequence and structure information. IEEE Trans Nanobioscience 14(4):339–349

    Google Scholar 

  35. Xiao J, Hays J, Ehinger K, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. Proc IEEE Conf Comput Vis Pattern Recogn 23(3):3485–3492

    Google Scholar 

  36. Yu W, Hu Q, Zhou Y, Hong Z, Qian Y, Liang J (2017) Local bayes risk minimization based stopping strategy for hierarchical classification. In: IEEE International conference on data mining

  37. Yuan X, Xie L, Abouelenien M (2017) A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data. Pattern Recogn 77:160–172

    Google Scholar 

  38. Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. In: IEEE International conference on data mining

  39. Zhang C, Tan K, Li H, Hong G (2018) A cost-sensitive deep belief network for imbalanced classification. IEEE Trans Neural Netw Learn Syst 99:1–14

    Google Scholar 

  40. Zhou Y, Hu Q, Yu W (2018) Deep super-class learning for long-tail distributed image classification. Pattern Recogn, 118–128

  41. Ashburner M, Ball C, Blake J, Botstein D, Cherry J (2000) Gene ontology: tool for the unification of biology. Nat Gen, 25–29

  42. Gopal S, Yang Y (2015) Hierarchical Bayesian inference and recursive regularization for large-scale classification. Acm Trans Knowl Discov Data, 1–23

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant No. 61703196, the Natural Science Foundation of Fujian Province under Grant No. 2018J01549, and the President’s Fund of Minnan Normal University under Grant No. KJ19021.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Zhao.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, W., Zhao, H. Cost-sensitive hierarchical classification for imbalance classes. Appl Intell 50, 2328–2338 (2020). https://doi.org/10.1007/s10489-019-01624-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01624-z

Keywords

Navigation