Measure oriented training: a targeted approach to imbalanced classification problems

Yuan, Bo; Liu, Wenhuang

doi:10.1007/s11704-012-2943-8

Measure oriented training: a targeted approach to imbalanced classification problems

Research Article
Published: 22 June 2012

Volume 6, pages 489–497, (2012)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Bo Yuan¹ &
Wenhuang Liu¹

120 Accesses
3 Citations
Explore all metrics

Abstract

Since the overall prediction error of a classifier on imbalanced problems can be potentially misleading and biased, alternative performance measures such as G-mean and F-measure have been widely adopted. Various techniques including sampling and cost sensitive learning are often employed to improve the performance of classifiers in such situations. However, the training process of classifiers is still largely driven by traditional error based objective functions. As a result, there is clearly a gap between themeasure according to which the classifier is evaluated and how the classifier is trained. This paper investigates the prospect of explicitly using the appropriate measure itself to search the hypothesis space to bridge this gap. In the case studies, a standard threelayer neural network is used as the classifier, which is evolved by genetic algorithms (GAs) with G-mean as the objective function. Experimental results on eight benchmark problems show that the proposed method can achieve consistently favorable outcomes in comparison with a commonly used sampling technique. The effectiveness of multi-objective optimization in handling imbalanced problems is also demonstrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function

Article 10 January 2019

Cost-Sensitive Learning based on Performance Metric for Imbalanced Data

Article 09 February 2022

A Logarithmic Distance-Based Multi-Objective Genetic Programming Approach for Classification of Imbalanced Data

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Chawla N V. Data mining for imbalanced datasets: an overview. In: Maimon O, Rokach L, eds. Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers. New York: Springer, 2005, 853–867
Chapter Google Scholar
Han S, Yuan B, Liu W. Rare class mining: progress and prospect. In: Proceedings of the 2009 Chinese Conference on Pattern Recognition. 2009, 137–141
Qu X, Yuan B, Liu W. A predictive model for identifying possible MCI to AD conversions in the ADNI database. In: Proceeding of the 2nd International Symposium on Knowledge Acquisition and Modeling, Vol 3. 2009, 102–105
Article Google Scholar
Freund Y, Schapire R E. Experiments with a new boosting algorithm. In: Proceedings of the 13th International Conference on Machine Learning. 1996, 148–156
Chawla N V, Lazarevic A, Hall L O, Bowyer K W. SMOTEBoost: improving prediction of the minority class in boosting. In: Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases. 2003, 107–119
Fan W, Stolfo S J, Zhang J, Chan P K. AdaCost: misclassification costsensitive boosting. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 97–105
Hoens T R, Chawla N V. Generating diverse ensembles to counter the problem of class imbalance. In: Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Part II. 2010, 488–499
Yuan B, Liu W. A measure oriented training scheme for imbalanced classification problems. In: Proceedings of the 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining Workshop on Biologically Inspired Techniques for Data Mining. 2011, 293–303
Kubat M, Matwin S. Addressing the curse of imbalanced training sets: one sided selection. In: Proceedings of the 14th Interactional Conference on Machine Learning. 1997, 179–186
Liu X, Wu J, Zhou Z. Exploratory under-sampling for class-imbalance learning. In: Proceedings of the 6th International Conference on Data Mining. 2006, 965–969
Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 2002, 16(1): 321–357
MATH Google Scholar
Yao X. Evolving artificial neural networks. Proceedings of the IEEE, 1999, 87(9): 1423–1447
Article Google Scholar
Goldberg D. Genetic Algorithms in Search, Optimization, and Machine Learning. Boston: Addison Wesley, 1989
MATH Google Scholar
Frank A, Asuncion A. UCI machine learning repository. 2010, http://archive.ics.uci.edu/ml
Mangasarian O L, Setiono R, Wolberg W H. Pattern recognition via linear programming: theory and application to medical diagnosis. In: Coleman T F, Li Y, eds. Large-Scale Numerical Optimization. 1990, 22–30
Cortez P, Cerdeira A, Almeida F, Matos T, Reis J. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 2009, 47(4): 547–553
Article Google Scholar
Horton P, Nakai K. A probabilistic classification system for predicting the cellular localization sites of proteins. In: Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology. 1996, 109–115
Jin Y, Sendhoff B. Pareto-based multiobjective machine learning: an overview and case studies. IEEE Transactions on Systems, Man and Cybernetics. Part C, Applications and Reviews, 2008, 38(3): 397–415
Article Google Scholar
Bhowan U, Zhang M, Johnston M. Multi-objective genetic programming for classification with unbalanced data. In: Proceedings of the 22nd Australasian Conference on Artificial Intelligence. 2009, 370–380
Ducange P, Lazzerini B, Marcelloni F. Multi-objective genetic fuzzy classifiers for imbalanced and cost-sensitive datasets. Soft Computing, 2010, 14(7): 713–728
Article Google Scholar
García S, Aler R, Galván I. Using evolutionary multiobjective techniques for imbalanced classification data. In: Proceedings of the 20th International Conference on Artificial Neural Networks. 2010, 422–427

Download references

Author information

Authors and Affiliations

Division of Informatics, Graduate School at Shenzhen, Tsinghua University, Shenzhen, 518055, China
Bo Yuan & Wenhuang Liu

Authors

Bo Yuan
View author publications
You can also search for this author inPubMed Google Scholar
Wenhuang Liu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Bo Yuan.

Additional information

Dr. Bo Yuan received his BEng from Nanjing University of Science and Technology, China, in 1998, and his MSc and PhD from the University of Queensland, Australia, in 2002 and 2006, respectively. From 2006 to 2007, he was a research officer on a project funded by the Australian Research Council at the University of Queensland. He is currently an associate professor in the Division of Informatics, Graduate School at Shenzhen, Tsinghua University, China, and a member of the IEEE and the IEEE Computational Intelligence Society. He is mostly interested in data mining, evolutionary computation, and parallel computing.

Prof. Wenhuang Liu received his BEng from Tsinghua University, China, in 1970 and has been a faculty member of Tsinghua University for more than forty years. He was the deputy director of National CIMS Engineering Research Center and the deputy dean of the Graduate School at Shenzhen, Tsinghua University. His research interests include CIMS, operation research, and decision support systems.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, B., Liu, W. Measure oriented training: a targeted approach to imbalanced classification problems. Front. Comput. Sci. 6, 489–497 (2012). https://doi.org/10.1007/s11704-012-2943-8

Download citation

Received: 10 August 2011
Accepted: 11 December 2011
Published: 22 June 2012
Issue Date: October 2012
DOI: https://doi.org/10.1007/s11704-012-2943-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Measure oriented training: a targeted approach to imbalanced classification problems

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Learning from Imbalanced Data Sets with Weighted Cross-Entropy Function

Cost-Sensitive Learning based on Performance Metric for Imbalanced Data

A Logarithmic Distance-Based Multi-Objective Genetic Programming Approach for Classification of Imbalanced Data

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now