Predicting the Influence of Additional Training Data on Classification Performance for Imbalanced Data

Kockentiedt, Stephen; Tönnies, Klaus; Gierke, Erhardt

doi:10.1007/978-3-319-11752-2_30

Stephen Kockentiedt^16,17,
Klaus Tönnies¹⁶ &
Erhardt Gierke¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8753))

Included in the following conference series:

German Conference on Pattern Recognition

2752 Accesses

Abstract

It is desirable to predict the influence of additional training data on classification performance because the generation of samples is often costly. Current methods can only predict performance as measured by accuracy, which is not suitable if one class is much rarer than another. We propose an approach which is able to also predict other measures such as G-mean and F-measure, which are used in cases of imbalanced data. We show that our method leads to more correct decisions whether to generate more training samples or not using a highly imbalanced real-world dataset of scanning electron microscopy images of nanoparticles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)
Article Google Scholar
Kockentiedt, S., Tönnies, K., Gierke, E., Dziurowitz, N., Thim, C., Plitzko, S.: Automatic detection and recognition of engineered nanoparticles in SEM images. In: VMV 2012: Vision, Modeling & Visualization, pp. 23–30. Eurographics Association (2012)
Google Scholar
Kohavi, R., Wolpert, D.H.: Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the 13th International Conference on Machine Learning, pp. 275–283 (1996)
Google Scholar
Mukherjee, S., Tamayo, P., Rogers, S., Rifkin, R., Engle, A., Campbell, C., Golub, T.R., Mesirov, J.P.: Estimating dataset size requirements for classifying DNA microarray data. J. Comput. Biol. 10(2), 119–142 (2003)
Article Google Scholar
Smith, J.E., Tahir, M.A.: Stop wasting time: on predicting the success or failure of learning for industrial applications. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 673–683. Springer, Heidelberg (2007)
Chapter Google Scholar
Smith, J.E., Tahir, M.A., Sannen, D., Van Brussel, H.: Making early predictions of the accuracy of machine learning classifiers. In: Sayed-Mouchaweh, M., Lughofer, E. (eds.) Learning in Non-stationary Environments, Chap. 6, pp. 125–151. Springer, New York (2012)
Google Scholar
Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(4), 687–719 (2009)
Article Google Scholar
Webb, G.I., Conilione, P.: Estimating bias and variance from data. Technical report, Monash University, Melbourne (2003). http://www.csse.monash.edu/~webb/Files/WebbConilione06.pdf

Download references

Author information

Authors and Affiliations

Department of Simulation and Graphics, Faculty of Computer Science, University of Magdeburg, Magdeburg, Germany
Stephen Kockentiedt & Klaus Tönnies
Federal Institute for Occupational Safety and Health, Berlin, Germany
Stephen Kockentiedt & Erhardt Gierke

Authors

Stephen Kockentiedt
View author publications
You can also search for this author in PubMed Google Scholar
Klaus Tönnies
View author publications
You can also search for this author in PubMed Google Scholar
Erhardt Gierke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen Kockentiedt .

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, University of Münster, Münster, Germany
Xiaoyi Jiang
Computer Science Department 5, University of Erlangen-Nürnberg, Erlangen, Germany
Joachim Hornegger
Department of Computer Science, University of Kiel, Kiel, Germany
Reinhard Koch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kockentiedt, S., Tönnies, K., Gierke, E. (2014). Predicting the Influence of Additional Training Data on Classification Performance for Imbalanced Data. In: Jiang, X., Hornegger, J., Koch, R. (eds) Pattern Recognition. GCPR 2014. Lecture Notes in Computer Science(), vol 8753. Springer, Cham. https://doi.org/10.1007/978-3-319-11752-2_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-11752-2_30
Published: 15 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11751-5
Online ISBN: 978-3-319-11752-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics