Skip to main content

Predicting the Influence of Additional Training Data on Classification Performance for Imbalanced Data

  • Conference paper
  • First Online:
Pattern Recognition (GCPR 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8753))

Included in the following conference series:

  • 2752 Accesses

Abstract

It is desirable to predict the influence of additional training data on classification performance because the generation of samples is often costly. Current methods can only predict performance as measured by accuracy, which is not suitable if one class is much rarer than another. We propose an approach which is able to also predict other measures such as G-mean and F-measure, which are used in cases of imbalanced data. We show that our method leads to more correct decisions whether to generate more training samples or not using a highly imbalanced real-world dataset of scanning electron microscopy images of nanoparticles.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009)

    Article  Google Scholar 

  2. Kockentiedt, S., Tönnies, K., Gierke, E., Dziurowitz, N., Thim, C., Plitzko, S.: Automatic detection and recognition of engineered nanoparticles in SEM images. In: VMV 2012: Vision, Modeling & Visualization, pp. 23–30. Eurographics Association (2012)

    Google Scholar 

  3. Kohavi, R., Wolpert, D.H.: Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the 13th International Conference on Machine Learning, pp. 275–283 (1996)

    Google Scholar 

  4. Mukherjee, S., Tamayo, P., Rogers, S., Rifkin, R., Engle, A., Campbell, C., Golub, T.R., Mesirov, J.P.: Estimating dataset size requirements for classifying DNA microarray data. J. Comput. Biol. 10(2), 119–142 (2003)

    Article  Google Scholar 

  5. Smith, J.E., Tahir, M.A.: Stop wasting time: on predicting the success or failure of learning for industrial applications. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 673–683. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Smith, J.E., Tahir, M.A., Sannen, D., Van Brussel, H.: Making early predictions of the accuracy of machine learning classifiers. In: Sayed-Mouchaweh, M., Lughofer, E. (eds.) Learning in Non-stationary Environments, Chap. 6, pp. 125–151. Springer, New York (2012)

    Google Scholar 

  7. Sun, Y., Wong, A.K., Kamel, M.S.: Classification of imbalanced data: a review. Int. J. Pattern Recogn. Artif. Intell. 23(4), 687–719 (2009)

    Article  Google Scholar 

  8. Webb, G.I., Conilione, P.: Estimating bias and variance from data. Technical report, Monash University, Melbourne (2003). http://www.csse.monash.edu/~webb/Files/WebbConilione06.pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen Kockentiedt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Kockentiedt, S., Tönnies, K., Gierke, E. (2014). Predicting the Influence of Additional Training Data on Classification Performance for Imbalanced Data. In: Jiang, X., Hornegger, J., Koch, R. (eds) Pattern Recognition. GCPR 2014. Lecture Notes in Computer Science(), vol 8753. Springer, Cham. https://doi.org/10.1007/978-3-319-11752-2_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11752-2_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11751-5

  • Online ISBN: 978-3-319-11752-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics