Skip to main content

Big Data Classification – Aspects on Many Features

  • Chapter
  • First Online:
Solving Large Scale Learning Tasks. Challenges and Algorithms

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9580))

  • 1466 Accesses

Abstract

In this paper we discuss the performance of classical classification methods on Big Data. We concentrate on the case with many more features than observations and discuss the dependence of classification methods on the distance of the classes and their behavior for many noise features. The examples in this paper show that standard classification methods should be rechecked for Big Data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Thanks to T. Glasmachers for suggesting this definition.

  2. 2.

    We base on the corresponding part of an earlier version of this paper [7].

  3. 3.

    This simulation was carried out using the R-packages BatchJobs [2] and mlr on the SLURM cluster of the Statistics Department of TU Dortmund University.

References

  1. Bickel, P.J., Levina, E.: Some theory for Fisher’s linear discriminant function, naive bayes, and some alternatives when there are many more variables than observations. Bernoulli 10, 989–1010 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bischl, B., Lang, M., Mersmann, O., Rahnenfuehrer, J., Weihs, C.: BatchJobs and BatchExperiments: abstraction mechanisms for using R in batch environments. J. Stat. Softw. 64, 1–25 (2015). doi:10.18637/jss.v064.i11. http://www.jstatsoft.org/index.php/jss/article/view/v064i11

    Article  Google Scholar 

  3. Fan, J., Fan, Y., Wu, Y.: High-dimensional classification. In: Cai, T.T., Shen, X. (eds.) High-dimensional Data Analysis, pp. 3–37. World Scientific, New Jersey (2011)

    Google Scholar 

  4. Kiiveri, H.T.: A general approach to simultaneous model fitting and variable elimination in response models for biological data with many more variables than observations. BMC Bioinform. 9, 195 (2008). doi:10.1186/1471-2105-9-195

    Article  Google Scholar 

  5. McLachlan, G.J.: Discriminant analysis and statistical pattern recognition. Wiley, New York (1992)

    Book  MATH  Google Scholar 

  6. Core Team, R.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2014). http://www.R-project.org/

  7. Weihs, C., Horn, D., Bischl, B.: Big data classification: aspects on many features and many observations. In: Wilhelm, A.F.X., Kestler, H.A. (eds.) Analysis of Large and Complex Data, pp. 113–122. Springer (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claus Weihs .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Weihs, C. (2016). Big Data Classification – Aspects on Many Features. In: Michaelis, S., Piatkowski, N., Stolpe, M. (eds) Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science(), vol 9580. Springer, Cham. https://doi.org/10.1007/978-3-319-41706-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41706-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41705-9

  • Online ISBN: 978-3-319-41706-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics