Abstract
In this paper we discuss the performance of classical classification methods on Big Data. We concentrate on the case with many more features than observations and discuss the dependence of classification methods on the distance of the classes and their behavior for many noise features. The examples in this paper show that standard classification methods should be rechecked for Big Data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bickel, P.J., Levina, E.: Some theory for Fisher’s linear discriminant function, naive bayes, and some alternatives when there are many more variables than observations. Bernoulli 10, 989–1010 (2004)
Bischl, B., Lang, M., Mersmann, O., Rahnenfuehrer, J., Weihs, C.: BatchJobs and BatchExperiments: abstraction mechanisms for using R in batch environments. J. Stat. Softw. 64, 1–25 (2015). doi:10.18637/jss.v064.i11. http://www.jstatsoft.org/index.php/jss/article/view/v064i11
Fan, J., Fan, Y., Wu, Y.: High-dimensional classification. In: Cai, T.T., Shen, X. (eds.) High-dimensional Data Analysis, pp. 3–37. World Scientific, New Jersey (2011)
Kiiveri, H.T.: A general approach to simultaneous model fitting and variable elimination in response models for biological data with many more variables than observations. BMC Bioinform. 9, 195 (2008). doi:10.1186/1471-2105-9-195
McLachlan, G.J.: Discriminant analysis and statistical pattern recognition. Wiley, New York (1992)
Core Team, R.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2014). http://www.R-project.org/
Weihs, C., Horn, D., Bischl, B.: Big data classification: aspects on many features and many observations. In: Wilhelm, A.F.X., Kestler, H.A. (eds.) Analysis of Large and Complex Data, pp. 113–122. Springer (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Weihs, C. (2016). Big Data Classification – Aspects on Many Features. In: Michaelis, S., Piatkowski, N., Stolpe, M. (eds) Solving Large Scale Learning Tasks. Challenges and Algorithms. Lecture Notes in Computer Science(), vol 9580. Springer, Cham. https://doi.org/10.1007/978-3-319-41706-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-41706-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41705-9
Online ISBN: 978-3-319-41706-6
eBook Packages: Computer ScienceComputer Science (R0)