Observing a Naïve Bayes Classifier’s Performance on Multiple Datasets

Brumen, Boštjan; Rozman, Ivan; Černezel, Aleš

doi:10.1007/978-3-319-10933-6_20

Boštjan Brumen¹⁸,
Ivan Rozman¹⁸ &
Aleš Černezel¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8716))

Included in the following conference series:

East European Conference on Advances in Databases and Information Systems

1041 Accesses

Abstract

General theories describing the performance of artificial learners are of little help when a user is confronted with a selection of datasets and a given artificial classifier. The objective of this paper is to find out the best description of the learning curves produced by a Naïve Bayes classification. The performance of Naïve Bayes was measured on 121 datasets using k-fold crossvalidation. Power, linear, logarithmic and exponential functions were fit to the data. The exponential function was a better descriptor of the error rate in 44 of 60 useful cases. Average mean squared error is significantly different at P=0,000 from power and linear and at P=0,001 from logarithmic function. The exponential function’s rank is significantly different from the ranks of other models (P=0,000). The results can be used to forecast the future performance of the learner, or to check where on the learning curve the current measurement lies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, J.R., Schooler, L.J.: Reflections of the Environment in Memory. Psychological Science 2(6), 396–408 (1991)
Article Google Scholar
Anderson, R.B.: The power law as an emergent property. Memory & Cognition 29(7), 1061–1068 (2001)
Article Google Scholar
Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-Law Distributions in Empirical Data. SIAM Review 51(4), 661–703 (2009), doi:10.1137/070710111
Article MATH MathSciNet Google Scholar
Heathcote, A., Brown, S., Mewhort, D.J.K.: The power law repealed: The case for an exponential law of practice. Psychonomic Bulletin & Review 7(2), 185–207 (2000), doi:10.3758/bf03212979
Article Google Scholar
Kotsiantis, S.B.: Supervised Machine Learning: A Review of Classification Techniques. Informatica (Ljubljana) 31(3), 249–268 (2007)
MATH MathSciNet Google Scholar
Dzemyda, G., Sakalauskas, L.: Large-Scale Data Analysis Using Heuristic Methods. Informatica (Lithuan.) 22(1), 1–10 (2011)
Google Scholar
Vapnik, V.N.: Estimation of Dependences Based on Empirical Data. Springer, NY (1982)
MATH Google Scholar
Brumen, B., Jurič, M.B., Welzer, T., Rozman, I., Jaakkola, H., Papadopoulos, A.: Assessment of classification models with small amounts of data. Informatica (Lithuan.) 18(3), 343–362 (2007)
MATH Google Scholar
Dučinskas, K., Stabingiene, L.: Expected Bayes Error Rate in Supervised Classification of Spatial Gaussian Data. Informatica (Lithuan.) 22(3), 371–381 (2011)
MATH Google Scholar
Frey, L.J., Fisher, D.H.: Modeling decision tree performance with the power law. In: Seventh International Workshop on Artificial Intelligence and Statistics. Morgan Kaufmann, Ft. Lauderdale (1999)
Google Scholar
Last, M.: Predicting and Optimizing Classifier Utility with the Power Law. In: 7th IEEE International Conference on Data Mining, ICDM Workshops 2007. IEEE, Omaha (2007), doi:10.1109/icdmw.2007.31
Google Scholar
Provost, F., Jensen, D., Oates, T.: Efficient progressive sampling. In: Fifth International Conference on Knowledge Discovery and Data Mining. ACM, San Diego (1999)
Google Scholar
Singh, S.: Modeling Performance of Different Classification Methods: Deviation from the Power Law. Project Report. Vanderbilt University, Nashville, Tennessee, USA, Department of Computer Science (2005)
Google Scholar
Dzemyda, G., Sakalauskas, L.: Optimization and Knowledge-Based Technologies. Informatica (Lithuan.) 20(2), 165–172 (2009)
MATH Google Scholar
John, G.H., Langley, P.: Estimating Continuous Distributions in Bayesian Classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, August 18-20. Morgan Kaufmann, San Francisco (1995)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) ISBN: 0120884070
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Article Google Scholar
Asuncion, A., Newman, D.: UCI Machine Learning Repository (2010), http://archive.ics.uci.edu/ml/datasets.html (Archived by WebCite® at http://www.webcitation.org/6C2hgsRrX )
TunedIT. TunedIT research repository (2012), http://tunedit.org/search?q=arff&qt=Repository (accessed: December 12, 2012) (Archived by WebCite® at http://www.webcitation.org/6CqplN6Xr )
Kjellerstrand H.: My Weka page (2012), http://www.hakank.org/weka/ (accessed: December 12, 2012) (Archived by WebCite® at http://www.webcitation.org/6Cqq5pQtZ )
Kjellerstrand, H.: My Weka page/DASL (2012), http://www.hakank.org/weka/DASL/ (accessed: December 12, 2012) (Archived by WebCite® at http://www.webcitation.org/6CqqCwPmy )
Chai, K.: Kevin Chai Datasets (2012), http://kevinchai.net/datasets (accessed: December 12, 2012) (Archived by WebCite® at http://www.webcitation.org/6CqqWlQEp )
Brumen, B., Hölbl, M., Harej Pulko, K., Welzer, T., Heričko, M., Jurič, M.B., Jaakkola, H.: Learning Process Termination Criteria. Informatica (Lithuan.) 23(4), 521–536 (2012)
Google Scholar
Cohen, P.R.: Empirical methods for artificial intelligence. MIT Press, Cambridge (1995) ISBN: 9780262032254
Google Scholar
Weiss, S.M., Kulikowski, C.A.: Computer systems that learn: classification and prediction methods from statistics, neural nets, machine learning, and expert systems. Morgan Kaufmann, San Mateo (1991) ISBN: 978-1558600652
Google Scholar
McLachlan, G.J., Do, K.-A., Ambroise, C.: Analyzing microarray gene expression data. Wiley, Hoboken (2004) ISBN: 0471226165
Google Scholar
Eaton, J.W.: GNU Octave (2012), http://www.gnu.org/software/octave/ (accessed: December 12, 2012) (Archived by WebCite® at http://www.webcitation.org/6CqyEvDKU )
Marquardt, D.W.: An Algorithm for Least-Squares Estimation of Nonlinear Parameters. Journal of the Society for Industrial and Applied Mathematics 11(2), 431–441 (1963), doi:10.2307/2098941
Article MATH MathSciNet Google Scholar
Levenberg, K.: A Method for the Solution of Certain Non-Linear Problems in Least Squares. Quarterly of Applied Mathematics 2, 164–168 (1944)
MATH MathSciNet Google Scholar
Argyrous, G.: Statistics for research: With a guide to SPSS, 3rd edn. SAGE Publications Ltd., Thousand Oaks (2011) ISBN: 1849205957
Google Scholar
Medvedev, V., Dzemyda, G., Kurasova, O., Marcinkevicijus, V.: Efficient Data Projection for Visual Analysis of Large Data Sets Using Neural Networks. Informatica (Lithuan.) 22(4), 507–520 (2011)
Google Scholar
Abdi, H.: The Bonferonni and Šidák Corrections for Multiple Comparisons. In: Salkind, N.J. (ed.) Encyclopedia of Measurement and Statistics. SAGE Publications, Inc., Thousand Oaks (2007) ISBN: 9781412916110
Google Scholar
Pragarauskaite, J., Dzemyda, G.: Markov Models in the Analysis of Frequent Patterns in Financial Data. Informatica (Lithuan.) 24(1), 87–102 (2014)
MathSciNet Google Scholar
Pišek, P., Štumberger, B., Marčič, T., Virtič, P.: Design analysis and experimental validation of a double rotor synchronous PM machine used for HEV. IEEE Transactions on Magnetics 49(1), 152–155 (2013), doi:10.1109/TMAG.2012.2220338
Article Google Scholar
Virtič, P.: Determining losses and efficiency of axial flux permanent magnet synchronous motor. Przeglęad Elektrotechniczny 89(2b), 13–16 (2013)
Google Scholar
Virtič, P., Pišek, P., Hadžiselimović, M., Marčič, T., Štumberger, B.: Torque analysis of an axial flux permanent magnet synchronous machine by using analytical magnetic field calculation. IEEE Transactions on Magnetics 45(3), 1036–1039 (2009), doi:10.1109/TMAG.2009.2012566
Article Google Scholar
Virtič, P., Pišek, P., Marčič, T., Hadžiselimović, M., Štumberger, B.: Analytical analysis of magnetic field and back electromotive force calculation of an axial-flux permanent magnet synchronous generator with coreless stator. IEEE Transactions on Magnetics 44(11), 4333–4336 (2008)
Article Google Scholar
Hadžiselimović, M., Virtič, P., Štumberger, G., Marčič, T., Štumberger, B.: Determining force characteristics of an electromagnetic brake using co-energy. Journal of Magnetism and Magnetic Materials 320(20), e556-e561 (2008), doi: 10.1016/j.jmmm.2008.04.013
Google Scholar
Castillo, G., Gama, J.: Adaptive Bayesian network classifiers. Intelligent Data Analysis 13(1), 39–59 (2009), doi:10.3233/IDA-2009-0355
Google Scholar
Castillo, G., Gama, J.: An adaptive prequential learning framework for Bayesian network classifiers. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 67–78. Springer, Heidelberg (2006)
Google Scholar
Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)
Chapter Google Scholar
Cipresso, P., Carelli, L., Solca, F., Meazzi, D., Meriggi, P., Poletti, B., Lulé, D., Ludolph, A.C., Silani, V., Riva, G.: The use of P300-based BCIs in amyotrophic lateral sclerosis: from augmentative and alternative communication to cognitive assessment. Brain and Behavior 2(4), 479–498 (2012), doi:10.1002/brb3.57
Article Google Scholar
Cipresso, P., Paglia, F., Cascia, C., Riva, G., Albani, G., La Barbera, D.: Break in volition: a virtual reality study in patients with obsessive-compulsive disorder. Experimental Brain Research 229(3), 443–449 (2013), doi:10.1007/s00221-013-3471-y
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Electrical Engineering and Computer science, University of Maribor, Smetanova 17, Si-2000, Maribor, Slovenia
Boštjan Brumen, Ivan Rozman & Aleš Černezel

Authors

Boštjan Brumen
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Rozman
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Černezel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
Yannis Manolopoulos
EECS Department, Northwestern University, 2145 Sheridan Road, 60208, Evanston, IL, USA
Goce Trajcevski
Faculty of Computer Sciences and Engineering, University Ss. Cyril and Methodius Skopje, Rugjer Boshkovikj 16, 1000, Skopje, Macedonia
Margita Kon-Popovska

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brumen, B., Rozman, I., Černezel, A. (2014). Observing a Naïve Bayes Classifier’s Performance on Multiple Datasets. In: Manolopoulos, Y., Trajcevski, G., Kon-Popovska, M. (eds) Advances in Databases and Information Systems. ADBIS 2014. Lecture Notes in Computer Science, vol 8716. Springer, Cham. https://doi.org/10.1007/978-3-319-10933-6_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-10933-6_20
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10932-9
Online ISBN: 978-3-319-10933-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics