Abstract
The Naive-Bayes classifier is widely used due to its simplicity, speed and accuracy. However this approach fails when, for at least one attribute value in a test sample, there are no corresponding training samples with that attribute value. This is known as the zero frequency problem and is typically addressed using Laplace Smoothing. However, Laplace Smoothing does not take into account the statistical characteristics of the neighbourhood of the attribute values of the test sample. Gaussian Naive Bayes addresses this but the resulting Gaussian model is formed from global information. We instead propose an approach that estimates conditional probabilities using information in the neighbourhood of the test sample. In this case we no longer need to make the assumption of independence of attribute values and hence consider the joint probability distribution conditioned on the given class which means our approach (unlike the Gaussian and Laplace approaches) takes into consideration dependencies among the attribute values. We illustrate the performance of the proposed approach on a wide range of datasets taken from the University of California at Irvine (UCI) Machine Learning Repository. We also include results for the k-NN classifier and demonstrate that the proposed approach is simple, robust and outperforms standard approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997)
Jiang, L., Zhang, H., Cai, Z.: A novel bayes model: hidden naive bayes. IEEE Trans. Knowl. Data Eng. 21(10), 1361–1371 (2009)
Yu, L., Gan, S., Chen, Y., Luo, D.: A novel hybrid approach: instance weighted hidden naive bayes. Mathematics 9(22), 2982 (2021)
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: UAI (1994)
Lee, C.-H., Gutierrez, F., Dou, D.: Calculating feature weights in naive bayes with Kullback-Leibler measure. In: 2011 IEEE 11th International Conference on Data Mining, pp. 1146–1151 (2011)
Foo, N.I.L.-K., Chua, S.-L.: Attribute weighted naïve bayes classifier. Comput. Mater. Continua 71(1), 1945–1957 (2022)
Xie, Z., Hsu, W., Liu, Z., Lee, M.L.: SNNB: a selective neighborhood based naïve bayes for lazy learning. In: Chen, M.S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, pp. 104–114. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_10
Gweon, H., Schonlau, M., Steiner, S.H.: The k conditional nearest neighbor algorithm for classification and class probability estimation. PeerJ Comput. Sci. 5, e194 (2019)
Frank, E., Hall, M., Pfahringer, B.: Locally weighted naive bayes. In: Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, UAI 2003, pp. 249-256. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Chandra, B., Gupta, M., Gupta, M.P.: Robust approach for estimating probabilities in naive-bayes classifier. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 11–16. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77046-6_2
Baboolal, K.: GitHub repository (2022)
Dua, D., Graff, C.: UCI machine learning repository (2017)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)
da Silva, J.E., de Sá, J.P.M., Jossinet, J.: Classification of breast tissue by electrical impedance spectroscopy. Med. Biol. Eng. Comput. 38(1), 26–30 (2000)
Abid, F., Izeboudjen, N.: Predicting forest fire in Algeria using data mining techniques: case study of the decision tree algorithm. In: Ezziyyani, M. (ed.) AI2SD 2019. AISC, vol. 1105, pp. 363–370. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36674-2_37
Quinlan, J.R.: Simplifying decision trees. Int. J. Man-Mach. Stud. 27(3), 221–234 (1987)
Aeberhard, S., Coomans, D., De Vel, O.: Comparative analysis of statistical pattern recognition methods in high dimensional settings. Pattern Recogn. 27(8), 1065–1077 (1994)
Zwitter, M., Soklic, M.: UCI machine learning repository (1988)
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47(4), 547–553 (2009)
Aha, D.W.: Incremental constructive induction: an instance-based approach. In: ML (1991)
Nakai, K., Kanehisa, M.: A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14(4), 897–911 (1992)
Cinar, I., Koklu, M., Tasdemir, S.: Classification of raisin grains using machine vision and artificial intelligence methods. Gazi Muhendislik Bilimleri Dergisi (GMBD) 6(3), 200–209 (2020)
Evett, I.W., Spiehler, E.J.: rule induction in forensic science. In: KBS in Government, pp. 107–118. Online Publications (1987)
Silva, P.F.B., Marçal, A.R.S., da Silva, R.M.A.: Evaluation of features for leaf discrimination. In: Kamel, M., Campilho, A. (eds.) ICIAR 2013. LNCS, vol. 7950, pp. 197–204. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39094-4_23
Lohweg, V., Derksen, H.: UCI machine learning repository (2012)
Koklu, M., Ozkan, I.A.: Multiclass classification of dry beans using computer vision and machine learning techniques. Comput. Electron. Agric. 174, 105507 (2020)
Nash, W.J., Tasmania. Marine Research Laboratories: The Population Biology of Abalone (Haliotis Species) in Tasmania: Blacklip abalone (H. rubra) from the north coast and the islands of Bass Strait. Number v. 1 in Technical report (Tasmania. Sea Fisheries Division). Sea Fisheries Division, Marine Research Laboratories - Taroona, Department of Primary Industry and Fisheries, Tasmania (1994)
Banerjee, P.: Comprehensive guide on feature selection (2020)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no financial or proprietary interests in any material discussed in this article.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hosein, P., Baboolal, K. (2024). Bayes Classification Using an Approximation to the Joint Probability Distribution of the Attributes. In: Fred, A., Hadjali, A., Gusikhin, O., Sansone, C. (eds) Deep Learning Theory and Applications. DeLTA 2024. Communications in Computer and Information Science, vol 2172. Springer, Cham. https://doi.org/10.1007/978-3-031-66705-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-66705-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-66704-6
Online ISBN: 978-3-031-66705-3
eBook Packages: Computer ScienceComputer Science (R0)