Bayes Classification Using an Approximation to the Joint Probability Distribution of the Attributes

Hosein, Patrick; Baboolal, Kevin

doi:10.1007/978-3-031-66705-3_4

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2172))

Included in the following conference series:

International Conference on Deep Learning Theory and Applications

320 Accesses
2 Citations

Abstract

The Naive-Bayes classifier is widely used due to its simplicity, speed and accuracy. However this approach fails when, for at least one attribute value in a test sample, there are no corresponding training samples with that attribute value. This is known as the zero frequency problem and is typically addressed using Laplace Smoothing. However, Laplace Smoothing does not take into account the statistical characteristics of the neighbourhood of the attribute values of the test sample. Gaussian Naive Bayes addresses this but the resulting Gaussian model is formed from global information. We instead propose an approach that estimates conditional probabilities using information in the neighbourhood of the test sample. In this case we no longer need to make the assumption of independence of attribute values and hence consider the joint probability distribution conditioned on the given class which means our approach (unlike the Gaussian and Laplace approaches) takes into consideration dependencies among the attribute values. We illustrate the performance of the proposed approach on a wide range of datasets taken from the University of California at Irvine (UCI) Machine Learning Repository. We also include results for the k-NN classifier and demonstrate that the proposed approach is simple, robust and outperforms standard approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

When is the Naive Bayes approximation not so naive?

Article 21 July 2017

Constrained Naïve Bayes with application to unbalanced data classification

Article Open access 20 October 2021

Bayes estimation of ratio of scale-like parameters for inverse Gaussian distributions and applications to classification

Article 19 September 2024

References

Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Article MathSciNet Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29, 131–163 (1997)
Article Google Scholar
Jiang, L., Zhang, H., Cai, Z.: A novel bayes model: hidden naive bayes. IEEE Trans. Knowl. Data Eng. 21(10), 1361–1371 (2009)
Article Google Scholar
Yu, L., Gan, S., Chen, Y., Luo, D.: A novel hybrid approach: instance weighted hidden naive bayes. Mathematics 9(22), 2982 (2021)
Article Google Scholar
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: UAI (1994)
Google Scholar
Lee, C.-H., Gutierrez, F., Dou, D.: Calculating feature weights in naive bayes with Kullback-Leibler measure. In: 2011 IEEE 11th International Conference on Data Mining, pp. 1146–1151 (2011)
Google Scholar
Foo, N.I.L.-K., Chua, S.-L.: Attribute weighted naïve bayes classifier. Comput. Mater. Continua 71(1), 1945–1957 (2022)
Article Google Scholar
Xie, Z., Hsu, W., Liu, Z., Lee, M.L.: SNNB: a selective neighborhood based naïve bayes for lazy learning. In: Chen, M.S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS, vol. 2336, pp. 104–114. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_10
Chapter Google Scholar
Gweon, H., Schonlau, M., Steiner, S.H.: The k conditional nearest neighbor algorithm for classification and class probability estimation. PeerJ Comput. Sci. 5, e194 (2019)
Article Google Scholar
Frank, E., Hall, M., Pfahringer, B.: Locally weighted naive bayes. In: Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence, UAI 2003, pp. 249-256. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Google Scholar
Chandra, B., Gupta, M., Gupta, M.P.: Robust approach for estimating probabilities in naive-bayes classifier. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 11–16. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77046-6_2
Chapter Google Scholar
Baboolal, K.: GitHub repository (2022)
Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017)
Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936)
Article Google Scholar
da Silva, J.E., de Sá, J.P.M., Jossinet, J.: Classification of breast tissue by electrical impedance spectroscopy. Med. Biol. Eng. Comput. 38(1), 26–30 (2000)
Article Google Scholar
Abid, F., Izeboudjen, N.: Predicting forest fire in Algeria using data mining techniques: case study of the decision tree algorithm. In: Ezziyyani, M. (ed.) AI2SD 2019. AISC, vol. 1105, pp. 363–370. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36674-2_37
Chapter Google Scholar
Quinlan, J.R.: Simplifying decision trees. Int. J. Man-Mach. Stud. 27(3), 221–234 (1987)
Article Google Scholar
Aeberhard, S., Coomans, D., De Vel, O.: Comparative analysis of statistical pattern recognition methods in high dimensional settings. Pattern Recogn. 27(8), 1065–1077 (1994)
Article Google Scholar
Zwitter, M., Soklic, M.: UCI machine learning repository (1988)
Google Scholar
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems 47(4), 547–553 (2009)
Article Google Scholar
Aha, D.W.: Incremental constructive induction: an instance-based approach. In: ML (1991)
Google Scholar
Nakai, K., Kanehisa, M.: A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 14(4), 897–911 (1992)
Article Google Scholar
Cinar, I., Koklu, M., Tasdemir, S.: Classification of raisin grains using machine vision and artificial intelligence methods. Gazi Muhendislik Bilimleri Dergisi (GMBD) 6(3), 200–209 (2020)
Google Scholar
Evett, I.W., Spiehler, E.J.: rule induction in forensic science. In: KBS in Government, pp. 107–118. Online Publications (1987)
Google Scholar
Silva, P.F.B., Marçal, A.R.S., da Silva, R.M.A.: Evaluation of features for leaf discrimination. In: Kamel, M., Campilho, A. (eds.) ICIAR 2013. LNCS, vol. 7950, pp. 197–204. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39094-4_23
Chapter Google Scholar
Lohweg, V., Derksen, H.: UCI machine learning repository (2012)
Google Scholar
Koklu, M., Ozkan, I.A.: Multiclass classification of dry beans using computer vision and machine learning techniques. Comput. Electron. Agric. 174, 105507 (2020)
Article Google Scholar
Nash, W.J., Tasmania. Marine Research Laboratories: The Population Biology of Abalone (Haliotis Species) in Tasmania: Blacklip abalone (H. rubra) from the north coast and the islands of Bass Strait. Number v. 1 in Technical report (Tasmania. Sea Fisheries Division). Sea Fisheries Division, Marine Research Laboratories - Taroona, Department of Primary Industry and Fisheries, Tasmania (1994)
Google Scholar
Banerjee, P.: Comprehensive guide on feature selection (2020)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of the West Indies, St. Augustine, Trinidad and Tobago
Patrick Hosein & Kevin Baboolal

Authors

Patrick Hosein
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Baboolal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin Baboolal .

Editor information

Editors and Affiliations

Instituto de Telecomunicações and University of Lisbon, Lisbon, Portugal
Ana Fred
LIAS, LIAS/ENSMA, Poitiers, France
Allel Hadjali
Ford Motor Company, Dearborn, MI, USA
Oleg Gusikhin
University of Naples Federico II, Naples, Italy
Carlo Sansone

Ethics declarations

Disclosure of Interests

The authors have no financial or proprietary interests in any material discussed in this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hosein, P., Baboolal, K. (2024). Bayes Classification Using an Approximation to the Joint Probability Distribution of the Attributes. In: Fred, A., Hadjali, A., Gusikhin, O., Sansone, C. (eds) Deep Learning Theory and Applications. DeLTA 2024. Communications in Computer and Information Science, vol 2172. Springer, Cham. https://doi.org/10.1007/978-3-031-66705-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-66705-3_4
Published: 21 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-66704-6
Online ISBN: 978-3-031-66705-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bayes Classification Using an Approximation to the Joint Probability Distribution of the Attributes