Abstract
In the process of learning the naive Bayes, estimating probabilities from a given set of training samples is crucial. However, when the training samples are not adequate, probability estimation method will inevitably suffer from the zero-frequency problem. To avoid this problem, Laplace-estimate and M-estimate are the two main methods used to estimate probabilities. The estimation of two important parameters m (integer variable) and p (probability variable) in these methods has a direct impact on the underlying experimental results. In this paper, we study the existing probability estimation methods and carry out a parameter Cross-test by experimentally analyzing the performance of M-estimate with different settings for the two parameters m and p. This part of experimental result shows that the optimal parameter values vary corresponding to different data sets. Motivated by these analysis results, we propose an estimation model based on self-adaptive differential evolution. Then we propose an approach to calculate the optimal m and p value for each conditional probability to avoid the zero-frequency problem. We experimentally test our approach in terms of classification accuracy using the 36 benchmark machine learning repository data sets, and compare it to a naive Bayes with Laplace-estimate and M-estimate with a variety of setting of parameters from literature and those possible optimal settings via our experimental analysis. The experimental results show that the estimation model is efficient and our proposed approach significantly outperforms the traditional probability estimation approaches especially for large data sets (large number of instances and attributes).
Similar content being viewed by others
References
Brest, J., Greiner, S., Boskovic, B., Mernik, M., Zumer, V. (2006). Self-adapting control parameters in differential evolution: a comparative study on numerical benchmark problems. IEEE Transactions on Evolutionary Computation, 10, 646–657.
Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In Proceedings of the ninth European conference on artificial intelligence, ECAI (pp. 147–149). Stockholm: IOS Press.
Chandra, B., Gupta, M., Gupta, M.P. (2007). Robust approach for estimating probabilities in Naive-Bayes classifier. In Proceeding of second international conference on pattern recognition and machine intelligence, PReMI (pp. 11–16). Kolkata: Springer Press.
Dai, C.H., Chen, W.R., Zhu, Y.F. (2010). Seeker optimization algorithm for digital IIR filter design. IEEE Transactions on Industrial Electronics, 57, 1710–1718.
Das, S., & Sil, S. (2011). Kernel-induced fuzzy clustering of image pixels with an improved differential evolution algorithm. Information Sciences, 180, 1237–1256.
Deng, W., Zheng, Q., Wang, Y., Chen, L., Xu, X. (2008). Differential evolutionary Bayesian classifier. In Proceedings of the 2008 IEEE international conference on granular computing (pp. 191–195). Hangzhou, China.
Duan, J., Lin, Z., Yi, W., Lu, M. (2010). Scaling up the accuracy of Bayesian classifier based on frequent itemsets by M-estimate. In Proceeding of artificial intelligence and computational intelligence (pp. 357–364). Sanya, China.
Flores, M.J., Gámez, A., Martínez, M., Puerta, J.M. (2009). GAODE and HAODE: Two proposals based on AODE to deal with continuous variables. In Proceedings of the 26th international conference on machine learning, ICML (pp. 40–320). Banff, Canada.
Friedman, N., Geiger, D., Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, 131–163.
Hansen, N., Ros, R., Mauny, N., Schoenauer, M., Auger, A. (2011). Impacts of invariance in search: when CMA-ES and PSO face ill-conditioned and non-separable problems. Applied Soft Computing, 11, 5755–5769.
Huang, G.B., Ding, X.J., Zhou, H.M. (2010). Optimization method based extreme learning machine for classification. Neurocomputing, 74, 155–163.
Huang, Y.P., Chang, Y.T., Hsieh, S.L., Sandnes, F.E. (2011). An adaptive knowledge evolution strategy for finding near-optimal solutions of specific problems. Expert Systems with Applications, 14, 865–884.
Ilonen, J., Kamarainen, J., Lampinen, J. (2003). Differential evolution training algorithm for feed forward neural networks. Neural Processing Letters, 17, 93–105.
Jiang, L.X. (2011). Random one-dependence estimators. Pattern Recognition Letters, 32, 532–539.
Jiang, L., Wang, D., Cai, Z. (2007). Scaling up the accuracy of Bayesian network classifiers by M-estimate. In Proceedings of the third international conference on intelligent computing, ICIC (pp. 475–484). Qingdao: Springer Press.
Jiang, L., Zhang, H., Cai, Z. (2009). A novel Bayes model: hidden naive Bayes. IEEE Transactions on Knowledge and Data Engineering, 21, 1361–1371.
Jiang, L., Cai, Z., Wang D., Zhang, H. (2012a). Improving Tree augmented Naive Bayes for class probability estimation. Knowledge-Based Systems, 26, 239–245.
Jiang, L., Zhang, H., Cai, Z.H., Wang, D. (2012b). Weighted averaged one-dependence estimators. Journal of experimental and Theoretical Artificial Intelligence, 24, 219–230.
Kohavi, R. (1996). Scaling up the accuracy of Naive-Bayes classifiers: A decision-tree hybrid. In Proceedings of second international conference on knowledge discovery and data mining, KDD (pp. 202–207). Portland, OR: AAAI Press.
Kotsiantis, S., & Kanellopoulos, D. (2006). Discretization techniques: a recent survey. International Transactions on Computer Science and Engineering, 32, 47–58.
Langley, P., Iba, W., Thompson, K. (1992). An analysis of Bayesian classifiers. In Proceedings of the tenth national conference on artificial intelligence (pp. 223–228). San Jose, California.
Lowd, D., & Domingos, P. (2005). Naive Bayes models for probability estimation. In Proceeding of twenty-second international conference on machine learning, ICML (pp. 529–536). Bonn: ACM Press.
Merz, C., Murphy, P., Aha, D. (1997). UCI repository of machine learning databases. Dept of ICS, University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html.
Mininno, E., Neri, F., Cupertino, F., Naso, D. (2011). Compact differential evolution. IEEE Transactions on Evolutionary Computation, 15, 32–54.
Mitchell, T.M. (1997). Machine learning. McGraw-Hill Publishers.
Park, T., & Ryu, K.R. (2010). A dual-population genetic algorithm for adaptive diversity control. IEEE Transactions on Evolutionary Computation, 14, 865–884.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, CA: Morgan Kaufmann Publishers.
Qin, A.K., Huang, V.L., Suganthan, P.N. (2009). Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE Transactions on Evolutionary Computation, 13, 398–417.
Slowik, A. (2011). Application of an adaptive differential evolution algorithm with multiple trial vectors to artificial neural network training. IEEE Transactions on Industrial Electronics, 58, 3160–3167.
Storn, R., & Price, K. (1997). Differential evolution: a simple and efficient adaptive scheme for global optimization over continuous spaces. Journal of Global Optimization, 11, 341–359.
Su, J., Zhang, H., Ling, C.X., Matwin, S. (2008). Discriminative parameter learning for Bayesian networks. In Proceedings of the 25th international conference on machine learning, ICML (pp. 1016–1023). New York, NY: ACM Press.
Subudhi, B., & Jena, D. (2011a). A differential evolution based neural network approach to nonlinear system identification. Applied Soft Computing, 11, 861–871.
Subudhi, B., & Jena, D. (2011b). Nonlinear system identification using memetic differential evolution trained neural networks. Neurocomputing, 74, 1696–1709.
Webb, G.I., Boughton, J., Wang, Z. (2005). Not so naive Bayes: aggregating one-dependence estimators. Machine Learning, 58, 5–24.
Weise, T., & Tang, K. (2012). Evolving distributed algorithms with genetic programming. IEEE Transactions on Evolutionary Computation, 16, 242–265.
Witten, I.H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd edn.) Morgan Kaufmann, San Francisco. http://prodowmloads.sourceforge.net/weka/datasets-UCI.jar.
White, D.R., Arcuri, A., Clark, J.A. (2011). Evolutionary improvement of programs. IEEE Transactions on Evolutionary Computation, 15, 515–538.
Wu, J., & Cai, Z.H. (2011). Attribute weighting via differential evolution algorithm for attribute weighted naive Bayes (WNB). Journal of Computational Information Systems, 7, 1672–1679.
Zadrozny, B., & Elkan, C. (2001). Learning and making decisions when costs and probabilities are both unknown. In Proceedings of seventh ACM SIGKDD international conference on knowledge discovery and data mining, SIGKDD (pp. 204–213). San Francisco, CA: ACM Press.
Zhong, Y.F., & Zhang, L.P. (2012). Remote sensing image subpixel mapping based on adaptive differential evolution. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42, 1306–1329.
Acknowledgements
This work was supported by National Natural Science Foundation of China under Grant No.61075063, the Fund for Outstanding Doctoral Dissertation of CUG No. 2235122, Self-Determined and the Innovative Research Fund of CUG No. 1210491B16.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, J., Cai, Z. A naive Bayes probability estimation model based on self-adaptive differential evolution. J Intell Inf Syst 42, 671–694 (2014). https://doi.org/10.1007/s10844-013-0279-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-013-0279-y