Bayesian network classifiers based on Gaussian kernel density

doi:10.1016/j.eswa.2015.12.031

Expert Systems with Applications

Volume 51, 1 June 2016, Pages 207-217

https://doi.org/10.1016/j.eswa.2015.12.031 Get rights and content

Highlights

•
We construct ENBC by imposing dependency extension on NBC with continuous attributes.
•
We combine smoothing parameter adjustment and the structure learning.
•
We control and optimize the fitting degree between classifier and data.
•
We present that the attributes of ENBC provide three types of information for class.
•
The other two information improve the classification accuracy effectively.

Abstract

For learning a Bayesian network classifier, continuous attributes usually need to be discretized. But the discretization of continuous attributes may bring information missing, noise and less sensitivity to the changing of the attributes towards class variables. In this paper, we use the Gaussian kernel function with smoothing parameter to estimate the density of attributes. Bayesian network classifier with continuous attributes is established by the dependency extension of Naive Bayes classifiers. We also analyze the information provided to a class for each attributes as a basis for the dependency extension of Naive Bayes classifiers. Experimental studies on UCI data sets show that Bayesian network classifiers using Gaussian kernel function provide good classification accuracy comparing to other approaches when dealing with continuous attributes.

Introduction

Naive Bayes classifier (NBC) (Bouckaert, 2005, Duda, Hart, et al., 1973, Ramoni, Sebastiani, 2001) is an important probability classifier, well known for its simplicity, high efficiency, and good classification accuracy. It is capable of directly processing continuous attributes and has been widely used in medical diagnosis, text categorization, mail filtering, information retrieval, etc. However, this classifier is based on a rather strong assumption: the attributes are conditionally independent when the class is given. This leads to poor utilization of dependency information between attributes, while dependency information also being crucial for classification. To solve this problem, a series of studies on the dependency extension of NBCs have been conducted. These research studies can be traced back to the dependence tree of Chow and Liu (1968), based on which Friedman, Geiger, and Goldszmidt (1997) later constructed the well-known TAN (Tree Augmented Naive Bayesian) classifier. Grossman and Domingos (2004) conducted the learning of Bayesian network classifiers setting the conditional likelihood as the criteria. Jing, Pavlović, and Rehg (2008) set classification accuracy as the criteria, and conducted attribute selection and parameter ensemble on the TAN classifiers. Dependency extension of these classifiers can effectively improve classification accuracy, but these studies only focused on NBCs with discrete attributes. For NBCs with continuous attributes (Bouckaert, 2005), two processing methods can be applied: one is to discretize the continuous attributes, eventually turning them into classification issues with discrete attributes (Boullé, 2006, Fayyad, Irani, 1993, Friedman, Goldszmidt, et al., 1996, Yang, Webb, 2009); the other does not discretize continuous attributes, but requires the estimation of the conditional density of attributes. Both of these two methods have their own advantages and disadvantages. In detail, the first method is suitable for big data sets with fewer classes where the conditional probability of attributes can therefore be reliably estimated. The second method is more likely to be used for comparatively smaller data sets with multi-class, because the estimation of conditional density can be done without many examples. This paper explores the second method, assuming that all the attributes are continuous. Nevertheless, the research findings here can also be applied to cases of mixed attributes. The core of processing continuous attributes is the estimation of conditional density, John and Langley (1995) studied NBCs and Flexible Bayes Classifiers (FBCs) which were achieved by estimating conditional density of attributes using Gaussian function and Gaussian kernel function. Although the accuracy of two kinds of classifiers is not very good, they lay the foundation of the research of Bayesian network classifiers with continuous attributes. Based on John and Langley’s research, Pérez, Larrañaga, and Inza (2009) improved the estimation of Gaussian kernel function by introducing and optimizing a smoothing parameter. They used classical MISE (Mean Integrated Square Error) statistical standard to optimize the smoothing parameter (Kobos, 2009), and named classifiers constructed with optimized smoothing parameter as Flexible Naive Bayes Classifiers (FNBCs), the classification accuracy of which is better than that of FBC. Bounhas, Mellouli, Prade, and Serrurier (2013), He, Wang, Kwong, and Wang (2014), Dong and Zhou (2014) and Pavani, Delgado-Gomez, and Frangi (2014) respectively studied Bayes classifiers based on Gaussian function and Gaussian kernel function to estimate attribute joint density. These classifiers cannot effectively use conditional independence relationship between attributes, which makes them have low accuracy and reliability.

This paper uses Gaussian kernel function with smoothing parameter to estimate conditional density. On the basis of optimizing smoothing parameter using classification accuracy as the criteria(classification accuracy criteria can better measure the fitting degree between Bayesian network classifier with continuous attributes and data), the greedy selection of parent nodes is achieved by using the same criteria. The Extended Naive Bayes Classifiers (ENBCs) are thus constructed. Then, based on the Bayesian network theory, the information provided to class by attributes is analyzed from the viewpoint of dependency extension. In the end, the experiments and analysis are given by using UCI data sets (Murphy & Aha, 2014) with continuous attributes to verify the necessity of dependency extension and the effectiveness of our methods.

The main contributions of this paper are as follows:

(1)
We propose a new method to construct ENBC by imposing dependency extension on NBC with continuous attributes, which extracts the conditional dependency information among attributes for classification in the way of local optimum.
(2)
We combine the smoothing parameter adjustment which determine the shapes of the Gaussian density curves in the Gaussian kernel function with the structure learning which determine the decomposition and the calculation of attribute joint density to control and optimize the fitting degree between classifier and data, and then improve the generalization ability.
(3)
Based on the Bayesian network theory (Cooper, Herskovits, 1992, Heckerman, Geiger, Chickering, 1995, Langley, Iba, Thompson, 1992, Olesen, 1993, Pearl, 1988), we present that the attributes of ENBC can provide three types of information for class. They are transitive dependency information, direct induced dependency information and indirect induced dependency information. NBC just provides the first type of information for class. Through dependency extension, other two kinds of information can be effectively utilized and thus the classification accuracy can be improved.

This paper is divided into 4 sections: Section 2 presents the structure of ENBC, methods for estimating conditional density of attributes, the analysis of the constitution of the information provided to class by attributes, and the selection methods of attribute parent nodes; Section 3 presents the experiments and analysis comparing different levels of classification accuracy, and contribution levels of different attributes to classification; Section 4 lists the research finding and further work in the field.

Section snippets

Dependency extension of NBC

The dependency extension of NBC means that the attributes can also have other parent nodes besides class. Its purpose is to effectively utilize dependency information between attributes. With $X_{1}, \dots, X_{n}, C$ to represent continuous attributes and class, $x_{1}, \dots, x_{n}, c$ being their values, D being a data set with N samples, the data is generated randomly by probability distribution P, x_im and c_m to represent the value of sample no. m(1 ≤ m ≤ N) of X_i(1 ≤ i ≤ n) and C in data set D, respectively. Variables

Experiment and analysis

In the experiment, 28 UCI data sets are selected (these UCI datasets are chosen based on distributions of class and attribute, conditional dependence between attributes, and also by referring to their uses in the literatures). In the data sets as Table 2, the records with missing data are deleted, the attribute data are standardized, the location of records is randomly initialized, and some data interception is done with several comparatively big data sets (the intercepted data sets are marked

Conclusion and further work

The classification technique is an important part of Expert and Intelligent Systems. Bayesian network classifiers are the core member of the classifier family. In this paper, we develop the ENBC by combining Bayesian network theory, the Gaussian kernel function and greedy selection of attribute parent nodes in the light of classification accuracy criteria. ENBC will become one of the representative Bayesian network classifiers.

We use Gaussian kernel function with smoothing parameter to estimate

Acknowledgments

This work was supported by Shanghai Natural Science Foundation (15ZR1429700) and the Innovation Program of Shanghai Municipal Education Commission (15zz099).

References (28)

HeY.L. et al.
Bayesian classifiers based on probability density estimation and their applications to simultaneous fault diagnosis
Information Sciences
(2014)
PérezA. et al.
Bayesian classifiers based on kernel density estimation: Flexible classifiers
International Journal of Approximate Reasoning
(2009)
RamoniM. et al.
Robust Bayes classifiers
Artificial Intelligence
(2001)
BouckaertR.R.
Naive Bayes classifiers that perform well with continuous variables
Advances in artificial intelligence, AI 2004
(2005)
BoulléM.
Modl: A Bayes optimal discretization method for continuous attributes
Machine Learning
(2006)
BounhasM. et al.
Possibilistic classifiers for numerical data
Soft Computing
(2013)
ChickeringD.M.
Learning equivalence classes of Bayesian-network structures
The Journal of Machine Learning Research
(2002)
ChowC. et al.
Approximating discrete probability distributions with dependence trees
IEEE Transactions on Information Theory
(1968)
CooperG.F. et al.
A bayesian method for the induction of probabilistic networks from data
Machine Learning
(1992)
DemšarJ.
Statistical comparisons of classifiers over multiple data sets
The Journal of Machine Learning Research
(2006)

DongW. et al.

Gaussian classifier-based evolutionary strategy for multimodal optimization

IEEE Transactions on Neural Networks & Learning Systems

(2014)

DudaR.O. et al.

Pattern classification and scene analysis

(1973)

FayyadU.M. et al.

Multi-interval discretization of continuous-valued attributes for classification learning.

Proceedings of the 5th international joint conference on artificial intelligence (IJCAI)

(1993)

FriedmanN. et al.