FMDBN: A first-order Markov dynamic Bayesian network classifier with continuous attributes

https://doi.org/10.1016/j.knosys.2020.105638Get rights and content

Abstract

With the development of data driven decision making and prediction, time-series data are ubiquitous and the demand for its classification is vast. Although a large body of research has been reported in the literature, it is mainly oriented to situations in which class and attributes are changing simultaneously. In practice however, those class and attributes changes are not always synchronous. This means that further studies for asynchronous classifier problems are necessary. In this paper, a first-order Markov dynamic Bayesian network classifier is proposed to address the asynchronous issue, by combing time-series data preprocessing, time-delayed and dislocated transformation of variables, initial and evolutionary learning. The attribute density in this classifier is estimated based on Gaussian function, and the classification accuracy criterion for time-series progressiveness is also considered. This classifier has a relatively simple structure, which can avoid the problem of overfitting. In addition, data can effectively be classified by utilizing three kinds of classification information, namely time-delayed, non-time-delayed and mixed information in multivariate time-series datasets. The proposed method is also able to accumulate classification information via iterative evolution and thus improve the generalization of classifiers. Experiments were carried out by using standard time-series datasets from UCI, financial and macroeconomic domains. The experimental results show that the proposed first-order Markov dynamic Bayesian network classifier is more accurate in dealing with these dynamic classification problems.

Introduction

A Bayesian network [1] is a graphical model that describes dependencies among random variables (variables for short in this paper) that have the characteristics of versatility, effectiveness, and openness. This makes it a powerful tool for dealing with uncertainties. Classical Bayesian networks are mainly used to present causal knowledge and uncertain reasoning. When used for classification, it is generally called a Bayesian network classifier [2]. There have been significant studies on Bayesian network classifiers, which are mainly divided into two classes, some with discrete attributes and others with continuous attributes. Regarding the former, for example, Chow and Liu (1968) [2] proposed the Dependency Tree classifier. Friedman et al. (1997) [3] put forward the TAN (Tree augmented naïve Bayes) classifier, which was enhanced by Jing et al. (2008) [4] by using an edge selecting technique. Wang et al. (2013) [5] presented a restricted Bayesian classification network. Martínez et al. (2016) [6] studied the scalable learning of Bayesian network classifiers. Arias et al. (2017) [7] addressed the distributed Bayesian network classifier. Sardinha et al. (2018) [8] discussed how to modify the structure of Bayesian network classifiers with missing data. Two methods can be used when there exist continuous attributes: one is to convert them into discrete ones, the other is to estimate attribute density.

In terms of attribute density estimation, John and Langley (1995) [9] established two naïve Bayes classifiers by using the classical Gaussian function and Gaussian kernel function to estimate the edge density attribute. This work is widely perceived as the foundation for the study of continuous attributes based on density estimation. Gaussian functions and Gaussian kernel functions are widely used for attribute density estimation with different characteristics. The former one emphasizes on the overall fitting ability and has good generalizability, while the latter one emphasizes on the local fitting ability and can be adopted to estimate the complex density. Pérez et al. (2006, 2009) [10], [11] developed the two classifiers proposed by John and Langley (1995) [9] by extending dependencies via adding edges between attributes. He et al. (2014) [12] presented a naïve Bayesian classifier based on Gaussian function, and both Luis et al. (2014) [13] and Wang et al. (2016) [14] put forward a complete Bayesian classifier and a Bayesian network classifier based on Gaussian kernel function to estimate the attribute density and applied them to spectral analysis, fault detection and root identification. Chen (2018) [15] proposed the kernel density estimation method to estimate the probability density function instead of learning the parameter as in the traditional Bayesian network classifiers and applied it to fault detection and root identification. Although the above-mentioned Bayesian network classifiers are not suitable for classifications of time-series data directly, they lay the foundation for the research on dynamic Bayesian network classifiers.

A dynamic Bayesian network [16], [17] is an extension to the traditional Bayesian network and is applicable to the solution of time-related uncertainty problems. Research into dynamic Bayesian networks began when, in 1998, Friedman et al. [16] proposed a learning method based on search and scoring under the assumptions of stationarity and Markov. Later, Murphy (2002) discussed theoretical methods for the systematic application of dynamic Bayesian networks. Early researchers mainly focused on theoretical studies of the Hidden Markov model, the Kalman filtering model, and variants of the two models, as well as their applications in speech recognition, video analysis, and information filtering. In recent years, more attentions have been paid to the application of dynamic Bayesian networks for dynamic assessment, recognition, diagnosis, prediction and early warning. For example, Ma et al. (2019) [18] evaluated vehicle driving risk based on on-road experimental driving data by using the dynamic Bayesian network approach. This work contributes significantly to the development of safety research for advanced driving assistance systems. Yang et al. (2010) [19] built a driver fatigue recognition model and proved its effectiveness experimentally. Cai et al. (2017) [20] applied the dynamic approach to fault diagnosis, whilst Zhang et al. (2018) [21] applied it to enhance fault detection and maintenance for intelligent connected vehicles. Dabrowski et al. (2016) [22] built an Early Warning System for Systemic Banking Crises by using the dynamic Bayesian network, in which experimental results indicated that it can provide more precise early warnings compared with signal extraction and the logit methods. These dynamic Bayesian networks relies mainly on expert knowledge. Because the directed edges in the structures being more prominent in expressing causality, rather than emphasizing on the channels or paths of information transmission, they are more applicable to dynamic analysis and reasoning calculations, they are not ideal for use in direct classification calculation.

Among the researchers on dynamic Bayesian network classifiers, Kafai and Bhanu (2012) [23] built a dynamic Bayesian network classifier based on expertise knowledge and applied it to the classification of vehicles in video scenes. Experimental results showed that the proposed classifier performs better than all the K-Nearest Neighbor classifier (kNN), Linear Discriminant Analysis (LDA) method, and Support Vector Machine (SVM) approach in terms of reliability. Premebida et al. (2017) [24] adopted a dynamic Bayesian mixture model, that is an improved variation of the dynamic Bayesian network used in semantic place classification in mobile robotics, the results indicated that the model is effective and competitive under different scenarios and conditions.

Rishu et al. (2019) [25] performed a smartphone-based context-aware driver behavior classification using dynamic Bayesian network. Li et al. (2019) [26] proposed a solution to HVAC system fault detection and diagnosis (FDD) based on SVM. All the dynamic Bayesian network classifiers mentioned above lack an evolution mechanism; thus, the extraction of classification information is not sufficient, affecting the classification effectiveness. In recent years, Recurrent Neural Networks (RNN) [27], [28] were widely used in classification of multivariate time series. RNN takes time-series data as input and recurs in the direction of time-series evolution and enables all nodes (cyclic units) to form chain connections. After that, Echo State Network (ESN) [29], Long Short-Term Memory networks (LSTM) [30], [31], and Gated Recurrent Unit networks (GRU) were put forward to deal with multiple univariate time series and achieved excellent results. RNNs can fully fit the time-series changes in the process of evolution by adopting the time point evolution. However, they are susceptible to noise and singular values, which will reduce the generalization performance of the classifier.

The main contributions of this paper are as follows:

(1) We propose a first-order Markov dynamic Bayesian network classifier with continuous attributes (FMDBN). Both initial learning and evolutionary learning methods and algorithms are presented based on the time-delayed transformation of variables (including attributes and classes) and the dislocated transformation of variables (between attributes and classes). Among them the time-delayed transformation can realize the unification of delayed and non-delayed information, while the dislocated transformation can realize asynchronous classification and prediction.

(2) We make class nodes to be parents of all attribute nodes, to make full use of the most important transitive dependency information provided by the attributes. By establishing the tree (or forest) structure with time-delayed nodes (not including class nodes) between attributes, both direct and indirect induced dependency information is extracted via a local optimization method. In this way, owing to the simple tree (or forest) structure of attributes, we adopt the Gaussian function to estimate the density of attributes, thus the problem of overfitting data is effectively avoided.

(3) To improve performance and generalizability, we combine structural adjustment, model averaging and evolutionary classification calculation. Experiments and analysis are performed to examine the classification accuracy by using UCI, financial and macroeconomic time-series datasets in the following three aspects: the comparison between different classifiers, the influence of time-delayed information, and the influence of attribute dependent information. The FMDBN classifier indicates that it can effectively utilize the time-delayed, non-time-delayed and mixed information to improve the classification accuracy.

This paper is organized as follows: Section 1 reviews and analyzes the development of both Bayesian network classifiers and dynamic Bayesian network classifiers; Section 2 studies the structure of FMDBN; Section 2.5 develops the initial and evolutionary learning of FMDBN; Section 3 carries out experiments and analysis for classification accuracy with UCI, finance and macroeconomic datasets; Section 4 concludes this work with further directions.

Section snippets

FMDBN

FMDBN is a dynamic Bayesian network classifier, which will be implemented into two stages: establishing FMDBN (FMDBN learning) and using FMDBN for classification calculation (FMDBN classification). The basis of FMDBN learning and classification is the transformation of time-series data.

Experiments and analysis

We select 45 standard time-series datasets, 21 from UCI machine learning, 18 financial and 6 macroeconomic from the Wind database, as inputs to carry out experimental investigations mainly from three aspects: comparison classification accuracy, the influence of time-delayed information and attribute-dependent information on classification accuracy of FMDBN. In this process, missing data is filled in by the sliding average method, time series of classes are discretized in chronological order,

Conclusions and future work

In this paper, we develop a dynamic Bayesian network classifier named FMDBN for classification prediction of multivariate time-series datasets. It combines the time-series preprocessing, the time-delayed transformation and dislocated transformation of multivariate, tree (or forest) structures among attributes, and the classification accuracy metric. In addition, the initial learning, evolutionary learning and evolutionary classification algorithms are developed.

In FMDBN, class nodes are the

CRediT authorship contribution statement

Shuangcheng Wang: Conceptualization, Methodology, Funding acquisition. Siwen Zhang: Data curation, Formal analysis, Writing - original draft. Tao Wu: Investigation, Resources, Funding acquisition. Yongrui Duan: Validation, Writing - review & editing, Supervision, Funding acquisition. Liang Zhou: Data curation, Writing - review & editing. Hao Lei: Software, Visualization.

Acknowledgments

This work is supported by the National Social Science Fund of China [Grant number 18BTJ020]; the National Natural Science Foundation of China [Grant numbers 71771179, 71532015]; the Foundation of Shanghai Municipal Health Commission, China [Grant number 2018HYL0211]; and the Foundation of Shanghai Municipal Commission of Economy and Informatization, China [Grant number 2018-RGZN-02042].

References (44)

  • AlKhateebJawad H.

    Performance of hidden Markov model and dynamic Bayesian network classifiers on handwritten arabic word recognition

    Knowl.-Based Syst.

    (2011)
  • LiuYing et al.

    Online semi-supervised support vector machine

    Inform. Sci.

    (2018)
  • GarcíaSalvador

    Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power

    Inform. Sci.

    (2010)
  • Loyola-GonzálezOctavio

    PBC4cip: A new contrast pattern-based classifier for class imbalance problems

    Knowl.-Based Syst.

    (2017)
  • PearlJudea

    Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

    (1988)
  • ChowC. et al.

    Approximating discrete probability distributions with dependence trees

    IEEE Trans. Inf. Theory

    (1968)
  • FriedmanNir et al.

    Bayesian network classifiers

    Mach. Learn.

    (1997)
  • JingYushi et al.

    Boosted Bayesian network classifiers

    Mach. Learn.

    (2008)
  • WangShuangCheng et al.

    Restricted Bayesian classification networks

    Sci. China Inf. Sci.

    (2013)
  • MartínezAna M.

    Scalable learning of Bayesian network classifiers

    J. Mach. Learn. Res.

    (2016)
  • JohnGeorge H. et al.

    Estimating continuous distributions in Bayesian classifiers

  • ChenXiaolu et al.

    Probability density estimation and Bayesian causal analysis based fault detection and root identification

    Ind. Eng. Chem. Res.

    (2018)
  • Cited by (18)

    • Deep attention based optimized Bi-LSTM for improving geospatial data ontology

      2023, Data and Knowledge Engineering
      Citation Excerpt :

      Then, large number of applications like map query and navigation are used. Further, Machine learning (ML) models like Decision Tree (DT) [11], Maximum likelihood estimate (MLE) [12] and Extreme Learning Machine (ELM) [13] are used for geospatial semantic problems. Because, the performance is based on the handcrafted features, these classifiers are not widely used [14].

    • Intelligent cognition of traffic loads on road bridges: From measurement to simulation – A review

      2022, Measurement: Journal of the International Measurement Confederation
      Citation Excerpt :

      VTI exploits a similar mechanism as LPR, that is, computer-vision-based target detection and identification are employed [161]. Traditional efforts on target detection focus on extracting robust features and learning discriminative classifiers, including [162–164] the histograms of oriented gradients [165–166], the scale-invariant feature transform (SIFT) [167–168], support vector machine (SVM) [169–170], and Bayesian network classifiers [171–173]. This process mainly relies on local image feature matching for which SIFT is commonly used because of its strong robustness against scale and rotation changes.

    • Combining deep learning and ontology reasoning for remote sensing image semantic segmentation

      2022, Knowledge-Based Systems
      Citation Excerpt :

      Compared with natural images, RS images often present more complex image structures [5], which lead to additional challenges in RS image semantic segmentation [6]. Based on hand-crafted features, shallow classifiers such as support vector machine (SVM) [7,8], maximum likelihood estimate (MLE) [9], and decision tree (DT) [10] have been widely applied to RS image semantic segmentation [11]. However, the performance of these handcrafted feature-based semantic segmentation methods is still very limited.

    • Predictive maintenance scheduling for multiple power equipment based on data-driven fault prediction

      2022, Computers and Industrial Engineering
      Citation Excerpt :

      Predictive maintenance is to develop maintenance strategies based on the actual operating condition of target equipment (Poór et al., 2019). The existing research on PdM strategy is mainly focused on how to technically predict the reliability (Wang et al., 2020), failure rate (Baptista et al., 2017; Li et al., 2018) and remaining life (Ballal et al., 2017; Chen et al., 2017) ;(Lee and Pan, 2017) of equipment, and the application objects are mainly for single equipment/system. However, in large-scale power plants, there are generally multiple power systems composed of transformers, circuit breakers and other equipment, which are responsible for power supply in different areas, which often results in that multiple power equipment are operating in abnormal conditions during a uniform period.

    View all citing articles on Scopus

    No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105638.

    1

    Contributed equally to this work.

    View full text