FMDBN: A first-order Markov dynamic Bayesian network classifier with continuous attributes

doi:10.1016/j.knosys.2020.105638

Knowledge-Based Systems

Volume 195, 11 May 2020, 105638

https://doi.org/10.1016/j.knosys.2020.105638 Get rights and content

Abstract

With the development of data driven decision making and prediction, time-series data are ubiquitous and the demand for its classification is vast. Although a large body of research has been reported in the literature, it is mainly oriented to situations in which class and attributes are changing simultaneously. In practice however, those class and attributes changes are not always synchronous. This means that further studies for asynchronous classifier problems are necessary. In this paper, a first-order Markov dynamic Bayesian network classifier is proposed to address the asynchronous issue, by combing time-series data preprocessing, time-delayed and dislocated transformation of variables, initial and evolutionary learning. The attribute density in this classifier is estimated based on Gaussian function, and the classification accuracy criterion for time-series progressiveness is also considered. This classifier has a relatively simple structure, which can avoid the problem of overfitting. In addition, data can effectively be classified by utilizing three kinds of classification information, namely time-delayed, non-time-delayed and mixed information in multivariate time-series datasets. The proposed method is also able to accumulate classification information via iterative evolution and thus improve the generalization of classifiers. Experiments were carried out by using standard time-series datasets from UCI, financial and macroeconomic domains. The experimental results show that the proposed first-order Markov dynamic Bayesian network classifier is more accurate in dealing with these dynamic classification problems.

Introduction

A Bayesian network [1] is a graphical model that describes dependencies among random variables (variables for short in this paper) that have the characteristics of versatility, effectiveness, and openness. This makes it a powerful tool for dealing with uncertainties. Classical Bayesian networks are mainly used to present causal knowledge and uncertain reasoning. When used for classification, it is generally called a Bayesian network classifier [2]. There have been significant studies on Bayesian network classifiers, which are mainly divided into two classes, some with discrete attributes and others with continuous attributes. Regarding the former, for example, Chow and Liu (1968) [2] proposed the Dependency Tree classifier. Friedman et al. (1997) [3] put forward the TAN (Tree augmented naïve Bayes) classifier, which was enhanced by Jing et al. (2008) [4] by using an edge selecting technique. Wang et al. (2013) [5] presented a restricted Bayesian classification network. Martínez et al. (2016) [6] studied the scalable learning of Bayesian network classifiers. Arias et al. (2017) [7] addressed the distributed Bayesian network classifier. Sardinha et al. (2018) [8] discussed how to modify the structure of Bayesian network classifiers with missing data. Two methods can be used when there exist continuous attributes: one is to convert them into discrete ones, the other is to estimate attribute density.

In terms of attribute density estimation, John and Langley (1995) [9] established two naïve Bayes classifiers by using the classical Gaussian function and Gaussian kernel function to estimate the edge density attribute. This work is widely perceived as the foundation for the study of continuous attributes based on density estimation. Gaussian functions and Gaussian kernel functions are widely used for attribute density estimation with different characteristics. The former one emphasizes on the overall fitting ability and has good generalizability, while the latter one emphasizes on the local fitting ability and can be adopted to estimate the complex density. Pérez et al. (2006, 2009) [10], [11] developed the two classifiers proposed by John and Langley (1995) [9] by extending dependencies via adding edges between attributes. He et al. (2014) [12] presented a naïve Bayesian classifier based on Gaussian function, and both Luis et al. (2014) [13] and Wang et al. (2016) [14] put forward a complete Bayesian classifier and a Bayesian network classifier based on Gaussian kernel function to estimate the attribute density and applied them to spectral analysis, fault detection and root identification. Chen (2018) [15] proposed the kernel density estimation method to estimate the probability density function instead of learning the parameter as in the traditional Bayesian network classifiers and applied it to fault detection and root identification. Although the above-mentioned Bayesian network classifiers are not suitable for classifications of time-series data directly, they lay the foundation for the research on dynamic Bayesian network classifiers.

A dynamic Bayesian network [16], [17] is an extension to the traditional Bayesian network and is applicable to the solution of time-related uncertainty problems. Research into dynamic Bayesian networks began when, in 1998, Friedman et al. [16] proposed a learning method based on search and scoring under the assumptions of stationarity and Markov. Later, Murphy (2002) discussed theoretical methods for the systematic application of dynamic Bayesian networks. Early researchers mainly focused on theoretical studies of the Hidden Markov model, the Kalman filtering model, and variants of the two models, as well as their applications in speech recognition, video analysis, and information filtering. In recent years, more attentions have been paid to the application of dynamic Bayesian networks for dynamic assessment, recognition, diagnosis, prediction and early warning. For example, Ma et al. (2019) [18] evaluated vehicle driving risk based on on-road experimental driving data by using the dynamic Bayesian network approach. This work contributes significantly to the development of safety research for advanced driving assistance systems. Yang et al. (2010) [19] built a driver fatigue recognition model and proved its effectiveness experimentally. Cai et al. (2017) [20] applied the dynamic approach to fault diagnosis, whilst Zhang et al. (2018) [21] applied it to enhance fault detection and maintenance for intelligent connected vehicles. Dabrowski et al. (2016) [22] built an Early Warning System for Systemic Banking Crises by using the dynamic Bayesian network, in which experimental results indicated that it can provide more precise early warnings compared with signal extraction and the logit methods. These dynamic Bayesian networks relies mainly on expert knowledge. Because the directed edges in the structures being more prominent in expressing causality, rather than emphasizing on the channels or paths of information transmission, they are more applicable to dynamic analysis and reasoning calculations, they are not ideal for use in direct classification calculation.

Among the researchers on dynamic Bayesian network classifiers, Kafai and Bhanu (2012) [23] built a dynamic Bayesian network classifier based on expertise knowledge and applied it to the classification of vehicles in video scenes. Experimental results showed that the proposed classifier performs better than all the K-Nearest Neighbor classifier (kNN), Linear Discriminant Analysis (LDA) method, and Support Vector Machine (SVM) approach in terms of reliability. Premebida et al. (2017) [24] adopted a dynamic Bayesian mixture model, that is an improved variation of the dynamic Bayesian network used in semantic place classification in mobile robotics, the results indicated that the model is effective and competitive under different scenarios and conditions.

Rishu et al. (2019) [25] performed a smartphone-based context-aware driver behavior classification using dynamic Bayesian network. Li et al. (2019) [26] proposed a solution to HVAC system fault detection and diagnosis (FDD) based on SVM. All the dynamic Bayesian network classifiers mentioned above lack an evolution mechanism; thus, the extraction of classification information is not sufficient, affecting the classification effectiveness. In recent years, Recurrent Neural Networks (RNN) [27], [28] were widely used in classification of multivariate time series. RNN takes time-series data as input and recurs in the direction of time-series evolution and enables all nodes (cyclic units) to form chain connections. After that, Echo State Network (ESN) [29], Long Short-Term Memory networks (LSTM) [30], [31], and Gated Recurrent Unit networks (GRU) were put forward to deal with multiple univariate time series and achieved excellent results. RNNs can fully fit the time-series changes in the process of evolution by adopting the time point evolution. However, they are susceptible to noise and singular values, which will reduce the generalization performance of the classifier.

The main contributions of this paper are as follows:

(1) We propose a first-order Markov dynamic Bayesian network classifier with continuous attributes (FMDBN). Both initial learning and evolutionary learning methods and algorithms are presented based on the time-delayed transformation of variables (including attributes and classes) and the dislocated transformation of variables (between attributes and classes). Among them the time-delayed transformation can realize the unification of delayed and non-delayed information, while the dislocated transformation can realize asynchronous classification and prediction.

(2) We make class nodes to be parents of all attribute nodes, to make full use of the most important transitive dependency information provided by the attributes. By establishing the tree (or forest) structure with time-delayed nodes (not including class nodes) between attributes, both direct and indirect induced dependency information is extracted via a local optimization method. In this way, owing to the simple tree (or forest) structure of attributes, we adopt the Gaussian function to estimate the density of attributes, thus the problem of overfitting data is effectively avoided.

(3) To improve performance and generalizability, we combine structural adjustment, model averaging and evolutionary classification calculation. Experiments and analysis are performed to examine the classification accuracy by using UCI, financial and macroeconomic time-series datasets in the following three aspects: the comparison between different classifiers, the influence of time-delayed information, and the influence of attribute dependent information. The FMDBN classifier indicates that it can effectively utilize the time-delayed, non-time-delayed and mixed information to improve the classification accuracy.

This paper is organized as follows: Section 1 reviews and analyzes the development of both Bayesian network classifiers and dynamic Bayesian network classifiers; Section 2 studies the structure of FMDBN; Section 2.5 develops the initial and evolutionary learning of FMDBN; Section 3 carries out experiments and analysis for classification accuracy with UCI, finance and macroeconomic datasets; Section 4 concludes this work with further directions.

Section snippets

FMDBN

FMDBN is a dynamic Bayesian network classifier, which will be implemented into two stages: establishing FMDBN (FMDBN learning) and using FMDBN for classification calculation (FMDBN classification). The basis of FMDBN learning and classification is the transformation of time-series data.

Experiments and analysis

We select 45 standard time-series datasets, 21 from UCI machine learning, 18 financial and 6 macroeconomic from the Wind database, as inputs to carry out experimental investigations mainly from three aspects: comparison classification accuracy, the influence of time-delayed information and attribute-dependent information on classification accuracy of FMDBN. In this process, missing data is filled in by the sliding average method, time series of classes are discretized in chronological order,

Conclusions and future work

In this paper, we develop a dynamic Bayesian network classifier named FMDBN for classification prediction of multivariate time-series datasets. It combines the time-series preprocessing, the time-delayed transformation and dislocated transformation of multivariate, tree (or forest) structures among attributes, and the classification accuracy metric. In addition, the initial learning, evolutionary learning and evolutionary classification algorithms are developed.

In FMDBN, class nodes are the

CRediT authorship contribution statement

Shuangcheng Wang: Conceptualization, Methodology, Funding acquisition. Siwen Zhang: Data curation, Formal analysis, Writing - original draft. Tao Wu: Investigation, Resources, Funding acquisition. Yongrui Duan: Validation, Writing - review & editing, Supervision, Funding acquisition. Liang Zhou: Data curation, Writing - review & editing. Hao Lei: Software, Visualization.

Acknowledgments

This work is supported by the National Social Science Fund of China [Grant number 18BTJ020]; the National Natural Science Foundation of China [Grant numbers 71771179, 71532015]; the Foundation of Shanghai Municipal Health Commission, China [Grant number 2018HYL0211]; and the Foundation of Shanghai Municipal Commission of Economy and Informatization, China [Grant number 2018-RGZN-02042].

References (44)

AriasJacinto et al.
Learning distributed discrete Bayesian network classifiers under MapReduce with Apache spark
Knowl.-Based Syst.
(2017)
SardinhaRoosevelt et al.
Revising the structure of Bayesian network classifiers in the presence of missing data
Inform. Sci.
(2018)
PérezAritz et al.
Supervised classification with conditional Gaussian networks: Increasing the structure complexity from naive Bayes
Internat. J. Approx. Reason.
(2006)
PérezAritz et al.
Bayesian classifiers based on kernel density estimation: Flexible classifiers
Internat. J. Approx. Reason.
(2009)
HeYu-Lin
Bayesian classifiers based on probability density estimation and their applications to simultaneous fault diagnosis
Inform. Sci.
(2014)
GutiérrezLuis et al.
Bayesian nonparametric classification for spectroscopy data
Comput. Statist. Data Anal.
(2014)
WangShuang-cheng et al.
Bayesian network classifiers based on Gaussian kernel density
Expert Syst. Appl.
(2016)
YangGuosheng et al.
A driver fatigue recognition model based on information fusion and dynamic Bayesian network
Inform. Sci.
(2010)
DabrowskiJoel Janek et al.
Systemic banking crisis early warning systems using dynamic Bayesian networks
Expert Syst. Appl.
(2016)
WijnandsJasper S.
Identifying behavioural change among drivers using long Short-Term Memory recurrent neural networks
Transp. Res. F: Traffic Psychol. Behav.
(2018)

AlKhateebJawad H.

Performance of hidden Markov model and dynamic Bayesian network classifiers on handwritten arabic word recognition

Knowl.-Based Syst.

(2011)

LiuYing et al.

Online semi-supervised support vector machine

Inform. Sci.

(2018)

GarcíaSalvador

Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power

Inform. Sci.

(2010)

Loyola-GonzálezOctavio

PBC4cip: A new contrast pattern-based classifier for class imbalance problems

Knowl.-Based Syst.

(2017)

PearlJudea

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

(1988)

ChowC. et al.

Approximating discrete probability distributions with dependence trees

IEEE Trans. Inf. Theory

(1968)

FriedmanNir et al.

Bayesian network classifiers

Mach. Learn.

(1997)

JingYushi et al.

Boosted Bayesian network classifiers

Mach. Learn.

(2008)

WangShuangCheng et al.

Restricted Bayesian classification networks

Sci. China Inf. Sci.

(2013)

MartínezAna M.

Scalable learning of Bayesian network classifiers

J. Mach. Learn. Res.

(2016)

JohnGeorge H. et al.

Estimating continuous distributions in Bayesian classifiers

ChenXiaolu et al.

Probability density estimation and Bayesian causal analysis based fault detection and root identification

Ind. Eng. Chem. Res.

(2018)

Cited by (18)

Residual shrinkage transformer relation network for intelligent fault detection of industrial robot with zero-fault samples
2023, Knowledge-Based Systems
Fault detection might effectively enhance the operational reliability and safety of industrial robot (IR). Data-driven intelligent detection methods are dependent on a certain number of fault samples. However, the fault samples of the IR are difficult to be obtained and even unavailable. To overcome the mentioned shortcomings, a newly residual shrinkage transformer relation network (RSTRN) is proposed in the paper for fault detection of the IR. In this method, a residual shrinkage network is applied to eliminate interference features hidden in the input signals and extract representative features. And, the feature sample pair is created to describe relationship between the health state and other states. Then, the transformer relation network is constructed to evaluate the similarity relations between the sample pair to determine their types. In addition, an auxiliary sample library is built to help the RSTRN in extracting more firm health features. Finally, the effectiveness of the RSTRN method is verified by using self-built IR experiments. The experimental results show that detection accuracy and recall of the RSTRN method is at least 25% higher than that of existing methods, and its noise immunity is also improved.
Deep attention based optimized Bi-LSTM for improving geospatial data ontology
2023, Data and Knowledge Engineering
Citation Excerpt :
Then, large number of applications like map query and navigation are used. Further, Machine learning (ML) models like Decision Tree (DT) [11], Maximum likelihood estimate (MLE) [12] and Extreme Learning Machine (ELM) [13] are used for geospatial semantic problems. Because, the performance is based on the handcrafted features, these classifiers are not widely used [14].
Recently, the geospatial semantic information of remote sensing (RS) has attracted attention due to its various applications. This paper introduces a model for ontology based geospatial data integration using novel deep learning techniques. Here, we use a semantic web technology to establish the spatial ontology of risk knowledge with deep learning (DL), namely deep attention based bidirectional search and rescue LSTM for analysis. This approach takes into consideration of the study which presents the technique driven by the spatial ontology which minimizes the cost of modelling. The classification results from DL model enhances the performance of the ontology module. In this paper, ontological reasoning and DL model are jointly used for increase the module efficiency. The implementation of the proposed scheme is implemented on MATLAB 2020a. The performance of the implemented scheme is compared against the existing models like U-Net, Semantic referee and collaboratively boosting framework (CBF). The Overall accuracy (OA) of the system is found to be 0.923 on UCM dataset. Thus, the developed spatial ontologies provide the semantic foundation to achieve a semantic knowledge of geospatial data understandings.
Intelligent cognition of traffic loads on road bridges: From measurement to simulation – A review
2022, Measurement: Journal of the International Measurement Confederation
Citation Excerpt :
VTI exploits a similar mechanism as LPR, that is, computer-vision-based target detection and identification are employed [161]. Traditional efforts on target detection focus on extracting robust features and learning discriminative classifiers, including [162–164] the histograms of oriented gradients [165–166], the scale-invariant feature transform (SIFT) [167–168], support vector machine (SVM) [169–170], and Bayesian network classifiers [171–173]. This process mainly relies on local image feature matching for which SIFT is commonly used because of its strong robustness against scale and rotation changes.
Traffic load is a crucial but complicated factor in determining the in-service performance and deterioration behavior of bridges. A better understanding of traffic loads in different traffic densities has become increasingly important in structure health monitoring. As a result, for the traffic load measurement, the relevant technologies had great progress in the past decades. Therefore, we focus on introducing the state-of-the-art approaches most relevant to the traffic load cognition on road bridges, including in-site measurement and data-driven simulation. General principles of the traffic load cognition are firstly presented by reviewing different statistical analysis techniques for determining the spatial-temporal factors of vehicles. Then, this paper reviews various measurement methods carried out for the essential data of traffic loads. The methods are roughly grouped into mechanical, optical and microwave sensor-based methods. Within each category, technical descriptions of the sensor types, properties and applications are discussed in terms of theoretical formulas and feasible scenarios. This paper also implements qualitative and comprehensive comparisons between multiple measurement sensors to show the efficiency of each method or technique. Base on in-site measurement, several kinds of simulation models can be established for traffic loads on road bridges, including the modelling of single vehicles and the overall traffic flow. Considering the significant contribution of statistics-based deterministic, direct probabilistic methods, and artificial intelligence to traffic load cognition, we carried out the investigation on them in vehicle modelling. For on-bridge traffic flow simulation, three representative microscopic models are reviewed, involving the car-following, hydrodynamic, and cellular automatic models. Overall, this study highlights the application of intelligent cognition methods in identifying and simulating traffic loads on road bridges, potentially providing support for digitalised design, operation, and maintenance.
Combining deep learning and ontology reasoning for remote sensing image semantic segmentation
2022, Knowledge-Based Systems
Citation Excerpt :
Compared with natural images, RS images often present more complex image structures [5], which lead to additional challenges in RS image semantic segmentation [6]. Based on hand-crafted features, shallow classifiers such as support vector machine (SVM) [7,8], maximum likelihood estimate (MLE) [9], and decision tree (DT) [10] have been widely applied to RS image semantic segmentation [11]. However, the performance of these handcrafted feature-based semantic segmentation methods is still very limited.
Because of its wide potential applications, remote sensing (RS) image semantic segmentation has attracted increasing research interest in recent years. Until now, deep semantic segmentation network (DSSN) has achieved a certain degree of success on semantic segmentation of RS imagery and can obviously outperform the traditional methods based on hand-crafted features. As a classic data-driven technique, DSSN can be trained by an end-to-end mechanism and is competent for employing low-level and mid-level cues (i.e., the discriminative image structure) to understand RS images. However, its interpretability and reliability are poor due to the nature weakness of the data-driven deep learning methods. By contrast, human beings have an excellent inference capacity and can reliably interpret RS imagery with the basic RS domain knowledge. Ontological reasoning is an ideal way to imitate and employ the domain knowledge of human beings. However, it is still rarely explored and adopted in the RS domain. As a solution of the aforementioned critical limitation of DSSN, this study proposes a collaboratively boosting framework (CBF) to combine the data-driven deep learning module and knowledge-guided ontology reasoning module in an iterative manner. The deep learning module adopts the DSSN architecture and takes the integration of the original image and inferred channels as the input of the DSSN. In addition, the ontology reasoning module is composed of intra- and extra-taxonomy reasoning. More specifically, the intra-taxonomy reasoning directly corrects misclassifications of the deep learning module based on the domain knowledge, which is the key to improve the classification performance. The extra-taxonomy reasoning aims to generate the inferred channels beyond the current taxonomy to improve the discriminative performance of DSSN in the original RS image space. On the one hand, benefiting from the referred channels from the ontology reasoning module, the deep learning module using the integration of the original image and referred channels can achieve better classification performance than only using the original image. On the other hand, better classification results from the deep learning module further improve the performance of the ontology reasoning module. As a whole, the deep learning and ontology reasoning modules are mutually boosted in the iterations. Extensive experiments on two publicly open RS datasets such as UCM and ISPRS Potsdam show that our proposed CBF can outperform the competitive baselines with a large margin.
R.Graph: A new risk-based causal reasoning and its application to COVID-19 risk analysis
2022, Process Safety and Environmental Protection
Various unexpected, low-probability events can have short or long-term effects on organizations and the global economy. Hence there is a need for appropriate risk management practices within organizations to increase their readiness and resiliency, especially if an event may lead to a series of irreversible consequences. One of the main aspects of risk management is to analyze the levels of change and risk in critical variables which the organization's survival depends on. In these cases, an awareness of risks provides a practical plan for organizational managers to reduce/avoid them. Various risk analysis methods aim at analyzing the interactions of multiple risk factors within a specific problem. This paper develops a new method of variability and risk analysis, termed R.Graph, to examine the effects of a chain of possible risk factors on multiple variables. Additionally, different configurations of risk analysis are modeled, including acceptable risk, analysis of maximum and minimum risks, factor importance, and sensitivity analysis. This new method's effectiveness is evaluated via a practical analysis of the economic consequences of new Coronavirus in the electricity industry.
Predictive maintenance scheduling for multiple power equipment based on data-driven fault prediction
2022, Computers and Industrial Engineering
Citation Excerpt :
Predictive maintenance is to develop maintenance strategies based on the actual operating condition of target equipment (Poór et al., 2019). The existing research on PdM strategy is mainly focused on how to technically predict the reliability (Wang et al., 2020), failure rate (Baptista et al., 2017; Li et al., 2018) and remaining life (Ballal et al., 2017; Chen et al., 2017) ;(Lee and Pan, 2017) of equipment, and the application objects are mainly for single equipment/system. However, in large-scale power plants, there are generally multiple power systems composed of transformers, circuit breakers and other equipment, which are responsible for power supply in different areas, which often results in that multiple power equipment are operating in abnormal conditions during a uniform period.
In view of the maintenance of multiple power equipment operating in abnormal conditions in large-scale power plants, based on the prediction of fault state, a predictive maintenance scheduling method is proposed to schedule the maintenance activities. Firstly, based on the actual operating data, combined with the influencing factors of fault state deterioration by Pareto analysis, a time-variant function is improved to predict the deterioration state of potential fault in future maintenance interval. Then, maintenance priority is divided based on the fault state, considering the constraints of maintenance resources and equipment downtime, with the objective of minimizing the total maintenance cost, a scheduling model is built for the maintenance of multiple equipment. Finally, aiming at the continuity of maintenance time, a two-stage algorithm is proposed, which divides the maintenance time-window to transform the complex continuous time optimization problem into a combinatorial optimization problem of time periods, and then develops the optimal maintenance scheme. Taking the maintenance of multiple power transformers as an example, combined with the data resources provided by Yunnan power grid of China, the effectiveness of the improved prediction function of fault state is proved. In addition, by comparing with the traditional maintenance strategy based on the principle of first-fault-first-repair, the superiority of the proposed maintenance scheduling method is verified in reducing cost and improving system stability.

View all citing articles on Scopus

^☆: No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.knosys.2020.105638.

¹: Contributed equally to this work.

View full text

FMDBN: A first-order Markov dynamic Bayesian network classifier with continuous attributes☆

Abstract

Introduction

Section snippets

FMDBN

Experiments and analysis

Conclusions and future work

CRediT authorship contribution statement

Acknowledgments

Knowl.-Based Syst.

Inform. Sci.

Internat. J. Approx. Reason.

Internat. J. Approx. Reason.

Inform. Sci.

Comput. Statist. Data Anal.

Expert Syst. Appl.

Inform. Sci.

Expert Syst. Appl.

Transp. Res. F: Traffic Psychol. Behav.

Knowl.-Based Syst.

Inform. Sci.

Inform. Sci.

Knowl.-Based Syst.

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

Approximating discrete probability distributions with dependence trees

IEEE Trans. Inf. Theory

Bayesian network classifiers

Mach. Learn.

Boosted Bayesian network classifiers

Mach. Learn.

Restricted Bayesian classification networks

Sci. China Inf. Sci.

Scalable learning of Bayesian network classifiers

J. Mach. Learn. Res.

Estimating continuous distributions in Bayesian classifiers

Probability density estimation and Bayesian causal analysis based fault detection and root identification

Ind. Eng. Chem. Res.