Data-driven fault prediction and anomaly measurement for complex systems using support vector probability density estimation
Introduction
With the increasing needs for system reliability, it is hoped that not only the fault’s detection and isolation can be provided when it occurs, but also the fault can be forecasted before it occurs. It also means that the fault can be discovered, be located and be eliminated in the early period, when it has notcaused serious damage to the whole system. In this way, enough time will be obtained to prevent the emerging of fault by taking necessary measures, which can avoid unnecessary loss and is important to system. Especially for the systems requiring high reliability, such as aerospace and nuclear energy, fault prediction has been a very important problem presented in recent years Zhou and Xu (2009), Dai and Gao (2013). In fault prediction field, the system commonly has the fault state and the failure state. The fault state means that an anomaly of the system index occurs, but the system can still in a normal working process. Correspondingly, the failure state means the system index exceeds some threshold, in this case, the system will cannot work.
Different systems have different demand levels for reliability, so it would be best if the anomaly index measuring the system’s anomaly degree can be calculated from algorithms. As for whether the fault should be predicted, the operator can decide according to the practical security requirement. In the domain of data-driven fault prediction, one of the methods can be used is probability density estimation to samples. On this basis, we can seek for an evaluation index characterizing the system’s anomaly degree can be found and utilized.
Probability density estimation from the observed dataset is a basic problem of machine learning. There are two types of probability density estimation methods at present, one is the parameter estimation, the other is the non-parameter estimation. Maximum likelihood method is one of the representative parameter estimation measures, but this method has some limitations, for example, it cannotbe used to estimate the probability density of the function compounded with several normal distributions. By contrast, the non-parameter estimation methods have been more widely used. The Parzen window density estimation Parzen (1962), Jenssen et al. (2006), Mohamed et al. (2004) is the most representative non-parameter estimation method, which is also a classical kernel density estimator. But the Parzen window method has a disadvantage that it does not have sparseness. When the probability densities of new samples are estimated, all the samples of the dataset are concerned and the computational complexity will become huge. Therefore, researchers have expected for a long time to find a probability density estimation method, which only uses some training samples having great influence on density estimation, instead of all the training samples. The essence of this method is to seek a sparse solution, so as to reduce the computation cost and improve the applicability. The support vector machine (SVM) provides a good approach for obtaining sparse solutions (Vapnik and Mukherjee, 2000), as the solution of SVM is only concerned with the support vectors in training samples. So the SVM method can be used to estimate probability density, and the operational steps are as follows: firstly, start from the definition of probability density and estimate an approximate distribution function from the empirical cumulative distribution function values. Secondly, get the density function by differential computing. In fact, the linear operator equation solutions are computed by SVM in the above-mentioned method, as a result, a sparse probability density estimation which is similar to the Parzen Window in form is obtained. By improving the form of constraint condition of SVM probability density estimation model, a single slack factor SVM probability density estimation model is presented in this paper. On this basis, the measurement of system’s anomaly degree is achieved.
In the remainder of this paper, we go along through different sections which are organized as follows: in Section 2, we summarize the data-driven fault prediction methods for complex systems, and introduce the quantitative measurement of system anomaly based on anomaly index. The principle of probability density model based on single slack factor SVM is introduced detailedly in Section 3, the corresponding algorithm’s complexity is also analyzed. In Section 4, several experiments are carried out to testify the effectiveness of the proposed method. Finally, a conclusion is drawn and the future work is also planned in Section 5.
Section snippets
Data-driven fault prediction for complex systems
In the operating process of some practical industry systems, the fault prediction and reliability evaluation technologies can be used to reduce the cost of system’s maintenance Wang et al. (2008), Ding et al. (2014), Alghazzawi and Lennox (2009). The technologies also can provide reliable evidence for system’s repairing opportunity determination, under this circumstance, the blindness of device maintenance can be reduced, and the effective time of system running can be greatly increased. Fault
SVM probability density estimation
The idea of SVM probability density estimation is to approximate distribution function using support vector regression rather than estimate the probability density function directly. For an observed sample , the empirical distribution function can be constructed as Vapnik and Mukherjee (2000), Wang et al. (2008), Ding et al. (2014) where satisfies
The relationship (Alghazzawi and Lennox, 2009) between
A simulation example and result analysis
Firstly, a complex system based on the Gaussian mixture model is introduced to evaluate the performance of the presented algorithm. Generate 100 random samples according to the distribution density (41) and let , , , . From (41), we can see that belongs to the two classes with the prior probabilities 0.2 and 0.8. Then estimate according to the condition on uncoupled data with the known prior probability and the sample
Conclusions
The SVM probability density estimation method is discussed in this paper to evaluate the degree deviating from the normal running state of a system. But for some complex systems in practical applications, the accurate model of system is commonly unavailable, and we also do not know what distribution the probability density obeys. Under this circumstance, we can calculate and obtain an approximate estimation of the actual probability density through a regression of the collected samples, which
Acknowledgments
This work was jointly supported by the National Natural Science Foundation for Young Scientists of China (Grant No: 61202332, 61403397, 61503389), China Postdoctoral Science Foundation (Grant No: 2012M521905) and Natural Science Basic Research Plan in Shaanxi Province of China (Grant No: 2015JM6313, 2016JM6061).
References (28)
- et al.
Model predictive control monitoring using multivariate statistics
J. Process Control
(2009) - et al.
Data-driven realizations of kernel and image representations and their application to fault detection and control system design
Automatica
(2014) - et al.
Intelligent ICA-SVM fault detector for non-Gaussian multivariateprocess monitoring
Expert Syst. Appl.
(2010) Model-based fault-detection and diagnosis-status and applications
Annual Reviews in Control
(2005)- et al.
The Cauchy–Schwarz divergence and parzen windowing: Connections to graph theory and mercer kernels
J. Franklin Inst.
(2006) - et al.
Fault detection and diagnosis in process data using one-class support vector machines
J. Process Control
(2009) - et al.
Data-driven and adaptive statistical residual evaluation for fault detection with an automotive application
Mech. Syst. Signal Process.
(2014) - et al.
Data-driven fault diagnosis for an automobile suspension system by using a clustering based method
J. Franklin Inst.
(2014) - et al.
Statistical MIMO controller performance monitoring, Part I: Data-driven covariance benchmark
J. Process Control
(2008) - et al.
A data-driven methodology for fault detection in electromechanical actuators
Journal of Dynamic Systems, Measurement and Control
(2014)
Domain described support vector classifier for multi-classification problems
Pattern Recognit.
From model, signal to knowledge: A data-driven perspective of fault detection and diagnosis
IEEE Trans. on Industrial Informatics
Fault classification and section identification of an advanced series compensated transmission line using support vector machine
IEEE Trans. Power Deliv.
Multiple sensor fault diagnosis by evolving data-driven approach
Information Science
Cited by (18)
Utilization of measurements, machine learning, and analytical calculation for preventing belt flip over on conveyor belts
2023, Measurement: Journal of the International Measurement ConfederationA semisupervised autoencoder-based approach for anomaly detection in high performance computing systems
2019, Engineering Applications of Artificial IntelligenceCitation Excerpt :They are based on learning the normal behavior of the target system, via ML or statistical models; faults are then detected because they present a different signature w.r.t. the learnt one. The most common fall in one the following categories: probability density estimation (Yamanishi et al., 2004; Kristan et al., 2011; Li et al., 2016; Wang et al., 2018a), one-class Support Vector Machine (SVM) (Schölkopf et al., 2000; Heller et al., 2003), elliptical envelope (Pedregosa et al., 2011; Hoyle et al., 2015), Isolation Forest (Ting et al., 2008; Ding and Fei, 2013) and neighborhood identification (Kriegel et al., 2009; Tang et al., 2001). Since there is no clear technique outperforming all the others (Markou and Singh, 2003; Hodge and Austin, 2004), a subset of algorithms from the literature were implemented.
One-class support vector machines with a bias constraint and its application in system reliability prediction
2019, Artificial Intelligence for Engineering Design, Analysis and Manufacturing: AIEDAMCausality-Based PCA Methods for Condition Modeling of Mechatronic Systems
2024, IEEE Transactions on Industrial InformaticsFault detection and prediction scheme for nonlinear stochastic distribution systems
2023, Asian Journal of Control