Comparing a knowledge-based and a data-driven method in querying data streams for system fault detection: A hydraulic drive system application
Introduction
Industrial companies seek to increase the availability of their product and, thereby, improve product quality. Product availability can be improved in the design phase, in the testing and refinement phase, or in the operational phase, i.e. when the product is in use [1], [2]. Availability of industrial systems can be improved by detecting and diagnosing the failures at an early stage [3].
Fault detection and diagnosis methods have been classified in different ways [4], [5], [6]. Zhang and Jiang [5], divided the fault detection methods into model-based and data-based methods as illustrated in Fig. 1. Then, Zhang and Jiang [5] divided the model-based and the data-based methods into two groups: quantitative and qualitative methods. On the other hand, Chiang et al. [6] classified fault detecting and diagnosing methods into three categories: data-driven methods, analytical-based (model-based) methods, and knowledge-based methods. Fig. 1 shows that the model-quantitative-based, data-quantitative-based, and the combination of model-qualitative-based and data-qualitative-based methods in the Zhang and Jiang [5] model can be seen as analytically based, data-driven, and knowledge-based methods, respectively, in the Chiang et al. [6] model.
The analytical approach uses first principles to construct mathematical models of the system [6]. Therefore, the analytically based methods incorporate the physical understanding of the system into the fault detection and diagnosis process [6], [7]. According to Ref. [6] most of the analytically based methods are based on parameter estimation, observer-based design and parity relations. The analytical model is, according to Chiang et al. [6], not adequate for large-scale and complex systems. However, analytically based methods, when applicable, outperform the data-driven methods, as the former incorporate the physical understanding [6].
Knowledge-based methods use qualitative models in fault detection and the diagnosis process [6]. According to Ref. [8], most of the knowledge-based methods are rule-based expert systems. As the inferred rules are based on the historical failure cases and engineers’ experience, it is difficult to search a wide range of yield-loss cases beyond the engineers’ current knowledge [8]. The knowledge-based method is suitable when the detailed mathematical model is not available and when the number of inputs, outputs and states of a system is relatively small [6]. However, the knowledge-based method became more applicable to complex systems with the help of software packages [6]. A survey of knowledge-based fault diagnosis methods was performed by Zhu and Yu [9] and Venkatasubramanian et al. [10].
Fig. 1 shows that knowledge-based methods may be either qualitative or quantitative. It also shows that knowledge-based methods can be either qualitative, model-based (such as causal models) or qualitative, data-based (such as expert systems). In this paper, we have used causal models since the industrial partners are experts concerning the function of their hydraulic drive system, have a good historical knowledge, and are familiar with the fault tree analysis method (FTA) [11] used.
Data-driven methods use the product lifecycle data for fault detection and do not require first-principles models. Therefore, data-driven methods can be applied to large-scale and complex systems and save time and cost, which is required for the development of first-principles models [6]. Data-driven methods are preferred when the product data is available while the system model is not [7]. In addition; data-driven methods have the ability to capture information and provide knowledge which is beyond the engineers’ current knowledge [8]. However, the performance of data-driven methods is based on the quality and the quantity of the collected data [6]. A survey of data-driven prognostics methods can be found in Refs. [10], [12]. Data-driven methods can, according to Fig. 1, be divided into two groups: statistical and non-statistical methods. Principal component analysis (PCA) [13] and partial least squares (PLS) [14] are examples of statistical-based data-driven methods. On the other hand, artificial neural networks (ANN) [15], self-organized map (SOM) [16], and K-nearest neighbors [17] are examples of non-statistical data-driven methods. In this work, principal component analysis was selected as a data-driven method because it has been successfully used to detect faults in general, by researchers such as Villegas et al. [18], Tharrault et al. [19], Russell et al. [20], Kresta et al. [21], and Piovoso et al. [22].
Nowadays, the volume of the generated data is increasing. It is expected that the volume of generated data will exceed the available storage [23], [24]. Therefore, fault detection methods need not only achieve high classification accuracy in detecting failures, but must also be fast enough to identify faults from continuous and fast-arriving data streams. A number of researchers have discussed the issue of detecting failures in data streams. A review of such application can be found in Alzghoul and Löfstrand [23]. Examples of research concerning data stream mining include [3], [23], [25], [26], [27]. Karcal [26], used a multivariate statistical process monitoring technique to detect change in the sensor data stream. Kargupta [25], developed the VEhicle DAta Stream mining (VEDAS) system for realtime vehicle-health monitoring. The proposed system was based on DSMS and a data stream mining (DSM) method. Matthews and Srivastava [27], applied different data-driven methods on continuous data stream from a solid rocket motor for anomaly detection.
The authors of this paper have presented a number of related works [3], [23], [28], [29] and in this paper, we report some verification and validation activities. In Alzghoul and Löfstrand [23], the authors investigated the possibility of increasing availability through the use of data stream mining and DSMS technologies. The authors reviewed the DSM algorithms and their applications in monitoring industrial systems. Also, they developed a fault detection system based on the DSM and the DSMS technologies. The fault detection system was developed based on three different DSM classification algorithms: One-class support vector machine (OCSVM), polygon-based classification algorithm and grid-based classification algorithm. The developed fault detection system was tested using data collected from hydraulic motors. The results showed that the three algorithms achieved good performance in detecting faults from sensor data streams. In Alzghoul et al. [3], the authors utilized data stream prediction methods to forecast the future data stream. In Ref. [3], they developed a fault detection system based on data stream forecasting, data stream mining and data stream management systems. The developed fault detection system was able to predict system faults based on a forecasted data stream. Furthermore, in Ref. [28] Alzghoul et al., discussed the potential industrial use of the proposed fault detection system which, in turn, was presented. Löfstrand et al. [29], proposed a model for predicting and monitoring industrial system availability [29]. They suggested using the IF-THEN-ELSE statements, which are based on FTA, for development of queries for system monitoring. However, developing and testing such queries was not done in Ref. [29]. In this work, queries based on FTA and PCA methods were implemented and tested. Thus, the model presented in Ref. [29] is partly tested and verified which can be considered as a novelty of this work. Another novelty of this paper is testing and comparing a data-driven method (PCA), which was not tested in Refs. [3], [23], to a knowledge driven method (FTA) in querying data streams for monitoring a hydraulic drive system application.
Several authors, as identified above, discuss various fault detection methods, their advantages and disadvantages. However, few authors discuss applying specifically knowledge-based and data-driven methods for searching high-volume data streams. In addition, to our knowledge, no authors have compared the performance of different fault detection methods in searching high-volume data streams.
The authors have worked closely with Bosch Rexroth Mellansel AB [30] (BRMAB, formerly Hägglunds Drives AB), to such an extent that their representatives are co-authors of this paper. BRMAB manufactures low-speed, high-torque hydraulic drive systems and are interested in improving the availability of their drive systems through monitoring. In this industrial context we investigated the functionality and performance of two different fault detection methods, as reported in Section 6.
The first method, Principal Component Analysis is, in terms of Zhang and Jiang [5], a data-based quantitative method, while Chiang et al. [6], consider it a data-driven method. The second method, Fault Tree Analysis (FTA) is, in terms of Zhang and Jiang [5], a model-based qualitative method, while Chiang et al. [6], consider it a knowledge-based method.
Using sensor data collected from a real BRMAB hydraulic drive system, the aim of this work is to partly validate the model presented by Löfstrand et al. [29]. The aim is also to test and compare the functionality and performance when applying a knowledge-based method (FTA) and a data-driven method (PCA) to query data streams, with the purpose of fault detection. Therefore, we developed a fault detection system based on DSMS technology to test the two fault detection methods. In Section 2 (research method), Fig. 2 shows the process of developing the fault detection functions, while Fig. 3 in Section 3 (fault detection system architecture) shows the architecture of the developed fault detection system. In addition, we discuss and summarize the advantages and disadvantages of using the knowledge-based and the data-driven methods (see Section 6).
Results suggested that both methods, when implemented in the DSMS, are sufficiently fast and can produce good classification accuracy. As reported in Section 5, it is clear that the two approaches produce comparable and acceptable results in terms of accuracy and speed. This is also evident from reviewing Table 1, Table 2, Table 3, Table 4, Table 5; thus, the developed models are verified and the model in Ref. [29] is partly verified. Furthermore, PCA is shown in this paper to be a suitable data-driven method for searching data streams from a hydraulic motor. Comparing to previous work by the authors, in Refs. [3], [23], the authors showed that three other data driven methods are suitable. Furthermore, in Refs. [3], [23], the data originated from a laboratory test (tank test hydraulic drive system) while in this paper, the data originate in a shredder application in use rather than from a lab test. From these tests, it is indicated that data driven methods in general, are suitable and generally successful for searching data streams. In particular, the authors feel that the four data-driven methods tested (three in Ref. [3], [23] and one in this paper) are now validated when applied on data from hydraulic drive systems. Also the requirements for successful industrial implementation of the two methods were identified and described in Section 6. Furthermore, it was found that the performance of the fault detection models can be improved by investigating the misclassified data. Also, it was shown that sensor data can be used to improve the performance of the knowledge-based method by tuning its model parameters.
Section snippets
Research approach and case study
Bosch Rexroth Mellansel AB (BRMAB) is a Swedish company which manufactures low-speed, high-torque hydraulic drive systems. BRMAB hydraulic drive systems are used in industries including: mining, recycling, pulp and paper and construction. BRMAB are interested in improving the availability of their drive systems through system monitoring. The work in this paper relates to a full-scale BRMAB hydraulic drive system, a shredder application used to crush waste wood as shown in Fig. 2.
Qualitative
Fault detection system architecture
As previously mentioned, some authors have discussed the issue of detecting failures in high volume data streams from equipment in use. Alzghoul and Löfstrand [23], Alzghoul et al. [3], and Kargupta [25], found that data stream management system (DSMS) technology has the potential to support scalability issues. DSMSs have the ability to manage data streams and apply continuous queries over input data streams. DSMS technology can be used to implement and test different fault detection methods in
The data set
The example query developed and used in this paper concerned fault detection in air–oil cooler, part of a hydraulic drive system powering the shredder application. Several variables which are associated with the cooler system functionality were considered. The cooler fan in the system is activated for different periods of time, depending on ambient temperature and system load. The cooler monitoring process will be activated when the cooler fan is switched on, whenever cooling is needed.
Results
This section presents, compares and discusses the results of testing the data-driven model (PCA) and the knowledge-based model (FTA).
Discussion
The developed models were tested using a data set containing both faulty and non-faulty data points. The time when faults appear as well as when faults disappear in the data set, are known (i.e. the data-label is known). Comparing the outputs from the developed models with the pre-known data-label showed relatively speaking, good accuracy, as shown in Section 5, thus verifying the developed models.
The results showed that both the knowledge-based and the data-driven method achieved high
Conclusions
As discussed previously, increasing availability of products is of great interest for industrial companies. Availability of industrial products can be improved by fault detection and diagnosis. Fault detection and diagnosis methods can be divided into three categories: data-driven, analytical-based, and knowledge-based methods.
In this work, monitoring cooler functionality of BRMAB hydraulic drive systems was investigated. Two fault detection techniques, i.e. data-driven (PCA) and
Acknowledgments
The authors wish to acknowledge the following three organizations: The EU FP7 Project SmartVortex, The Faste Laboratory at Luleå University of Technology funded by the Swedish Governmental Agency for Innovation Systems (VINNOVA: 2012-00705) and the SSPI project (Scalable search of product lifecycle information) funded by the Swedish Foundation for Strategic Research (SSF: RIT08-0041).
Ahmad Alzghoul is currently a researcher at Uppsala University, Department of Information Technology, Division of Computing Science. He has an educational background in the field of computer science and engineering. He received a Ph.D. in Computer Aided Design at Luleå University of Technology, Division of Product and Production Development, Sweden. He received his M.Sc. degree in Computer Engineering (Intelligent Systems) from Halmstad University, Sweden, and his M.Sc. degree in Software
References (47)
- et al.
Data stream forecasting for system fault prediction
Computers and Industrial Engineering
(2012) - et al.
A review of process fault detection and diagnosis part I: quantitative model-based methods
Computers and Chemical Engineering
(2003) - et al.
Bibliographical review on reconfigurable fault-tolerant control systems
Annual Reviews in Control
(2008) - et al.
A review of process fault detection and diagnosis: part III: process history based methods
Computers and Chemical Engineering
(2003) - et al.
Fault detection in industrial processes using canonical variate analysis and dynamic principal component analysis
Chemometrics and Intelligent Laboratory Systems
(2000) - et al.
Increasing availability of industrial systems through data stream mining
Computers and Industrial Engineering
(2011) - et al.
Artificial intelligence for monitoring and supervisory control of process systems
Engineering Applications of Artificial Intelligence
(2007) - et al.
Data driven fault diagnosis and fault tolerant control: some advances and possible new directions
Acta Automatica Sinica
(2009) - et al.
Soft computing applications in wind power systems: a review and analysis
Fault Detection and Diagnosis in Industrial Systems
Model-based and data-driven prognosis of automotive and electronic systems
A Bayesian framework to integrate knowledge-based and data-driven inference tools for reliable yield diagnoses
Survey of knowledge-based fault diagnosis methods
Anhui Gongye Daxue Xuebao
Fault trees – a state of the art discussion
IEEE Transactions on Reliability
A Survey of Data Driven Prognostics
On lines and planes of closest fit to systems of points in space
The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science
Multivariate data analysis in chemistry
Chemometrics
A logical calculus of the ideas immanent in nervous activity
Bulletin of Mathematical Biophysics
Self-organized formation of topologically correct feature maps
Biological Cybernetics
Nearest Neighbor ({NN}) Norms:{NN} Pattern Classification Techniques
Principal Component Analysis for Fault Detection and Diagnosis. Experience with a pilot plant
Fault Detection and Isolation with Robust Principal Component Analysis
Cited by (51)
A novel dynamic distance coding identification method for oil–gas gathering and transportation process
2023, Engineering Applications of Artificial IntelligenceData-driven invariant modelling patterns for digital twin design
2023, Journal of Industrial Information IntegrationAnalysis of multiscale process monitoring in industrial processes from a bibliometric perspective
2022, Computers and Chemical EngineeringA framework to automate fault detection and diagnosis based on moving window principal component analysis and Bayesian network
2021, Reliability Engineering and System SafetyCitation Excerpt :In fact, FDD is an active research area that has been encouraging the development of a wide range of methods and heuristics [9]. Many authors have approached FDD methods, presenting review articles, and comparative studies in several areas of application [8,10–22]. Considering these studies and the attributes of the methods, FDD can be classified into some major categories, such as model-based and data-based approaches, quantitative and qualitative (also known as knowledge-based methods), and supervised or unsupervised methods [21,23].
Transformative computing for products sales forecast based on SCIM
2021, Applied Soft ComputingDigital Twin-driven online anomaly detection for an automation system based on edge intelligence
2021, Journal of Manufacturing SystemsCitation Excerpt :With smart sensors and IoT technologies, large amounts of data can be gathered from a physical system, containing more hidden knowledge. Data-driven approaches primarily use data analysis algorithms to find system operations patterns based on observed data signatures without requiring first-principles models [26]. Therefore, data-driven approaches are suitable for complex automation systems, whose first-principles model is expensive to develop.
Ahmad Alzghoul is currently a researcher at Uppsala University, Department of Information Technology, Division of Computing Science. He has an educational background in the field of computer science and engineering. He received a Ph.D. in Computer Aided Design at Luleå University of Technology, Division of Product and Production Development, Sweden. He received his M.Sc. degree in Computer Engineering (Intelligent Systems) from Halmstad University, Sweden, and his M.Sc. degree in Software Engineering from Linnaeus University, Sweden. His main research interests include data mining and industrial data mining applications.
Björn Backe (M.Sc.) is a Ph.D. student in Computer Aided Design at Luleå University of Technology, Division of Product and Production Development. His main research interests include hardware reliability.
Magnus Löfstrand has an educational background in mechanical engineering and was awarded his Ph.D. in Computer Aided Design (CAD) at Luleå University of Technology, Sweden in 2007. After serving as assistant professor in CAD, he is now employed as a senior researcher at Uppsala University, Department of Information Technology, Division of Computing Science, Uppsala DataBase Laboratory. He has an academic background in the work process description, refinement and simulation based on product development literature. He also has experience of research concerning, and equipment management for, distributed collaboration over IP networks. He is involved in research concerning system availability (maintainability and reliability) and the use of DSMS (Data Stream Management System) and DSM (Data Stream Mining) in engineering applications, often in the context of enabling larger industrial service content in industrial systems.
Arne Byström, B.Sc., has experience from working in the Swedish high technological industry, mainly within R&D, for more than 37 years. He previously, for over ten years, served as manager for customer control system order development. He has good knowledge of elliciting and interpreting customer needs and has experience from control related problem solving in customer applications around the world. Today, he works for Bosch Rexroth Mellansel AB as Technical Product Manager Systems, in the application area of electrohydraulic drive control and drive monitoring systems.
Bengt Liljedahl has a B.Sc. in Mechanical Engineering from 1973 and a diploma in Mechanical Engineering at University level, received in 1978. He has over 38 years of experience from working in Swedish Industry, mainly within R&D. He currently serves as Technical Product Manager at Bosch Rexroth Mellansel AB (former Hägglunds Drives AB). He has since 1989 coordinated and built a network of close cooperation with Swedish Universities to increase the competence of Bosch Rexroth Mellansel AB in 3 main areas, Tribology, Product Development and Material Science. For his longstanding and successful research collaboration, he was awarded an Honorary Doctorate at Luleå University of Technology in June of 2009.