Elsevier

Computers in Industry

Volume 65, Issue 8, October 2014, Pages 1126-1135
Computers in Industry

Comparing a knowledge-based and a data-driven method in querying data streams for system fault detection: A hydraulic drive system application

https://doi.org/10.1016/j.compind.2014.06.003Get rights and content

Highlights

  • Streaming data from a Swedish industrial hydraulic drive system (Bosch Rexroth).

  • Comparing a knowledge-based (FTA) to a data-driven method (PCA), performance is similar.

  • The two methods produce acceptable results, thus the developed models are verified.

  • Both methods generate queries fast enough to query the data stream online.

  • Both methods may improve the quality of the product when used in industry.

Abstract

The field of fault detection and diagnosis has been the subject of considerable interest in industry. Fault detection may increase the availability of products, thereby improving their quality. Fault detection and diagnosis methods can be classified in three categories: data-driven, analytically based, and knowledge-based methods.

In this work, we investigated the ability and the performance of applying two fault detection methods to query data streams produced from hydraulic drive systems. A knowledge-based method was compared to a data-driven method. A fault detection system based on a data stream management system (DSMS) was developed in order to test and compare the two methods using data from real hydraulic drive systems.

The knowledge-based method was based on causal models (fault trees), and principal component analysis (PCA) was used to build the data-driven model. The performance of the methods in terms of accuracy and speed, was examined using normal and physically simulated fault data. The results show that both methods generate queries fast enough to query the data streams online, with a similar level of fault detection accuracy. The industrial applications of both methods include monitoring of individual industrial mechanical systems as well as fleets of such systems. One can conclude that both methods may be used to increase industrial system availability.

Introduction

Industrial companies seek to increase the availability of their product and, thereby, improve product quality. Product availability can be improved in the design phase, in the testing and refinement phase, or in the operational phase, i.e. when the product is in use [1], [2]. Availability of industrial systems can be improved by detecting and diagnosing the failures at an early stage [3].

Fault detection and diagnosis methods have been classified in different ways [4], [5], [6]. Zhang and Jiang [5], divided the fault detection methods into model-based and data-based methods as illustrated in Fig. 1. Then, Zhang and Jiang [5] divided the model-based and the data-based methods into two groups: quantitative and qualitative methods. On the other hand, Chiang et al. [6] classified fault detecting and diagnosing methods into three categories: data-driven methods, analytical-based (model-based) methods, and knowledge-based methods. Fig. 1 shows that the model-quantitative-based, data-quantitative-based, and the combination of model-qualitative-based and data-qualitative-based methods in the Zhang and Jiang [5] model can be seen as analytically based, data-driven, and knowledge-based methods, respectively, in the Chiang et al. [6] model.

The analytical approach uses first principles to construct mathematical models of the system [6]. Therefore, the analytically based methods incorporate the physical understanding of the system into the fault detection and diagnosis process [6], [7]. According to Ref. [6] most of the analytically based methods are based on parameter estimation, observer-based design and parity relations. The analytical model is, according to Chiang et al. [6], not adequate for large-scale and complex systems. However, analytically based methods, when applicable, outperform the data-driven methods, as the former incorporate the physical understanding [6].

Knowledge-based methods use qualitative models in fault detection and the diagnosis process [6]. According to Ref. [8], most of the knowledge-based methods are rule-based expert systems. As the inferred rules are based on the historical failure cases and engineers’ experience, it is difficult to search a wide range of yield-loss cases beyond the engineers’ current knowledge [8]. The knowledge-based method is suitable when the detailed mathematical model is not available and when the number of inputs, outputs and states of a system is relatively small [6]. However, the knowledge-based method became more applicable to complex systems with the help of software packages [6]. A survey of knowledge-based fault diagnosis methods was performed by Zhu and Yu [9] and Venkatasubramanian et al. [10].

Fig. 1 shows that knowledge-based methods may be either qualitative or quantitative. It also shows that knowledge-based methods can be either qualitative, model-based (such as causal models) or qualitative, data-based (such as expert systems). In this paper, we have used causal models since the industrial partners are experts concerning the function of their hydraulic drive system, have a good historical knowledge, and are familiar with the fault tree analysis method (FTA) [11] used.

Data-driven methods use the product lifecycle data for fault detection and do not require first-principles models. Therefore, data-driven methods can be applied to large-scale and complex systems and save time and cost, which is required for the development of first-principles models [6]. Data-driven methods are preferred when the product data is available while the system model is not [7]. In addition; data-driven methods have the ability to capture information and provide knowledge which is beyond the engineers’ current knowledge [8]. However, the performance of data-driven methods is based on the quality and the quantity of the collected data [6]. A survey of data-driven prognostics methods can be found in Refs. [10], [12]. Data-driven methods can, according to Fig. 1, be divided into two groups: statistical and non-statistical methods. Principal component analysis (PCA) [13] and partial least squares (PLS) [14] are examples of statistical-based data-driven methods. On the other hand, artificial neural networks (ANN) [15], self-organized map (SOM) [16], and K-nearest neighbors [17] are examples of non-statistical data-driven methods. In this work, principal component analysis was selected as a data-driven method because it has been successfully used to detect faults in general, by researchers such as Villegas et al. [18], Tharrault et al. [19], Russell et al. [20], Kresta et al. [21], and Piovoso et al. [22].

Nowadays, the volume of the generated data is increasing. It is expected that the volume of generated data will exceed the available storage [23], [24]. Therefore, fault detection methods need not only achieve high classification accuracy in detecting failures, but must also be fast enough to identify faults from continuous and fast-arriving data streams. A number of researchers have discussed the issue of detecting failures in data streams. A review of such application can be found in Alzghoul and Löfstrand [23]. Examples of research concerning data stream mining include [3], [23], [25], [26], [27]. Karcal [26], used a multivariate statistical process monitoring technique to detect change in the sensor data stream. Kargupta [25], developed the VEhicle DAta Stream mining (VEDAS) system for realtime vehicle-health monitoring. The proposed system was based on DSMS and a data stream mining (DSM) method. Matthews and Srivastava [27], applied different data-driven methods on continuous data stream from a solid rocket motor for anomaly detection.

The authors of this paper have presented a number of related works [3], [23], [28], [29] and in this paper, we report some verification and validation activities. In Alzghoul and Löfstrand [23], the authors investigated the possibility of increasing availability through the use of data stream mining and DSMS technologies. The authors reviewed the DSM algorithms and their applications in monitoring industrial systems. Also, they developed a fault detection system based on the DSM and the DSMS technologies. The fault detection system was developed based on three different DSM classification algorithms: One-class support vector machine (OCSVM), polygon-based classification algorithm and grid-based classification algorithm. The developed fault detection system was tested using data collected from hydraulic motors. The results showed that the three algorithms achieved good performance in detecting faults from sensor data streams. In Alzghoul et al. [3], the authors utilized data stream prediction methods to forecast the future data stream. In Ref. [3], they developed a fault detection system based on data stream forecasting, data stream mining and data stream management systems. The developed fault detection system was able to predict system faults based on a forecasted data stream. Furthermore, in Ref. [28] Alzghoul et al., discussed the potential industrial use of the proposed fault detection system which, in turn, was presented. Löfstrand et al. [29], proposed a model for predicting and monitoring industrial system availability [29]. They suggested using the IF-THEN-ELSE statements, which are based on FTA, for development of queries for system monitoring. However, developing and testing such queries was not done in Ref. [29]. In this work, queries based on FTA and PCA methods were implemented and tested. Thus, the model presented in Ref. [29] is partly tested and verified which can be considered as a novelty of this work. Another novelty of this paper is testing and comparing a data-driven method (PCA), which was not tested in Refs. [3], [23], to a knowledge driven method (FTA) in querying data streams for monitoring a hydraulic drive system application.

Several authors, as identified above, discuss various fault detection methods, their advantages and disadvantages. However, few authors discuss applying specifically knowledge-based and data-driven methods for searching high-volume data streams. In addition, to our knowledge, no authors have compared the performance of different fault detection methods in searching high-volume data streams.

The authors have worked closely with Bosch Rexroth Mellansel AB [30] (BRMAB, formerly Hägglunds Drives AB), to such an extent that their representatives are co-authors of this paper. BRMAB manufactures low-speed, high-torque hydraulic drive systems and are interested in improving the availability of their drive systems through monitoring. In this industrial context we investigated the functionality and performance of two different fault detection methods, as reported in Section 6.

The first method, Principal Component Analysis is, in terms of Zhang and Jiang [5], a data-based quantitative method, while Chiang et al. [6], consider it a data-driven method. The second method, Fault Tree Analysis (FTA) is, in terms of Zhang and Jiang [5], a model-based qualitative method, while Chiang et al. [6], consider it a knowledge-based method.

Using sensor data collected from a real BRMAB hydraulic drive system, the aim of this work is to partly validate the model presented by Löfstrand et al. [29]. The aim is also to test and compare the functionality and performance when applying a knowledge-based method (FTA) and a data-driven method (PCA) to query data streams, with the purpose of fault detection. Therefore, we developed a fault detection system based on DSMS technology to test the two fault detection methods. In Section 2 (research method), Fig. 2 shows the process of developing the fault detection functions, while Fig. 3 in Section 3 (fault detection system architecture) shows the architecture of the developed fault detection system. In addition, we discuss and summarize the advantages and disadvantages of using the knowledge-based and the data-driven methods (see Section 6).

Results suggested that both methods, when implemented in the DSMS, are sufficiently fast and can produce good classification accuracy. As reported in Section 5, it is clear that the two approaches produce comparable and acceptable results in terms of accuracy and speed. This is also evident from reviewing Table 1, Table 2, Table 3, Table 4, Table 5; thus, the developed models are verified and the model in Ref. [29] is partly verified. Furthermore, PCA is shown in this paper to be a suitable data-driven method for searching data streams from a hydraulic motor. Comparing to previous work by the authors, in Refs. [3], [23], the authors showed that three other data driven methods are suitable. Furthermore, in Refs. [3], [23], the data originated from a laboratory test (tank test hydraulic drive system) while in this paper, the data originate in a shredder application in use rather than from a lab test. From these tests, it is indicated that data driven methods in general, are suitable and generally successful for searching data streams. In particular, the authors feel that the four data-driven methods tested (three in Ref. [3], [23] and one in this paper) are now validated when applied on data from hydraulic drive systems. Also the requirements for successful industrial implementation of the two methods were identified and described in Section 6. Furthermore, it was found that the performance of the fault detection models can be improved by investigating the misclassified data. Also, it was shown that sensor data can be used to improve the performance of the knowledge-based method by tuning its model parameters.

Section snippets

Research approach and case study

Bosch Rexroth Mellansel AB (BRMAB) is a Swedish company which manufactures low-speed, high-torque hydraulic drive systems. BRMAB hydraulic drive systems are used in industries including: mining, recycling, pulp and paper and construction. BRMAB are interested in improving the availability of their drive systems through system monitoring. The work in this paper relates to a full-scale BRMAB hydraulic drive system, a shredder application used to crush waste wood as shown in Fig. 2.

Qualitative

Fault detection system architecture

As previously mentioned, some authors have discussed the issue of detecting failures in high volume data streams from equipment in use. Alzghoul and Löfstrand [23], Alzghoul et al. [3], and Kargupta [25], found that data stream management system (DSMS) technology has the potential to support scalability issues. DSMSs have the ability to manage data streams and apply continuous queries over input data streams. DSMS technology can be used to implement and test different fault detection methods in

The data set

The example query developed and used in this paper concerned fault detection in air–oil cooler, part of a hydraulic drive system powering the shredder application. Several variables which are associated with the cooler system functionality were considered. The cooler fan in the system is activated for different periods of time, depending on ambient temperature and system load. The cooler monitoring process will be activated when the cooler fan is switched on, whenever cooling is needed.

Results

This section presents, compares and discusses the results of testing the data-driven model (PCA) and the knowledge-based model (FTA).

Discussion

The developed models were tested using a data set containing both faulty and non-faulty data points. The time when faults appear as well as when faults disappear in the data set, are known (i.e. the data-label is known). Comparing the outputs from the developed models with the pre-known data-label showed relatively speaking, good accuracy, as shown in Section 5, thus verifying the developed models.

The results showed that both the knowledge-based and the data-driven method achieved high

Conclusions

As discussed previously, increasing availability of products is of great interest for industrial companies. Availability of industrial products can be improved by fault detection and diagnosis. Fault detection and diagnosis methods can be divided into three categories: data-driven, analytical-based, and knowledge-based methods.

In this work, monitoring cooler functionality of BRMAB hydraulic drive systems was investigated. Two fault detection techniques, i.e. data-driven (PCA) and

Acknowledgments

The authors wish to acknowledge the following three organizations: The EU FP7 Project SmartVortex, The Faste Laboratory at Luleå University of Technology funded by the Swedish Governmental Agency for Innovation Systems (VINNOVA: 2012-00705) and the SSPI project (Scalable search of product lifecycle information) funded by the Swedish Foundation for Strategic Research (SSF: RIT08-0041).

Ahmad Alzghoul is currently a researcher at Uppsala University, Department of Information Technology, Division of Computing Science. He has an educational background in the field of computer science and engineering. He received a Ph.D. in Computer Aided Design at Luleå University of Technology, Division of Product and Production Development, Sweden. He received his M.Sc. degree in Computer Engineering (Intelligent Systems) from Halmstad University, Sweden, and his M.Sc. degree in Software

References (47)

  • L.H. Chiang et al.

    Fault Detection and Diagnosis in Industrial Systems

    (2001)
  • C. Sankavaram et al.

    Model-based and data-driven prognosis of automotive and electronic systems

  • F. Chih-Min et al.

    A Bayesian framework to integrate knowledge-based and data-driven inference tools for reliable yield diagnoses

  • D. Zhu et al.

    Survey of knowledge-based fault diagnosis methods

    Anhui Gongye Daxue Xuebao

    (2002)
  • J.B. Fussell et al.

    Fault trees – a state of the art discussion

    IEEE Transactions on Reliability

    (1974)
  • M.A. Schwabacher

    A Survey of Data Driven Prognostics

    (2005)
  • K. Pearson

    On lines and planes of closest fit to systems of points in space

    The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science

    (1901)
  • S. Wold et al.

    Multivariate data analysis in chemistry

    Chemometrics

    (1984)
  • W.S. McCulloch et al.

    A logical calculus of the ideas immanent in nervous activity

    Bulletin of Mathematical Biophysics

    (1943)
  • T. Kohonen

    Self-organized formation of topologically correct feature maps

    Biological Cybernetics

    (1982)
  • B.V. Dasarathy

    Nearest Neighbor ({NN}) Norms:{NN} Pattern Classification Techniques

    (1991)
  • T. Villegas et al.

    Principal Component Analysis for Fault Detection and Diagnosis. Experience with a pilot plant

    (2010)
  • Y. Tharrault et al.

    Fault Detection and Isolation with Robust Principal Component Analysis

    (2008)
  • Cited by (51)

    • Data-driven invariant modelling patterns for digital twin design

      2023, Journal of Industrial Information Integration
    • A framework to automate fault detection and diagnosis based on moving window principal component analysis and Bayesian network

      2021, Reliability Engineering and System Safety
      Citation Excerpt :

      In fact, FDD is an active research area that has been encouraging the development of a wide range of methods and heuristics [9]. Many authors have approached FDD methods, presenting review articles, and comparative studies in several areas of application [8,10–22]. Considering these studies and the attributes of the methods, FDD can be classified into some major categories, such as model-based and data-based approaches, quantitative and qualitative (also known as knowledge-based methods), and supervised or unsupervised methods [21,23].

    • Digital Twin-driven online anomaly detection for an automation system based on edge intelligence

      2021, Journal of Manufacturing Systems
      Citation Excerpt :

      With smart sensors and IoT technologies, large amounts of data can be gathered from a physical system, containing more hidden knowledge. Data-driven approaches primarily use data analysis algorithms to find system operations patterns based on observed data signatures without requiring first-principles models [26]. Therefore, data-driven approaches are suitable for complex automation systems, whose first-principles model is expensive to develop.

    View all citing articles on Scopus

    Ahmad Alzghoul is currently a researcher at Uppsala University, Department of Information Technology, Division of Computing Science. He has an educational background in the field of computer science and engineering. He received a Ph.D. in Computer Aided Design at Luleå University of Technology, Division of Product and Production Development, Sweden. He received his M.Sc. degree in Computer Engineering (Intelligent Systems) from Halmstad University, Sweden, and his M.Sc. degree in Software Engineering from Linnaeus University, Sweden. His main research interests include data mining and industrial data mining applications.

    Björn Backe (M.Sc.) is a Ph.D. student in Computer Aided Design at Luleå University of Technology, Division of Product and Production Development. His main research interests include hardware reliability.

    Magnus Löfstrand has an educational background in mechanical engineering and was awarded his Ph.D. in Computer Aided Design (CAD) at Luleå University of Technology, Sweden in 2007. After serving as assistant professor in CAD, he is now employed as a senior researcher at Uppsala University, Department of Information Technology, Division of Computing Science, Uppsala DataBase Laboratory. He has an academic background in the work process description, refinement and simulation based on product development literature. He also has experience of research concerning, and equipment management for, distributed collaboration over IP networks. He is involved in research concerning system availability (maintainability and reliability) and the use of DSMS (Data Stream Management System) and DSM (Data Stream Mining) in engineering applications, often in the context of enabling larger industrial service content in industrial systems.

    Arne Byström, B.Sc., has experience from working in the Swedish high technological industry, mainly within R&D, for more than 37 years. He previously, for over ten years, served as manager for customer control system order development. He has good knowledge of elliciting and interpreting customer needs and has experience from control related problem solving in customer applications around the world. Today, he works for Bosch Rexroth Mellansel AB as Technical Product Manager Systems, in the application area of electrohydraulic drive control and drive monitoring systems.

    Bengt Liljedahl has a B.Sc. in Mechanical Engineering from 1973 and a diploma in Mechanical Engineering at University level, received in 1978. He has over 38 years of experience from working in Swedish Industry, mainly within R&D. He currently serves as Technical Product Manager at Bosch Rexroth Mellansel AB (former Hägglunds Drives AB). He has since 1989 coordinated and built a network of close cooperation with Swedish Universities to increase the competence of Bosch Rexroth Mellansel AB in 3 main areas, Tribology, Product Development and Material Science. For his longstanding and successful research collaboration, he was awarded an Honorary Doctorate at Luleå University of Technology in June of 2009.

    1

    Tel.: +46 0920 49 2439.

    2

    Tel.: +46 018 471 4020.

    3

    Tel.: +46 660 87062.

    4

    Tel.: +46 660 87123.

    View full text