Abstract
This study addresses data-driven predictive maintenance, an area in which machine learning has received considerable attention. Traditionally, a machine learning model is trained on static data before being put into production to predict failures on incoming data. However, new data typically present novelties that were not included in the training data, such as unexpected anomalies or faults. Such novelties reduce the model accuracy and require model retraining, which we consider to be a suboptimal practice.Therefore, we propose to leverage online machine learning as an adaptive and continuous alternative to implement efficient predictive maintenance on systems that produce data continuously. The literature on predictive maintenance concentrates primarily on failure prediction, whereas there are multiple stages in a standard predictive maintenance framework, such as data preprocessing and diagnostics, that require attention. In this study, we propose a modular pipeline consisting of three modules to execute many stages inside a predictive maintenance solution. Each module represents one of our original contributions. Firstly, because a system generates repeating patterns in the form of cycles when performing its functions, we construct an online active learning-based framework to extract these cycles from a stream of sensor data (cycle extraction with InterCE). Secondly, we implement an autoencoder for encoding the extracted cycles into feature vectors (feature learning with LSTM-AE). Thirdly, we develop an adaptive scoring function to compute the health of any system at any time using online clustering on the stream of feature vectors (health detection with CheMoc). These three contributions establish our framework for processing raw sensor data for predictive maintenance. We evaluate our methods using a real-world data set provided by SNCF, the French national railway company. For each experiment, we simulate a data stream consisting of sequentially arriving data from the provided data set to test our online algorithms. The experimental results demonstrate that (i) InterCE is able to extract cycles from a high-speed stream with greater accuracy than a hand-crafted expert system, (ii) LSTM-AE can identify meaningful features from the extracted cycles, and (iii) CheMoc can discover clusters that represent physical anomalies of the systems and capture the health evolution of the monitored systems. Due to a lack of ground-truth data at the time of writing, we have not implemented the prognostics method and will reserve this for future works. This study confirms the potential of online machine learning as an adaptive and lifelong learning solution for predictive maintenance.
Similar content being viewed by others
Data Availability
Due to the confidential obligation imposed by the data supplier, we cannot publish our data.
Code Availability
The source code implementing the methods described in this article is available at https://tinyurl.com/ms4bvj5k.
Notes
Failure reporting, analysis, and corrective action system.
It does not necessarily mean the failure will never occur again, but if it does, the domain experts will likely not label it again.
As cycles may differ slightly in length, we pad them with 0.0 to have equal-length cycles.
In this context, an expert indicator is short for a vector of expert indicators, and similarly for the LSTM-AE feature vectors and features.
The profile of a set of cycles is the average computed at each timestep over all the cycles.
A curve is a univariate time series in a cycle. The AUC of one cycle is the average of the areas of all its univariate series.
This can be carried out for every new data point or by micro-batch. Updating on every point is compatible with the principle of real-time monitoring, but it can cause data communication bottleneck if the traces must be saved in a database. This choice is therefore application-dependent.
References
Abadi M, Agarwal A, Barham P et al (2015) TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/, software available from tensorflow.org
Aminikhanghahi S, Cook DJ (2017) A survey of methods for time series change point detection. Knowl Inf Syst 51(2):339–367. https://doi.org/10.1007/s10115-016-0987-z
Aydemir G, Acar B (2020) Anomaly monitoring improves remaining useful life estimation of industrial machinery. J Manuf Syst 56:463–469
Ben Ali J, Saidi L, Harrath S et al (2018) Online automatic diagnosis of wind turbine bearings progressive degradations under real experimental conditions based on unsupervised machine learning. Appl Acoust 132:167–181. https://doi.org/10.1016/j.apacoust.2017.11.021
Canizo M, Onieva E, Conde A et al (2017) Real-time predictive maintenance for wind turbines using big data frameworks. In: 2017 IEEE international conference on prognostics and health management (ICPHM). pp 70–77. https://doi.org/10.1109/ICPHM.2017.7998308
Cao F, Ester M, Qian W et al (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the 6th SIAM international conference on data mining, April 20-22, 2006, Bethesda, MD, USA, https://doi.org/10.1137/1.9781611972764.29
Carnein M, Trautmann H (2019) Optimizing data stream representation: An extensive survey on stream clustering algorithms. Business & Information Systems Engineering 61(3):277–297. https://doi.org/10.1007/s12599-019-00576-5
Carvalho TP, Soares FAAMN, Vita R et al (2019) A systematic literature review of machine learning methods applied to predictive maintenance. Computers & Industrial Engineering 137:106024
Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, San Jose, California, USA, KDD ’07. pp 133–142. https://doi.org/10.1145/1281192.1281210
Davari N, Veloso B, De Assis Costa G et al (2021) A survey on data-driven predictive maintenance for the railway industry. Sensors 21:5739. https://doi.org/10.3390/s21175739
Ester M, Kriegel HP, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. AAAI Press, Portland, Oregon, KDD’96, pp 226–231
Feng X, Weng C, He X et al (2019) Online state-of-health estimation for li-ion battery using partial charging segment based on support vector machine. IEEE Transactions on Vehicular Technology 68(9). https://doi.org/10.1109/TVT.2019.2927120
Forestiero A, Pizzuti C, Spezzano G (2009) FlockStream: a bio-inspired algorithm for clustering evolving data streams. In: 2009 21st IEEE international conference on tools with artificial intelligence. pp 1–8, https://doi.org/10.1109/ICTAI.2009.60, iSSN: 2375-0197
Gomes HM, Read J, Bifet A et al (2019) Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explorations Newsl 21(2):6–22. https://doi.org/10.1145/3373464.3373470
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Inturi V, Shreyas N, Chetti K et al (2021) Comprehensive fault diagnostics of wind turbine gearbox through adaptive condition monitoring scheme. Appl Acoust 174:107738
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4):422–446. https://doi.org/10.1145/582415.582418
Kranen P, Assent I, Baldauf C et al (2009) Self-adaptive anytime stream clustering. In: 2009 9th IEEE international conference on data mining. pp 249–258. https://doi.org/10.1109/ICDM.2009.47, ISSN: 2374–8486
Le Nguyen MH, Turgis F, Fayemi PE et al (2021) A complete streaming pipeline for real-time monitoring and predictive maintenance. In: Proceedings of the 31st European safety and reliability conference. pp 2119, https://doi.org/10.3850/978-981-18-2016-8_400-cd
Lebold M, Reichard K (2002) OSA-CBM architecture development with emphasis on XML implementations. In: OSA-CBM architecture development with emphasis on XML implementations
Li Y, Li H, Wang Z et al (2020) ESA-Stream: efficient self-adaptive online data stream clustering. IEEE Transactions on Knowledge and Data Engineering pp 1–1. https://doi.org/10.1109/TKDE.2020.2990196, conference Name: IEEE Transactions on Knowledge and Data Engineering
Lin J, Keogh E, Lonardi S et al (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery. Association for Computing Machinery, New York, NY, USA, DMKD ’03, pp 2–11, https://doi.org/10.1145/882082.882086
Liu Lx, Huang H, Guo Yf et al (2009) rDenStream, A clustering algorithm over an evolving data stream. In: 2009 international conference on information engineering and computer science. pp 1–4, 10.1109/ICIECS.2009.5363379, iSSN: 2156-7387
Mitici M, Hennink B, Pavel M et al (2023) Prognostics for Lithium-ion batteries for electric Vertical Take-off and Landing aircraft using data-driven machine learning. Energy and AI 12:100233
Polikar R (2012) Ensemble learning. In: Zhang C, Ma Y (eds) Ensemble machine learning: methods and applications. Springer US, Boston, MA, pp 1–34. https://doi.org/10.1007/978-1-4419-9326-7_1
Putina A, Rossi D (2021) Online anomaly detection leveraging stream-based clustering and real-time telemetry. IEEE Transactions on Network and Service Management 18(1):839–854. https://doi.org/10.1109/TNSM.2020.3037019, conference Name: IEEE Transactions on Network and Service Management
Ren J, Ma R (2009) Density-based data streams clustering over sliding windows. In: 2009 6th international conference on fuzzy systems and knowledge discovery. pp 248–252. https://doi.org/10.1109/FSKD.2009.553
Ribeiro RP, Pereira P, Gama J (2016) Sequential anomalies: a study in the railway industry. Mach Learn 105(1):127–153. https://doi.org/10.1007/s10994-016-5584-6
Ruiz C, Menasalvas E, Spiliopoulou M (2009) C-DenStream: using domain knowledge on a data stream. In: Gama J, Costa VS, Jorge AM et al (eds) Discovery science. Springer, Berlin, Heidelberg, Lecture Notes in Computer Science, pp 287–301, https://doi.org/10.1007/978-3-642-04747-3_23
Sahal R, Breslin JG, Ali MI (2020) Big data and stream processing platforms for Industry 4.0 requirements mapping for a predictive maintenance use case. J Manuf Syst 54:138–151
Settles B (2009) Active learning literature survey. Technical Report, University of Wisconsin-Madison Department of Computer Sciences
Shen J, Li S, Jia F et al (2020) A deep multi-label learning framework for the intelligent fault diagnosis of machines. IEEE Access 8:113557–113566. https://doi.org/10.1109/ACCESS.2020.3002826, Conference Name: IEEE Access
Su CJ, Huang SF (2018) Real-time big data analytics for hard disk drive predictive maintenance. Computers & Electrical Engineering 71:93–101
Tian H, Khoa NLD, Anaissi A et al (2019) Concept drift adaption for online anomaly detection in structural health monitoring. In: Proceedings of the 28th international conference on information and knowledge management, CIKM ’19. pp 2813–2821, https://doi.org/10.1145/3357384.3357816
Torkamani S, Lohweg V (2017) Survey on time series motif discovery. WIREs Data Mining and Knowledge Discovery 7(2):e1199. https://doi.org/10.1002/widm.1199, https://onlinelibrary.wiley.com/doi/abs/10.1002/widm.1199
Turgis F, Audier P, Nemoz V et al (2022) Health state characterization using clustering algorithms for railway maintenance. In: World Congress on Railway Research 2022, Birmingham, United Kingdom
Zhao R, Yan R, Chen Z et al (2019) Deep learning and its applications to machine health monitoring. Mech Syst Signal Process 115:213–237
Zonta T, da Costa CA, da Rosa Righi R et al (2020) Predictive maintenance in the Industry 4.0: A systematic literature review. Computers & Industrial Engineering 150:106889. https://doi.org/10.1016/j.cie.2020.106889, http://www.sciencedirect.com/science/article/pii/S0360835220305787
Zubaroğlu A, Atalay V (2021) Data stream clustering: a review. Artif Intell Rev 54(2):1201–1236. https://doi.org/10.1007/s10462-020-09874-x
Acknowledgements
We are grateful to the SNCF for providing us with the data used to develop and evaluate our methods.
Funding
Funded by the Association Nationale de la Recherche et de la Technologie (ANRT) de la France.
Author information
Authors and Affiliations
Contributions
All authors contribute to the manuscript equally.
Corresponding author
Ethics declarations
Competing interests
We declare that there is no competing interests among the authors.
Author approbation
All authors have approved the manuscript for submission.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Le-Nguyen, MH., Turgis, F., Fayemi, PE. et al. Exploring the potentials of online machine learning for predictive maintenance: a case study in the railway industry. Appl Intell 53, 29758–29780 (2023). https://doi.org/10.1007/s10489-023-05092-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05092-4