Skip to main content
Log in

Exploring the potentials of online machine learning for predictive maintenance: a case study in the railway industry

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This study addresses data-driven predictive maintenance, an area in which machine learning has received considerable attention. Traditionally, a machine learning model is trained on static data before being put into production to predict failures on incoming data. However, new data typically present novelties that were not included in the training data, such as unexpected anomalies or faults. Such novelties reduce the model accuracy and require model retraining, which we consider to be a suboptimal practice.Therefore, we propose to leverage online machine learning as an adaptive and continuous alternative to implement efficient predictive maintenance on systems that produce data continuously. The literature on predictive maintenance concentrates primarily on failure prediction, whereas there are multiple stages in a standard predictive maintenance framework, such as data preprocessing and diagnostics, that require attention. In this study, we propose a modular pipeline consisting of three modules to execute many stages inside a predictive maintenance solution. Each module represents one of our original contributions. Firstly, because a system generates repeating patterns in the form of cycles when performing its functions, we construct an online active learning-based framework to extract these cycles from a stream of sensor data (cycle extraction with InterCE). Secondly, we implement an autoencoder for encoding the extracted cycles into feature vectors (feature learning with LSTM-AE). Thirdly, we develop an adaptive scoring function to compute the health of any system at any time using online clustering on the stream of feature vectors (health detection with CheMoc). These three contributions establish our framework for processing raw sensor data for predictive maintenance. We evaluate our methods using a real-world data set provided by SNCF, the French national railway company. For each experiment, we simulate a data stream consisting of sequentially arriving data from the provided data set to test our online algorithms. The experimental results demonstrate that (i) InterCE is able to extract cycles from a high-speed stream with greater accuracy than a hand-crafted expert system, (ii) LSTM-AE can identify meaningful features from the extracted cycles, and (iii) CheMoc can discover clusters that represent physical anomalies of the systems and capture the health evolution of the monitored systems. Due to a lack of ground-truth data at the time of writing, we have not implemented the prognostics method and will reserve this for future works. This study confirms the potential of online machine learning as an adaptive and lifelong learning solution for predictive maintenance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Algorithm 2
Fig. 6
Algorithm 3
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Algorithm 4
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data Availability

Due to the confidential obligation imposed by the data supplier, we cannot publish our data.

Code Availability

The source code implementing the methods described in this article is available at https://tinyurl.com/ms4bvj5k.

Notes

  1. Failure reporting, analysis, and corrective action system.

  2. It does not necessarily mean the failure will never occur again, but if it does, the domain experts will likely not label it again.

  3. As cycles may differ slightly in length, we pad them with 0.0 to have equal-length cycles.

  4. In this context, an expert indicator is short for a vector of expert indicators, and similarly for the LSTM-AE feature vectors and features.

  5. The profile of a set of cycles is the average computed at each timestep over all the cycles.

  6. A curve is a univariate time series in a cycle. The AUC of one cycle is the average of the areas of all its univariate series.

  7. We refer to these excellent surveys for more details on online clustering algorithms [7, 39].

  8. This can be carried out for every new data point or by micro-batch. Updating on every point is compatible with the principle of real-time monitoring, but it can cause data communication bottleneck if the traces must be saved in a database. This choice is therefore application-dependent.

References

  1. Abadi M, Agarwal A, Barham P et al (2015) TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/, software available from tensorflow.org

  2. Aminikhanghahi S, Cook DJ (2017) A survey of methods for time series change point detection. Knowl Inf Syst 51(2):339–367. https://doi.org/10.1007/s10115-016-0987-z

    Article  Google Scholar 

  3. Aydemir G, Acar B (2020) Anomaly monitoring improves remaining useful life estimation of industrial machinery. J Manuf Syst 56:463–469

  4. Ben Ali J, Saidi L, Harrath S et al (2018) Online automatic diagnosis of wind turbine bearings progressive degradations under real experimental conditions based on unsupervised machine learning. Appl Acoust 132:167–181. https://doi.org/10.1016/j.apacoust.2017.11.021

    Article  Google Scholar 

  5. Canizo M, Onieva E, Conde A et al (2017) Real-time predictive maintenance for wind turbines using big data frameworks. In: 2017 IEEE international conference on prognostics and health management (ICPHM). pp 70–77. https://doi.org/10.1109/ICPHM.2017.7998308

  6. Cao F, Ester M, Qian W et al (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the 6th SIAM international conference on data mining, April 20-22, 2006, Bethesda, MD, USA, https://doi.org/10.1137/1.9781611972764.29

  7. Carnein M, Trautmann H (2019) Optimizing data stream representation: An extensive survey on stream clustering algorithms. Business & Information Systems Engineering 61(3):277–297. https://doi.org/10.1007/s12599-019-00576-5

    Article  Google Scholar 

  8. Carvalho TP, Soares FAAMN, Vita R et al (2019) A systematic literature review of machine learning methods applied to predictive maintenance. Computers & Industrial Engineering 137:106024

  9. Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, San Jose, California, USA, KDD ’07. pp 133–142. https://doi.org/10.1145/1281192.1281210

  10. Davari N, Veloso B, De Assis Costa G et al (2021) A survey on data-driven predictive maintenance for the railway industry. Sensors 21:5739. https://doi.org/10.3390/s21175739

    Article  Google Scholar 

  11. Ester M, Kriegel HP, Sander J et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international conference on knowledge discovery and data mining. AAAI Press, Portland, Oregon, KDD’96, pp 226–231

  12. Feng X, Weng C, He X et al (2019) Online state-of-health estimation for li-ion battery using partial charging segment based on support vector machine. IEEE Transactions on Vehicular Technology 68(9). https://doi.org/10.1109/TVT.2019.2927120

  13. Forestiero A, Pizzuti C, Spezzano G (2009) FlockStream: a bio-inspired algorithm for clustering evolving data streams. In: 2009 21st IEEE international conference on tools with artificial intelligence. pp 1–8, https://doi.org/10.1109/ICTAI.2009.60, iSSN: 2375-0197

  14. Gomes HM, Read J, Bifet A et al (2019) Machine learning for streaming data: state of the art, challenges, and opportunities. ACM SIGKDD Explorations Newsl 21(2):6–22. https://doi.org/10.1145/3373464.3373470

    Article  Google Scholar 

  15. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  16. Inturi V, Shreyas N, Chetti K et al (2021) Comprehensive fault diagnostics of wind turbine gearbox through adaptive condition monitoring scheme. Appl Acoust 174:107738

  17. Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4):422–446. https://doi.org/10.1145/582415.582418

    Article  Google Scholar 

  18. Kranen P, Assent I, Baldauf C et al (2009) Self-adaptive anytime stream clustering. In: 2009 9th IEEE international conference on data mining. pp 249–258. https://doi.org/10.1109/ICDM.2009.47, ISSN: 2374–8486

  19. Le Nguyen MH, Turgis F, Fayemi PE et al (2021) A complete streaming pipeline for real-time monitoring and predictive maintenance. In: Proceedings of the 31st European safety and reliability conference. pp 2119, https://doi.org/10.3850/978-981-18-2016-8_400-cd

  20. Lebold M, Reichard K (2002) OSA-CBM architecture development with emphasis on XML implementations. In: OSA-CBM architecture development with emphasis on XML implementations

  21. Li Y, Li H, Wang Z et al (2020) ESA-Stream: efficient self-adaptive online data stream clustering. IEEE Transactions on Knowledge and Data Engineering pp 1–1. https://doi.org/10.1109/TKDE.2020.2990196, conference Name: IEEE Transactions on Knowledge and Data Engineering

  22. Lin J, Keogh E, Lonardi S et al (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery. Association for Computing Machinery, New York, NY, USA, DMKD ’03, pp 2–11, https://doi.org/10.1145/882082.882086

  23. Liu Lx, Huang H, Guo Yf et al (2009) rDenStream, A clustering algorithm over an evolving data stream. In: 2009 international conference on information engineering and computer science. pp 1–4, 10.1109/ICIECS.2009.5363379, iSSN: 2156-7387

  24. Mitici M, Hennink B, Pavel M et al (2023) Prognostics for Lithium-ion batteries for electric Vertical Take-off and Landing aircraft using data-driven machine learning. Energy and AI 12:100233

  25. Polikar R (2012) Ensemble learning. In: Zhang C, Ma Y (eds) Ensemble machine learning: methods and applications. Springer US, Boston, MA, pp 1–34. https://doi.org/10.1007/978-1-4419-9326-7_1

  26. Putina A, Rossi D (2021) Online anomaly detection leveraging stream-based clustering and real-time telemetry. IEEE Transactions on Network and Service Management 18(1):839–854. https://doi.org/10.1109/TNSM.2020.3037019, conference Name: IEEE Transactions on Network and Service Management

  27. Ren J, Ma R (2009) Density-based data streams clustering over sliding windows. In: 2009 6th international conference on fuzzy systems and knowledge discovery. pp 248–252. https://doi.org/10.1109/FSKD.2009.553

  28. Ribeiro RP, Pereira P, Gama J (2016) Sequential anomalies: a study in the railway industry. Mach Learn 105(1):127–153. https://doi.org/10.1007/s10994-016-5584-6

    Article  MathSciNet  Google Scholar 

  29. Ruiz C, Menasalvas E, Spiliopoulou M (2009) C-DenStream: using domain knowledge on a data stream. In: Gama J, Costa VS, Jorge AM et al (eds) Discovery science. Springer, Berlin, Heidelberg, Lecture Notes in Computer Science, pp 287–301, https://doi.org/10.1007/978-3-642-04747-3_23

  30. Sahal R, Breslin JG, Ali MI (2020) Big data and stream processing platforms for Industry 4.0 requirements mapping for a predictive maintenance use case. J Manuf Syst 54:138–151

  31. Settles B (2009) Active learning literature survey. Technical Report, University of Wisconsin-Madison Department of Computer Sciences

  32. Shen J, Li S, Jia F et al (2020) A deep multi-label learning framework for the intelligent fault diagnosis of machines. IEEE Access 8:113557–113566. https://doi.org/10.1109/ACCESS.2020.3002826, Conference Name: IEEE Access

  33. Su CJ, Huang SF (2018) Real-time big data analytics for hard disk drive predictive maintenance. Computers & Electrical Engineering 71:93–101

  34. Tian H, Khoa NLD, Anaissi A et al (2019) Concept drift adaption for online anomaly detection in structural health monitoring. In: Proceedings of the 28th international conference on information and knowledge management, CIKM ’19. pp 2813–2821, https://doi.org/10.1145/3357384.3357816

  35. Torkamani S, Lohweg V (2017) Survey on time series motif discovery. WIREs Data Mining and Knowledge Discovery 7(2):e1199. https://doi.org/10.1002/widm.1199, https://onlinelibrary.wiley.com/doi/abs/10.1002/widm.1199

  36. Turgis F, Audier P, Nemoz V et al (2022) Health state characterization using clustering algorithms for railway maintenance. In: World Congress on Railway Research 2022, Birmingham, United Kingdom

  37. Zhao R, Yan R, Chen Z et al (2019) Deep learning and its applications to machine health monitoring. Mech Syst Signal Process 115:213–237

  38. Zonta T, da Costa CA, da Rosa Righi R et al (2020) Predictive maintenance in the Industry 4.0: A systematic literature review. Computers & Industrial Engineering 150:106889. https://doi.org/10.1016/j.cie.2020.106889, http://www.sciencedirect.com/science/article/pii/S0360835220305787

  39. Zubaroğlu A, Atalay V (2021) Data stream clustering: a review. Artif Intell Rev 54(2):1201–1236. https://doi.org/10.1007/s10462-020-09874-x

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to the SNCF for providing us with the data used to develop and evaluate our methods.

Funding

Funded by the Association Nationale de la Recherche et de la Technologie (ANRT) de la France.

Author information

Authors and Affiliations

Authors

Contributions

All authors contribute to the manuscript equally.

Corresponding author

Correspondence to Minh-Huong Le-Nguyen.

Ethics declarations

Competing interests

We declare that there is no competing interests among the authors.

Author approbation

All authors have approved the manuscript for submission.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Le-Nguyen, MH., Turgis, F., Fayemi, PE. et al. Exploring the potentials of online machine learning for predictive maintenance: a case study in the railway industry. Appl Intell 53, 29758–29780 (2023). https://doi.org/10.1007/s10489-023-05092-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05092-4

Keywords

Navigation