Abstract
In the last years, one of the main challenges in Industry 4.0 concerns maintenance operations optimization, which has been widely dealt with several predictive maintenance frameworks aiming to jointly reduce maintenance costs and downtime intervals. Nevertheless, the most recent and effective frameworks mainly rely on deep learning models, but their internal representations (black box) are too complex for human understanding making difficult explain their predictions. This issue can be challenged by using eXplainable artificial intelligence (XAI) methodologies, the aim of which is to explain the decisions of data-driven AI models, characterizing the strengths and weaknesses of the decision-making process by making results more understandable by humans. In this paper, we focus on explanation of the predictions made by a recurrent neural networks based model, which requires a tree-dimensional dataset because it exploits spatial and temporal features for estimating remaining useful life (RUL) of hard disk drives (HDDs). In particular, we have analyzed in depth as explanations about RUL prediction provided by different XAI tools, compared using different metrics and showing the generated dashboards, can be really useful for supporting predictive maintenance task by means of both global and local explanations. For this aim, we have realized an explanation framework able to investigate local interpretable model-agnostic explanations (LIME) and SHapley Additive exPlanations (SHAP) tools w.r.t. to the Backblaze Dataset and a long short-term memory (LSTM) prediction model. The achieved results show how SHAP outperforms LIME in almost all the considered metrics, resulting a suitable and effective solution for HDD predictive maintenance applications.























Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arras L, Montavon G, Müller K-R, Samek W (2017) Explaining recurrent neural network predictions in sentiment analysis. ArXiv preprint arXiv:1706.07206.
Asociación Española de Normalización, Génova M (2018) Une-en13306. maintenance. maintenance terminology. standard
Bach S, Binder A, Montavon G, Klauschen F, Müller K, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10:e0130140
Baptista M, Sankararaman S, de Medeiros IP, Nascimento C, Prendinger H, Henriques EMP (2018) Forecasting fault events for predictive maintenance using data-driven techniques and arma modeling. Comput Ind Eng 115:41–53
Baptista ML, Goebel K, Henriques EMP (2022) Relation between prognostics predictor evaluation metrics and local interpretability shap values. Artif Intell 306:103667. https://doi.org/10.1016/j.artint.2022.103667
Barakat NH, Bradley AP (2007) Rule extraction from support vector machines: a sequential covering approach. IEEE Trans Knowl Data Eng 19(6):729–741. https://doi.org/10.1109/TKDE.2007.190610
Barredo Arrieta A, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, Garcia S, Gil-Lopez S, Molina D, Benjamins R, Chatila R, Herrera F (2020) Explainable artificial intelligence (xai): concepts, taxonomies, opportunities and challenges toward responsible ai. Inf Fusion 58:82–115
Basak S, Sengupta S, Dubey A (2019) Mechanisms for integrated feature normalization and remaining useful life estimation using lstms applied to hard-disks. In: 2019 IEEE international conference on smart computing (SMARTCOMP), pp 208–216. https://doi.org/10.1109/SMARTCOMP.2019.00055
Botezatu MM, Giurgiu I, Bogojeska J, Wiesmann D (2016) Predicting disk replacement towards reliable data centers. In: 22nd ACM SIGKDD, pp 39–48
Breiman L, Friedman JH, Olshen RA, Stone CJ (2017) Classification and regression trees. Routledge, London
Che Z, Purushotham S, Khemani R, Liu Y (2017) Interpretable deep models for icu outcome prediction. AMIA Ann Symp Proc 2016:371–380
Chiu M-C, Huang J-H, Gupta S, Akman G (2021) Developing a personalized recommendation system in a smart product service system based on unsupervised learning model. Comput Ind 128:103421. https://doi.org/10.1016/j.compind.2021.103421
Colemen C, Damodaran S, Chandramoulin M, Deuel E (2017) Making maintenance smarter. Deloitte University Press, New York
De Santo A, Galli A, Gravina M, Moscato V, Sperli G (2020) Deep learning for hdd health assessment: an application based on lstm. IEEE Trans Comput. https://doi.org/10.1109/TC.2020.3042053
Deng H (2019) Interpreting tree ensembles with intrees. Int J Data Sci Anal 7(4):277–287. https://doi.org/10.1007/s41060-018-0144-8
Diez-Olivan A, Del Ser J, Galar D, Sierra B (2019) Data fusion and machine learning for industrial prognosis: trends and perspectives towards industry 4.0. Inf Fusion 50:92–111. https://doi.org/10.1016/j.inffus.2018.10.005
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608
Du A, Shen Y, Zhang Q, Tseng L, Aloqaily M (2022) Cracau: byzantine machine learning meets industrial edge computing in industry 5.0. IEEE Trans Industr Inf 18(8):5435–5445. https://doi.org/10.1109/TII.2021.3097072
El Shawi R, Sherif Y, Al-Mallah M, Sakr S (2019) Interpretability in healthcare a comparative study of local machine learning interpretability techniques. In: 2019 IEEE 32nd international symposium on computer-based medical systems (CBMS), pp 275–280. https://doi.org/10.1109/CBMS.2019.00065
Fareri S, Fantoni G, Chiarello F, Coli E, Binda A (2020) Estimating industry 4.0 impact on job profiles and skills using text mining. Comput Ind 118:103222. https://doi.org/10.1016/j.compind.2020.103222
Fu X, Ong C, Keerthi S, Hung GG, Goh L (2004) Extracting the knowledge embedded in support vector machines. In: 2004 IEEE international joint conference on neural networks (IEEE Cat. No.04CH37541), vol. 1, pp 291–296. https://doi.org/10.1109/IJCNN.2004.1379916
Gao Z, Cecati C, Ding SX (2015) A survey of fault diagnosis and fault-tolerant techniques-part i: fault diagnosis with model-based and signal-based approaches. IEEE Trans Ind Electron 62(6):3757–3767
Goldstein A, Kapelner A, Bleich J, Pitkin E (2014) Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J Comput Graph 24(1), 44–65
Goodman B, Flaxman S (2017) European union regulations on algorithmic decision-making and a “right to explanation’’. AI Mag 38(3):50–57
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput Surv. https://doi.org/10.1145/3236009
Hailemariam Y, Yazdinejad A, Parizi RM, Srivastava G, Dehghantanha A (2020) An empirical evaluation of ai deep explainable tools. In: 2020 IEEE Globecom workshops (GC Wkshps, pp 1–6. https://doi.org/10.1109/GCWkshps50303.2020.9367541
Hara S, Hayashi K (2018) Making tree ensembles interpretable: a bayesian model selection approach. In: International conference on artificial intelligence and statistics, pp 77–85. PMLR
Hastie T (2019) Causal interpretations of black-box models. J Bus Econ Stat 2019:1–19. https://doi.org/10.1080/07350015.2019.1624293
Honegger M (2018) Shedding light on black box machine learning algorithms: development of an axiomatic framework to assess the quality of methods that explain individual predictions arXiv preprint arXiv:1808.05054
Huang Y, Wang H, Khajepour A, Ding H, Yuan K, Qin Y (2020) A novel local motion planning framework for autonomous vehicles based on resistance network and model predictive control. IEEE Trans Veh Technol 69(1):55–66. https://doi.org/10.1109/TVT.2019.2945934
Hutson M (2018) AI researchers allege that machine learning is alchemy. Science. https://doi.org/10.1126/science.aau0577
Ignatiev A (2020) Towards trustable explainable ai. In: IJCAI, pp 5154–5158
Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Lavi Y (2018) The rewards and challenges of predictive maintenance. InfoQ
Li Z, Wang Y, Wang K (2019) A deep learning driven method for fault classification and degradation assessment in mechanical equipment. Comput Ind 104:1–10. https://doi.org/10.1016/j.compind.2018.07.002
Li X-H, Cao CC, Shi Y, Bai W, Gao H, Qiu L, Wang C, Gao Y, Zhang S, Xue X, Chen L (2020) A survey of data-driven and knowledge-aware explainable AI. IEEE Trans Knowl Data Eng. https://doi.org/10.1109/TKDE.2020.2983930
Liao L, Köttig F (2016) A hybrid framework combining data-driven and model-based methods for system remaining useful life prediction. Appl Soft Comput 44:191–199. https://doi.org/10.1016/j.asoc.2016.03.013
Lipton ZC (2018) The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3):31–57. https://doi.org/10.1145/3236386.3241340
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, vol. 30, pp 4768–4777. Curran Associates, Inc
Ma S, Chu F (2019) Ensemble deep learning-based fault diagnosis of rotor bearing systems. Comput Ind 105:143–152. https://doi.org/10.1016/j.compind.2018.12.012
Maddikunta PK, Pham QV, Prabadevi B, Deepa N, Dev K, Gadekallu TR, Ruby R, Liyanage M (2022) Industry 5.0: a survey on enabling technologies and potential applications. J Ind Inf Integr 26:100257. https://doi.org/10.1016/j.jii.2021.100257
Malandri L, Mercorio F, Mezzanzanica M, Nobani N (2021) Meet-lm: a method for embeddings evaluation for taxonomic data in the labour market. Comput Ind 124:103341. https://doi.org/10.1016/j.compind.2020.103341
Mao B, Fadlullah ZM, Tang F, Kato N, Akashi O, Inoue T, Mizutani K (2017) Routing or computing? The paradigm shift towards intelligent computer network packet transmission based on deep learning. IEEE Trans Comput 66(11):1946–1960. https://doi.org/10.1109/TC.2017.2709742
Markudova D, Mishra S, Cagliero L, Vassio L, Mellia M, Baralis E, Salvatori L, Loti R (2021) Preventive maintenance for heterogeneous industrial vehicles with incomplete usage data. Comput Ind 130:103468. https://doi.org/10.1016/j.compind.2021.103468
Nahavandi S (2019) Industry 5.0-a human-centric solution. Sustainability 11(16):4371
Ramírez-Durán VJ, Berges I, Illarramendi A (2021) Towards the implementation of industry 4.0: a methodology-based approach oriented to the customer life cycle. Comput Ind 126:103403. https://doi.org/10.1016/j.compind.2021.103403
Ribeiro MT, Singh S, Guestrin C (2016) “Why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16, pp. 1135–1144. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2939672.2939778
Ribeiro MT, Singh S, Guestrin C (2018) Anchors: High-precision model-agnostic explanations. Proceedings of the AAAI conference on artificial intelligence 32(1)
Rosenbaum L, Hinselmann G, Jahn A, Zell A (2011) Interpreting linear support vector machine models with heat map molecule coloring. J Cheminf 3(1):11
Rožanec JM, Novalija I, Zajec P, Kenda K, Tavakoli H, Suh S, Veliou E, Papamartzivanos D, Giannetsos T, Menesidou SA, et al (2022) Human-centric artificial intelligence architecture for industry 5.0 applications. arXiv preprint arXiv:2203.10794
Sánchez-Lozano JM, Rodríguez ON (2020) Application of fuzzy reference ideal method (frim) to the military advanced training aircraft selection. Appl Soft Comput 88:106061. https://doi.org/10.1016/j.asoc.2020.106061
Sankar S, Shaw M, Vaid K, Gurumurthi S (2013) Datacenter scale evaluation of the impact of temperature on hard disk drive failures. ACM TOS 9(2):1–24
Saraswat D, Bhattacharya P, Verma A, Prasad VK, Tanwar S, Sharma G, Bokoro PN, Sharma R (2022) Explainable AI for healthcare 5.0: opportunities and challenges. IEEE Access 10:84486–84517. https://doi.org/10.1109/ACCESS.2022.3197671
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2019) Grad-cam: visual explanations from deep networks via gradient-based localization. Int J Comput Vision 128(2):336–359. https://doi.org/10.1007/s11263-019-01228-7
Serradilla O, Zugasti E, Rodriguez J, Zurutuza U (2022) Deep learning models for predictive maintenance: a survey, comparison, challenges and prospect. Appl Intell 1–31
Shrikumar A, Greenside P, Shcherbina A, Kundaje A (2017) Not just a black box: learning important features through propagating activation differences arXiv preprint arXiv:1605.01713
Silvestri L, Forcina A, Introna V, Santolamazza A, Cesarotti V (2020) Maintenance transformation through industry 4.0 technologies: a systematic literature review. Comput Ind 123:103335. https://doi.org/10.1016/j.compind.2020.103335
Su X, Sperlì G, Moscato V, Picariello A, Esposito C, Choi C (2019) An edge intelligence empowered recommender system enabling cultural heritage applications. IEEE Trans Ind Inf 15(7):4266–4275. https://doi.org/10.1109/TII.2019.2908056
Sun Q, Ge Z (2020) Deep learning for industrial kpi prediction: when ensemble learning meets semi-supervised data. IEEE Trans Ind Inf 17(1):260–269
Sun X, Chakrabarty K, Huang R, Chen Y, Zhao B, Cao H, Han Y, Liang X, Jiang L (2019) System-level hardware failure prediction using deep learning. In: 2019 56th ACM/IEEE design automation conference (DAC), pp 1–6. IEEE
Tjoa E, Guan C (2020) A survey on explainable artificial intelligence (xai): toward medical xai. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/tnnls.2020.3027314
Wisdom S, Powers T, Pitton J, Atlas L (2016) Interpretable recurrent neural networks using sequential sparse recovery arXiv preprint arXiv:1611.07252
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2016) Show, attend and tell: neural image caption generation with visual attention. In International conference on machine learning (pp. 2048–2057) PMLR
Xu X, Lu Y, Vogel-Heuser B, Wang L (2021) Industry 4.0 and industry 5.0-inception, conception and perception. J Manuf Syst 61:530–535. https://doi.org/10.1016/j.jmsy.2021.10.006
Yan R, Chen X, Wang P, Onchis DM (2019) Deep learning for fault diagnosis and prognosis in manufacturing systems. Comput Ind 110:1–2. https://doi.org/10.1016/j.compind.2019.05.002
Zeb S, Mahmood A, Khowaja SA, Dev K, Hassan SA, Qureshi NMF, Gidlund M, Bellavista P (2022) Industry 5.0 is coming: a survey on intelligent nextg wireless networks as technological enablers. arXiv preprint arXiv:2205.09084
Zeiler MD, Taylor GW, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. 2011 international conference on computer vision, 2018–2025
Zhang J, Wang J, He L, Li Z, Philip SY (2018) Layerwise perturbation-based adversarial training for hard drive health degree prediction. In: 2018 IEEE international conference on data mining (ICDM), pp 1428–1433. IEEE
Zhang W, Yang D, Wang H (2019) Data-driven methods for predictive maintenance of industrial equipment: a survey. IEEE Syst J 13(3):2213–2227
Zhu J, Liapis A, Risi S, Bidarra R, Youngblood G (2018) Explainable AI for designers: a human-centered perspective on mixed-initiative co-creation. 2018 IEEE conference on computational intelligence and games (CIG), 1–8
Zilke JR, Mencía EL, Janssen F (2016) Deepred-rule extraction from deep neural networks. International conference on discovery science. Springer, Cham, pp 457–473. https://doi.org/10.1007/978-3-319-46307-0_2
Zonta T, da Costa CA, DA Rosa Righi R, DE Lima MJ, DA Trindade ES, Li GP (2020) Predictive maintenance in the industry 4.0: a systematic literature review. Comput Ind Eng 150:106889. https://doi.org/10.1016/j.cie.2020.106889
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
In this Appendix, we provide further details about our evaluation related to the explanation of HDD health status prediction using SHAP in terms of global and local explanations.
Appendix 1: global explanation
For each class we draw the Dependence Plots of the first four most important features of that class, in which we highlight the interaction effect with the feature we are plotting and another one through the colours (Fig. 24).
If we observe the plot at the bottom left, we can notice that the two samples that have the highest SHAP values are those included between 0.4 and 0.8 as regards the value of Raw Read Error Rate (RRER) and between 0.2 and 0.4 for SER. The contribution of the two samples is 0.020 and 0.025. In particular, the highlighted samples (in yellow) have the same values of both RRER and SER but both have negative SHAP values as shown in Fig. 28.
Appendix 2: local explanation
1.1 Alert
1.1.1 Force plot single element
The Force plot for the individual item shows the features that help push the model output from the base value (the average model output over the training dataset) to the model output. In particular, features that contribute most to the prediction are shown in red while the others in blue.
Figure 29 shows the Force plot for the 8223 sample, in which it is easy to note that TC, RRER and POH are the three main features that contribute positively to the output. In turn, Fig. 30 indicates that TC and POH features influenced the misclassification of the 8371 sample as Alert despite the fact that SER and RRER are rowing in the right direction.
1.1.2 Bar plot single element
The last plot we are going to analyze is the Bar Plot (see Fig. 31a, b), that does not bring more information content w.r.t. the above described plots but it is more easy to analyze. The features are ordered by importance, that are depicted in red or blue if they provide a positive contribution to the model output or not, respectively. In particular, the Shapley value of each features is reported on the x-axis, showing how they impact on the model output.
1.2 Warning
1.2.1 Force plot single element
The Fig. 32 shows how TC, RRER, POH and SER are the most important feature for predicting the warning class whilst the ones, that provides negative contribution, have such a low impact that they are not visible. In turn, RRER, SUT and POH provide negative contribution in order to misclassify 1135 sample as Warning class (see Fig. 33).
1.2.2 Bar plot single element
Figure 34a and b investigates how our model predicts the health level status about two samples, being respectively correctly classified (479) and misclassified (1135), through the bar plots.
1.3 Very fair
1.3.1 Force plot single element
Despite the fact that almost all features (except RRER) contribute positively to prediction, the main important one is undoubtedly SUT, as shown in Fig. 35. Nevertheless, as is even more evident in Fig. 36, TC definitely pushes the model in the wrong direction by classifying the 904 sample as Very Fair.
1.3.2 Bar plot single element
We can observe through the bar plots (Fig. 37a, b) the contribution of each feature in terms of Shapley value.
1.4 Good
1.4.1 Force plot single element
Figure 38 provides a clear picture of which are the main features (reported in red) contributing to the prediction made by the model, in which the main important features are TC, POH, SER and SUT. In Fig. 39, we can clearly understand why the model made a wrong prediction classifying the 7425 sample as belonging to the Good class. The main feature is POH which has a very high value (looking at the decision plot in Fig. 15b) and corresponds to an extremely large SHAP value as we can see from the force plot.
1.4.2 Bar plot single element
We show the Bar Plots of the two samples, 28 correctly classified and 7425 misclassified, in Fig. 40a and b.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ferraro, A., Galli, A., Moscato, V. et al. Evaluating eXplainable artificial intelligence tools for hard disk drive predictive maintenance. Artif Intell Rev 56, 7279–7314 (2023). https://doi.org/10.1007/s10462-022-10354-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-022-10354-7