Machine Learning Data Suitability and Performance Testing Using Fault Injection Testing Framework

Rahal, Manal; Ahmed, Bestoun S.; Samuelsson, Jörgen

doi:10.1007/978-3-031-49252-5_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14390))

Included in the following conference series:

International Conference on Engineering of Computer-Based Systems

486 Accesses

Abstract

Creating resilient machine learning (ML) systems has become necessary to ensure production-ready ML systems that acquire user confidence seamlessly. The quality of the input data and the model highly influence the successful end-to-end testing in data-sensitive systems. However, the testing approaches of input data are not as systematic and are few compared to model testing. To address this gap, this paper presents the Fault Injection for Undesirable Learning in input Data (FIUL-Data) testing framework that tests the resilience of ML models to multiple intentionally-triggered data faults. Data mutators explore vulnerabilities of ML systems against the effects of different fault injections. The proposed framework is designed based on three main ideas: The mutators are not random; one data mutator is applied at an instance of time, and the selected ML models are optimized beforehand. This paper evaluates the FIUL-Data framework using data from analytical chemistry, comprising retention time measurements of anti-sense oligonucleotide. Empirical evaluation is carried out in a two-step process in which the responses of selected ML models to data mutation are analyzed individually and then compared with each other. The results show that the FIUL-Data framework allows the evaluation of the resilience of ML models. In most experiments cases, ML models show higher resilience at larger training datasets, where gradient boost performed better than support vector regression in smaller training sets. Overall, the mean squared error metric is useful in evaluating the resilience of models due to its higher sensitivity to data mutation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Selecting fault revealing mutants

Article Open access 18 December 2019

Challenges in Testing Big Data Systems

Evaluation of the Prediction-Based Approach to Cost Reduction in Mutation Testing

References

Chapter 16 quantitative analysis by gas chromatography sources of errors, accuracy and precision of chromatographic measurements. In: Guiochon, G., Guillemin, C.L. (eds.) For Laboratory Analyses and On-Line Process Control, Journal of Chromatography Library, vol. 42, pp. 661–687. Elsevier (1988). https://doi.org/10.1016/S0301-4770(08)70088-5
Breck, E., Cai, S., Nielsen, E., Salib, M., Sculley, D.: The ml test score: a rubric for ml production readiness and technical debt reduction. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 1123–1132 (2017). https://doi.org/10.1109/BigData.2017.8258038
D’Archivio, A.: Artificial neural network prediction of retention of amino acids in reversed-phase HPLC under application of linear organic modifier gradients and/or pH gradients. Molecules 24(3), 632 (2019). https://doi.org/10.3390/molecules24030632, https://www.mdpi.com/1420-3049/24/3/632
Enmark, M., Häggström, J., Samuelsson, J., Fornstedt, T.: Building machine-learning-based models for retention time and resolution predictions in ion pair chromatography of oligonucleotides. J. Chromatogr. A 1671, 462999 (2022). https://doi.org/10.1016/j.chroma.2022.462999
Article Google Scholar
Fornstedt, T., Forssén, P., Westerlund, D.: Basic HPLC theory and definitions: retention, thermodynamics, selectivity, zone spreading, kinetics, and resolution. Anal. Sep. Sci. 5 Vol. Set 2, 1–22 (2015). https://doi.org/10.1002/9783527678129.assep001
Gangolli, A., Mahmoud, Q.H., Azim, A.: A systematic review of fault injection attacks on IoT systems. Electronics 11(13), 2023 (2022). https://doi.org/10.3390/electronics11132023, https://www.mdpi.com/2079-9292/11/13/2023
Ghiduk, A.S., Girgis, M.R., Shehata, M.H.: Higher order mutation testing: a systematic literature review. Comput. Sci. Rev. 25, 29–48 (2017). https://doi.org/10.1016/j.cosrev.2017.06.001
Article MathSciNet Google Scholar
Hellier, E., Edworthy, J., Lee, A.: An analysis of human error in the analytical measurement task in chemistry. Int. J. Cogn. Ergon. 5(4), 445–458 (2001). https://doi.org/10.1207/S15327566IJCE0504_5
Article Google Scholar
Jha, S., Banerjee, S.S., Cyriac, J., Kalbarczyk, Z.T., Iyer, R.K.: AVFI: fault injection for autonomous vehicles. In: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), pp. 55–56. IEEE Computer Society (2018). https://doi.org/10.1109/DSN-W.2018.00027
Kaiser, R.E.: Errors in chromatography. Chromatographia 4, 479–490 (1971). https://doi.org/10.1007/BF02268820
Article Google Scholar
Katzir, Z., Elovici, Y.: Quantifying the resilience of machine learning classifiers used for cyber security. Expert Syst. Appl. 92, 419–429 (2018). https://doi.org/10.1016/j.eswa.2017.09.053
Article Google Scholar
Kohlbacher, O., Quinten, S., Strum, M., Mayr, B.M., Huber, C.G.: Structure-activity relationships in chromatography: retention prediction of oligonucleotides with support vector regression. Angew. Chem. Int. Ed. Engl. 45(42), 7009–7012 (2006). https://api.semanticscholar.org/CorpusID:33345638
Korany, M.A., Mahgoub, H., Fahmy, O.T., Maher, H.M.: Application of artificial neural networks for response surface modelling in HPLC method development. J. Adv. Res. 3(1), 53–63 (2012)
Article Google Scholar
Kuselman, I., et al.: House-of-security approach to measurement in analytical chemistry: quantification of human error using expert judgments. Accred. Qual. Assur. 18(6), 459–467 (2013). https://doi.org/10.1007/s00769-013-1020-9
Article Google Scholar
Kuselman, I., Pennecchi, F., Fajgelj, A., Karpov, Y.: Human errors and reliability of test results in analytical chemistry. Accred. Qual. Assur. 18, 3–9 (2013). https://doi.org/10.1007/s00769-012-0934-y
Article Google Scholar
Lotfi, R., Gholamrezaei, A., Kadłubek, M., Afshar, M., Ali, S.S., Kheiri, K.: A robust and resilience machine learning for forecasting agri-food production. Sci. Rep. 12(1), 21787 (2022). https://doi.org/10.1038/s41598-022-26449-8
Article Google Scholar
Lu, Y., Sun, W., Sun, M.: Towards mutation testing of reinforcement learning systems. J. Syst. Architect. 131, 102701 (2022). https://doi.org/10.1007/978-3-030-91265-9_8
Article Google Scholar
Ma, L., et al.: DeepMutation: mutation testing of deep learning systems. In: 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE), pp. 100–111. IEEE Computer Society, Los Alamitos, CA, USA (2018). https://doi.org/10.48550/arXiv.1805.05206
Narayanan, N., Pattabiraman, K.: TF-DM: tool for studying ml model resilience to data faults. In: 2021 IEEE/ACM Third International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest), pp. 25–28. IEEE Computer Society, Los Alamitos, CA, USA (2021). https://doi.org/10.1109/DeepTest52559.2021.00010
Nurminen, J.K., et al.: Software framework for data fault injection to test machine learning systems. In: 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp. 294–299 (2019). https://doi.org/10.1109/ISSREW.2019.00087
Papadakis, M., Kintis, M., Zhang, J., Jia, Y., Traon, Y.L., Harman, M.: Chapter six - mutation testing advances: an analysis and survey. In: Memon, A.M. (ed.) Advances in Computers, Advances in Computers, vol. 112, pp. 275–378. Elsevier (2019). https://doi.org/10.1016/bs.adcom.2018.03.015
Petritis, K., et al.: Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. Anal. Chem. 75(5), 1039–1048 (2003). https://doi.org/10.1021/ac0205154
Article Google Scholar
Riccio, V., Jahangirova, G., Stocco, A., Humbatova, N., Weiss, M., Tonella, P.: Testing machine learning based systems: a systematic mapping. Empir. Softw. Eng. 25(6), 5193–5254 (2020). https://doi.org/10.1007/s10664-020-09881-0
Article Google Scholar
Risum, A.B., Bro, R.: Using deep learning to evaluate peaks in chromatographic data. Talanta 204, 255–260 (2019). https://doi.org/10.1016/j.talanta.2019.05.053
Article Google Scholar
Sturm, M., Quinten, S., Huber, C.G., Kohlbacher, O.: A statistical learning approach to the modeling of chromatographic retention of oligonucleotides incorporating sequence and secondary structure data. Nucleic Acids Res. 35(12), 4195–4202 (2007). https://doi.org/10.1093/nar/gkm338
Article Google Scholar
Tambon, F., Khomh, F., Antoniol, G.: A probabilistic framework for mutation testing in deep neural networks. Inf. Softw. Technol. 155(C), 107129 (2023). https://doi.org/10.1016/j.infsof.2022.107129
Tran, A., Hyne, R., Pablo, F., Day, W., Doble, P.: Optimisation of the separation of herbicides by linear gradient high performance liquid chromatography utilising artificial neural networks. Talanta 71(3), 1268–1275 (2007). https://doi.org/10.1016/j.talanta.2006.06.031
Vairo, T., Pettinato, M., Reverberi, A.P., Milazzo, M.F., Fabiano, B.: An approach towards the implementation of a reliable resilience model based on machine learning. Process Saf. Environ. Prot. 172, 632–641 (2023). https://doi.org/10.1016/j.psep.2023.02.058
Article Google Scholar
Webb, R., Doble, P., Dawson, M.: Optimisation of HPLC gradient separations using artificial neural networks (ANNs): application to benzodiazepines in post-mortem samples. J. Chromatogr. B 877(7), 615–620 (2009). https://doi.org/10.1016/j.jchromb.2009.01.012
Article Google Scholar
Zhang, J.M., Harman, M., Ma, L., Liu, Y.: Machine learning testing: survey, landscapes and horizons. IEEE Trans. Software Eng. 48(1), 1–36 (2022). https://doi.org/10.1109/TSE.2019.2962027
Article Google Scholar
Zheng, A., Casari, A.: Feature Engineering for Machine Learning. O’Reilly Media, Inc. (2018)
Google Scholar
Zhu, Q., Panichella, A., Zaidman, A.: A systematic literature review of how mutation testing supports quality assurance processes. Softw. Test. Verification and Reliab. 28(6), e1675 (2018). https://doi.org/10.1002/stvr.1675
Article Google Scholar

Download references

Acknowledgements

This work has been funded by the Knowledge Foundation of Sweden (KKS) through the Synergy project - Improved Methods for Process and Quality Controls using Digital Tools (IMPAQCDT) grant number (20210021). In this project, we acknowledge Gergely Szabados, Jakob Häggström, and Patrik Forssén from the Department of Engineering and Chemical Sciences/Chemistry at Karlstad University for their contribution to the acquisition and preprocessing of data.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Karlstad University, Karlstad, Sweden
Manal Rahal & Bestoun S. Ahmed
Department of Computer Science, Faculty of Electrical Engineering, Czech Technical University in Prague, 16627, Prague, Czech Republic
Bestoun S. Ahmed
Department of Engineering and Chemical Sciences, Karlstad University, Karlstad, Sweden
Jörgen Samuelsson

Authors

Manal Rahal
View author publications
You can also search for this author in PubMed Google Scholar
Bestoun S. Ahmed
View author publications
You can also search for this author in PubMed Google Scholar
Jörgen Samuelsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bestoun S. Ahmed .

Editor information

Editors and Affiliations

Charles University, Praha, Czech Republic
Jan Kofroň
University of Limerick, Limerick, Ireland
Tiziana Margaria
Mälardalen University, Västerås, Sweden
Cristina Seceleanu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rahal, M., Ahmed, B.S., Samuelsson, J. (2024). Machine Learning Data Suitability and Performance Testing Using Fault Injection Testing Framework. In: Kofroň, J., Margaria, T., Seceleanu, C. (eds) Engineering of Computer-Based Systems. ECBS 2023. Lecture Notes in Computer Science, vol 14390. Springer, Cham. https://doi.org/10.1007/978-3-031-49252-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-49252-5_5
Published: 29 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49251-8
Online ISBN: 978-3-031-49252-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Machine Learning Data Suitability and Performance Testing Using Fault Injection Testing Framework