Abstract
Tandem mass spectrometry is an advanced biochemical analysis method and has been widely used in screening of inherited metabolic disorders (IMDs). Obtained examination results are filtered by cutoff values and then interpreted based on doctor’s knowledge to get diagnoses. However, cutoff-based approaches have difficulties with the correlations of multiple metabolites. Doctor’s experiences affect the diagnostic decision-making as well. The rapidly increasing availability of newborn screening data (1.5M cases in this study) enables the application of machine learning (ML) techniques to provide more accurate diagnoses of IMDs compared to simple cutoff values. We investigated two tasks in this study, i.e. complicated patterns between metabolites and better auxiliary diagnostic means. Experimental results show that novel metabolic patterns found in the study are effective and meaningful. Integrating ML techniques with these patterns improved predictive performance compared to existing diagnostic methods, suggesting ML techniques are becoming valuable as auxiliary diagnostic tools.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)
Baumgartner, C., Böhm, C., Baumgartner, D.: Modelling of classification rules on metabolic patterns including machine learning and expert knowledge. J. Biomed. Inform. 38(2), 89–98 (2005)
Baumgartner, C., et al.: Supervised machine learning techniques for the classification of metabolic disorders in newborns. Bioinformatics 20(17), 2985–2996 (2004)
Van den Bulcke, T., et al.: Data mining methods for classification of medium-chain Acyl-CoA dehydrogenase deficiency (MCADD) using non-derivatized tandem MS neonatal screening data. J. Biomed. Inform. 44(2), 319–325 (2011)
Chace, D., DiPerna, J., Naylor, E.: Laboratory integration and utilization of tandem mass spectrometry in neonatal screening: a model for clinical mass spectrometry in the next millennium. Acta Paediatr. 88, 45–47 (1999)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Gulshan, V., et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316(22), 2402–2410 (2016)
Gurian, E.A., Kinnamon, D.D., Henry, J.J., Waisbren, S.E.: Expanded newborn screening for biochemical disorders: the effect of a false-positive result. Pediatrics 117(6), 1915–1921 (2006)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Hazlett, H.C., et al.: Early brain development in infants at high risk for autism spectrum disorder. Nature 542(7641), 348 (2017)
Iba, W., Langley, P.: Induction of one-level decision trees. In: Machine Learning Proceedings 1992, pp. 233–240. Elsevier (1992)
Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: International Conference on Machine Learning, Nashville, USA, vol. 97, pp. 179–186 (1997)
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)
Millington, D., Kodo, N., Norwood, D., Roe, C.: Tandem mass spectrometry: a new method for acylcarnitine profiling with potential for neonatal screening for inborn errors of metabolism. J. Inherit. Metab. Dis. 13(3), 321–324 (1990)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Tomek, I.: Two modifications of CNN. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)
Venditti, L.N., et al.: Newborn screening by tandem mass spectrometry for medium-chain Acyl-CoA dehydrogenase deficiency: a cost-effectiveness analysis. Pediatrics 112(5), 1005–1015 (2003)
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 3, 408–421 (1972)
Acknowledgement
This work was supported in part by the National Key Research and Development Program of China (Grant No. 2017YFC1001703, 2018YFC1002700), in part by the National Natural Science Foundation of China (Grant No. 61825205, 61772459) and in part by the National Science and Technology Major Project of China (Grant No. 50-D36B02-9002-16/19).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
A MS/MS Biomarkers
A MS/MS Biomarkers
Rights and permissions
Copyright information
© 2019 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Lin, B. et al. (2019). Integration of Machine Learning Techniques as Auxiliary Diagnosis of Inherited Metabolic Disorders: Promising Experience with Newborn Screening Data. In: Wang, X., Gao, H., Iqbal, M., Min, G. (eds) Collaborative Computing: Networking, Applications and Worksharing. CollaborateCom 2019. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 292. Springer, Cham. https://doi.org/10.1007/978-3-030-30146-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-30146-0_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30145-3
Online ISBN: 978-3-030-30146-0
eBook Packages: Computer ScienceComputer Science (R0)