Abstract
Despite their widespread use, machine learning (ML) methods often exhibit sub-optimal performance. The accuracy of these models is primarily hindered by insufficient training data and poor data quality, with particularly severe consequences in critical areas such as medical diagnosis prediction. Our hypothesis is that enhancing ML pipelines with semantic information such as those available in knowledge graphs (KG) can address these challenges and improve ML prediction accuracy. To that end, we extend the state of the art through a novel approach that uses KG embeddings to augment tabular data in various innovative ways within ML pipelines. Concretely, we introduce and examine several integration techniques of KG embeddings and the influence of KG characteristics on model performance, specifically accuracy and F2 scores. We evaluate our approach with four ML algorithms and two embedding techniques, applied to heart and chronic kidney disease prediction. Our results indicate consistent improvements in model performance across various ML models and tasks, thus confirming our hypothesis, e.g. we increased the F2 score for the KNN from 70% to 82.22%, and the F2 score for SVM from 74.53% to 81.71%, for heart disease prediction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The importance of preprocessing step for achieveing good performance is shown by Hassler et al. [14].
- 2.
- 3.
- 4.
- 5.
- 6.
References
Alfrjani, R., Osman, T., Cosma, G.: A hybrid semantic knowledgebase-machine learning approach for opinion mining. Data Knowl. Eng. 121, 88–108 (2019)
Ali, L., et al.: An optimized stacked support vector machines based expert system for the effective prediction of heart failure. IEEE Access 7, 54007–54014 (2019)
Bhatt, S., Sheth, A., Shalin, V., Zhao, J.: Knowledge graph semantic enhancement of input data for improving AI. IEEE Internet Comput. 24(2), 66–72 (2020)
Chen, J., Alghamdi, G., Schmidt, R.A., Walther, D., Gao, Y.: Ontology extraction for large ontologies via modularity and forgetting. In: Proceedings of the 10th International Conference on Knowledge Capture, pp. 45–52 (2019)
Chittora, P., et al.: Prediction of chronic kidney disease-a machine learning perspective. IEEE Access 9, 17312–17334 (2021)
Chute, C.G., Çelik, C.: Overview of ICD-11 architecture and structure. BMC Med. Inform. Decis. Mak. 21(6), 1–7 (2021)
Confalonieri, R., Weyde, T., Besold, T.R., del Prado MartÃn, F.M.: Using ontologies to enhance human understandability of global post-hoc explanations of black-box models. Artif. Intell. 296, 103471 (2021)
Dash, T., Chitlangia, S., Ahuja, A., Srinivasan, A.: A review of some techniques for inclusion of domain-knowledge into deep neural networks. Sci. Rep. 12(1), 1040 (2022)
El-Sappagh, S., Franda, F., Ali, F., Kwak, K.S.: SNOMED CT standard ontology based on the ontology for general medical science. BMC Med. Inform. Decis. Mak. 18, 1–19 (2018)
Garcez, A.D., Lamb, L.C.: Neurosymbolic AI: the 3rd wave. Artif. Intell. Rev. 1–20 (2023)
Gaur, M., et al.: “Let me tell you about your mental health!" contextualized classification of reddit posts to DSM-5 for web-based intervention. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 753–762 (2018)
Gazzotti, R., Faron-Zucker, C., Gandon, F., Lacroix-Hugues, V., Darmon, D.: Injecting domain knowledge in electronic medical records to improve hospitalization prediction. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 116–130. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_8
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Hassler, A.P., Menasalvas, E., GarcÃa-GarcÃa, F.J., RodrÃguez-Mañas, L., Holzinger, A.: Importance of medical data preprocessing in predictive modeling and risk factor discovery for the frailty syndrome. BMC Med. Inform. Decis. Mak. 19, 1–17 (2019)
Herron, D., Jiménez-Ruiz, E., Weyde, T.: On the benefits of OWL-based knowledge graphs for neural-symbolic systems. In: Proceedings of the 17th International Workshop on Neural-Symbolic Learning and Reasoning, vol. 3432, pp. 327–335. CEUR Workshop Proceedings (2023)
Hitzler, P., Eberhart, A., Ebrahimi, M., Sarker, M.K., Zhou, L.: Neuro-symbolic approaches in artificial intelligence. Natl. Sci. Rev. 9(6), nwac035 (2022)
Huang, Y.X., et al.: Enabling abductive learning to exploit knowledge graph. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp. 3839–3847 (2023)
Ivanović, M., Budimac, Z.: An overview of ontologies and data resources in medical domains. Expert Syst. Appl. 41(11), 5158–5166 (2014)
Jovic, A., Prcela, M., Gamberger, D.: Ontologies in medical knowledge representation. In: 2007 29th International Conference on Information Technology Interfaces, pp. 535–540. IEEE (2007)
Katarya, R., Meena, S.K.: Machine learning techniques for heart disease prediction: a comparative study and analysis. Heal. Technol. 11, 87–97 (2021)
Kursuncu, U., Gaur, M., Sheth, A.: Knowledge infused learning (k-il): towards deep incorporation of knowledge in deep learning. arXiv preprint arXiv:1912.00512 (2019)
Lehmann, J., et al.: Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant. web 6(2), 167–195 (2015)
Llugiqi, M., Ekaputra, F.J., Sabou, M.: Leveraging knowledge graphs for enhancing machine learning-based heart disease prediction. In: The Knowledge Graphs and Neurosymbolic AI (KG-NeSy) 2024 Workshop co-located with AIRoV – The First Austrian Symposium on AI, Robotics, and Vision (accepted for publication) (2024). https://semantic-systems.org/sites/KG-NeSy/papers/P28.pdf
Mohan, S., Thirumalai, C., Srivastava, G.: Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7, 81542–81554 (2019)
Pisanelli, D.M.: Ontologies in Medicine, vol. 102. IOS press (2004)
Poulinakis, K., Drikakis, D., Kokkinakis, I.W., Spottswood, S.M.: Machine-learning methods on noisy and sparse data. Mathematics 11(1), 236 (2023)
Rady, E.H.A., Anwar, A.S.: Prediction of kidney disease stages using data mining algorithms. Inf. Med. Unlocked 15, 100178 (2019)
Rani, P., Kumar, R., Ahmed, N.M.S., Jain, A.: A decision support system for heart disease prediction based upon machine learning. J. Reliable Intell. Environ. 7(3), 263–275 (2021)
Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., Simperl, E., Gray, A., Sabou, M., Krötzsch, M., Lecue, F., Flöck, F., Gil, Y. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_30
Ruiz, C., Ren, H., Huang, K., Leskovec, J.: High dimensional, tabular deep learning with an auxiliary knowledge graph. Adv. Neural Inf. Process. Syst. 36 (2024)
Sarker, M.K., Zhou, L., Eberhart, A., Hitzler, P.: Neuro-symbolic artificial intelligence. AI Commun. 34(3), 197–209 (2021)
Shah, D., Patel, S., Bharti, S.K.: Heart disease prediction using machine learning techniques. SN Comput. Sci. 1, 1–6 (2020)
Szilagyi, I., Wira, P.: An intelligent system for smart buildings using machine learning and semantic technologies: a hybrid data-knowledge approach. In: 2018 IEEE Industrial Cyber-Physical Systems (ICPS), pp. 20–25. IEEE (2018)
Vijayarani, S., Dhayanand, S., Phil, M.: Kidney disease prediction using SVM and ANN algorithms. Int. J. Comput. Bus. Res. (IJCBR) 6(2), 1–12 (2015)
Yadav, A.L., Soni, K., Khare, S.: Heart diseases prediction using machine learning. In: 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–7. IEEE (2023)
Yildirim, P.: Chronic kidney disease prediction on imbalanced data by multilayer perceptron: chronic kidney disease prediction. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 193–198 (2017). https://doi.org/10.1109/COMPSAC.2017.84
Yin, C., Zhao, R., Qian, B., Lv, X., Zhang, P.: Domain knowledge guided deep learning with electronic health records. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 738–747. IEEE (2019)
Ziegler, K., et al.: Injecting semantic background knowledge into neural networks using graph embeddings. In: 2017 IEEE 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 200–205. IEEE (2017)
Acknowledgements
This work was supported by the FWF HOnEst project (V 745-N), FFG SENSE project (894802) and FAIR-AI project (904624).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix: Additional Experimental Analysis
A Appendix: Additional Experimental Analysis
Tables in the appendix depicts the information about the experiment setup.
-
Table 3 shows the details of the ontologies used for heart and kidney disease domain where classes (or concepts) represent distinct groups e.g., ‘Patient’ or ‘Disease’; object properties describe the relationships between two classes e.g., ‘hasSymptom’ connecting ‘Patient’ to ‘Symptom’; and data properties define characteristics or attributes of classes e.g., ‘Patient’s’ age.
-
Table 4 shows the embedding algorithms parameters used. The embeddings are generated with vector sizes of 64, 100, and 128, to ensure a better capture of the semantic knowledge from the KG. The reported results represent the average performance across these three vector dimensions.
-
Table 5 shows the parameters used for the ML models.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Llugiqi, M., Ekaputra, F.J., Sabou, M. (2024). Enhancing Machine Learning Predictions Through Knowledge Graph Embeddings. In: Besold, T.R., d’Avila Garcez, A., Jimenez-Ruiz, E., Confalonieri, R., Madhyastha, P., Wagner, B. (eds) Neural-Symbolic Learning and Reasoning. NeSy 2024. Lecture Notes in Computer Science(), vol 14979. Springer, Cham. https://doi.org/10.1007/978-3-031-71167-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-71167-1_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71166-4
Online ISBN: 978-3-031-71167-1
eBook Packages: Computer ScienceComputer Science (R0)