Enhancing Machine Learning Predictions Through Knowledge Graph Embeddings

Llugiqi, Majlinda; Ekaputra, Fajar J.; Sabou, Marta

doi:10.1007/978-3-031-71167-1_15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14979))

Included in the following conference series:

International Conference on Neural-Symbolic Learning and Reasoning

721 Accesses
1 Altmetric

Abstract

Despite their widespread use, machine learning (ML) methods often exhibit sub-optimal performance. The accuracy of these models is primarily hindered by insufficient training data and poor data quality, with particularly severe consequences in critical areas such as medical diagnosis prediction. Our hypothesis is that enhancing ML pipelines with semantic information such as those available in knowledge graphs (KG) can address these challenges and improve ML prediction accuracy. To that end, we extend the state of the art through a novel approach that uses KG embeddings to augment tabular data in various innovative ways within ML pipelines. Concretely, we introduce and examine several integration techniques of KG embeddings and the influence of KG characteristics on model performance, specifically accuracy and F2 scores. We evaluate our approach with four ML algorithms and two embedding techniques, applied to heart and chronic kidney disease prediction. Our results indicate consistent improvements in model performance across various ML models and tasks, thus confirming our hypothesis, e.g. we increased the F2 score for the KNN from 70% to 82.22%, and the F2 score for SVM from 74.53% to 81.71%, for heart disease prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Impact of Clinical Features on Disease Diagnosis Using Knowledge Graph Embedding and Machine Learning: A Detailed Analysis

Multiple disease diagnoses using heterogeneous EHR curated knowledge graph and machine learning models

Article 03 April 2025

A Multi-modal Knowledge Graph Platform Based on Medical Data Lake

Notes

1.
The importance of preprocessing step for achieveing good performance is shown by Hassler et al. [14].
2.
https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset.
3.
https://www.kaggle.com/datasets/mansoordaku/ckdisease?select=kidney_disease.csv.
4.
https://bioportal.bioontology.org/ontologies/HFO.
5.
https://www.snomed.org.
6.
https://termbrowser.nhs.ukmar.

References

Alfrjani, R., Osman, T., Cosma, G.: A hybrid semantic knowledgebase-machine learning approach for opinion mining. Data Knowl. Eng. 121, 88–108 (2019)
Article Google Scholar
Ali, L., et al.: An optimized stacked support vector machines based expert system for the effective prediction of heart failure. IEEE Access 7, 54007–54014 (2019)
Article Google Scholar
Bhatt, S., Sheth, A., Shalin, V., Zhao, J.: Knowledge graph semantic enhancement of input data for improving AI. IEEE Internet Comput. 24(2), 66–72 (2020)
Article Google Scholar
Chen, J., Alghamdi, G., Schmidt, R.A., Walther, D., Gao, Y.: Ontology extraction for large ontologies via modularity and forgetting. In: Proceedings of the 10th International Conference on Knowledge Capture, pp. 45–52 (2019)
Google Scholar
Chittora, P., et al.: Prediction of chronic kidney disease-a machine learning perspective. IEEE Access 9, 17312–17334 (2021)
Article Google Scholar
Chute, C.G., Çelik, C.: Overview of ICD-11 architecture and structure. BMC Med. Inform. Decis. Mak. 21(6), 1–7 (2021)
Google Scholar
Confalonieri, R., Weyde, T., Besold, T.R., del Prado Martín, F.M.: Using ontologies to enhance human understandability of global post-hoc explanations of black-box models. Artif. Intell. 296, 103471 (2021)
Article MathSciNet Google Scholar
Dash, T., Chitlangia, S., Ahuja, A., Srinivasan, A.: A review of some techniques for inclusion of domain-knowledge into deep neural networks. Sci. Rep. 12(1), 1040 (2022)
Article Google Scholar
El-Sappagh, S., Franda, F., Ali, F., Kwak, K.S.: SNOMED CT standard ontology based on the ontology for general medical science. BMC Med. Inform. Decis. Mak. 18, 1–19 (2018)
Article Google Scholar
Garcez, A.D., Lamb, L.C.: Neurosymbolic AI: the 3rd wave. Artif. Intell. Rev. 1–20 (2023)
Google Scholar
Gaur, M., et al.: “Let me tell you about your mental health!" contextualized classification of reddit posts to DSM-5 for web-based intervention. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 753–762 (2018)
Google Scholar
Gazzotti, R., Faron-Zucker, C., Gandon, F., Lacroix-Hugues, V., Darmon, D.: Injecting domain knowledge in electronic medical records to improve hospitalization prediction. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 116–130. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_8
Chapter Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Google Scholar
Hassler, A.P., Menasalvas, E., García-García, F.J., Rodríguez-Mañas, L., Holzinger, A.: Importance of medical data preprocessing in predictive modeling and risk factor discovery for the frailty syndrome. BMC Med. Inform. Decis. Mak. 19, 1–17 (2019)
Article Google Scholar
Herron, D., Jiménez-Ruiz, E., Weyde, T.: On the benefits of OWL-based knowledge graphs for neural-symbolic systems. In: Proceedings of the 17th International Workshop on Neural-Symbolic Learning and Reasoning, vol. 3432, pp. 327–335. CEUR Workshop Proceedings (2023)
Google Scholar
Hitzler, P., Eberhart, A., Ebrahimi, M., Sarker, M.K., Zhou, L.: Neuro-symbolic approaches in artificial intelligence. Natl. Sci. Rev. 9(6), nwac035 (2022)
Google Scholar
Huang, Y.X., et al.: Enabling abductive learning to exploit knowledge graph. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp. 3839–3847 (2023)
Google Scholar
Ivanović, M., Budimac, Z.: An overview of ontologies and data resources in medical domains. Expert Syst. Appl. 41(11), 5158–5166 (2014)
Article Google Scholar
Jovic, A., Prcela, M., Gamberger, D.: Ontologies in medical knowledge representation. In: 2007 29th International Conference on Information Technology Interfaces, pp. 535–540. IEEE (2007)
Google Scholar
Katarya, R., Meena, S.K.: Machine learning techniques for heart disease prediction: a comparative study and analysis. Heal. Technol. 11, 87–97 (2021)
Article Google Scholar
Kursuncu, U., Gaur, M., Sheth, A.: Knowledge infused learning (k-il): towards deep incorporation of knowledge in deep learning. arXiv preprint arXiv:1912.00512 (2019)
Lehmann, J., et al.: Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant. web 6(2), 167–195 (2015)
Article Google Scholar
Llugiqi, M., Ekaputra, F.J., Sabou, M.: Leveraging knowledge graphs for enhancing machine learning-based heart disease prediction. In: The Knowledge Graphs and Neurosymbolic AI (KG-NeSy) 2024 Workshop co-located with AIRoV – The First Austrian Symposium on AI, Robotics, and Vision (accepted for publication) (2024). https://semantic-systems.org/sites/KG-NeSy/papers/P28.pdf
Mohan, S., Thirumalai, C., Srivastava, G.: Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7, 81542–81554 (2019)
Article Google Scholar
Pisanelli, D.M.: Ontologies in Medicine, vol. 102. IOS press (2004)
Google Scholar
Poulinakis, K., Drikakis, D., Kokkinakis, I.W., Spottswood, S.M.: Machine-learning methods on noisy and sparse data. Mathematics 11(1), 236 (2023)
Article Google Scholar
Rady, E.H.A., Anwar, A.S.: Prediction of kidney disease stages using data mining algorithms. Inf. Med. Unlocked 15, 100178 (2019)
Article Google Scholar
Rani, P., Kumar, R., Ahmed, N.M.S., Jain, A.: A decision support system for heart disease prediction based upon machine learning. J. Reliable Intell. Environ. 7(3), 263–275 (2021)
Article Google Scholar
Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., Simperl, E., Gray, A., Sabou, M., Krötzsch, M., Lecue, F., Flöck, F., Gil, Y. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_30
Chapter Google Scholar
Ruiz, C., Ren, H., Huang, K., Leskovec, J.: High dimensional, tabular deep learning with an auxiliary knowledge graph. Adv. Neural Inf. Process. Syst. 36 (2024)
Google Scholar
Sarker, M.K., Zhou, L., Eberhart, A., Hitzler, P.: Neuro-symbolic artificial intelligence. AI Commun. 34(3), 197–209 (2021)
Article MathSciNet Google Scholar
Shah, D., Patel, S., Bharti, S.K.: Heart disease prediction using machine learning techniques. SN Comput. Sci. 1, 1–6 (2020)
Article Google Scholar
Szilagyi, I., Wira, P.: An intelligent system for smart buildings using machine learning and semantic technologies: a hybrid data-knowledge approach. In: 2018 IEEE Industrial Cyber-Physical Systems (ICPS), pp. 20–25. IEEE (2018)
Google Scholar
Vijayarani, S., Dhayanand, S., Phil, M.: Kidney disease prediction using SVM and ANN algorithms. Int. J. Comput. Bus. Res. (IJCBR) 6(2), 1–12 (2015)
Google Scholar
Yadav, A.L., Soni, K., Khare, S.: Heart diseases prediction using machine learning. In: 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–7. IEEE (2023)
Google Scholar
Yildirim, P.: Chronic kidney disease prediction on imbalanced data by multilayer perceptron: chronic kidney disease prediction. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 193–198 (2017). https://doi.org/10.1109/COMPSAC.2017.84
Yin, C., Zhao, R., Qian, B., Lv, X., Zhang, P.: Domain knowledge guided deep learning with electronic health records. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 738–747. IEEE (2019)
Google Scholar
Ziegler, K., et al.: Injecting semantic background knowledge into neural networks using graph embeddings. In: 2017 IEEE 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 200–205. IEEE (2017)
Google Scholar

Download references

Acknowledgements

This work was supported by the FWF HOnEst project (V 745-N), FFG SENSE project (894802) and FAIR-AI project (904624).

Author information

Authors and Affiliations

Vienna University of Economics and Business, Vienna, Austria
Majlinda Llugiqi, Fajar J. Ekaputra & Marta Sabou

Authors

Majlinda Llugiqi
View author publications
You can also search for this author in PubMed Google Scholar
Fajar J. Ekaputra
View author publications
You can also search for this author in PubMed Google Scholar
Marta Sabou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Majlinda Llugiqi .

Editor information

Editors and Affiliations

Sony AI, Barcelona, Spain
Tarek R. Besold
City, University of London, London, UK
Artur d’Avila Garcez
City, University of London, London, UK
Ernesto Jimenez-Ruiz
University of Padova, Padova, Italy
Roberto Confalonieri
City, University of London, London, UK
Pranava Madhyastha
City, University of London, London, UK
Benedikt Wagner

A Appendix: Additional Experimental Analysis

Tables in the appendix depicts the information about the experiment setup.

Table 3 shows the details of the ontologies used for heart and kidney disease domain where classes (or concepts) represent distinct groups e.g., ‘Patient’ or ‘Disease’; object properties describe the relationships between two classes e.g., ‘hasSymptom’ connecting ‘Patient’ to ‘Symptom’; and data properties define characteristics or attributes of classes e.g., ‘Patient’s’ age.
Table 4 shows the embedding algorithms parameters used. The embeddings are generated with vector sizes of 64, 100, and 128, to ensure a better capture of the semantic knowledge from the KG. The reported results represent the average performance across these three vector dimensions.
Table 5 shows the parameters used for the ML models.

Table 3. Details of the ontologies for heart and kidney disease domain.

Full size table

Table 4. Node2Vec and RDF2Vec parameters for different KGs.

Full size table

Table 5. Parameter grid for ML methods.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Llugiqi, M., Ekaputra, F.J., Sabou, M. (2024). Enhancing Machine Learning Predictions Through Knowledge Graph Embeddings. In: Besold, T.R., d’Avila Garcez, A., Jimenez-Ruiz, E., Confalonieri, R., Madhyastha, P., Wagner, B. (eds) Neural-Symbolic Learning and Reasoning. NeSy 2024. Lecture Notes in Computer Science(), vol 14979. Springer, Cham. https://doi.org/10.1007/978-3-031-71167-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-031-71167-1_15
Published: 10 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71166-4
Online ISBN: 978-3-031-71167-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Machine Learning Predictions Through Knowledge Graph Embeddings

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Impact of Clinical Features on Disease Diagnosis Using Knowledge Graph Embedding and Machine Learning: A Detailed Analysis

Multiple disease diagnoses using heterogeneous EHR curated knowledge graph and machine learning models

A Multi-modal Knowledge Graph Platform Based on Medical Data Lake

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix: Additional Experimental Analysis

A Appendix: Additional Experimental Analysis

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us