Skip to main content

Enhancing Machine Learning Predictions Through Knowledge Graph Embeddings

  • Conference paper
  • First Online:
Neural-Symbolic Learning and Reasoning (NeSy 2024)

Abstract

Despite their widespread use, machine learning (ML) methods often exhibit sub-optimal performance. The accuracy of these models is primarily hindered by insufficient training data and poor data quality, with particularly severe consequences in critical areas such as medical diagnosis prediction. Our hypothesis is that enhancing ML pipelines with semantic information such as those available in knowledge graphs (KG) can address these challenges and improve ML prediction accuracy. To that end, we extend the state of the art through a novel approach that uses KG embeddings to augment tabular data in various innovative ways within ML pipelines. Concretely, we introduce and examine several integration techniques of KG embeddings and the influence of KG characteristics on model performance, specifically accuracy and F2 scores. We evaluate our approach with four ML algorithms and two embedding techniques, applied to heart and chronic kidney disease prediction. Our results indicate consistent improvements in model performance across various ML models and tasks, thus confirming our hypothesis, e.g. we increased the F2 score for the KNN from 70% to 82.22%, and the F2 score for SVM from 74.53% to 81.71%, for heart disease prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The importance of preprocessing step for achieveing good performance is shown by Hassler et al. [14].

  2. 2.

    https://www.kaggle.com/datasets/johnsmith88/heart-disease-dataset.

  3. 3.

    https://www.kaggle.com/datasets/mansoordaku/ckdisease?select=kidney_disease.csv.

  4. 4.

    https://bioportal.bioontology.org/ontologies/HFO.

  5. 5.

    https://www.snomed.org.

  6. 6.

    https://termbrowser.nhs.ukmar.

References

  1. Alfrjani, R., Osman, T., Cosma, G.: A hybrid semantic knowledgebase-machine learning approach for opinion mining. Data Knowl. Eng. 121, 88–108 (2019)

    Article  Google Scholar 

  2. Ali, L., et al.: An optimized stacked support vector machines based expert system for the effective prediction of heart failure. IEEE Access 7, 54007–54014 (2019)

    Article  Google Scholar 

  3. Bhatt, S., Sheth, A., Shalin, V., Zhao, J.: Knowledge graph semantic enhancement of input data for improving AI. IEEE Internet Comput. 24(2), 66–72 (2020)

    Article  Google Scholar 

  4. Chen, J., Alghamdi, G., Schmidt, R.A., Walther, D., Gao, Y.: Ontology extraction for large ontologies via modularity and forgetting. In: Proceedings of the 10th International Conference on Knowledge Capture, pp. 45–52 (2019)

    Google Scholar 

  5. Chittora, P., et al.: Prediction of chronic kidney disease-a machine learning perspective. IEEE Access 9, 17312–17334 (2021)

    Article  Google Scholar 

  6. Chute, C.G., Çelik, C.: Overview of ICD-11 architecture and structure. BMC Med. Inform. Decis. Mak. 21(6), 1–7 (2021)

    Google Scholar 

  7. Confalonieri, R., Weyde, T., Besold, T.R., del Prado Martín, F.M.: Using ontologies to enhance human understandability of global post-hoc explanations of black-box models. Artif. Intell. 296, 103471 (2021)

    Article  MathSciNet  Google Scholar 

  8. Dash, T., Chitlangia, S., Ahuja, A., Srinivasan, A.: A review of some techniques for inclusion of domain-knowledge into deep neural networks. Sci. Rep. 12(1), 1040 (2022)

    Article  Google Scholar 

  9. El-Sappagh, S., Franda, F., Ali, F., Kwak, K.S.: SNOMED CT standard ontology based on the ontology for general medical science. BMC Med. Inform. Decis. Mak. 18, 1–19 (2018)

    Article  Google Scholar 

  10. Garcez, A.D., Lamb, L.C.: Neurosymbolic AI: the 3rd wave. Artif. Intell. Rev. 1–20 (2023)

    Google Scholar 

  11. Gaur, M., et al.: “Let me tell you about your mental health!" contextualized classification of reddit posts to DSM-5 for web-based intervention. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 753–762 (2018)

    Google Scholar 

  12. Gazzotti, R., Faron-Zucker, C., Gandon, F., Lacroix-Hugues, V., Darmon, D.: Injecting domain knowledge in electronic medical records to improve hospitalization prediction. In: Hitzler, P., et al. (eds.) ESWC 2019. LNCS, vol. 11503, pp. 116–130. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21348-0_8

    Chapter  Google Scholar 

  13. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)

    Google Scholar 

  14. Hassler, A.P., Menasalvas, E., García-García, F.J., Rodríguez-Mañas, L., Holzinger, A.: Importance of medical data preprocessing in predictive modeling and risk factor discovery for the frailty syndrome. BMC Med. Inform. Decis. Mak. 19, 1–17 (2019)

    Article  Google Scholar 

  15. Herron, D., Jiménez-Ruiz, E., Weyde, T.: On the benefits of OWL-based knowledge graphs for neural-symbolic systems. In: Proceedings of the 17th International Workshop on Neural-Symbolic Learning and Reasoning, vol. 3432, pp. 327–335. CEUR Workshop Proceedings (2023)

    Google Scholar 

  16. Hitzler, P., Eberhart, A., Ebrahimi, M., Sarker, M.K., Zhou, L.: Neuro-symbolic approaches in artificial intelligence. Natl. Sci. Rev. 9(6), nwac035 (2022)

    Google Scholar 

  17. Huang, Y.X., et al.: Enabling abductive learning to exploit knowledge graph. In: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pp. 3839–3847 (2023)

    Google Scholar 

  18. Ivanović, M., Budimac, Z.: An overview of ontologies and data resources in medical domains. Expert Syst. Appl. 41(11), 5158–5166 (2014)

    Article  Google Scholar 

  19. Jovic, A., Prcela, M., Gamberger, D.: Ontologies in medical knowledge representation. In: 2007 29th International Conference on Information Technology Interfaces, pp. 535–540. IEEE (2007)

    Google Scholar 

  20. Katarya, R., Meena, S.K.: Machine learning techniques for heart disease prediction: a comparative study and analysis. Heal. Technol. 11, 87–97 (2021)

    Article  Google Scholar 

  21. Kursuncu, U., Gaur, M., Sheth, A.: Knowledge infused learning (k-il): towards deep incorporation of knowledge in deep learning. arXiv preprint arXiv:1912.00512 (2019)

  22. Lehmann, J., et al.: Dbpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant. web 6(2), 167–195 (2015)

    Article  Google Scholar 

  23. Llugiqi, M., Ekaputra, F.J., Sabou, M.: Leveraging knowledge graphs for enhancing machine learning-based heart disease prediction. In: The Knowledge Graphs and Neurosymbolic AI (KG-NeSy) 2024 Workshop co-located with AIRoV – The First Austrian Symposium on AI, Robotics, and Vision (accepted for publication) (2024). https://semantic-systems.org/sites/KG-NeSy/papers/P28.pdf

  24. Mohan, S., Thirumalai, C., Srivastava, G.: Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7, 81542–81554 (2019)

    Article  Google Scholar 

  25. Pisanelli, D.M.: Ontologies in Medicine, vol. 102. IOS press (2004)

    Google Scholar 

  26. Poulinakis, K., Drikakis, D., Kokkinakis, I.W., Spottswood, S.M.: Machine-learning methods on noisy and sparse data. Mathematics 11(1), 236 (2023)

    Article  Google Scholar 

  27. Rady, E.H.A., Anwar, A.S.: Prediction of kidney disease stages using data mining algorithms. Inf. Med. Unlocked 15, 100178 (2019)

    Article  Google Scholar 

  28. Rani, P., Kumar, R., Ahmed, N.M.S., Jain, A.: A decision support system for heart disease prediction based upon machine learning. J. Reliable Intell. Environ. 7(3), 263–275 (2021)

    Article  Google Scholar 

  29. Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., Simperl, E., Gray, A., Sabou, M., Krötzsch, M., Lecue, F., Flöck, F., Gil, Y. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_30

    Chapter  Google Scholar 

  30. Ruiz, C., Ren, H., Huang, K., Leskovec, J.: High dimensional, tabular deep learning with an auxiliary knowledge graph. Adv. Neural Inf. Process. Syst. 36 (2024)

    Google Scholar 

  31. Sarker, M.K., Zhou, L., Eberhart, A., Hitzler, P.: Neuro-symbolic artificial intelligence. AI Commun. 34(3), 197–209 (2021)

    Article  MathSciNet  Google Scholar 

  32. Shah, D., Patel, S., Bharti, S.K.: Heart disease prediction using machine learning techniques. SN Comput. Sci. 1, 1–6 (2020)

    Article  Google Scholar 

  33. Szilagyi, I., Wira, P.: An intelligent system for smart buildings using machine learning and semantic technologies: a hybrid data-knowledge approach. In: 2018 IEEE Industrial Cyber-Physical Systems (ICPS), pp. 20–25. IEEE (2018)

    Google Scholar 

  34. Vijayarani, S., Dhayanand, S., Phil, M.: Kidney disease prediction using SVM and ANN algorithms. Int. J. Comput. Bus. Res. (IJCBR) 6(2), 1–12 (2015)

    Google Scholar 

  35. Yadav, A.L., Soni, K., Khare, S.: Heart diseases prediction using machine learning. In: 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1–7. IEEE (2023)

    Google Scholar 

  36. Yildirim, P.: Chronic kidney disease prediction on imbalanced data by multilayer perceptron: chronic kidney disease prediction. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), vol. 2, pp. 193–198 (2017). https://doi.org/10.1109/COMPSAC.2017.84

  37. Yin, C., Zhao, R., Qian, B., Lv, X., Zhang, P.: Domain knowledge guided deep learning with electronic health records. In: 2019 IEEE International Conference on Data Mining (ICDM), pp. 738–747. IEEE (2019)

    Google Scholar 

  38. Ziegler, K., et al.: Injecting semantic background knowledge into neural networks using graph embeddings. In: 2017 IEEE 26th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 200–205. IEEE (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the FWF HOnEst project (V 745-N), FFG SENSE project (894802) and FAIR-AI project (904624).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Majlinda Llugiqi .

Editor information

Editors and Affiliations

A Appendix: Additional Experimental Analysis

A Appendix: Additional Experimental Analysis

Tables in the appendix depicts the information about the experiment setup.

  • Table 3 shows the details of the ontologies used for heart and kidney disease domain where classes (or concepts) represent distinct groups e.g., ‘Patient’ or ‘Disease’; object properties describe the relationships between two classes e.g., ‘hasSymptom’ connecting ‘Patient’ to ‘Symptom’; and data properties define characteristics or attributes of classes e.g., ‘Patient’s’ age.

  • Table 4 shows the embedding algorithms parameters used. The embeddings are generated with vector sizes of 64, 100, and 128, to ensure a better capture of the semantic knowledge from the KG. The reported results represent the average performance across these three vector dimensions.

  • Table 5 shows the parameters used for the ML models.

Table 3. Details of the ontologies for heart and kidney disease domain.
Table 4. Node2Vec and RDF2Vec parameters for different KGs.
Table 5. Parameter grid for ML methods.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Llugiqi, M., Ekaputra, F.J., Sabou, M. (2024). Enhancing Machine Learning Predictions Through Knowledge Graph Embeddings. In: Besold, T.R., d’Avila Garcez, A., Jimenez-Ruiz, E., Confalonieri, R., Madhyastha, P., Wagner, B. (eds) Neural-Symbolic Learning and Reasoning. NeSy 2024. Lecture Notes in Computer Science(), vol 14979. Springer, Cham. https://doi.org/10.1007/978-3-031-71167-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-71167-1_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-71166-4

  • Online ISBN: 978-3-031-71167-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics