Skip to main content

Challenges for Healthcare Data Analytics Over Knowledge Graphs

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems LIV

Abstract

Over the past decade, the volume of data has experienced a significant increase, and this growth is projected to accelerate in the coming years. Within the healthcare sector, various methods (such as liquid biopsies, medical images, and genome sequencing) generate substantial amounts of data, which can lead to the discovery of new biomarkers. Analyzing big data in healthcare holds the potential to advance precise diagnostics and effective treatments. However, healthcare data faces several complexity challenges, including volume, variety, and veracity, which necessitate innovative techniques for data management and knowledge discovery to ensure accurate insights and informed decision-making. This paper summarizes the results presented in the invited talk at BDA 2022 and addresses these challenges by proposing a knowledge-driven framework able to handle complexity issues associated with big data and their impact on analytics. In particular, we propose the use of Knowledge Graphs (KGs) as data structures that enable the integration of diverse healthcare data and facilitate the merging of data with ontologies that describe their meaning. We show the benefits of leveraging KGs to uncover patterns and associations among entities. Specifically, we illustrate the application of rule mining tasks that enhance the understanding of the role of biomarkers and previous cancers in lung cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Unified Medical Language System https://www.nlm.nih.gov/research/umls/index.html.

  2. 2.

    https://www.w3.org/TR/shacl/.

  3. 3.

    https://www.w3.org/TR/vocab-dcat-3/.

  4. 4.

    https://www.clarify2020.eu/.

  5. 5.

    https://go.drugbank.com/.

  6. 6.

    https://www.ncbi.nlm.nih.gov/mesh/.

  7. 7.

    https://labs.tib.eu/sdm/clarify_mappings_and_ontology/sparql.

  8. 8.

    https://www.ontotext.com/products/graphdb/.

  9. 9.

    https://rdf4j.org/documentation/programming/federation/.

References

  1. Acosta, M., Vidal, M.E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints. In: The Semantic Web - ISWC 2011–10th International Semantic Web Conference, Bonn, Germany, October 23–27, 2011, Proceedings, Part I. Lecture Notes in Computer Science, vol. 7031, pp. 18–34. Springer (2011). https://doi.org/10.1007/978-3-642-25073-6_2, https://doi.org/10.1007/978-3-642-25073-6_2

  2. Acosta, M., Vidal, M.E., Sure-Vetter, Y.: Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches. In: The Semantic Web - ISWC 2017. pp. 3–19. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_1

  3. Aisopos, F., Jozashoori, S., Niazmand, E., Purohit, D., Rivas, A., Sakor, A., Iglesias, E., Vogiatzis, D., Menasalvas, E., González, A.R., Vigueras, G., Gómez-Bravo, D., Torrente, M., López, R.H., Pulla, M.P., Dalianis, A., Triantafillou, A., Paliouras, G., Vidal, M.E.: Knowledge Graphs for Enhancing Transparency in Health Data Ecosystems. Semantic Web 14(5), 943–976 (2023). https://doi.org/10.3233/SW-223294

    Article  Google Scholar 

  4. Angell, R., Monath, N., Mohan, S., Yadav, N., McCallum, A.: Clustering-based inference for biomedical entity linking. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 2598–2608 (2021). https://doi.org/10.18653/v1/2021.naacl-main.205

  5. Arenas-Guerrero, J., Scrocca, M., Iglesias-Molina, A., Toledo, J., Pozo-Gilo, L., Doña, D., Corcho, Ó., Chaves-Fraga, D.: Knowledge Graph Construction with R2RML and RML: An ETL System-based Overview. In: Proceedings of the 2nd International Workshop on Knowledge Graph Construction co-located with 18th Extended Semantic Web Conference (ESWC 2021), Online, June 6, 2021. CEUR Workshop Proceedings, vol. 2873. CEUR-WS.org (2021), https://ceur-ws.org/Vol-2873/paper11.pdf

  6. Arenas-Guerrero, J., Chaves-Fraga, D., Toledo, J., Pérez, M.S., Corcho, O.: Morph-KGC: Scalable knowledge graph materialization with mapping partitions. Semantic Web (2022). https://doi.org/10.3233/SW-223135

  7. Badenes-Olmedo, C., Chaves-Fraga, D., Poveda-Villalón, M., Iglesias-Molina, A., Calleja, P., Bernardos, S., Martín-Chozas, P., Fernández-Izquierdo, A., Amador-Domínguez, E., Espinoza-Arias, P., Pozo-Gilo, L., Ruckhaus, E., González-Guardia, E., Cedazo, R., López-Centeno, B., Corcho, Ó.: Drugs4Covid: Drug-driven Knowledge Exploitation based on Scientific Publications. CoRR abs/2012.01953 (2020)

    Google Scholar 

  8. Barroca, J., Shivkumar, A., Ferreira, B.Q., Sherkhonov, E., Faria, J.: Enriching a Fashion Knowledge Graph from Product Textual Descriptions. arXiv preprint arXiv:2206.01087 (2022)

  9. Beer, A., Brunet, M., Srivastava, V., Vidal, M.E.: Leibniz Data Manager - A Research Data Management System. In: The Semantic Web: ESWC 2022 Satellite Events - Hersonissos, Crete, Greece, May 29 - June 2, 2022, Proceedings. Lecture Notes in Computer Science, vol. 13384, pp. 73–77. Springer (2022). https://doi.org/10.1007/978-3-031-11609-4_14, https://doi.org/10.1007/978-3-031-11609-4_14

  10. Benítez-Andrades, J.A., García-Ordás, M.T., Russo, M., Sakor, A., Fernandes, L.D., Vidal, M.E.: Empowering Machine Learning Models with Contextual Knowledge for Enhancing the Detection of Eating Disorders in Social Media Posts. Semantic Web 14(5), 873–892 (2023). https://doi.org/10.3233/SW-223269, https://doi.org/10.3233/SW-223269

  11. Chandak, P., Huang, K., Zitnik, M.: Building a Knowledge Graph to enable Precision Medicine. Sci Data 10(67) (2023). https://doi.org/10.1038/s41597-023-01960-3

  12. Collarana, D., Galkin, M., Ribón, I.T., Lange, C., Vidal, M.E., Auer, S.: Semantic Data Integration for Knowledge Graph Construction at Query Time. In: 11th IEEE International Conference on Semantic Computing, ICSC 2017. pp. 109–116 (2017). https://doi.org/10.1109/ICSC.2017.85

  13. Collarana, D., Galkin, M., Traverso-Ribón, I., Vidal, M.E., Lange, C., Auer, S.: MINTE: semantically integrating RDF graphs. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (2017). https://doi.org/10.1145/3102254.3102280, https://doi.org/10.1145/3102254.3102280

  14. Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF Mapping Language, W3C Recommendation 27 September 2012. W3C (2012), http://www.w3.org/TR/r2rml/

  15. Dimou, A.: Creation of Knowledge Graphs. In: Knowledge Graphs and Big Data Processing, Lecture Notes in Computer Science, vol. 12072, pp. 59–72. Springer (2020). https://doi.org/10.1007/978-3-030-53199-7_4, https://doi.org/10.1007/978-3-030-53199-7_4

  16. Dimou, A., Nies, T.D., Verborgh, R., Mannens, E., de Walle, R.V.: Automated Metadata Generation for Linked Data Generation and Publishing Workflows. In: Proceedings of the Workshop on Linked Data on the Web, LDOW 2016, co-located with 25th International World Wide Web Conference (WWW 2016). CEUR Workshop Proceedings, vol. 1593. CEUR-WS.org (2016)

    Google Scholar 

  17. Dimou, A., Sande, M.V., Colpaert, P., Verborgh, R., Mannens, E., de Walle, R.V.: RML: A generic language for integrated RDF mappings of heterogeneous data. In: Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea, April 8, 2014. CEUR Workshop Proceedings, vol. 1184. CEUR-WS.org (2014), https://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf

  18. Doan, A., Halevy, A.Y., Ives, Z.G.: Principles of Data Integration. Morgan Kaufmann (2012), http://research.cs.wisc.edu/dibook/

  19. Endris, K.M., Galkin, M., Lytra, I., Mami, M.N., Maria-Esther, V., Auer, S.: Querying Interlinked Data by Bridging RDF Molecule Templates. In: Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIX, vol. 11310, pp. 1–42. Springer, Berlin, Heidelberg (Nov 2018). https://doi.org/10.1007/978-3-662-58415-6_1

  20. Endris, K.M., Rohde, P.D., Vidal, M.E., Auer, S.: Ontario: Federated Query Processing against a Semantic Data Lake. In: International Conference on Database and Expert Systems Applications. pp. 379–395. Springer (2019). https://doi.org/10.1007/978-3-030-27615-7_29, https://doi.org/10.1007/978-3-030-27615-7_29

  21. Endris, K.M., Vidal, M.E., Graux, D.: Federated Query Processing. In: Knowledge Graphs and Big Data Processing, Lecture Notes in Computer Science, vol. 12072, pp. 73–86. Springer (2020). https://doi.org/10.1007/978-3-030-53199-7_5, https://doi.org/10.1007/978-3-030-53199-7_5

  22. Figuera, M., Rohde, P.D., Vidal, M.E.: Trav-SHACL: Efficiently Validating Networks of SHACL Constraints. In: The Web Conference. pp. 3337–3348. ACM, New York, NY, USA (2021). https://doi.org/10.1145/3442381.3449877

  23. Fu, F., Deng, C., Sun, W., et al.: Distribution and concordance of PD-L1 expression by routine 22C3 assays in East-Asian patients with non-small cell lung cancer. Respir Res 23(302) (2022). https://doi.org/10.1186/s12931-022-02201-8

  24. Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.: Fast Rule Mining in Ontological Knowledge Bases with AMIE+. The VLDB Journal (2015), https://hal-imt.archives-ouvertes.fr/hal-01699866

  25. Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.M.: AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May 13–17, 2013. pp. 413–422. International World Wide Web Conferences Steering Committee / ACM (2013). https://doi.org/10.1145/2488388.2488425, https://doi.org/10.1145/2488388.2488425

  26. Gatalica, Z., Senarathne, J., Vranic, S.: PD-L1 expression patterns in the metastatic tumors to the lung: A comparative study with the primary non-small cell lung cancer. Ann. Oncol. 28(suppl_2), ii52 (2017). https://doi.org/10.1093/annonc/mdx094.003

  27. Geisler, S., Vidal, M.E., Cappiello, C., Lóscio, B.F., Gal, A., Jarke, M., Lenzerini, M., Missier, P., Otto, B., Paja, E., Pernici, B., Rehof, J.: Knowledge-Driven Data Ecosystems Toward Data Transparency. ACM J. Data Inf. Qual. 14(1), 3:1–3:12 (2022). https://doi.org/10.1145/3467022, https://doi.org/10.1145/3467022

  28. Golshan, B., Halevy, A.Y., Mihaila, G.A., Tan, W.: Data Integration: After the Teenage Years. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14–19, 2017. pp. 101–106 (2017). https://doi.org/10.1145/3034786.3056124, https://doi.org/10.1145/3034786.3056124

  29. Gries, D., Schneider, F.B.: A Logical Approach to Discrete Math. Texts and Monographs in Computer Science, Springer (1993). https://doi.org/10.1007/978-1-4757-3837-7, https://doi.org/10.1007/978-1-4757-3837-7

  30. Gu, Z., Corcoglioniti, F., Lanti, D., Mosca, A., Xiao, G., Xiong, J., Calvanese, D.: A systematic overview of data federation systems. Semant. Web pp. 1–59 (2022)

    Google Scholar 

  31. Ha, S., Choi, S., Cho, J., Choi, H., Lee, J., Jung, K., Irwin, D., Liu, X., Lira, M., Mao, M., Kim, H., Choi, Y., Shim, Y., Park, W., Choi, Y., Kim, J.: Lung cancer in never-smoker asian females is driven by oncogenic mutations, most often involving EGFR. Oncotarget 10(7) (2015). https://doi.org/10.18632/oncotarget.2925

  32. Halevy, A.Y.: Information integration. In: Encyclopedia of Database Systems, Second Edition. Springer (2018). https://doi.org/10.1007/978-1-4614-8265-9_1069, https://doi.org/10.1007/978-1-4614-8265-9_1069

  33. Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data Integration: The Teenage Years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12–15, 2006. pp. 9–16 (2006)

    Google Scholar 

  34. Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., de Melo, G., Gutierrez, C., Kirrane, S., Gayo, J.E.L., Navigli, R., Neumaier, S., Ngomo, A.N., Polleres, A., Rashid, S.M., Rula, A., Schmelzeisen, L., Sequeda, J., Staab, S., Zimmermann, A.: Knowledge Graphs. Synthesis Lectures on Data, Semantics, and Knowledge, Morgan & Claypool Publishers (2021). https://doi.org/10.2200/S01125ED1V01Y202109DSK022, https://doi.org/10.2200/S01125ED1V01Y202109DSK022

  35. Hulsen, T., Jamuar, S.S., Moody, A.R., Karnes, J.H., Varga, O., Hedensted, S., Spreafico, R., Hafler, D.A., McKinney, E.F.: From Big Data to Precision Medicine. Frontiers in Medicine 6 (2019). https://doi.org/10.3389/fmed.2019.00034, https://www.frontiersin.org/articles/10.3389/fmed.2019.00034

  36. Iglesias, E., Jozashoori, S., Chaves-Fraga, D., Collarana, D., Vidal, M.E.: SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. In: CIKM ’20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19–23, 2020. pp. 3039–3046. ACM (2020). https://doi.org/10.1145/3340531.3412881, https://doi.org/10.1145/3340531.3412881

  37. Iglesias, E., Jozashoori, S., Vidal, M.E.: Scaling up Knowledge Graph Creation to Large and Heterogeneous Data Sources. J. Web Semant. 75, 100755 (2023). https://doi.org/10.1016/j.websem.2022.100755, https://doi.org/10.1016/j.websem.2022.100755

  38. Janev, V., Vidal, M.E., Pujić, D., Popadić, D., Iglesias, E., Sakor, A., Čampa, A.: Responsible Knowledge Management in Energy Data Ecosystems. Energies 15(11) (2022). https://doi.org/10.3390/en15113973

  39. Jozashoori, S., Sakor, A., Iglesias, E., Vidal, M.E.: EABlock: a declarative entity alignment block for knowledge graph creation pipelines. In: SAC ’22: The 37th ACM/SIGAPP Symposium on Applied Computing, Virtual Event, April 25–29, 2022. pp. 1908–1916. ACM (2022). https://doi.org/10.1145/3477314.3507132, https://doi.org/10.1145/3477314.3507132

  40. Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.: SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics 28(23) (2012). https://doi.org/10.1093/bioinformatics/bts591

  41. Krithara, A., Aisopos, F., Rentoumi, V., Nentidis, A., Bougiatiotis, K., Vidal, M.E., Menasalvas, E., González, A.R., Samaras, E., Garrard, P., Torrente, M., Pulla, M.P., Dimakopoulos, N., Mauricio, R., Argila, J.R.D., Tartaglia, G.G., Paliouras, G.: iASiS: Towards Heterogeneous Big Data Analysis for Personalized Medicine. In: 32nd IEEE International Symposium on Computer-Based Medical Systems, CBMS 2019. pp. 106–111. IEEE (2019). https://doi.org/10.1109/CBMS.2019.00032, https://doi.org/10.1109/CBMS.2019.00032

  42. Lajus, J., Galárraga, L., Suchanek, F.M.: Fast and Exact Rule Mining with AMIE 3. In: The Semantic Web - 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31-June 4, 2020, Proceedings. Lecture Notes in Computer Science, vol. 12123, pp. 36–52. Springer (2020). https://doi.org/10.1007/978-3-030-49461-2_3, https://doi.org/10.1007/978-3-030-49461-2_3

  43. Lefrançois, M., Zimmermann, A., Bakerally, N.: A SPARQL extension for generating RDF from heterogeneous formats. In: European Semantic Web Conference. pp. 35–50. Springer (2017). https://doi.org/10.1007/978-3-319-58068-5_3, https://doi.org/10.1007/978-3-319-58068-5_3

  44. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web Journal (2015). https://doi.org/10.3233/SW-140134

    Article  Google Scholar 

  45. Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3–5, Madison, Wisconsin, USA. pp. 233–246. ACM (2002). https://doi.org/10.1145/543613.543644, https://doi.org/10.1145/543613.543644

  46. Li, J., Sun, Y., Johnson, R.J., Sciaky, D., Wei, C.H., Leaman, R., Davis, A.P., Mattingly, C.J., Wiegers, T.C., Lu, Z.: Biocreative v cdr task corpus: a resource for chemical disease relation extraction. Database (Oxford) 2016 (2016). https://doi.org/10.1093/database/baw068

  47. Meilicke, C., Chekol, M.W., Ruffinelli, D., Stuckenschmidt, H.: Anytime Bottom-Up Rule Learning for Knowledge Graph Completion. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10–16, 2019. pp. 3137–3143. ijcai.org (2019). https://doi.org/10.24963/ijcai.2019/435, https://doi.org/10.24963/ijcai.2019/435

  48. Mohan, S., Li, D.: MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts. In: Automated Knowledge Base Construction (AKBC) (2019). https://doi.org/10.24432/C5G59C, https://doi.org/10.24432/C5G59C

  49. Montoya, G., Vidal, M.E., Corcho, O., Ruckhaus, E., Buil-Aranda, C.: Benchmarking federated SPARQL query engines: Are existing testbeds enough? In: International Semantic Web Conference. pp. 313–324. Springer (2012). https://doi.org/10.1007/978-3-642-35173-0_21, https://doi.org/10.1007/978-3-642-35173-0_21

  50. Mountantonakis, M.: Large scale services for connecting and integrating hundreds of linked datasets. SIGWEB Newsl. 2021(Autumn), 3:1–3:4 (2021). https://doi.org/10.1145/3494825.3494828, https://doi.org/10.1145/3494825.3494828

  51. Mountantonakis, M., Tzitzikas, Y.: Large-scale Semantic Integration of Linked Data: A Survey. ACM Comput. Surv. 52(5), 103:1–103:40 (2019). https://doi.org/10.1145/3345551, https://doi.org/10.1145/3345551

  52. Namici, M., Giacomo, G.D.: Comparing Query Answering in OBDA Tools over W3C-Compliant Specifications. In: Proceedings of the 31st International Workshop on Description Logics co-located with 16th International Conference on Principles of Knowledge Representation and Reasoning (KR 2018), Tempe, Arizona, US, October 27th - to - 29th, 2018. CEUR Workshop Proceedings, vol. 2211. CEUR-WS.org (2018), https://ceur-ws.org/Vol-2211/paper-25.pdf

  53. Neumann, M., King, D., Beltagy, I., Ammar, W.: ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. In: Proceedings of the 18th BioNLP Workshop and Shared Task. pp. 319–327. Association for Computational Linguistics, Florence, Italy (Aug 2019). https://doi.org/10.18653/v1/w19-5034, https://doi.org/10.18653/v1/w19-5034

  54. Nobel, T.B., Carr, R.A., Caso, R., Livschitz, J., Nussenzweig, S., Hsu, M., Tan, K.S., Sihag, S., Adusumilli, P.S., Bott, M.J., Downey, R.J., Huang, J., Isbell, J.M., Park, B.J., Rocco, G., Rusch, V.W., Jones, D.R., Molena, D.: Primary lung cancer in women after previous breast cancer. BJS Open 5(6), zrab115 (01 2022). https://doi.org/10.1093/bjsopen/zrab115

  55. Pagedar, N.A., Jayawardena, A., Charlton, M.E., Hoffman, H.T.: Second primary lung cancer after head and neck cancer: Implications for screening computed tomography. Ann. Otol. Rhinol. Laryngol. 124(10), 765–769 (2015). https://doi.org/10.1177/0003489415582259

    Article  Google Scholar 

  56. Poggi, A., Lembo, D., Calvanese, D., Giacomo, G.D., Lenzerini, M., Rosati, R.: Linking Data to Ontologies. J. Data Semant. 10, 133–173 (2008). https://doi.org/10.1007/978-3-540-77688-8_5, https://doi.org/10.1007/978-3-540-77688-8_5

  57. Ravi, M.P.K., Singh, K., Mulang, I.O., Shekarpour, S., Hoffart, J., Lehmann, J.: CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. pp. 504–514 (2021). https://doi.org/10.18653/v1/2021.eacl-main.40, https://doi.org/10.18653/v1/2021.eacl-main.40

  58. Reck, M., Carbone, D.P., Garassino, M., Barlesi, F.: Targeting KRAS in non-small-cell lung cancer: recent progress and new approaches. Ann. Oncol. 32(9), 1101–1110 (2021). https://doi.org/10.1016/j.annonc.2021.06.001

    Article  Google Scholar 

  59. Rivas, A., Collarana, D., Torrente, M., Vidal, M.E.: A Neuro-Symbolic System over Knowledge Graphs for Link Prediction. Semantic Web (2023), https://www.semantic-web-journal.net/system/files/swj3324.pdf

  60. Rohde, P.D., Bechara, M., Avellino: DeTrusty v0.12.2 (06 2023). https://doi.org/10.5281/zenodo.8063472

  61. Ruckhaus, E., Ruiz, E., Vidal, M.E.: Query evaluation and optimization in the semantic web. Theory Pract. Log. Program. 8(3), 393–409 (2008). https://doi.org/10.1017/S1471068407003225, https://doi.org/10.1017/S1471068407003225

  62. Sakor, A., Jozashoori, S., Niazmand, E., Rivas, A., Bougiatiotis, K., Aisopos, F., Iglesias, E., Rohde, P.D., Padiya, T., Krithara, A., Paliouras, G., Vidal, M.E.: Knowledge4COVID-19: A Semantic-based Approach for constructing a COVID-19 related Knowledge Graph from Various Sources and Analyzing Treatments’ Toxicities. J. Web Semant. 75, 100760 (2023). https://doi.org/10.1016/j.websem.2022.100760

    Article  Google Scholar 

  63. Sakor, A., Mulang, I.O., Singh, K., Shekarpour, S., Vidal, M.E., Lehmann, J., Auer, S.: Old is Gold: Linguistic Driven Approach for Entity and Relation Linking of Short Text. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Volume 1 (Long Papers). pp. 2336–2346. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1243, https://doi.org/10.18653/v1/n19-1243

  64. Sakor, A., Singh, K., Patel, A., Vidal, M.E.: Falcon 2.0: An Entity and Relation Linking Tool over Wikidata. In: CIKM ’20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19–23, 2020. pp. 3141–3148. ACM (2020). https://doi.org/10.1145/3340531.3412777, https://doi.org/10.1145/3340531.3412777

  65. Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: Optimization Techniques for Federated Query Processing on Linked Data. In: The Semantic Web - ISWC 2011–10th International Semantic Web Conference, Bonn, Germany, October 23–27, 2011, Proceedings, Part I. Lecture Notes in Computer Science, vol. 7031, pp. 601–616. Springer (2011). https://doi.org/10.1007/978-3-642-21064-8_39, https://doi.org/10.1007/978-3-642-21064-8_39

  66. Steenwinckel, B., Vandewiele, G., Rausch, I., Heyvaert, P., Taelman, R., Colpaert, P., Simoens, P., Dimou, A., Turck, F.D., Ongenae, F.: Facilitating the Analysis of COVID-19 Literature Through a Knowledge Graph. In: The Semantic Web - ISWC 2020. pp. 344–357 (2020). https://doi.org/10.1007/978-3-030-62466-8_22

  67. Sweis, R., Thomas, S., Bank, B., Fishkin, P., Mooney, C., Salgia, R.: Concurrent EGFR Mutation and ALK Translocation in Non-Small Cell Lung Cancer. Cureus 8(2) (2016). https://doi.org/10.7759/cureus.513

  68. Torrente, M., Sousa, P.A., Hernández, R., Blanco, M., Calvo, V., Collazo, A., Guerreiro, G.R., Núñez, B., Pimentao, J., Sánchez, J.C., Campos, M., Costabello, L., Novacek, V., Menasalvas, E., Vidal, M.E., Provencio, M.: An Artificial Intelligence-Based Tool for Data Analysis and Prognosis in Cancer Patients: Results from the Clarify Study. Cancers 14(16) (2022). https://doi.org/10.3390/cancers14164041, https://www.mdpi.com/2072-6694/14/16/4041

  69. Vidal, M.E., Castillo, S., Acosta, M., Montoya, G., Palma, G.: On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL queries. Trans. Large Scale Data Knowl. Centered Syst. 25, 109–149 (2016). https://doi.org/10.1007/978-3-662-49534-6_4, https://doi.org/10.1007/978-3-662-49534-6_4

  70. Vidal, M.E., Endris, K.M., Jazashoori, S., Sakor, A., Rivas, A.: Transforming Heterogeneous Data into Knowledge for Personalized Treatments - A Use Case. Datenbank-Spektrum 19(2), 95–106 (2019). https://doi.org/10.1007/s13222-019-00312-z, https://doi.org/10.1007/s13222-019-00312-z

  71. Vrandečić, D., Krötzsch, M.: Wikidata: A Free Collaborative Knowledgebase. Communications of the ACM (2014). https://doi.org/10.1145/2629489

    Article  Google Scholar 

  72. Wang, R., Yin, Z., Liu, L., Gao, W., Li, W., Shu, Y., Xu, J.: Second primary lung cancer after breast cancer: A population-based study of 6,269 women. Front. Oncol. 8, 427 (2018). https://doi.org/10.3389/fonc.2018.00427

    Article  Google Scholar 

  73. Wennstig, A.K., Wadsten, C., Garmo, H., Johansson, M., Fredriksson, I., Blomqvist, C., Holmberg, L., Nilsson, G., Sund, M.: Risk of primary lung cancer after adjuvant radiotherapy in breast cancer-a large population-based study. NPJ Breast Cancer 7(1), 71 (2021). https://doi.org/10.1038/s41523-021-00280-2

    Article  Google Scholar 

  74. Wiederhold, G.: Mediators in the Architecture of Future Information Systems. IEEE Computer 25(3), 38–49 (1992)

    Article  Google Scholar 

  75. Wu, B., Knoblock, C.A.: An Iterative Approach to Synthesize Data Transformation Programs. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI) (2015)

    Google Scholar 

  76. Wu, X., Huang, Y., Zhao, Q., Wang, L., Song, X., Li, Y., Jiang, L.: PD-L1 expression correlation with metabolic parameters of FDG PET/CT and clinicopathological characteristics in non-small cell lung cancer. EJNMMI Res 19(1) (2020). https://doi.org/10.1186/s13550-020-00639-9

  77. Zhang, H., Yu, A., Baran, A., Messing, E.: Risk of second cancer among young prostate cancer survivors. Radiat. Oncol. J. 39(2), 91–98 (2021)

    Article  Google Scholar 

  78. Zhao, Y., Shi, F., Zhou, Q., Li, Y., Wu, J., Wang, R., Song, Q.: Prognostic significance of PD-L1 in advanced non-small cell lung carcinoma. Medicine (Baltimore) (2020). https://doi.org/10.1097/MD.0000000000023172

    Article  Google Scholar 

Download references

Acknowledgement

This work has been supported by the EU H2020 RIA project CLARIFY (GA No. 875160). Maria-Esther Vidal is partially supported by Leibniz Association in the program “Leibniz Best Minds: Programme for Women Professors”, project TrustKG-Transforming Data in Trustable Insights with grant P99/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria-Esther Vidal .

Editor information

Editors and Affiliations

Appendices

A Federated SPARQL Queries

figure a
figure b
figure c
figure d
figure e
figure f
figure g
figure h
figure i

B Results Federated Query Engines

See Table 4.

Table 4. Execution Times of Federated Query Engines. avg is the average execution time of the query over 10 runs. stdev reports the standard deviation observed across the 10 runs. FedX (RDF4J) outperforms GraphDB in all queries. GraphDB times out for query Q6. DeTrusty has the best performance of all three engines in all nine queries. Additionally, the query execution time of DeTrusty is the most stable one as can be seen by the low standard deviation.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Vidal, ME., Niazmand, E., Rohde, P.D., Iglesias, E., Sakor, A. (2023). Challenges for Healthcare Data Analytics Over Knowledge Graphs. In: Hameurlain, A., Tjoa, A.M., Boucelma, O., Toumani, F. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems LIV. Lecture Notes in Computer Science(), vol 14160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-68014-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-68014-8_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-68013-1

  • Online ISBN: 978-3-662-68014-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics