ABSTRACT
In this work, we use semantic knowledge sources, such as cross-domain knowledge graphs (KGs) and domain-specific ontologies, to enrich structured data for various AI applications. By enriching our understanding of the underlying data with semantics brought in from external ontologies and KGs, we can better interpret the data as well as the queries to answer more questions, provide more complete answers, and deal with entity disambiguation. To semantically enrich the data with external knowledge sources, we need to find the correspondences between the structured data and the entities in the cross-domain KGs and/or the domain-specific ontologies. In this paper, we break this problem into several steps, and provide detailed solutions for each step. We showcase the practical value of semantic enrichment of data using our proposed techniques in entity disambiguation, natural language querying and conversational interfaces to data, query relaxation, as well as query answering, with promising results.
- FIBEN. https://github.com/IBM/fiben-benchmark. Accessed: 2021-06-01.Google Scholar
- FIBO. https://spec.edmcouncil.org/fibo/. Accessed: 2021-06-01.Google Scholar
- FRO. http://xbrl.squarespace.com/financial-report-ontology/. Accessed: 2021-06-01.Google Scholar
- RxNorm. https://www.nlm.nih.gov/research/umls/rxnorm/index.html. Accessed: 2021-06-01.Google Scholar
- Schema.org. https://schema.org. Accessed: 2021-06-01.Google Scholar
- SNOMED Clinical Terms. https://www.snomed.org/snomed-ct/what-is-snomed-ct. Accessed: 2021-06-01.Google Scholar
- S. Ahmetaj, V. Efthymiou, R. Fagin, P. G. Kolaitis, C. Lei, F. Özcan, and L. Popa. Ontology-enriched query answering on relational databases. In IAAI, page (to appear). AAAI Press, 2021.Google Scholar
- S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. DBpedia: A nucleus for a web of open data. In ISWC, pages 722--735, 2007. Google ScholarDigital Library
- Y. Cao, L. Hou, J. Li, and Z. Liu. Neural collective entity linking. In COLING, pages 675--686, 2018.Google Scholar
- Y. Cen, X. Zou, J. Zhang, H. Yang, J. Zhou, and J. Tang. Representation learning for attributed multiplex heterogeneous network. In SIGKDD, page 1358--1368, 2019. Google ScholarDigital Library
- J. Chen, G. Alghamdi, R. A. Schmidt, D. Walther, and Y. Gao. Ontology extraction for large ontologies via modularity and forgetting. In K-CAP, pages 45--52, 2019. Google ScholarDigital Library
- P. Christen. Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, 2012. Google ScholarDigital Library
- V. Christophides, V. Efthymiou, T. Palpanas, G. Papadakis, and K. Stefanidis. An overview of end-to-end entity resolution for big data. ACM Comput. Surv., 53(6):127:1--127:42, 2021. Google ScholarDigital Library
- J. Dai, M. Zhang, G. Chen, J. Fan, K. Y. Ngiam, and B. C. Ooi. Fine-grained concept linking using neural networks in healthcare. In SIGMOD, pages 51--66, 2018. Google ScholarDigital Library
- A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012. Google ScholarDigital Library
- R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89--124, 2005. Google ScholarDigital Library
- D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. F. Cruz, and F. M. Couto. The agreementmakerlight ontology matching system. In OTM, pages 527--541, 2013.Google ScholarCross Ref
- X. Fu, J. Zhang, Z. Meng, and I. King. Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In WWW, page 2331--2341, 2020. Google ScholarDigital Library
- R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases, 1997. Google ScholarDigital Library
- Y. Govind, P. Konda, P. S. G. C., P. Martinkus, P. Nagarajan, H. Li, A. Soundararajan, S. Mudgal, J. R. Ballard, H. Zhang, A. Ardalan, S. Das, D. Paulsen, A. S. Saini, E. Paulson, Y. Park, M. Carter, M. Sun, G. M. Fung, and A. Doan. Entity matching meets data science: A progress report from the magellan project. In SIGMOD, pages 389--403, 2019. Google ScholarDigital Library
- B. C. Grau, I. Horrocks, Y. Kazakov, and U. Sattler. Modular reuse of ontologies: Theory and practice. J. Artif. Intell. Res., 31:273--318, 2008. Google ScholarDigital Library
- W. L. Hamilton, R. Ying, and J. Leskovec. Inductive representation learning on large graphs. In NIPS, pages 1024--1034, 2017. Google ScholarDigital Library
- X. Han, L. Hu, J. Sen, Y. Dang, B. Gao, V. Isahagian, C. Lei, V. Efthymiou, F. Özcan, A. Quamar, Z. Huang, and V. Muthusamy. Bootstrapping natural language querying on process automation data. In 2020 IEEE International Conference on Services Computing, SCC 2020, Beijing, China, November 7-11, 2020, pages 170--177. IEEE, 2020.Google Scholar
- J. Hao, C. Lei, V. Efthymiou, A. Quamar, F. Ozcan, Y. Sun, and W. Wang. Medto: Medical data to ontology matching using hybrid graph neural networks. In SIGKDD, 2021.Google Scholar
- M. Jammi, J. Sen, A. R. Mittal, et al. Tooling framework for instantiating natural language querying system. PVLDB, 11(12):2014--2017, 2018. Google ScholarDigital Library
- E. Jiménez-Ruiz and B. C. Grau. Logmap: Logic-based and scalable ontology matching. In ISWC, pages 273--288, 2011. Google ScholarDigital Library
- E. Jiménez-Ruiz, B. C. Grau, U. Sattler, T. Schneider, and R. B. Llavori. Safe and economic re-use of ontologies: A logic-based methodology and tool support. In ESWC, volume 5021, pages 185--199, 2008. Google ScholarDigital Library
- E. Jiménez-Ruiz, O. Hassanzadeh, V. Efthymiou, J. Chen, and K. Srinivas. Semtab 2019: Resources to benchmark tabular data to knowledge graph matching systems. In A. Harth, S. Kirrane, A. N. Ngomo, H. Paulheim, A. Rula, A. L. Gentile, P. Haase, and M. Cochez, editors, The Semantic Web - 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31-June 4, 2020, Proceedings, volume 12123 of Lecture Notes in Computer Science, pages 514--530. Springer, 2020.Google ScholarCross Ref
- A. E. Johnson, T. J. Pollard, L. Shen, et al. Mimic-iii, a freely accessible critical care database. Scientific data, 3:160035, 2016.Google ScholarCross Ref
- T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.Google Scholar
- P. Kolyvakis, A. Kalousis, and D. Kiritsis. DeepAlignment: Unsupervised ontology matching with refined word vectors. In NAACL, pages 787--798, 2018.Google ScholarCross Ref
- P. Kolyvakis, A. Kalousis, B. Smith, and D. Kiritsis. Biomedical ontology alignment: an approach based on representation learning. J. Biomed. Semant., 9(1):21:1--21:20, 2018.Google ScholarCross Ref
- H. Köpcke and E. Rahm. Frameworks for entity matching: A comparison. Data Knowl. Eng., 69(2):197--210, 2010. Google ScholarDigital Library
- C. Lei, V. Efthymiou, R. Geis, and F. Ozcan. Expanding query answers on medical knowledge bases. In EDBT, pages 567--578, 2020.Google Scholar
- C. Lei, F. Özcan, A. Quamar, A. R. Mittal, J. Sen, D. Saha, and K. Sankaranarayanan. Ontology-based natural language query interfaces for data exploration. IEEE Data Eng. Bull., 41(3):52--63, 2018.Google Scholar
- Y. Li, D. Tarlow, M. Brockschmidt, and R. S. Zemel. Gated graph sequence neural networks. In ICLR, 2016.Google Scholar
- S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, and V. Raghavendra. Deep learning for entity matching: A design space exploration. In SIGMOD, page 19--34, 2018. Google ScholarDigital Library
- A. Quamar, C. Lei, D. Miller, F. Ozcan, J. Kreulen, R. J. Moore, and V. Efthymiou. An ontology-based conversation system for knowledge bases. In SIGMOD, pages 361--376, 2020. Google ScholarDigital Library
- A. Quamar, F. Özcan, D. Miller, R. J. Moore, R. Niehus, and J. Kreulen. Conversational BI: an ontology-driven conversation system for business intelligence applications. Proc. VLDB Endow., 13(12):3369--3381, 2020. Google ScholarDigital Library
- D. Saha, A. Floratou, K. Sankaranarayanan, U. F. Minhas, A. R. Mittal, and F. Özcan. Athena: an ontology-driven system for natural language querying over relational data stores. PVLDB, 9(12):1209--1220, 2016. Google ScholarDigital Library
- M. S. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In ESWC, pages 593--607, 2018.Google Scholar
- J. Sen, C. Lei, A. Quamar, F. Özcan, V. Efthymiou, A. Dalmia, G. Stager, A. R. Mittal, D. Saha, and K. Sankaranarayanan. ATHENA++ : natural language querying for complex nested SQL queries. Proc. VLDB Endow., 13(11):2747--2759, 2020. Google ScholarDigital Library
- W. Shen, J. Wang, and J. Han. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng., 27(2):443--460, 2015.Google ScholarCross Ref
- T. P. Tanon, G. Weikum, and F. M. Suchanek. YAGO 4: A reason-able knowledge base. In ESWC, pages 583--596, 2020.Google Scholar
- P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph attention networks. In ICLR, 2018.Google Scholar
- C. D. Vescovo, M. Horridge, B. Parsia, U. Sattler, T. Schneider, and H. Zhao. Modular structures and atomic decomposition in ontologies. J. Artif. Intell. Res., 69:963--1021, 2020.Google Scholar
- D. Vrandecic. Wikidata: a new platform for collaborative data collection. In WWW, pages 1063--1064, 2012. Google ScholarDigital Library
- A. Vretinaris, C. Lei, V. Efthymiou, X. Qin, and F. Özcan. Medical entity disambiguation using graph neural networks. CoRR, abs/2104.01488, 2021. Google ScholarDigital Library
- X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu. Heterogeneous graph attention network. In WWW, page 2022--2032, 2019. Google ScholarDigital Library
- D. Wright, Y. Katsis, R. Mehta, and C.-N. Hsu. NormCo: Deep disease normalization for biomedical knowledge base construction. In AKBC 2019, 2019.Google Scholar
- Y. Wu, X. Liu, Y. Feng, Z. Wang, R. Yan, and D. Zhao. Relation-aware entity alignment for heterogeneous knowledge graphs. In IJCAI, pages 5278--5284, 2019. Google ScholarCross Ref
- G. Xiao, D. Calvanese, R. Kontchakov, D. Lembo, A. Poggi, R. Rosati, and M. Zakharyaschev. Ontology-based data access: A survey. In IJCAI, pages 5511--5519, 2018. Google ScholarDigital Library
- K. Xu, L. Wu, Z. Wang, Y. Feng, and V. Sheinin. Graph2seq: Graph to sequence learning with attention-based neural networks. CoRR, abs/1804.00823, 2018.Google Scholar
- T. Yu, R. Zhang, K. Yang, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In EMNLP, pages 3911--3921, 2018.Google ScholarCross Ref
- C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla. Heterogeneous graph neural network. In SIGKDD, page 793--803, 2019. Google ScholarDigital Library
Index Terms
- Semantic enrichment of data for AI applications
Recommendations
Semantic enrichment for medical ontologies
The Unified Medical Language System (UMLS) contains two separate but interconnected knowledge structures, the Semantic Network (upper level) and the Metathesaurus (lower level). In this paper, we have attempted to work out better how the use of such a ...
Merging ontology by semantic enrichment and combining similarity measures
In this paper, we present a new approach to merge OWL ontologies by semantic enrichment of initial ontologies. This work is situated in the general context of stored information heterogeneity in a decisional system such as data, metadata and knowledge, ...
Semantic enrichment for improving systems interoperability
SAC '04: Proceedings of the 2004 ACM symposium on Applied computingThe overall goal addressed in this paper is to improve semantic interoperability in heterogeneous systems by means of establishing mappings between relevant domain ontologies. The mappings are discovered based on the technique of semantic enrichment ...
Comments