skip to main content
10.1145/3462462.3468881acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Semantic enrichment of data for AI applications

Published:20 June 2021Publication History

ABSTRACT

In this work, we use semantic knowledge sources, such as cross-domain knowledge graphs (KGs) and domain-specific ontologies, to enrich structured data for various AI applications. By enriching our understanding of the underlying data with semantics brought in from external ontologies and KGs, we can better interpret the data as well as the queries to answer more questions, provide more complete answers, and deal with entity disambiguation. To semantically enrich the data with external knowledge sources, we need to find the correspondences between the structured data and the entities in the cross-domain KGs and/or the domain-specific ontologies. In this paper, we break this problem into several steps, and provide detailed solutions for each step. We showcase the practical value of semantic enrichment of data using our proposed techniques in entity disambiguation, natural language querying and conversational interfaces to data, query relaxation, as well as query answering, with promising results.

References

  1. FIBEN. https://github.com/IBM/fiben-benchmark. Accessed: 2021-06-01.Google ScholarGoogle Scholar
  2. FIBO. https://spec.edmcouncil.org/fibo/. Accessed: 2021-06-01.Google ScholarGoogle Scholar
  3. FRO. http://xbrl.squarespace.com/financial-report-ontology/. Accessed: 2021-06-01.Google ScholarGoogle Scholar
  4. RxNorm. https://www.nlm.nih.gov/research/umls/rxnorm/index.html. Accessed: 2021-06-01.Google ScholarGoogle Scholar
  5. Schema.org. https://schema.org. Accessed: 2021-06-01.Google ScholarGoogle Scholar
  6. SNOMED Clinical Terms. https://www.snomed.org/snomed-ct/what-is-snomed-ct. Accessed: 2021-06-01.Google ScholarGoogle Scholar
  7. S. Ahmetaj, V. Efthymiou, R. Fagin, P. G. Kolaitis, C. Lei, F. Özcan, and L. Popa. Ontology-enriched query answering on relational databases. In IAAI, page (to appear). AAAI Press, 2021.Google ScholarGoogle Scholar
  8. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. DBpedia: A nucleus for a web of open data. In ISWC, pages 722--735, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Cao, L. Hou, J. Li, and Z. Liu. Neural collective entity linking. In COLING, pages 675--686, 2018.Google ScholarGoogle Scholar
  10. Y. Cen, X. Zou, J. Zhang, H. Yang, J. Zhou, and J. Tang. Representation learning for attributed multiplex heterogeneous network. In SIGKDD, page 1358--1368, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Chen, G. Alghamdi, R. A. Schmidt, D. Walther, and Y. Gao. Ontology extraction for large ontologies via modularity and forgetting. In K-CAP, pages 45--52, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Christen. Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Christophides, V. Efthymiou, T. Palpanas, G. Papadakis, and K. Stefanidis. An overview of end-to-end entity resolution for big data. ACM Comput. Surv., 53(6):127:1--127:42, 2021. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Dai, M. Zhang, G. Chen, J. Fan, K. Y. Ngiam, and B. C. Ooi. Fine-grained concept linking using neural networks in healthcare. In SIGMOD, pages 51--66, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89--124, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. F. Cruz, and F. M. Couto. The agreementmakerlight ontology matching system. In OTM, pages 527--541, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  18. X. Fu, J. Zhang, Z. Meng, and I. King. Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In WWW, page 2331--2341, 2020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Govind, P. Konda, P. S. G. C., P. Martinkus, P. Nagarajan, H. Li, A. Soundararajan, S. Mudgal, J. R. Ballard, H. Zhang, A. Ardalan, S. Das, D. Paulsen, A. S. Saini, E. Paulson, Y. Park, M. Carter, M. Sun, G. M. Fung, and A. Doan. Entity matching meets data science: A progress report from the magellan project. In SIGMOD, pages 389--403, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. C. Grau, I. Horrocks, Y. Kazakov, and U. Sattler. Modular reuse of ontologies: Theory and practice. J. Artif. Intell. Res., 31:273--318, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. L. Hamilton, R. Ying, and J. Leskovec. Inductive representation learning on large graphs. In NIPS, pages 1024--1034, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Han, L. Hu, J. Sen, Y. Dang, B. Gao, V. Isahagian, C. Lei, V. Efthymiou, F. Özcan, A. Quamar, Z. Huang, and V. Muthusamy. Bootstrapping natural language querying on process automation data. In 2020 IEEE International Conference on Services Computing, SCC 2020, Beijing, China, November 7-11, 2020, pages 170--177. IEEE, 2020.Google ScholarGoogle Scholar
  24. J. Hao, C. Lei, V. Efthymiou, A. Quamar, F. Ozcan, Y. Sun, and W. Wang. Medto: Medical data to ontology matching using hybrid graph neural networks. In SIGKDD, 2021.Google ScholarGoogle Scholar
  25. M. Jammi, J. Sen, A. R. Mittal, et al. Tooling framework for instantiating natural language querying system. PVLDB, 11(12):2014--2017, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Jiménez-Ruiz and B. C. Grau. Logmap: Logic-based and scalable ontology matching. In ISWC, pages 273--288, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Jiménez-Ruiz, B. C. Grau, U. Sattler, T. Schneider, and R. B. Llavori. Safe and economic re-use of ontologies: A logic-based methodology and tool support. In ESWC, volume 5021, pages 185--199, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Jiménez-Ruiz, O. Hassanzadeh, V. Efthymiou, J. Chen, and K. Srinivas. Semtab 2019: Resources to benchmark tabular data to knowledge graph matching systems. In A. Harth, S. Kirrane, A. N. Ngomo, H. Paulheim, A. Rula, A. L. Gentile, P. Haase, and M. Cochez, editors, The Semantic Web - 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31-June 4, 2020, Proceedings, volume 12123 of Lecture Notes in Computer Science, pages 514--530. Springer, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  29. A. E. Johnson, T. J. Pollard, L. Shen, et al. Mimic-iii, a freely accessible critical care database. Scientific data, 3:160035, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  30. T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.Google ScholarGoogle Scholar
  31. P. Kolyvakis, A. Kalousis, and D. Kiritsis. DeepAlignment: Unsupervised ontology matching with refined word vectors. In NAACL, pages 787--798, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  32. P. Kolyvakis, A. Kalousis, B. Smith, and D. Kiritsis. Biomedical ontology alignment: an approach based on representation learning. J. Biomed. Semant., 9(1):21:1--21:20, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  33. H. Köpcke and E. Rahm. Frameworks for entity matching: A comparison. Data Knowl. Eng., 69(2):197--210, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Lei, V. Efthymiou, R. Geis, and F. Ozcan. Expanding query answers on medical knowledge bases. In EDBT, pages 567--578, 2020.Google ScholarGoogle Scholar
  35. C. Lei, F. Özcan, A. Quamar, A. R. Mittal, J. Sen, D. Saha, and K. Sankaranarayanan. Ontology-based natural language query interfaces for data exploration. IEEE Data Eng. Bull., 41(3):52--63, 2018.Google ScholarGoogle Scholar
  36. Y. Li, D. Tarlow, M. Brockschmidt, and R. S. Zemel. Gated graph sequence neural networks. In ICLR, 2016.Google ScholarGoogle Scholar
  37. S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, and V. Raghavendra. Deep learning for entity matching: A design space exploration. In SIGMOD, page 19--34, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Quamar, C. Lei, D. Miller, F. Ozcan, J. Kreulen, R. J. Moore, and V. Efthymiou. An ontology-based conversation system for knowledge bases. In SIGMOD, pages 361--376, 2020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Quamar, F. Özcan, D. Miller, R. J. Moore, R. Niehus, and J. Kreulen. Conversational BI: an ontology-driven conversation system for business intelligence applications. Proc. VLDB Endow., 13(12):3369--3381, 2020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. D. Saha, A. Floratou, K. Sankaranarayanan, U. F. Minhas, A. R. Mittal, and F. Özcan. Athena: an ontology-driven system for natural language querying over relational data stores. PVLDB, 9(12):1209--1220, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. S. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In ESWC, pages 593--607, 2018.Google ScholarGoogle Scholar
  42. J. Sen, C. Lei, A. Quamar, F. Özcan, V. Efthymiou, A. Dalmia, G. Stager, A. R. Mittal, D. Saha, and K. Sankaranarayanan. ATHENA++ : natural language querying for complex nested SQL queries. Proc. VLDB Endow., 13(11):2747--2759, 2020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. W. Shen, J. Wang, and J. Han. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng., 27(2):443--460, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  44. T. P. Tanon, G. Weikum, and F. M. Suchanek. YAGO 4: A reason-able knowledge base. In ESWC, pages 583--596, 2020.Google ScholarGoogle Scholar
  45. P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph attention networks. In ICLR, 2018.Google ScholarGoogle Scholar
  46. C. D. Vescovo, M. Horridge, B. Parsia, U. Sattler, T. Schneider, and H. Zhao. Modular structures and atomic decomposition in ontologies. J. Artif. Intell. Res., 69:963--1021, 2020.Google ScholarGoogle Scholar
  47. D. Vrandecic. Wikidata: a new platform for collaborative data collection. In WWW, pages 1063--1064, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. Vretinaris, C. Lei, V. Efthymiou, X. Qin, and F. Özcan. Medical entity disambiguation using graph neural networks. CoRR, abs/2104.01488, 2021. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu. Heterogeneous graph attention network. In WWW, page 2022--2032, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. D. Wright, Y. Katsis, R. Mehta, and C.-N. Hsu. NormCo: Deep disease normalization for biomedical knowledge base construction. In AKBC 2019, 2019.Google ScholarGoogle Scholar
  51. Y. Wu, X. Liu, Y. Feng, Z. Wang, R. Yan, and D. Zhao. Relation-aware entity alignment for heterogeneous knowledge graphs. In IJCAI, pages 5278--5284, 2019. Google ScholarGoogle ScholarCross RefCross Ref
  52. G. Xiao, D. Calvanese, R. Kontchakov, D. Lembo, A. Poggi, R. Rosati, and M. Zakharyaschev. Ontology-based data access: A survey. In IJCAI, pages 5511--5519, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. K. Xu, L. Wu, Z. Wang, Y. Feng, and V. Sheinin. Graph2seq: Graph to sequence learning with attention-based neural networks. CoRR, abs/1804.00823, 2018.Google ScholarGoogle Scholar
  54. T. Yu, R. Zhang, K. Yang, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In EMNLP, pages 3911--3921, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  55. C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla. Heterogeneous graph neural network. In SIGKDD, page 793--803, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Semantic enrichment of data for AI applications
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          DEEM '21: Proceedings of the Fifth Workshop on Data Management for End-To-End Machine Learning
          June 2021
          52 pages
          ISBN:9781450384865
          DOI:10.1145/3462462

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 20 June 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate23of37submissions,62%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader