research-article

Semantic enrichment of data for AI applications

Authors:
Fatma Özcan

Google

Google
View Profile

,
Chuan Lei

IBM Research - Almaden

IBM Research - Almaden
View Profile

,
Abdul Quamar

IBM Research - Almaden

IBM Research - Almaden
View Profile

,
Vasilis Efthymiou

FORTH - Institute of Computer Science, Heraklion, Crete, Greece

FORTH - Institute of Computer Science, Heraklion, Crete, Greece
View Profile

DEEM '21: Proceedings of the Fifth Workshop on Data Management for End-To-End Machine LearningJune 2021Article No.: 4Pages 1–7https://doi.org/10.1145/3462462.3468881

Published:20 June 2021Publication History

DEEM '21: Proceedings of the Fifth Workshop on Data Management for End-To-End Machine Learning

Pages 1–7

ABSTRACT

In this work, we use semantic knowledge sources, such as cross-domain knowledge graphs (KGs) and domain-specific ontologies, to enrich structured data for various AI applications. By enriching our understanding of the underlying data with semantics brought in from external ontologies and KGs, we can better interpret the data as well as the queries to answer more questions, provide more complete answers, and deal with entity disambiguation. To semantically enrich the data with external knowledge sources, we need to find the correspondences between the structured data and the entities in the cross-domain KGs and/or the domain-specific ontologies. In this paper, we break this problem into several steps, and provide detailed solutions for each step. We showcase the practical value of semantic enrichment of data using our proposed techniques in entity disambiguation, natural language querying and conversational interfaces to data, query relaxation, as well as query answering, with promising results.

References

FIBEN. https://github.com/IBM/fiben-benchmark. Accessed: 2021-06-01.Google Scholar
FIBO. https://spec.edmcouncil.org/fibo/. Accessed: 2021-06-01.Google Scholar
FRO. http://xbrl.squarespace.com/financial-report-ontology/. Accessed: 2021-06-01.Google Scholar
RxNorm. https://www.nlm.nih.gov/research/umls/rxnorm/index.html. Accessed: 2021-06-01.Google Scholar
Schema.org. https://schema.org. Accessed: 2021-06-01.Google Scholar
SNOMED Clinical Terms. https://www.snomed.org/snomed-ct/what-is-snomed-ct. Accessed: 2021-06-01.Google Scholar
S. Ahmetaj, V. Efthymiou, R. Fagin, P. G. Kolaitis, C. Lei, F. Özcan, and L. Popa. Ontology-enriched query answering on relational databases. In IAAI, page (to appear). AAAI Press, 2021.Google Scholar
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. DBpedia: A nucleus for a web of open data. In ISWC, pages 722--735, 2007. Google ScholarDigital Library
Y. Cao, L. Hou, J. Li, and Z. Liu. Neural collective entity linking. In COLING, pages 675--686, 2018.Google Scholar
Y. Cen, X. Zou, J. Zhang, H. Yang, J. Zhou, and J. Tang. Representation learning for attributed multiplex heterogeneous network. In SIGKDD, page 1358--1368, 2019. Google ScholarDigital Library
J. Chen, G. Alghamdi, R. A. Schmidt, D. Walther, and Y. Gao. Ontology extraction for large ontologies via modularity and forgetting. In K-CAP, pages 45--52, 2019. Google ScholarDigital Library
P. Christen. Data Matching - Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Data-Centric Systems and Applications. Springer, 2012. Google ScholarDigital Library
V. Christophides, V. Efthymiou, T. Palpanas, G. Papadakis, and K. Stefanidis. An overview of end-to-end entity resolution for big data. ACM Comput. Surv., 53(6):127:1--127:42, 2021. Google ScholarDigital Library
J. Dai, M. Zhang, G. Chen, J. Fan, K. Y. Ngiam, and B. C. Ooi. Fine-grained concept linking using neural networks in healthcare. In SIGMOD, pages 51--66, 2018. Google ScholarDigital Library
A. Doan, A. Y. Halevy, and Z. G. Ives. Principles of Data Integration. Morgan Kaufmann, 2012. Google ScholarDigital Library
R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89--124, 2005. Google ScholarDigital Library
D. Faria, C. Pesquita, E. Santos, M. Palmonari, I. F. Cruz, and F. M. Couto. The agreementmakerlight ontology matching system. In OTM, pages 527--541, 2013.Google ScholarCross Ref
X. Fu, J. Zhang, Z. Meng, and I. King. Magnn: Metapath aggregated graph neural network for heterogeneous graph embedding. In WWW, page 2331--2341, 2020. Google ScholarDigital Library
R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases, 1997. Google ScholarDigital Library
Y. Govind, P. Konda, P. S. G. C., P. Martinkus, P. Nagarajan, H. Li, A. Soundararajan, S. Mudgal, J. R. Ballard, H. Zhang, A. Ardalan, S. Das, D. Paulsen, A. S. Saini, E. Paulson, Y. Park, M. Carter, M. Sun, G. M. Fung, and A. Doan. Entity matching meets data science: A progress report from the magellan project. In SIGMOD, pages 389--403, 2019. Google ScholarDigital Library
B. C. Grau, I. Horrocks, Y. Kazakov, and U. Sattler. Modular reuse of ontologies: Theory and practice. J. Artif. Intell. Res., 31:273--318, 2008. Google ScholarDigital Library
W. L. Hamilton, R. Ying, and J. Leskovec. Inductive representation learning on large graphs. In NIPS, pages 1024--1034, 2017. Google ScholarDigital Library
X. Han, L. Hu, J. Sen, Y. Dang, B. Gao, V. Isahagian, C. Lei, V. Efthymiou, F. Özcan, A. Quamar, Z. Huang, and V. Muthusamy. Bootstrapping natural language querying on process automation data. In 2020 IEEE International Conference on Services Computing, SCC 2020, Beijing, China, November 7-11, 2020, pages 170--177. IEEE, 2020.Google Scholar
J. Hao, C. Lei, V. Efthymiou, A. Quamar, F. Ozcan, Y. Sun, and W. Wang. Medto: Medical data to ontology matching using hybrid graph neural networks. In SIGKDD, 2021.Google Scholar
M. Jammi, J. Sen, A. R. Mittal, et al. Tooling framework for instantiating natural language querying system. PVLDB, 11(12):2014--2017, 2018. Google ScholarDigital Library
E. Jiménez-Ruiz and B. C. Grau. Logmap: Logic-based and scalable ontology matching. In ISWC, pages 273--288, 2011. Google ScholarDigital Library
E. Jiménez-Ruiz, B. C. Grau, U. Sattler, T. Schneider, and R. B. Llavori. Safe and economic re-use of ontologies: A logic-based methodology and tool support. In ESWC, volume 5021, pages 185--199, 2008. Google ScholarDigital Library
E. Jiménez-Ruiz, O. Hassanzadeh, V. Efthymiou, J. Chen, and K. Srinivas. Semtab 2019: Resources to benchmark tabular data to knowledge graph matching systems. In A. Harth, S. Kirrane, A. N. Ngomo, H. Paulheim, A. Rula, A. L. Gentile, P. Haase, and M. Cochez, editors, The Semantic Web - 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31-June 4, 2020, Proceedings, volume 12123 of Lecture Notes in Computer Science, pages 514--530. Springer, 2020.Google ScholarCross Ref
A. E. Johnson, T. J. Pollard, L. Shen, et al. Mimic-iii, a freely accessible critical care database. Scientific data, 3:160035, 2016.Google ScholarCross Ref
T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.Google Scholar
P. Kolyvakis, A. Kalousis, and D. Kiritsis. DeepAlignment: Unsupervised ontology matching with refined word vectors. In NAACL, pages 787--798, 2018.Google ScholarCross Ref
P. Kolyvakis, A. Kalousis, B. Smith, and D. Kiritsis. Biomedical ontology alignment: an approach based on representation learning. J. Biomed. Semant., 9(1):21:1--21:20, 2018.Google ScholarCross Ref
H. Köpcke and E. Rahm. Frameworks for entity matching: A comparison. Data Knowl. Eng., 69(2):197--210, 2010. Google ScholarDigital Library
C. Lei, V. Efthymiou, R. Geis, and F. Ozcan. Expanding query answers on medical knowledge bases. In EDBT, pages 567--578, 2020.Google Scholar
C. Lei, F. Özcan, A. Quamar, A. R. Mittal, J. Sen, D. Saha, and K. Sankaranarayanan. Ontology-based natural language query interfaces for data exploration. IEEE Data Eng. Bull., 41(3):52--63, 2018.Google Scholar
Y. Li, D. Tarlow, M. Brockschmidt, and R. S. Zemel. Gated graph sequence neural networks. In ICLR, 2016.Google Scholar
S. Mudgal, H. Li, T. Rekatsinas, A. Doan, Y. Park, G. Krishnan, R. Deep, E. Arcaute, and V. Raghavendra. Deep learning for entity matching: A design space exploration. In SIGMOD, page 19--34, 2018. Google ScholarDigital Library
A. Quamar, C. Lei, D. Miller, F. Ozcan, J. Kreulen, R. J. Moore, and V. Efthymiou. An ontology-based conversation system for knowledge bases. In SIGMOD, pages 361--376, 2020. Google ScholarDigital Library
A. Quamar, F. Özcan, D. Miller, R. J. Moore, R. Niehus, and J. Kreulen. Conversational BI: an ontology-driven conversation system for business intelligence applications. Proc. VLDB Endow., 13(12):3369--3381, 2020. Google ScholarDigital Library
D. Saha, A. Floratou, K. Sankaranarayanan, U. F. Minhas, A. R. Mittal, and F. Özcan. Athena: an ontology-driven system for natural language querying over relational data stores. PVLDB, 9(12):1209--1220, 2016. Google ScholarDigital Library
M. S. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In ESWC, pages 593--607, 2018.Google Scholar
J. Sen, C. Lei, A. Quamar, F. Özcan, V. Efthymiou, A. Dalmia, G. Stager, A. R. Mittal, D. Saha, and K. Sankaranarayanan. ATHENA++ : natural language querying for complex nested SQL queries. Proc. VLDB Endow., 13(11):2747--2759, 2020. Google ScholarDigital Library
W. Shen, J. Wang, and J. Han. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng., 27(2):443--460, 2015.Google ScholarCross Ref
T. P. Tanon, G. Weikum, and F. M. Suchanek. YAGO 4: A reason-able knowledge base. In ESWC, pages 583--596, 2020.Google Scholar
P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph attention networks. In ICLR, 2018.Google Scholar
C. D. Vescovo, M. Horridge, B. Parsia, U. Sattler, T. Schneider, and H. Zhao. Modular structures and atomic decomposition in ontologies. J. Artif. Intell. Res., 69:963--1021, 2020.Google Scholar
D. Vrandecic. Wikidata: a new platform for collaborative data collection. In WWW, pages 1063--1064, 2012. Google ScholarDigital Library
A. Vretinaris, C. Lei, V. Efthymiou, X. Qin, and F. Özcan. Medical entity disambiguation using graph neural networks. CoRR, abs/2104.01488, 2021. Google ScholarDigital Library
X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu. Heterogeneous graph attention network. In WWW, page 2022--2032, 2019. Google ScholarDigital Library
D. Wright, Y. Katsis, R. Mehta, and C.-N. Hsu. NormCo: Deep disease normalization for biomedical knowledge base construction. In AKBC 2019, 2019.Google Scholar
Y. Wu, X. Liu, Y. Feng, Z. Wang, R. Yan, and D. Zhao. Relation-aware entity alignment for heterogeneous knowledge graphs. In IJCAI, pages 5278--5284, 2019. Google ScholarCross Ref
G. Xiao, D. Calvanese, R. Kontchakov, D. Lembo, A. Poggi, R. Rosati, and M. Zakharyaschev. Ontology-based data access: A survey. In IJCAI, pages 5511--5519, 2018. Google ScholarDigital Library
K. Xu, L. Wu, Z. Wang, Y. Feng, and V. Sheinin. Graph2seq: Graph to sequence learning with attention-based neural networks. CoRR, abs/1804.00823, 2018.Google Scholar
T. Yu, R. Zhang, K. Yang, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. In EMNLP, pages 3911--3921, 2018.Google ScholarCross Ref
C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla. Heterogeneous graph neural network. In SIGKDD, page 793--803, 2019. Google ScholarDigital Library

Index Terms

Semantic enrichment of data for AI applications

Index terms have been assigned to the content through auto-classification.

Recommendations

Semantic enrichment for medical ontologies

The Unified Medical Language System (UMLS) contains two separate but interconnected knowledge structures, the Semantic Network (upper level) and the Metathesaurus (lower level). In this paper, we have attempted to work out better how the use of such a ...
Read More
Merging ontology by semantic enrichment and combining similarity measures

In this paper, we present a new approach to merge OWL ontologies by semantic enrichment of initial ontologies. This work is situated in the general context of stored information heterogeneity in a decisional system such as data, metadata and knowledge, ...
Read More
Semantic enrichment for improving systems interoperability
SAC '04: Proceedings of the 2004 ACM symposium on Applied computing

The overall goal addressed in this paper is to improve semantic interoperability in heterogeneous systems by means of establishing mappings between relevant domain ontologies. The mappings are discovered based on the technique of semantic enrichment ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DEEM '21: Proceedings of the Fifth Workshop on Data Management for End-To-End Machine Learning
June 2021
52 pages
ISBN:9781450384865
DOI:10.1145/3462462
Conference Chairs:
Matthias Boehm,
Julia Stoyanovich,
Steven Whang
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate23of37submissions,62%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 298
  Total Downloads
- Downloads (Last 12 months)79
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Semantic enrichment of data for AI applications

DEEM '21: Proceedings of the Fifth Workshop on Data Management for End-To-End Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semantic enrichment for medical ontologies

Merging ontology by semantic enrichment and combining similarity measures

Semantic enrichment for improving systems interoperability

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Semantic enrichment of data for AI applications

DEEM '21: Proceedings of the Fifth Workshop on Data Management for End-To-End Machine Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Semantic enrichment for medical ontologies

Merging ontology by semantic enrichment and combining similarity measures

Semantic enrichment for improving systems interoperability

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media