research-article

Semantic Annotation of Relational Schemas Using a Probabilistic Generative Model

Authors:
Debayan Mukherjee

TCS Research, India

TCS Research, India

0009-0007-3078-7587
View Profile

,
Atreya Bandyopadhyay

TCS Research, India

TCS Research, India

0009-0008-5725-1712
View Profile

,
Soham Datta

TCS Research, India

TCS Research, India

0009-0000-7151-8470
View Profile

,
Indrajit Bhattacharya

TCS Research, India

TCS Research, India

0009-0004-4279-538X
View Profile

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)January 2024Pages 127–135https://doi.org/10.1145/3632410.3632414

Published:04 January 2024Publication History

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

Pages 127–135

ABSTRACT

Data in enterprises resides largely under relational schemas. Annotating such schemas with a knowledge graph (KG) that represents knowledge of the domain is useful for semantic understanding of the data as well as downstream processing by machines and humans. Existing approaches annotate only individual tables using small and simple KGs, and also fail to generalize to unseen KG entities and relationships during test. We propose a probabilistic model that generates complex relational schemas — tables, grouping of tables into neighborhoods, foreign key connections between tables and fields associated with tables — by traversing over paths in a knowledge graph. An efficient two-pass inference algorithm based on inverting this model jointly annotates schema elements such as fields, tables and neighborhoods with entities, and the associations between schema elements with relational paths in the KG. The algorithm also generalizes to unseen paths at test time. We show using experiments on a real-world schema and domain knowledge graph, in addition to benchmark datasets, that the proposed approach significantly out-performs existing approaches while demonstrating better scalability.

References

Philip A. Bernstein, Jayant Madhavan, and Erhard Rahm. 2011. Generic Schema Matching, Ten Years Later. Proc. VLDB Endow. 4, 11 (aug 2011), 695–701. https://doi.org/10.14778/3402707.3402710Google ScholarDigital Library
David M. Blei, Michael I. Jordan, Thomas L. Griffiths, and Joshua B. Tenenbaum. 2003. Hierarchical Topic Models and the Nested Chinese Restaurant Process. In Proceedings of the 16th International Conference on Neural Information Processing Systems (Whistler, British Columbia, Canada) (NIPS’03). MIT Press, Cambridge, MA, USA, 17–24.Google ScholarDigital Library
Jiaoyan Chen, Ernesto Jiménez-Ruiz, Ian Horrocks, and Charles Sutton. 2019. ColNet: Embedding the Semantics of Web Tables for Column Type Prediction. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence (Honolulu, Hawaii, USA) (AAAI’19/IAAI’19/EAAI’19). AAAI Press, Article 4, 8 pages. https://doi.org/10.1609/aaai.v33i01.330129Google ScholarDigital Library
Diego De Uña, Nataliia Rümmele, Graeme Gange, Peter Schachte, and Peter J. Stuckey. 2018. Machine Learning and Constraint Programming for Relational-to-Ontology Schema Mapping. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (Stockholm, Sweden) (IJCAI’18). AAAI Press, 1277–1283.Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423Google ScholarCross Ref
Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy, and Pedro Domingos. 2004. IMAP: Discovering Complex Semantic Matches between Database Schemas. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (Paris, France) (SIGMOD ’04). Association for Computing Machinery, New York, NY, USA, 383–394. https://doi.org/10.1145/1007568.1007612Google ScholarDigital Library
Adji B. Dieng, Francisco J. R. Ruiz, and David M. Blei. 2020. Topic Modeling in Embedding Spaces. Transactions of the Association for Computational Linguistics 8 (2020), 439–453. https://doi.org/10.1162/tacl_a_00325Google ScholarCross Ref
AnHai Doan, Pedro Domingos, and Alon Y. Halevy. 2001. Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (Santa Barbara, California, USA) (SIGMOD ’01). Association for Computing Machinery, New York, NY, USA, 509–520. https://doi.org/10.1145/375663.375731Google ScholarDigital Library
Jinhao Jiang, Kun Zhou, Zican Dong, Keming Ye, Wayne Xin Zhao, and Ji-Rong Wen. 2023. StructGPT: A General Framework for Large Language Model to Reason over Structured Data. arxiv:2305.09645 [cs.CL]Google Scholar
Saurabh S. Kataria, Krishnan S. Kumar, Rajeev R. Rastogi, Prithviraj Sen, and Srinivasan H. Sengamedu. 2011. Entity Disambiguation with Hierarchical Topic Models. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Diego, California, USA) (KDD ’11). Association for Computing Machinery, New York, NY, USA, 1037–1045. https://doi.org/10.1145/2020408.2020574Google ScholarDigital Library
Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc. VLDB Endow. 3, 1–2 (sep 2010), 1338–1347. https://doi.org/10.14778/1920841.1921005Google ScholarDigital Library
Ye Liu, Semih Yavuz, Rui Meng, Dragomir Radev, Caiming Xiong, and Yingbo Zhou. 2022. Uni-Parser: Unified Semantic Parser for Question Answering on Knowledge Base and Database. arxiv:2211.05165 [cs.CL]Google Scholar
Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm. 2001. Generic Schema Matching with Cupid. In Proceedings of the 27th International Conference on Very Large Data Bases(VLDB ’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 49–58.Google ScholarDigital Library
Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang, Chao Zhang, and Jiawei Han. 2020. Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Virtual Event, CA, USA) (KDD ’20). Association for Computing Machinery, New York, NY, USA, 1908–1917. https://doi.org/10.1145/3394486.3403242Google ScholarDigital Library
David Mimno, Wei Li, and Andrew McCallum. 2007. Mixtures of Hierarchical Topics with Pachinko Allocation. In Proceedings of the 24th International Conference on Machine Learning (Corvalis, Oregon, USA) (ICML ’07). Association for Computing Machinery, New York, NY, USA, 633–640. https://doi.org/10.1145/1273496.1273576Google ScholarDigital Library
Varish Mulwad, Tim Finin, and Anupam Joshi. 2013. Semantic Message Passing for Generating Linked Data from Tables. In The Semantic Web – ISWC 2013, Harith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josiane Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, and Krzysztof Janowicz (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 363–378.Google ScholarDigital Library
Dat Quoc Nguyen, Richard Billingsley, Lan Du, and Mark Johnson. 2015. Improving Topic Models with Latent Feature Word Representations. Transactions of the Association for Computational Linguistics 3 (2015), 299–313. https://doi.org/10.1162/tacl_a_00140Google ScholarCross Ref
Minh Pham, Suresh Alse, Craig A. Knoblock, and Pedro Szekely. 2016. Semantic Labeling: A Domain-Independent Approach. In The Semantic Web – ISWC 2016, Paul Groth, Elena Simperl, Alasdair Gray, Marta Sabou, Markus Krötzsch, Freddy Lecue, Fabian Flöck, and Yolanda Gil (Eds.). Springer International Publishing, Cham, 446–462.Google Scholar
Aniket Pramanick and Indrajit Bhattacharya. 2021. Joint Learning of Representations for Web-tables, Entities and Types using Graph Convolutional Network. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, 1197–1206. https://doi.org/10.18653/v1/2021.eacl-main.102Google ScholarCross Ref
S.K. Ramnandan, Amol Mittal, Craig A. Knoblock, and Pedro Szekely. 2015. Assigning Semantic Labels to Data Sources. In The Semantic Web. Latest Advances and New Domains, Fabien Gandon, Marta Sabou, Harald Sack, Claudia d’Amato, Philippe Cudré-Mauroux, and Antoine Zimmermann (Eds.). Springer International Publishing, Cham, 403–417.Google Scholar
Nataliia Rümmele, Yuriy Tyshetskiy, and Alex Collins. 2018. Evaluating Approaches for Supervised Semantic Labeling. In Workshop on Linked Data on the Web co-located with The Web Conference 2018, LDOW@WWW 2018, Lyon, France April 23rd, 2018(CEUR Workshop Proceedings, Vol. 2073), Tim Berners-Lee, Sarven Capadisli, Stefan Dietze, Aidan Hogan, Krzysztof Janowicz, and Jens Lehmann (Eds.). CEUR-WS.org. https://ceur-ws.org/Vol-2073/article-04.pdfGoogle Scholar
Charles Sutton and Andrew McCallum. 2010. An Introduction to Conditional Random Fields. arxiv:1011.4088 [stat.ML]Google Scholar
Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and José Luis Ambite. 2016. Learning the Semantics of Structured Data Sources. Web Semant. 37, C (mar 2016), 152–169. https://doi.org/10.1016/j.websem.2015.12.003Google ScholarDigital Library
Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and José Luis Ambite. 2016. Leveraging Linked Data to Discover Semantic Relations Within Data Sources. In The Semantic Web – ISWC 2016, Paul Groth, Elena Simperl, Alasdair Gray, Marta Sabou, Markus Krötzsch, Freddy Lecue, Fabian Flöck, and Yolanda Gil (Eds.). Springer International Publishing, Cham, 549–565.Google ScholarDigital Library
Kunihiro Takeoka, Masafumi Oyamada, Shinji Nakadai, and Takeshi Okadome. 2019. Meimei: An Efficient Probabilistic Approach for Semantically Annotating Tables. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence (Honolulu, Hawaii, USA) (AAAI’19/IAAI’19/EAAI’19). AAAI Press, Article 35, 8 pages. https://doi.org/10.1609/aaai.v33i01.3301281Google ScholarDigital Library
Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. 2011. Recovering Semantics of Tables on the Web. Proc. VLDB Endow. 4, 9 (jun 2011), 528–538. https://doi.org/10.14778/2002938.2002939Google ScholarDigital Library
Binh Vu, Craig Knoblock, and Jay Pujara. 2019. Learning Semantic Models of Data Sources Using Probabilistic Graphical Models. In The World Wide Web Conference (San Francisco, CA, USA) (WWW ’19). Association for Computing Machinery, New York, NY, USA, 1944–1953. https://doi.org/10.1145/3308558.3313711Google ScholarDigital Library
Martin Wainwright and Michael Jordan. 2003. Graphical Models, Exponential Families and Variational Inference. Technical Report. Dep. of Statistics, Univ. of California, Berkeley.Google Scholar
Pengcheng Yin, Graham Neubig, Wen-tau Yih, and Sebastian Riedel. 2020. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8413–8426. https://doi.org/10.18653/v1/2020.acl-main.745Google ScholarCross Ref

Index Terms

Semantic Annotation of Relational Schemas Using a Probabilistic Generative Model
1. Information systems

Recommendations

Translating relational schema into XML schema definition with data semantic preservation and XSD graph

Many legacy systems have been created by using relational database operating not for the Internet expression. Since the relational database is not an efficient way for data explosion, electronic transfer of data, and electronic business on the Web, we ...
Read More
Mapping DTDs to relational schemas with semantic constraints

XML is becoming a prevalent format and standard for data exchange in many applications. With the increase of XML data, there is an urgent need to research some efficient methods to store and manage XML data. As relational databases are the primary ...
Read More
Mapping Relational Schemas to XML DTDs with Constraints
IMSCCS '06: Proceedings of the First International Multi-Symposiums on Computer and Computational Sciences - Volume 2 (IMSCCS'06) - Volume 02

XML is becoming a prevalent format and de facto standard for data exchange in many applications. While traditionally, lots of data are stored and managed in relational databases. There is an urgent need to research some efficient methods to convert ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
January 2024
627 pages
ISBN:9798400716348
DOI:10.1145/3632410
Editors:
Sriraam Natarajan,
Indrajit Bhattacharya,
Richa Singh,
Arun Kumar,
Sayan Ranu,
Kalika Bali,
Abinaya K
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 January 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
knowledge graph
probabilistic generative model
relational schema
semantic annotation
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 22
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Semantic Annotation of Relational Schemas Using a Probabilistic Generative Model

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Translating relational schema into XML schema definition with data semantic preservation and XSD graph

Mapping DTDs to relational schemas with semantic constraints

Mapping Relational Schemas to XML DTDs with Constraints

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Semantic Annotation of Relational Schemas Using a Probabilistic Generative Model

CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Translating relational schema into XML schema definition with data semantic preservation and XSD graph

Mapping DTDs to relational schemas with semantic constraints

Mapping Relational Schemas to XML DTDs with Constraints

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media