Abstract
Schema matching aims to establish the correspondence between the attributes of database schemas. It has been regarded as the most difficult and crucial stage in the development of many contemporary database and web semantic systems. Manual mapping is a lengthy and laborious process, yet a low-quality algorithmic matcher may cause more trouble. Moreover, the issue of data privacy in certain domains, such as healthcare, poses further challenges, as the use of instance-level data should be avoided to prevent the leakage of sensitive information. To address this issue, we propose CONSchema, a model that combines both the textual attribute description and constraints of the schemas to learn a better matcher. We also propose a new experimental setting to assess the practical performance of schema matching models. Our results on 6 benchmark datasets across various domains including healthcare and movies demonstrate the robustness of CONSchema.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We explored other models such as random forest and logistic regression and the results follow similar trends with MLP providing the largest performance boost.
References
Alwan, A.A., Nordin, A., Alzeber, M., Abualkishik, A.Z.: A survey of schema matching research using database schemas and instances. Int. J. Adv. Comput. Sci. Appl. 8(10), 2017 (2017)
Atzeni, P., Bellomarini, L., Papotti, P., Torlone, R.: Meta-mappings for schema mapping reuse. Proc. VLDB Endow. 12(5), 557–569 (2019)
Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908 (2005)
Azevedo, L.G., de Souza Soares, E.F., Souza, R., Moreno, M.F.: Modern federated database systems: an overview. ICEIS 1, 276–283 (2020)
Centers for Medicare & Medicaid Services: CMS 2008–2010 data entrepreneurs’ synthetic public use file (de-synpuf) (2011)
Chen, C., Golshan, B., Halevy, A.Y., Tan, W.C., Doan, A.: Biggorilla: an open-source ecosystem for data preparation and integration. IEEE Data Eng. Bull. 41(2), 10–22 (2018)
Do, H.H., Rahm, E.: Coma-a system for flexible combination of schema matching approaches. Proc. VLDB, 610–621 (2002)
Doan, A.: Learning to map between structured representations of data (2002)
Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Schema mapping evolution through composition and inversion. In: Schema Matching and Mapping, pp. 191–222 (2011)
Hammer, J., Stonebraker, M., Topsakal, O.: Thalia: test harness for the assessment of legacy information integration approaches. In: Proceedings of ICDE, pp. 485–486 (2005)
Johnson, A.E., et al.: Mimic-iii, a freely accessible critical care database. Sci. Data 3, 160035 (2016)
Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endow. 9(3), 204–215 (2015)
Li, Y., Li, J., Suhara, Y., Doan, A., Tan, W.C.: Deep entity matching with pre-trained language models. arXiv preprint abs/2004.00584 (2020)
Lundberg, S.M., Lee, S.: A unified approach to interpreting model predictions. In: Proceedings of NeurIPS, pp. 4765–4774 (2017)
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: vldb. vol. 1, pp. 49–58 (2001)
Mecca, G., Papotti, P., Santoro, D.: Schema mappings: from data translation to data cleaning. In: A Comprehensive Guide Through the Italian Database Research Over the Last 25 Years, pp. 203–217 (2018)
Observational Health Data Sciences and Informatics: The book of OHDSI (2019)
Shraga, R., Gal, A., Roitman, H.: Adnev: cross-domain schema matching using deep similarity matrix adjustment and evaluation. Proc. VLDB 13(9), 1401–1415 (2020)
Thirumuruganathan, S., Tang, N., Ouzzani, M., Doan, A.: Data curation with deep learning. In: EDBT, pp. 277–286 (2020)
Walonoski, J., et al.: Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. JAMIA 25(3), 230–238 (2017)
Zhang, J., Shin, B., Choi, J.D., Ho, J.C.: Smat: an attention-based deep learning solution to the automation of schema matching. In: Proceedings of ADBIS, pp. 260–274 (2021)
Acknowledgements
This work was supported by the National Science Foundation award IIS-2145411.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, K., Zhang, J., Ho, J.C. (2023). CONSchema: Schema Matching with Semantics and Constraints. In: Abelló, A., et al. New Trends in Database and Information Systems. ADBIS 2023. Communications in Computer and Information Science, vol 1850. Springer, Cham. https://doi.org/10.1007/978-3-031-42941-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-42941-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42940-8
Online ISBN: 978-3-031-42941-5
eBook Packages: Computer ScienceComputer Science (R0)