Schema Matching across Query Interfaces on the Deep Web

He, Zhongtian; Hong, Jun; Bell, David

doi:10.1007/978-3-540-70504-8_6

Zhongtian He¹,
Jun Hong¹ &
David Bell¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5071))

Included in the following conference series:

British National Conference on Databases

621 Accesses
4 Citations

Abstract

Schema matching is a crucial step in data integration. Many approaches to schema matching have been proposed so far. Different types of information about schemas, including structures, linguistic features and data types, etc have been used to match attributes between schemas. Relying on a single aspect of information about schemas for schema matching is not sufficient. Approaches have been proposed to combine multiple matchers taking into account different aspects of information about schemas. Weights are usually assigned to individual matchers so that their match results can be combined taking into account their different levels of importance. However, these weights have to be manually generated and are domain-dependent. We propose a new approach to combining multiple matchers using the Dempster-Shafer theory of evidence, which finds the top-k attribute correspondences of each source attribute from the target schema. We then make use of some heuristics to resolve any conflicts between the attribute correspondences of different source attributes. Our experimental results show that our approach is highly effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

SMAT: An Attention-Based Deep Learning Solution to the Automation of Schema Matching

CONSchema: Schema Matching with Semantics and Constraints

A study on machine learning techniques for the schema matching network problem

Article Open access 23 November 2021

References

Bergman, M.K.: The deep web: Surfacing hidden value. BrightPlanet (2001)
Google Scholar
Dragut, E.C., Yu, C.T., Meng, W.: Meaningful labeling of integrated query interfaces. In: Proceedings of the 32th International Conference on Very Large Data Bases (VLDB 2006), pp. 679–690 (2006)
Google Scholar
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal of Data Semantics, 146–171 (2005)
Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Article MATH Google Scholar
He, B., Tao, T., Chang, K.C.C.: Clustering structured web sources: A schema-based, model-differentiation approach. In: Proceedings of the joint of the 20th International Conference on Data Engineering and 9th International Conference on Extending Database Technology (ICDE/EDBT) Ph.D. Workshop, pp. 536–546 (2004)
Google Scholar
He, B., Chang, K.C.C.: Statistical schema matching across web query interfaces. In: Proceedings of the 22th ACM International Conference on Management of Data (SIGMOD 2003), pp. 217–228 (2003)
Google Scholar
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), pp. 117–128 (2002)
Google Scholar
He, B., Chang, K.C.C., Han, J.: Discovering complex matchings across web query interfaces: a correlation mining approach. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pp. 148–157 (2004)
Google Scholar
Wu, W., Yu, C.T., Doan, A., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep web. In: Proceedings of the 23th ACM International Conference on Management of Data (SIGMOD 2004), pp. 95–106 (2004)
Google Scholar
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), pp. 49–58 (2001)
Google Scholar
Wang, J., Wen, J.R., Lochovsky, F.H., Ma, W.Y.: Instance-based schema matching for web databases by domain-specific query probing. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), pp. 408–419 (2004)
Google Scholar
Do, H.H., Rahm, E.: Coma - a system for flexible combination of schema matching approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB 2002), pp. 610–621 (2002)
Google Scholar
Beneventano, D., Bergamaschi, S., Castano, S., Corni, A., Guidetti, R., Malvezzi, G., Melchiori, M., Vincini, M.: Information integration: The momis project demonstration. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), pp. 611–614 (2000)
Google Scholar
Castano, S., Antonellis, V.D., di Vimercati, S.D.C.: Global viewing of heterogeneous data sources. IEEE Transactions on Knowledge and Data Engineering 13(2), 277–297 (2001)
Article Google Scholar
Doan, A., Domingos, P., Levy, A.Y.: Learning source description for data integration. In: Proceedings of the 3rd International Workshop on the Web and Databases (WebDB 2000) (Informal Proceedings), pp. 81–86 (2000)
Google Scholar
Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Proceedings of the 20th ACM International Conference on Management of Data (SIGMOD 2001), pp. 509–520 (2001)
Google Scholar
Lowrance, J.D., Garvey, T.D.: Evidential reasoning: An developing concept. In: Proceedings of the IEEE International Conference on Cybernetics and Society (ICCS 1981), pp. 6–9 (1981)
Google Scholar
Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)
MATH Google Scholar
Hall, P., Dowling, G.: Approximate string matching. Computing Surveys, 381–402 (1980)
Google Scholar
Halevy, A.Y., Madhavan, J.: Corpus-Based Knowledge Representation. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (IJCAI 2003), pp. 1567–1572 (2003)
Google Scholar
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence Workshop on Information Integration on the Web (IIWeb 2003), pp. 73–78 (2003)
Google Scholar
van Rijsbergen, C.J.: Information Retrival. Butterworths (1979)
Google Scholar
Doan, A., Madhavan, J., Dhamankar, R., Domingos, P., Halevy, A.Y.: Learning to match ontologies on the semantic web. VLDB Journal 12(4), 303–319 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast, BT7 1NN, UK
Zhongtian He, Jun Hong & David Bell

Authors

Zhongtian He
View author publications
You can also search for this author in PubMed Google Scholar
Jun Hong
View author publications
You can also search for this author in PubMed Google Scholar
David Bell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alex Gray Keith Jeffery Jianhua Shao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, Z., Hong, J., Bell, D. (2008). Schema Matching across Query Interfaces on the Deep Web. In: Gray, A., Jeffery, K., Shao, J. (eds) Sharing Data, Information and Knowledge. BNCOD 2008. Lecture Notes in Computer Science, vol 5071. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70504-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-70504-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70503-1
Online ISBN: 978-3-540-70504-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics