Abstract
One significant part of today’s Web is Web databases, which can dynamically provide information in response to user queries. To help users submit queries to different Web databases, the query interface matching problem needs to be addressed. To solve this problem, we propose a new complex schema matching approach, Holistic Schema Matching (HSM). By examining the query interfaces of real Web databases, we observe that attribute matchings can be discovered from attribute-occurrence patterns. For example, First Name often appears together with Last Name while it is rarely co-present with Author in the Books domain. Thus, we design a count-based greedy algorithm to identify which attributes are more likely to be matched in the query interfaces. In particular, HSM can identify both simple matching i.e., 1:1 matching, and complex matching, i.e., 1:n or m:n matching, between attributes. Our experiments show that HSM can discover both simple and complex matchings accurately and efficiently on real data sets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bergman, M.K.: Surfacing hidden value (December 2000), http://www.brightplanet.com/technology/deepweb.asp
Bilke, A., Naumann, F.: Schema matching using duplicates. In: 21st Int. Conf. on Data Engineering, pp. 69–80 (2005)
Chang, K.C.-C., He, B., Li, C., Zhang, Z.: Structured databases on the Web: Observations and implications. Technical Report UIUCDCS-R-2003-2321, CS Department, University of Illinois at Urbana-Champaign (February 2003)
Chang, K.C.-C., He, B., Li, C., Zhang, Z.: The UIUC Web integration repository. Computer Science Department, University of Illinois at Urbana-Champaign (2003), http://metaquerier.cs.uiuc.edu/repository
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: imap: Discovering complex semantic matches between database schemas. In: ACM SIGMOD Conference, pp. 383–394 (2004)
Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data sources: A machine-learning approach. In: ACM SIGMOD Conference, pp. 509–520 (2001)
He, B., Chang, K.C.-C.: Discovering complex matchings across Web query interfaces: A correlation mining approach. In: ACM SIGKDD Conference, pp. 147–158 (2004)
He, B., Chang, K.C.-C., Han, J.: Statistical schema matching acrossWeb query interfaces. In: ACM SIGMOD Conference, pp. 217–228 (2003)
Li, W., Clifton, C., Liu, S.: Database Integration using Neural Networks: Implementation and Experience. Knowledge and Information Systems 2(1), 73–96 (2000)
Madhavan, J., Bernstein, P., Doan, A., Halevy, A.: Corpus-based schema matching. In: 21st Int. Conf. on Data Engineering, pp. 57–68 (2005)
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing, May. MIT Press, Cambridge (1999)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: A versatile graph matching algorithm. In: 18th Int. Conf. on Data Engineering, pp. 117–128 (2002)
Miller, G.: WordNet: An on-line lexical database. International Journal of Lexicography (1990)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. The VLDB Journal 10, 334–350 (2001)
Tan, P., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: ACM SIGKDD Conference, pp. 32–41 (2002)
Wang, J., Wen, J., Lochovsky, F., Ma, W.: Instance-based schema matching for Web databases by domain-specific query probing. In: 30th Int. Conf. Very Large Data Bases, pp. 408–419 (2004)
Wu, W., Yu, C., Doan, A., Meng, W.: An interactive clustering-based approach to integrating source query interfaces on the deep Web. In: ACM SIGMOD Conference, pp. 95–106 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Su, W., Wang, J., Lochovsky, F. (2006). Holistic Schema Matching for Web Query Interfaces. In: Ioannidis, Y., et al. Advances in Database Technology - EDBT 2006. EDBT 2006. Lecture Notes in Computer Science, vol 3896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11687238_8
Download citation
DOI: https://doi.org/10.1007/11687238_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32960-2
Online ISBN: 978-3-540-32961-9
eBook Packages: Computer ScienceComputer Science (R0)