Abstract
In recent years it has become apparent that schema matching is a labor intensive process that is very costly in resources; this has led to the development of various automated tools to substitute the human experts involved in it. To this end we propose two new ideas. The first is the separation of matching techniques into strong and weak ones, in what we call two phase schema matching. The second is using information a human expert can provide to the system during the process of schema matching, that is used to determine how to combine the various matching techniques. A system encompassing both our ideas is easily tunable and allows the human expert to become part of the matching process and help the system choose the best techniques to use. In extensive experiments we demonstrate that this approach is better than contemporary state of the art systems in relational databases. We also demonstrate that single purpose (or niche) matchers can be helpful in such a system where the system can opt to use them if appropriate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Madhavan, J., Bernstein, P., Doan, A., Halevy, A.: Corpus-based schema matching. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 57–68. IEEE (2005)
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)
Peukert, E., Eberius, J., Rahm, E.: A self-configuring schema matching system. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 306–317. IEEE (2012)
Doan, A., Domingos, P., Halevy, A.: Learning to match the schemas of data sources: A multistrategy approach. Mach. Learn. 50(3), 279–301 (2003)
Cohen, W.W., Hirsh, H.: Joins that generalize: text classification using WHIRL. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD), pp. 169–173 (1998)
Elmeleegy, H., Lee, J., Rezig, E.K., Ouzzani, M., Elmagarmid, A.: U-MAP: a system for usage-based schema matching and mapping. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 1287–1290. ACM (2011)
Do, H.-H., Rahm, E.: COMA: a system for flexible combination of schema matching approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 610–621. VLDB Endowment (2002)
Aumueller, D., Do, H.-H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA ++. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908. ACM (2005)
Massmann, S., Engmann, D., Rahm, E.: COMAÂ ++: Results for the ontology alignment contest OAEI 2006. In: International Workshop on Ontology Matching, Collocated with the 5th ISWC-2006, p. 107. Athens, Georgia, USA (2006)
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: iMAP: discovering complex semantic matches between database schemas. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 383–394. ACM (2004)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th International Conference on Data Engineering, 2002, pp. 117–128. IEEE (2002)
Bernstein, P.A., Madhavan, J., Rahm, E.: Generic schema matching, ten years later. Proc. VLDB Endowment 4(11), 695–701 (2011)
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 5, 16–23 (2003)
osCommerce Online Merchant v2.3.3.4. http://www.oscommerce.com/Products
CubeCart free, v.5.2.8. http://www.cubecart.com/downloads/
Do, H.-H., Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)
Mork, P., Rosenthal, A., Seligman, L., Korb, J., Samuel, K.: Integration Workbench: Integrating Schema Integration Tools, The MITRE Corporation, Case #06-0055, May 2006
Mork, P., Seligman, L., Rosenthal, A., Korb, J., Wolf, C.: The harmony integration workbench. In: Spaccapietra, S., Pan, J.Z., Thiran, P., Halpin, T., Staab, S., Svatek, V., Shvaiko, P., Roddick, J. (eds.) Journal on Data Semantics XI. LNCS, vol. 5383, pp. 65–93. Springer, Heidelberg (2008)
Seligman, L., Mork, P., Halevy, A., Smith, K., Carey, M.J., Chen, K., Wolf, C., Madhavan, J., Kannan, A., Burdick, D.: OpenII: an open source information integration toolkit. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 1057–1060. ACM (2010)
COMA Community Edition, Schema Matching Solution for Data Integration. http://sourceforge.net/projects/coma-ce/
Duchateau, F., Coletta, R., Bellahsene, Z., Miller, R.J.: (Not) yet another matcher. In: Proceedings of the 18th ACM Conference on Information and knowledge management, pp. 1537–1540. ACM (2009)
Sagi, T., Gal, A.: In schema matching, even experts are human: towards expert sourcing in schema matching. In: 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), pp. 45–49. IEEE (2014)
Nguyen, Q.V.H., Nguyen, T.T., Miklós, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go reconciliation in schema matching networks. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 220–231. IEEE (2014)
Zhang, C.J., Chen, L., Jagadish, H.V., Cao, C.C.: Reducing uncertainty of schema matching via crowdsourcing. Proc. VLDB Endowment 6(9), 757–768 (2013)
McCann, R., Shen, W., Doan, A.: Matching schemas in online communities: a web 2.0 approach. In: IEEE 24th International Conference on Data Engineering, 2008 ICDE 2008, pp. 110–119. IEEE (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bozovic, N., Vassalos, V. (2015). Two Phase User Driven Schema Matching. In: Tadeusz, M., Valduriez, P., Bellatreche, L. (eds) Advances in Databases and Information Systems. ADBIS 2015. Lecture Notes in Computer Science(), vol 9282. Springer, Cham. https://doi.org/10.1007/978-3-319-23135-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-23135-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23134-1
Online ISBN: 978-3-319-23135-8
eBook Packages: Computer ScienceComputer Science (R0)