Skip to main content

Two Phase User Driven Schema Matching

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9282))

Abstract

In recent years it has become apparent that schema matching is a labor intensive process that is very costly in resources; this has led to the development of various automated tools to substitute the human experts involved in it. To this end we propose two new ideas. The first is the separation of matching techniques into strong and weak ones, in what we call two phase schema matching. The second is using information a human expert can provide to the system during the process of schema matching, that is used to determine how to combine the various matching techniques. A system encompassing both our ideas is easily tunable and allows the human expert to become part of the matching process and help the system choose the best techniques to use. In extensive experiments we demonstrate that this approach is better than contemporary state of the art systems in relational databases. We also demonstrate that single purpose (or niche) matchers can be helpful in such a system where the system can opt to use them if appropriate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  2. Madhavan, J., Bernstein, P., Doan, A., Halevy, A.: Corpus-based schema matching. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 57–68. IEEE (2005)

    Google Scholar 

  3. Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. In: Spaccapietra, S. (ed.) Journal on Data Semantics IV. LNCS, vol. 3730, pp. 146–171. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Peukert, E., Eberius, J., Rahm, E.: A self-configuring schema matching system. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 306–317. IEEE (2012)

    Google Scholar 

  5. Doan, A., Domingos, P., Halevy, A.: Learning to match the schemas of data sources: A multistrategy approach. Mach. Learn. 50(3), 279–301 (2003)

    Article  MATH  Google Scholar 

  6. Cohen, W.W., Hirsh, H.: Joins that generalize: text classification using WHIRL. In: Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD), pp. 169–173 (1998)

    Google Scholar 

  7. Elmeleegy, H., Lee, J., Rezig, E.K., Ouzzani, M., Elmagarmid, A.: U-MAP: a system for usage-based schema matching and mapping. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pp. 1287–1290. ACM (2011)

    Google Scholar 

  8. Do, H.-H., Rahm, E.: COMA: a system for flexible combination of schema matching approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 610–621. VLDB Endowment (2002)

    Google Scholar 

  9. Aumueller, D., Do, H.-H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA ++. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908. ACM (2005)

    Google Scholar 

  10. Massmann, S., Engmann, D., Rahm, E.: COMA ++: Results for the ontology alignment contest OAEI 2006. In: International Workshop on Ontology Matching, Collocated with the 5th ISWC-2006, p. 107. Athens, Georgia, USA (2006)

    Google Scholar 

  11. Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: iMAP: discovering complex semantic matches between database schemas. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 383–394. ACM (2004)

    Google Scholar 

  12. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the 18th International Conference on Data Engineering, 2002, pp. 117–128. IEEE (2002)

    Google Scholar 

  13. Bernstein, P.A., Madhavan, J., Rahm, E.: Generic schema matching, ten years later. Proc. VLDB Endowment 4(11), 695–701 (2011)

    Google Scholar 

  14. Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 5, 16–23 (2003)

    Article  Google Scholar 

  15. osCommerce Online Merchant v2.3.3.4. http://www.oscommerce.com/Products

  16. CubeCart free, v.5.2.8. http://www.cubecart.com/downloads/

  17. Do, H.-H., Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)

    Article  Google Scholar 

  18. Mork, P., Rosenthal, A., Seligman, L., Korb, J., Samuel, K.: Integration Workbench: Integrating Schema Integration Tools, The MITRE Corporation, Case #06-0055, May 2006

    Google Scholar 

  19. Mork, P., Seligman, L., Rosenthal, A., Korb, J., Wolf, C.: The harmony integration workbench. In: Spaccapietra, S., Pan, J.Z., Thiran, P., Halpin, T., Staab, S., Svatek, V., Shvaiko, P., Roddick, J. (eds.) Journal on Data Semantics XI. LNCS, vol. 5383, pp. 65–93. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  20. Seligman, L., Mork, P., Halevy, A., Smith, K., Carey, M.J., Chen, K., Wolf, C., Madhavan, J., Kannan, A., Burdick, D.: OpenII: an open source information integration toolkit. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 1057–1060. ACM (2010)

    Google Scholar 

  21. COMA Community Edition, Schema Matching Solution for Data Integration. http://sourceforge.net/projects/coma-ce/

  22. Duchateau, F., Coletta, R., Bellahsene, Z., Miller, R.J.: (Not) yet another matcher. In: Proceedings of the 18th ACM Conference on Information and knowledge management, pp. 1537–1540. ACM (2009)

    Google Scholar 

  23. Sagi, T., Gal, A.: In schema matching, even experts are human: towards expert sourcing in schema matching. In: 2014 IEEE 30th International Conference on Data Engineering Workshops (ICDEW), pp. 45–49. IEEE (2014)

    Google Scholar 

  24. Nguyen, Q.V.H., Nguyen, T.T., Miklós, Z., Aberer, K., Gal, A., Weidlich, M.: Pay-as-you-go reconciliation in schema matching networks. In: 2014 IEEE 30th International Conference on Data Engineering (ICDE), pp. 220–231. IEEE (2014)

    Google Scholar 

  25. Zhang, C.J., Chen, L., Jagadish, H.V., Cao, C.C.: Reducing uncertainty of schema matching via crowdsourcing. Proc. VLDB Endowment 6(9), 757–768 (2013)

    Article  Google Scholar 

  26. McCann, R., Shen, W., Doan, A.: Matching schemas in online communities: a web 2.0 approach. In: IEEE 24th International Conference on Data Engineering, 2008 ICDE 2008, pp. 110–119. IEEE (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nick Bozovic .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bozovic, N., Vassalos, V. (2015). Two Phase User Driven Schema Matching. In: Tadeusz, M., Valduriez, P., Bellatreche, L. (eds) Advances in Databases and Information Systems. ADBIS 2015. Lecture Notes in Computer Science(), vol 9282. Springer, Cham. https://doi.org/10.1007/978-3-319-23135-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23135-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23134-1

  • Online ISBN: 978-3-319-23135-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics