Skip to main content

Tuning for Schema Matching

  • Chapter
  • First Online:
Book cover Schema Matching and Mapping

Part of the book series: Data-Centric Systems and Applications ((DCSA))

  • 1841 Accesses

Abstract

Schema matching has long been heading towards complete automation. However, the difficulty arising from heterogeneity in the data sources, domain specificity or structure complexity has led to a plethora of semi-automatic matching tools. Besides, letting users the possibility to tune a tool also provides more flexibility, for instance to increase the matching quality. In the recent years, much work has been carried out to support users in the tuning process, specifically at higher levels. Indeed, tuning occurs at every step of the matching process. At the lowest level, similarity measures include internal parameters which directly impact computed similarity values. Furthermore, a common filter to present mappings to users are the thresholds applied to these values. At a mid-level, users can adopt one or more strategies according to the matching tool that they use. These strategies aim at combining similarity measures in an efficient way. Several tools support the users in this task, mainly by providing state-of-the-art graphical user interfaces. Automatically tuning a matching tool at this level is also possible, but this is limited to a few matching tools. The highest level deals with the choice of the matching tool. Due to the proliferation of these approaches, the first issue for the user is to find the one which would best satisfies his/her criteria. Although benchmarking available matching tools with datasets can be useful, we show that several approaches have been recently designed to solve this problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    SecondString (May 2010): http://sourceforge.net/projects/secondstring/.

  2. 2.

    SimMetrics (May 2010): http://www.dcs.shef.ac.uk/∼sam/stringmetrics.html.

References

  1. Anan M, Avigdor G (2008) Boosting schema matchers. In: OTM ’08: Proceedings of the OTM 2008 confederated international conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on on the move to meaningful internet systems. Springer, Heidelberg, pp 283–300. doi:http://dx.doi.org/10.1007/978-3-540-88871-0_20

  2. Aumueller D, Do HH, Massmann S, Rahm E (2005) Schema and ontology matching with COMA +  + . In: ACM SIGMOD. ACM, NY, pp 906–908

    Google Scholar 

  3. Avesani P, Giunchiglia F, Yatskevich M (2005) A large scale taxonomy mapping evaluation. In: ISWC 2005, Galway, pp 67–81

    Google Scholar 

  4. Avigdor G (2005) On the cardinality of schema matching. In: OTM workshops, pp 947–956

    Google Scholar 

  5. Berlin J, Motro A (2001) Automated discovery of contents for virtual databases. In: CoopIS. Springer, Heidelberg, pp 108–122

    Google Scholar 

  6. Berlin J, Motro A (2002) Database schema matching using machine learning with feature selection. In: CAiSE. Springer, London, pp 452–466

    Google Scholar 

  7. Bellahsene Z, Bonifati A, Duchateau F, Velegrakis Y (2011) On evaluating schema matching and mapping. In: Bellahsene Z, Bonifati A, Rahm E (eds) Schema matching and mapping. Data-694 Centric Systems and Applications Series. Springer, Heidelberg

    Google Scholar 

  8. Bozovic N, Vassalos V (2008) Two-phase schema matching in real world relational databases. In: ICDE Workshops, pp 290–296

    Google Scholar 

  9. Carmel D, Avigdor G, Haggai R (2007) Rank aggregation for automatic schema matching. IEEE Trans Knowl Data Eng 19(4):538–553. doi:http://dx.doi.org/10.1109/TKDE.2007.1010

    Google Scholar 

  10. Cohen W, Ravikumar P, Fienberg S (2003) A comparison of string distance metrics for name-matching tasks. In: Proceedings of the IJCAI-2003. http://citeseer.ist.psu.edu/cohen03comparison.html

  11. Cruz IF, Sunna W, Makar N, Bathala S (2007) A visual tool for ontology alignment to enable geospatial interoperability. J Vis Lang Comput 18(3):230–254

    Article  Google Scholar 

  12. Cruz IF, Antonelli FP, Stroe C (2009) Agreementmaker: Efficient matching for large real-world schemas and ontologies. Proc VLDB Endow 2(2):1586–1589

    Google Scholar 

  13. Dhamankar R, Lee Y, Doan A, Halevy A, Domingos P (2004) iMAP: Discovering complex semantic matches between database schemas. In: ACM SIGMOD. ACM, NY, pp 383–394

    Google Scholar 

  14. Do HH, Rahm E (2002) COMA – A system for flexible combination of schema matching approaches. In: VLDB. VLDB Endowment, pp 610–621

    Google Scholar 

  15. Do HH, Melnik S, Rahm E (2002) Comparison of schema matching evaluations. In: Web, web-services, and database systems workshop. Springer, London, pp 221–237

    Google Scholar 

  16. Doan A, Domingos P, Halevy AY (2001) Reconciling schemas of disparate data sources – A machine learning approach. In: ACM SIGMOD. ACM, NY, pp 509–520

    Google Scholar 

  17. Doan A, Madhavan J, Dhamankar R, Domingos P, Halevy AY (2003) Learning to match ontologies on the semantic web. VLDB J 12(4):303–319

    Article  Google Scholar 

  18. Drumm C, Schmitt M, Do HH, Rahm E (2007) Quickmig: Automatic schema matching for data migration projects. In: CIKM. ACM, NY, pp 107–116. doi:http://doi.acm.org/10.1145/1321440.1321458

  19. Duchateau F (2009) Towards a generic approach for schema matcher selection: Leveraging user pre- and post-match effort for improving quality and time performance. PhD thesis, Université Montpellier II – Sciences et Techniques du Languedoc. http://tel.archives-ouvertes.fr/tel-00436547/en/

  20. Duchateau F, Bellahsene Z, Hunt E (2007) Xbenchmatch: A benchmark for xml schema matching tools. In: VLDB. VLDB Endowment, pp 1318–1321

    Google Scholar 

  21. Duchateau F, Bellahsene Z, Coletta R (2008a) A flexible approach for planning schema matching algorithms. In: OTM Conferences (1), Springer, Heidelberg, pp 249–264

    Google Scholar 

  22. Duchateau F, Bellahsene Z, Roche M (2008b) Improving quality and performance of schema matching in large scale. Ingénierie des Systèmes d’Information 13(5):59–82

    Article  Google Scholar 

  23. Duchateau F, Coletta R, Bellahsene Z, Miller RJ (2009a) (not) yet another matcher. In: CIKM ACM, Hong Kong, pp 1537–1540

    Google Scholar 

  24. Duchateau F, Coletta R, Bellahsene Z, Miller RJ (2009b) Yam: A schema matcher factory. In: CIKM ACM, Hong Kong, pp 2079–2080

    Google Scholar 

  25. Ehrig M, Staab S, Sure Y (2005) Bootstrapping ontology alignment methods with APFEL. In: ISWC, ACM, NY, pp 1148–1149

    Google Scholar 

  26. Euzenat J, et al (2004) State of the art on ontology matching. Tech. Rep. KWEB/2004/D2.2.3/v1.2, Knowledge Web

    Google Scholar 

  27. Ferrara A, Lorusso D, Montanelli S, Varese G (2008) Towards a benchmark for instance matching. In: Shvaiko P, Euzenat J, Giunchiglia F, Stuckenschmidt H (eds) OM. CEUR-WS.org, CEUR workshop proceedings, vol 431. http://dblp.uni-trier.de/db/conf/semweb/om2008.html#FerraraLMV08

  28. Garner SR (1995) Weka: The waikato environment for knowledge analysis. In: Proceedings of the New Zealand computer science research students conference, pp 57–64

    Google Scholar 

  29. Giunchiglia F, Shvaiko P, Yatskevich M (2004) S-Match: An algorithm and an implementation of semantic matching. In: European semantic web symposium. ACM, NY, pp 61–75

    Google Scholar 

  30. Giunchiglia F, Shvaiko P, Yatskevich M (2007) Semantic matching: Algorithms and an implementation. Tech. rep., DISI, University of Trento. http://eprints.biblio.unitn.it/archive/00001148/

  31. Hernandez MA, Miller RJ, Haas LM (2002) Clio: A semi-automatic tool for schema mapping (software demonstration). In: ACM SIGMOD, Madison

    Google Scholar 

  32. Köpcke H, Rahm E (2008) Training selection for tuning entity matching. In: QDB/MUD, VLDB, Auckland, pp 3–12

    Google Scholar 

  33. Lee Y, Sayyadian M, Doan A, Rosenthal A (2007) etuner: Tuning schema matching software using synthetic scenarios. VLDB J 16(1):97–122

    Article  Google Scholar 

  34. Levenshtein V (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707

    MathSciNet  Google Scholar 

  35. Li J, Tang J, Li Y, Luo Q (2009) Rimom: A dynamic multistrategy ontology alignment framework. IEEE Trans Knowl Data Eng 21(8):1218–1232. http://dx.doi.org/10.1109/TKDE.2008.202

    Google Scholar 

  36. Li WS, Clifton C (2000) Semint: a tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl Eng 33(1):49–84. http://dx.doi.org/10.1016/S0169-023X(99)00044-0

    Google Scholar 

  37. Madhavan J, Bernstein PA, Rahm E (2001) Generic schema matching with cupid. In: VLDB. Morgan Kaufmann, CA, pp 49–58

    Google Scholar 

  38. Madhavan J, Bernstein PA, Doan A, Halevy AY (2005) Corpus-based schema matching. In: International conference on data engineering. IEEE Computer Society, Washington, DC, pp 57–68

    Google Scholar 

  39. Malgorzata M, Anja J, Jérôme E (2006) Applying an analytic method for matching approach selection. In: Shvaiko P, Euzenat J, Noy NF, Stuckenschmidt H, Benjamins VR, Uschold M (eds) Ontology matching. CEUR-WS.org, CEUR workshop proceedings, vol 225. http://dblp. http://www.uni-trier.de/db/conf/semweb/om2006.html#MocholJE06

  40. Melnik S, Garcia-Molina H, Rahm E (2002) Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In: ICDE. IEEE Computer Society, Washington, DC, pp 117–128

    Google Scholar 

  41. Melnik S, Rahm E, Bernstein PA (2003) Developing metadata-intensive applications with rondo. J Web Semant I:47–74

    Google Scholar 

  42. Mork P, Seligman L, Rosenthal A, Korb J, Wolf C (2008) The harmony integration workbench. J Data Semant 11:65–93

    Google Scholar 

  43. Naumann F, Ho CT, Tian X, Haas LM, Megiddo N (2002) Attribute classification using feature analysis. In: ICDE. IEEE Computer Society, Washington, p 271

    Google Scholar 

  44. Noy N, Musen M (2001) Anchor-PROMPT: Using non-local context for semantic matching. In: Proceedings of IJCAI 2001 workshop on ontology and information sharing, Seattle, pp 63–70

    Google Scholar 

  45. Saleem K, Bellahsene Z (2009) Complex schema match discovery and validation through collaboration. In: OTM Conferences (1). Springer, Heidelberg, pp 406–413

    Google Scholar 

  46. Saleem K, Bellahsene Z, Hunt E (2008) Porsche: Performance oriented schema mediation. Inf Syst 33(7–8):637–657

    Article  Google Scholar 

  47. Shvaiko P, Euzenat J (2008) Ten challenges for ontology matching. In: OTM Conferences (2). Springer, Heidelberg, pp 1164–1182

    Google Scholar 

  48. Smith K, Morse M, Mork P, Li M, Rosenthal A, Allen D, Seligman L (2009) The role of schema matching in large enterprises. In: CIDR, Asilomar

    Google Scholar 

  49. Winkler W (1999) The state of record linkage and current research problems. In: Statistics of Income Division, Internal Revenue Service Publication R99/04

    Google Scholar 

  50. Wordnet (2007) http://wordnet.princeton.edu

  51. Yatskevich M (2003) Preliminary evaluation of schema matching systems. Tech. Rep. DIT-03-028, Informatica e Telecomunicazioni, University of Trento

    Google Scholar 

  52. Zhang X, Zhong Q, Shi F, Li J, Tang J (2009) Rimom results for OAEI 2009. http://oaei.ontologymatching.org/2009/results/

Download references

Acknowledgements

We thank our reviewers for their comments and corrections on this chapter. We are also grateful to colleagues who have accepted the publication of pictures from their tools.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zohra Bellahsene .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bellahsene, Z., Duchateau, F. (2011). Tuning for Schema Matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds) Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16518-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16518-4_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16517-7

  • Online ISBN: 978-3-642-16518-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics