Skip to main content

Boosting Schema Matchers

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5331))

Abstract

Schema matching is recognized to be one of the basic operations required by the process of data and schema integration, and thus has a great impact on its outcome. We propose a new approach to combining matchers into ensembles, called Schema Matcher Boosting (SMB). This approach is based on a well-known machine learning technique, called boosting. We present a boosting algorithm for schema matching with a unique ensembler feature, namely the ability to choose the matchers that participate in an ensemble. SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher that is accurate for all schema pairs, a designer can focus on finding better than random schema matchers. We provide a thorough comparative empirical results where we show that SMB outperforms, on average, any individual matcher. In our experiments we have compared SMB with more than 30 other matchers over a real world data of 230 schemata and several ensembling approaches, including the Meta-Learner of LSD. Our empirical analysis shows that SMB improves, on average, over the performance of individual matchers. Moreover, SMB is shown to be consistently dominant, far beyond any other individual matcher. Finally, we observe that SMB performs better than the Meta-Learner in terms of precision, recall and F-Measure.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Batini, C., Lenzerini, M., Navathe, S.: A comparative analysis of methodologies for database schema integration. ACM Computing Surveys 18(4), 323–364 (1986)

    Article  Google Scholar 

  2. Berlin, J., Motro, A.: Autoplex: Automated discovery of content for virtual databases. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 108–122. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  3. Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: imap: Discovering complex mappings between database schemas. In: Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), pp. 383–394 (2004)

    Google Scholar 

  4. Do, H., Rahm, E.: COMA - a system for flexible combination of schema matching approaches. In: Proceedings of the International conference on Very Large Data Bases (VLDB), pp. 610–621 (2002)

    Google Scholar 

  5. Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref, W.G. (ed.) Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), Santa Barbara, California, May 2001, pp. 509–520. ACM Press, New York (2001)

    Google Scholar 

  6. Ehrig, M., Staab, S., Sure, Y.: Bootstrapping ontology alignment methods with apfel. In: The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Proceedings, Galway, Ireland, November 6-10, 2005, pp. 186–200 (2005)

    Google Scholar 

  7. Embley, D., Jackman, D., Xu, L.: Attribute match discovery in information integration: Exploiting multiple facets of metadata. Journal of Brazilian Computing Society 8(2), 32–43 (2002)

    Article  Google Scholar 

  8. Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1) (1997)

    Google Scholar 

  9. Freund, Y., Schapire, R.: A short introduction to boosting (1999)

    Google Scholar 

  10. Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB Journal 14(1), 50–67 (2005)

    Article  Google Scholar 

  11. Gal, A., Modica, G., Jamil, H., Eyal, A.: Automatic ontology matching using application semantics. AI Magazine 26(1), 21–32 (2005)

    Google Scholar 

  12. Hull, R.: Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pp. 51–61. ACM Press, New York (1997)

    Google Scholar 

  13. Lee, Y., Sayyadian, M., Doan, A., Rosenthal, A.: eTuner: tuning schema matching software using synthetic scenarios. VLDB Journal 16(1), 97–122 (2007)

    Article  Google Scholar 

  14. Li, W.-S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data & Knowledge Engineering 33(1), 49–84 (2000)

    Article  MATH  Google Scholar 

  15. Madhavan, J., Bernstein, P., Rahm, E.: Generic schema matching with Cupid. In: Proceedings of the International conference on Very Large Data Bases (VLDB), Rome, Italy, September 2001, pp. 49–58 (2001)

    Google Scholar 

  16. Marie, A., Gal, A.: Managing uncertainty in schema matcher ensembles. In: Prade, H., Subrahmanian, V. (eds.) Scalable Uncertainty Management, First International Conference, SUM 2007, Washington, DC, USA, October 2007, pp. 60–73. Springer, Heidelberg (2007)

    Google Scholar 

  17. Marie, A., Gal, A.: On the stable marriage of maximumweight royal couples. In: Proceedings of AAAI Workshop on Information Integration on the Web (IIWeb 2007), Vancouver, BC, Canada (July 2007)

    Google Scholar 

  18. Melnik, S.: Generic Model Management: Concepts and Algorithms. Springer, Heidelberg (2004)

    Book  MATH  Google Scholar 

  19. Melnik, S., Rahm, E., Bernstein, P.: Rondo: A programming platform for generic model management. In: Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), San Diego, California, pp. 193–204. ACM Press, New York (2003)

    Google Scholar 

  20. Mork, P., Rosenthal, A., Seligman, L., Korb, J., Samuel, K.: Integration workbench: Integrating schema integration tools. In: Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, Atlanta, GA, USA, April 3-7, 2006, page 3 (2006)

    Google Scholar 

  21. Nottelmann, H., Straccia, U.: Information retrieval and machine learning for probabilistic schema matching. Information Processing and Management 43(3), 552–576 (2007)

    Article  Google Scholar 

  22. Rahm, E., Bernstein, P.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  23. Ridgeway, G., Madigan, D., Richardson, T.: Boosting methodology for regression problems. In: Proceedings of the International Workshop on AI and Statistics, pp. 152–161 (1999)

    Google Scholar 

  24. Schapire, R.: The strength of weak learnability. Machine Learning 5, 197–227 (1990)

    Google Scholar 

  25. Schapire, R.: Using output codes to boost multiclass learning problems. In: Machine Learning: Proceedings of the Fourteenth International Conference, pp. 313–321 (1997)

    Google Scholar 

  26. Schapire, R.: The boosting approach to machine learning: An overview. In: MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, CA (March 2001)

    Google Scholar 

  27. Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal of Data Semantics 4, 146–171 (2005)

    MATH  Google Scholar 

  28. Su, W.: Domain-based Data Integration for Web Databases. PhD thesis, Dept. of Computer Science and Engineering, Hong Kong Univ. of Science and Technology, Hong Kong (December 2007)

    Google Scholar 

  29. Xu, L., Embley, D.: A composite approach to automating direct and indirect schema mappings. Information Systems 31(8), 697–886 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Marie, A., Gal, A. (2008). Boosting Schema Matchers. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems: OTM 2008. OTM 2008. Lecture Notes in Computer Science, vol 5331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88871-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88871-0_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88870-3

  • Online ISBN: 978-3-540-88871-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics