Boosting Schema Matchers

Marie, Anan; Gal, Avigdor

doi:10.1007/978-3-540-88871-0_20

Boosting Schema Matchers

Anan Marie³ &
Avigdor Gal³

Conference paper

1248 Accesses
28 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5331))

Abstract

Schema matching is recognized to be one of the basic operations required by the process of data and schema integration, and thus has a great impact on its outcome. We propose a new approach to combining matchers into ensembles, called Schema Matcher Boosting (SMB). This approach is based on a well-known machine learning technique, called boosting. We present a boosting algorithm for schema matching with a unique ensembler feature, namely the ability to choose the matchers that participate in an ensemble. SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher that is accurate for all schema pairs, a designer can focus on finding better than random schema matchers. We provide a thorough comparative empirical results where we show that SMB outperforms, on average, any individual matcher. In our experiments we have compared SMB with more than 30 other matchers over a real world data of 230 schemata and several ensembling approaches, including the Meta-Learner of LSD. Our empirical analysis shows that SMB improves, on average, over the performance of individual matchers. Moreover, SMB is shown to be consistently dominant, far beyond any other individual matcher. Finally, we observe that SMB performs better than the Meta-Learner in terms of precision, recall and F-Measure.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Batini, C., Lenzerini, M., Navathe, S.: A comparative analysis of methodologies for database schema integration. ACM Computing Surveys 18(4), 323–364 (1986)
Article Google Scholar
Berlin, J., Motro, A.: Autoplex: Automated discovery of content for virtual databases. In: Batini, C., Giunchiglia, F., Giorgini, P., Mecella, M. (eds.) CoopIS 2001. LNCS, vol. 2172, pp. 108–122. Springer, Heidelberg (2001)
Chapter Google Scholar
Dhamankar, R., Lee, Y., Doan, A., Halevy, A., Domingos, P.: imap: Discovering complex mappings between database schemas. In: Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), pp. 383–394 (2004)
Google Scholar
Do, H., Rahm, E.: COMA - a system for flexible combination of schema matching approaches. In: Proceedings of the International conference on Very Large Data Bases (VLDB), pp. 610–621 (2002)
Google Scholar
Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: A machine-learning approach. In: Aref, W.G. (ed.) Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), Santa Barbara, California, May 2001, pp. 509–520. ACM Press, New York (2001)
Google Scholar
Ehrig, M., Staab, S., Sure, Y.: Bootstrapping ontology alignment methods with apfel. In: The Semantic Web - ISWC 2005, 4th International Semantic Web Conference, ISWC 2005, Proceedings, Galway, Ireland, November 6-10, 2005, pp. 186–200 (2005)
Google Scholar
Embley, D., Jackman, D., Xu, L.: Attribute match discovery in information integration: Exploiting multiple facets of metadata. Journal of Brazilian Computing Society 8(2), 32–43 (2002)
Article Google Scholar
Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1) (1997)
Google Scholar
Freund, Y., Schapire, R.: A short introduction to boosting (1999)
Google Scholar
Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB Journal 14(1), 50–67 (2005)
Article Google Scholar
Gal, A., Modica, G., Jamil, H., Eyal, A.: Automatic ontology matching using application semantics. AI Magazine 26(1), 21–32 (2005)
Google Scholar
Hull, R.: Managing semantic heterogeneity in databases: A theoretical perspective. In: Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS), pp. 51–61. ACM Press, New York (1997)
Google Scholar
Lee, Y., Sayyadian, M., Doan, A., Rosenthal, A.: eTuner: tuning schema matching software using synthetic scenarios. VLDB Journal 16(1), 97–122 (2007)
Article Google Scholar
Li, W.-S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data & Knowledge Engineering 33(1), 49–84 (2000)
Article MATH Google Scholar
Madhavan, J., Bernstein, P., Rahm, E.: Generic schema matching with Cupid. In: Proceedings of the International conference on Very Large Data Bases (VLDB), Rome, Italy, September 2001, pp. 49–58 (2001)
Google Scholar
Marie, A., Gal, A.: Managing uncertainty in schema matcher ensembles. In: Prade, H., Subrahmanian, V. (eds.) Scalable Uncertainty Management, First International Conference, SUM 2007, Washington, DC, USA, October 2007, pp. 60–73. Springer, Heidelberg (2007)
Google Scholar
Marie, A., Gal, A.: On the stable marriage of maximumweight royal couples. In: Proceedings of AAAI Workshop on Information Integration on the Web (IIWeb 2007), Vancouver, BC, Canada (July 2007)
Google Scholar
Melnik, S.: Generic Model Management: Concepts and Algorithms. Springer, Heidelberg (2004)
Book MATH Google Scholar
Melnik, S., Rahm, E., Bernstein, P.: Rondo: A programming platform for generic model management. In: Proceedings of the ACM-SIGMOD conference on Management of Data (SIGMOD), San Diego, California, pp. 193–204. ACM Press, New York (2003)
Google Scholar
Mork, P., Rosenthal, A., Seligman, L., Korb, J., Samuel, K.: Integration workbench: Integrating schema integration tools. In: Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, Atlanta, GA, USA, April 3-7, 2006, page 3 (2006)
Google Scholar
Nottelmann, H., Straccia, U.: Information retrieval and machine learning for probabilistic schema matching. Information Processing and Management 43(3), 552–576 (2007)
Article Google Scholar
Rahm, E., Bernstein, P.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Article MATH Google Scholar
Ridgeway, G., Madigan, D., Richardson, T.: Boosting methodology for regression problems. In: Proceedings of the International Workshop on AI and Statistics, pp. 152–161 (1999)
Google Scholar
Schapire, R.: The strength of weak learnability. Machine Learning 5, 197–227 (1990)
Google Scholar
Schapire, R.: Using output codes to boost multiclass learning problems. In: Machine Learning: Proceedings of the Fourteenth International Conference, pp. 313–321 (1997)
Google Scholar
Schapire, R.: The boosting approach to machine learning: An overview. In: MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, CA (March 2001)
Google Scholar
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal of Data Semantics 4, 146–171 (2005)
MATH Google Scholar
Su, W.: Domain-based Data Integration for Web Databases. PhD thesis, Dept. of Computer Science and Engineering, Hong Kong Univ. of Science and Technology, Hong Kong (December 2007)
Google Scholar
Xu, L., Embley, D.: A composite approach to automating direct and indirect schema mappings. Information Systems 31(8), 697–886 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technion – Israel Institute of Technology, 32000, Israel
Anan Marie & Avigdor Gal

Authors

Anan Marie
View author publications
You can also search for this author in PubMed Google Scholar
Avigdor Gal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

STARLab, Bldg G/10, Vrije Universiteit Brussel (VUB), Pleinlaan 2, 1050, Brussels, Belgium
Robert Meersman
School of Computer Science and Information Technology, Bld 10.10, RMIT University, 376-392 Swanston Street, VIC 3001, Melbourne, Australia
Zahir Tari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marie, A., Gal, A. (2008). Boosting Schema Matchers. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems: OTM 2008. OTM 2008. Lecture Notes in Computer Science, vol 5331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88871-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-540-88871-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88870-3
Online ISBN: 978-3-540-88871-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics