Database Schema Matching Using Machine Learning with Feature Selection

Berlin, Jacob; Motro, Amihai

doi:10.1007/3-540-47961-9_32

Jacob Berlin⁷ &
Amihai Motro⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2348))

Included in the following conference series:

International Conference on Advanced Information Systems Engineering

3566 Accesses
3 Altmetric

Abstract

Schema matching, the problem of finding mappings between the attributes of two semantically related database schemas, is an important aspect of many database applications such as schema integration, data warehousing, and electronic commerce. Unfortunately, schema matching remains largely a manual, labor-intensive process. Furthermore, the effort required is typically linear in the number of schemas to be matched; the next pair of schemas to match is not any easier than the previous pair. In this paper we describe a system, called Automatch, that uses machine learning techniques to automate schema matching. Based primarily on Bayesian learning, the system acquires probabilistic knowledge from examples that have been provided by domain experts. This knowledge is stored in a knowledge base called the attribute dictionary. When presented with a pair of new schemas that need to be matched (and their corresponding database instances), Automatch uses the attribute dictionary to find an optimal matching. We also report initial results from the Automatch project.

Download to read the full chapter text

Chapter PDF

Two Phase User Driven Schema Matching

YAM: A Step Forward for Generating a Dedicated Schema Matcher

A clustering-based feature selection method for automatically generated relational attributes

Article 05 April 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993.
Google Scholar
Algorithmic Solutions. The LEDA Users Manual (Version 4.2.1), 2001.
Google Scholar
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999.
Google Scholar
Jacob Berlin and Amihai Motro. Autoplex: Automated discovery of content for virtual databases. In Proceedings of the Ninth International Conference on Cooperative Information Systems, pages 108–122, 2001.
Google Scholar
Silvana Castano and Valeria De Antonellis. A schema analysis and reconciliation tool environment for heterogeneous databases. In Proceedings of the International Database Engineering and Applications Symposium, pages 53–62, 1999.
Google Scholar
AnHai Doan, Pedro Domingos, and Alon Y. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In Proceedings ACM Special Interest Group for the Management of Data (SIGMOD), 2001.
Google Scholar
Pedro Domingos and Michael Pazzani. Conditions for the optimality of the simple bayesian classifier. In Proceedings of the 13th International Conference on Machine Learning, pages 105–112, 1996.
Google Scholar
Pat Langley, Wayne Iba, and Kevin Thompson. An analysis of bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 223–228, 1992.
Google Scholar
Wen-Syan Li and Chris Clifton. Semantic integration in heterogeneous databases using neural networks. In Proceedings of 20th International Conference on Very Large Data Bases, pages 1–12, 1994.
Google Scholar
Wen-Syan Li and Chris Clifton. Semint: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data & Knowledge Engineering, 33(1):49–84, 2000.
Article MATH Google Scholar
Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm. Generic schema matching with cupid. In Proceedings of the 27th International Conferences on Very Large Databases, pages 49–58, 2001.
Google Scholar
Renée Miller, Laura Haas, and Mauricio Hernández. Schema mapping as query discovery. In Proceedings of the 26th International Conferences on Very Large Databases, pages 77–88, 2000.
Google Scholar
Tom Mitchell. Machine Learning. McGraw-Hill, 1997.
Google Scholar
Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
Google Scholar
Erhard Rahm and Philip Bernstein. On matching schemas automatically. Technical Report MSR-TR-2001-17, Microsoft, Redmond, WA, February 2001.
Google Scholar
Mehran Sahami, Susan Dumais, David Heckerman, and Eric Horvitz. A bayesian approach to filtering junk e-mail. AAAI-98 Workshop on Learning for Text Categorization, 1998.
Google Scholar
Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

Information and Software Engineering Department, George Mason University, Fairfax, VA, 22030, Virginia
Jacob Berlin & Amihai Motro

Authors

Jacob Berlin
View author publications
You can also search for this author in PubMed Google Scholar
Amihai Motro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada
Anne Banks Pidduck & M. Tamer Ozsu &
University of Toronto, Pratt Building 6 King’s College Road, Toronto, Ontario, M5S 3H5
John Mylopoulos
Faculty of Commerce and Business Administration, University of British Columbia, 2053 Main Mall, Vancouver, B.C., V6T 1Z2, Canada
Carson C. Woo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berlin, J., Motro, A. (2002). Database Schema Matching Using Machine Learning with Feature Selection. In: Pidduck, A.B., Ozsu, M.T., Mylopoulos, J., Woo, C.C. (eds) Advanced Information Systems Engineering. CAiSE 2002. Lecture Notes in Computer Science, vol 2348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47961-9_32

Download citation

DOI: https://doi.org/10.1007/3-540-47961-9_32
Published: 29 May 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43738-3
Online ISBN: 978-3-540-47961-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Database Schema Matching Using Machine Learning with Feature Selection

Abstract

Chapter PDF

Similar content being viewed by others

Two Phase User Driven Schema Matching

YAM: A Step Forward for Generating a Dedicated Schema Matcher

A clustering-based feature selection method for automatically generated relational attributes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Database Schema Matching Using Machine Learning with Feature Selection

Abstract

Chapter PDF

Similar content being viewed by others

Two Phase User Driven Schema Matching

YAM: A Step Forward for Generating a Dedicated Schema Matcher

A clustering-based feature selection method for automatically generated relational attributes

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation