Abstract
Schema matching, the problem of finding mappings between the attributes of two semantically related database schemas, is an important aspect of many database applications such as schema integration, data warehousing, and electronic commerce. Unfortunately, schema matching remains largely a manual, labor-intensive process. Furthermore, the effort required is typically linear in the number of schemas to be matched; the next pair of schemas to match is not any easier than the previous pair. In this paper we describe a system, called Automatch, that uses machine learning techniques to automate schema matching. Based primarily on Bayesian learning, the system acquires probabilistic knowledge from examples that have been provided by domain experts. This knowledge is stored in a knowledge base called the attribute dictionary. When presented with a pair of new schemas that need to be matched (and their corresponding database instances), Automatch uses the attribute dictionary to find an optimal matching. We also report initial results from the Automatch project.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993.
Algorithmic Solutions. The LEDA Users Manual (Version 4.2.1), 2001.
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999.
Jacob Berlin and Amihai Motro. Autoplex: Automated discovery of content for virtual databases. In Proceedings of the Ninth International Conference on Cooperative Information Systems, pages 108–122, 2001.
Silvana Castano and Valeria De Antonellis. A schema analysis and reconciliation tool environment for heterogeneous databases. In Proceedings of the International Database Engineering and Applications Symposium, pages 53–62, 1999.
AnHai Doan, Pedro Domingos, and Alon Y. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In Proceedings ACM Special Interest Group for the Management of Data (SIGMOD), 2001.
Pedro Domingos and Michael Pazzani. Conditions for the optimality of the simple bayesian classifier. In Proceedings of the 13th International Conference on Machine Learning, pages 105–112, 1996.
Pat Langley, Wayne Iba, and Kevin Thompson. An analysis of bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 223–228, 1992.
Wen-Syan Li and Chris Clifton. Semantic integration in heterogeneous databases using neural networks. In Proceedings of 20th International Conference on Very Large Data Bases, pages 1–12, 1994.
Wen-Syan Li and Chris Clifton. Semint: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data & Knowledge Engineering, 33(1):49–84, 2000.
Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm. Generic schema matching with cupid. In Proceedings of the 27th International Conferences on Very Large Databases, pages 49–58, 2001.
Renée Miller, Laura Haas, and Mauricio Hernández. Schema mapping as query discovery. In Proceedings of the 26th International Conferences on Very Large Databases, pages 77–88, 2000.
Tom Mitchell. Machine Learning. McGraw-Hill, 1997.
Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
Erhard Rahm and Philip Bernstein. On matching schemas automatically. Technical Report MSR-TR-2001-17, Microsoft, Redmond, WA, February 2001.
Mehran Sahami, Susan Dumais, David Heckerman, and Eric Horvitz. A bayesian approach to filtering junk e-mail. AAAI-98 Workshop on Learning for Text Categorization, 1998.
Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, 2000.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berlin, J., Motro, A. (2002). Database Schema Matching Using Machine Learning with Feature Selection. In: Pidduck, A.B., Ozsu, M.T., Mylopoulos, J., Woo, C.C. (eds) Advanced Information Systems Engineering. CAiSE 2002. Lecture Notes in Computer Science, vol 2348. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47961-9_32
Download citation
DOI: https://doi.org/10.1007/3-540-47961-9_32
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43738-3
Online ISBN: 978-3-540-47961-1
eBook Packages: Springer Book Archive