ABSTRACT
The process of schema matching lies at the heart of database applications related to data integration. Many instance-based solutions to the schema matching problem have been proposed. These approaches focus on analyzing the values of attributes especially within the application domain. The approach presented in this paper is a two-step domain-independent schema matching technique. The technique first measures shared information between pair-wise attributes using the concept of mutual information. Next, a graph representation with weighted links is constructed for each input schema. At this stage, schema matching switches to a weighted graph matching problem. At this stage, a graduated assignment algorithm is applied to find the correspondence of vertices between graphs. We perform experiments using two real-world data sets in different application domains to roughly evaluate the performance of this schema matching technique in terms of precision, recall and running time.
- S. Gold and A. Rangarajan. 1996. A graduated assignment algorithm for graph matching. IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 4, pp. 377--388, Apr. 1996. Google ScholarDigital Library
- http://www.stat.ucla.eduGoogle Scholar
- http://lib.stat.cmu.edu/datasetsGoogle Scholar
- Rahm, E. and Bernstein, P. 2001. A survey of approaches to automatic schema matching. The VLDB Journal 10, 4. Google ScholarDigital Library
- Alexander Bilke and Felix Naumann. 2005. Schema Matching Using Duplicates. 21st International Conference on Data Engineering (ICDE'05) pp. 69--80 Google ScholarDigital Library
- Cecil Chua Eng Huang, Roger H. L. Chiang and Ee-Peng Lim. 2003. Instance-based attribute identification in database integration. VLDB J. 12(3): 228--243 (2003) Google ScholarDigital Library
- Kang, J. and Naughton, J. 2003. On Schema Matching with Opaque Column Names and Data Values. Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pp. 205--216. Google ScholarDigital Library
- Doan, A., P. Domingos, and A. Halevy. 2001. Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach. SIGMOD 2001, 509--520. Google ScholarDigital Library
- W. Li and C. Clifton. 2000. SEMINT: a tool for identifying attributes correspondences in heterogeneous databases using neural networks. Data & Knowledge Engineering, 33(1), 2000, 49--84. Google ScholarDigital Library
- Wang, J., Wen, J.-R., Lochovsky, F., and Ma, W. 2004. Instance-based schema matching for web databases by domain-specific query probing. In Proceedings of the VLDB 2004 Conference. Google ScholarDigital Library
- http://en.wikipedia.org/wiki/Mutual_informationGoogle Scholar
- Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm. 2001. Generic Schema Matching with Cupid. In Proceedings of the 27th VLDB Conference, Roma, Italy, 2001. Google ScholarDigital Library
Index Terms
- An instance-based approach for domain-independent schema matching
Recommendations
A schema matching-based approach to XML schema clustering
iiWAS '08: Proceedings of the 10th International Conference on Information Integration and Web-based Applications & ServicesThe relationship between XML data clustering and schema matching is bidirectional. On one side, clustering techniques have been adopted to improve matching performance, and on the other side schema matching is the backbone of the clustering technique. ...
Element similarity measures in XML schema matching
Schema matching plays a central role in a myriad of XML-based applications. There has been a growing need for developing high-performance matching systems in order to identify and discover semantic correspondences across XML data. XML schema matching ...
Schema Mediation for Heterogeneous XML Schema Sources
WAINA '09: Proceedings of the 2009 International Conference on Advanced Information Networking and Applications WorkshopsDue to the increasingly widespread use of XML, many XML-related applications require the service of schema mediation, which is to find semantically similar elements from two or more schema sources. Current approaches to schema mediation require much ...
Comments