Abstract
Schema matching is a critical step in numerous database applications, such as web data sources integrating, data warehouse loading and information exchanging among several authorities. Existing techniques for schema matching are classified as either schema-based, instance-based, or a combination of both. In this paper, we propose a new class of techniques, called schema matching based on source codes. The idea is to exploit the exterior schema extracted from the source codes to find semantic correspondences between attributes in the schemas to be matched. Essentially, the exterior schema is a schema that is used to be exposed to final users and is in the outermost shell of applications. Thus, it typically contains complete semantics of data, which is very helpful in the solution of schema matching. We present a framework for schema matching based on source codes, which includes three key components: extracting the exterior schema, evaluating the quality of matching and finding the optimal mapping. We also present some helpful features and rules of the source codes for the implementation of each component, and address the corresponding challenges in details.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, W.-S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33(1), 49–84 (2000)
Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: a machine-learning approach. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 509–520 (2001)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. J. Very Large Data Bases (VLDB) 10(4), 334–350 (2001)
Do, H.-H., Rahm, E.: COMA - A system for flexible combination of schema matching approaches. In: Proceedings of Very Large Data Bases (VLDB), pp. 610–621 (2002)
Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 117–128 (2002)
Kang, J., Naughton, J.F.: On schema matching with opaque column names and data values. In: Proceedings of the Special Interest Group on Management Of Data (SIGMOD), pp. 205–216 (2003)
Cohen, W. W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the IJCAI Workshop on Information Integration on the Web (IIWeb), pp. 73–78 (2003)
He, B., Chang, K.C.: Statistical schema matching across web query interfaces. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 217–228 (2003)
He, B., Chang, K.C.-C., Han, J.: Discovering complex matchings across web query interfaces: a correlation mining approach. In: Proceedings of Knowledge Discovery and Data Mining (KDD), pp. 148–157 (2004)
Bilke, A., Naumann, F.: Schema Matching using Duplicates. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 69–80 (2005)
Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.Y.: Corpus-based schema matching. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 57–68 (2005)
Warren, R.H., Tompa, F.: Multicolumn substring matching for database schema translation. In: Proceedings of Very Large Data Bases (VLDB), pp. 331–342 (2006)
Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting context into schema matching. In: Proceedings of Very Large Data Bases (VLDB), pp. 307–318 (2006)
Dong, X., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: Proceedings of Very Large Data Bases (VLDB), pp. 687–698 (2007)
An, Y., Borgid, A., Miller, R.J.: A semantic approach to discovering schema mapping expressions. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 206–215 (2007)
Dai, B.T., Koudas, N., Srivastavat, D., Tung, A.K.H., Venkatasubramaniant, S.: Validating Multi-column Schema Matchings by Type. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 120–129 (2008)
Sarma, A.D., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 861–874 (2008)
Chan, C., Elmeleegy, H.V.J.H., Ouzzani, M., Elmagarmid, A.: Usage-based schema matching. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 20–29 (2008)
Nguyen, T., Moreira, V., Nguyen, H., Nguyen, H., Freire, J.: Multilingual schema matching for wikipedia infoboxes. In: Proceedings of Very Large Data Bases (VLDB), pp. 133–144 (2011)
Peukert, E., Eberius, J., Rahm, E.: A self-configuring schema matching system. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 306–317 (2012)
Qian, L., Cafarella, M.J., Jagadish, H.V.: Sample-driven schema mapping. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 73–84 (2012)
Zhang, M., Chakrabarti, K.: Infogather+: semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 145–156 (2013)
Acknowledgments
This research was supported by the National Natural Science Foundation of China (Grant No. 61303016) and the Normal Project Foundation of Education Department of LiaoNing Province (Grant No. L2012045).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ding, G., Wang, G., Fan, C., Chen, S. (2015). Schema Matching Based on Source Codes. In: Liu, A., Ishikawa, Y., Qian, T., Nutanong, S., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9052. Springer, Cham. https://doi.org/10.1007/978-3-319-22324-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-22324-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22323-0
Online ISBN: 978-3-319-22324-7
eBook Packages: Computer ScienceComputer Science (R0)