Abstract
Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the “hidden meaning” associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations. In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy.
Acknowledgements: This work was partially supported by MUR FIRB Network Peer for Business project (http://www.dbgroup.unimo.it/nep4b) and by the IST FP6 STREP project 2006 STASIS (http://www.dbgroup.unimo.it/stasis).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: SIGMOD 2005, pp. 906–908 (2005)
Bergamaschi, S., Castano, S., Vincini, M.: Semantic integration of semistructured and structured data sources. SIGMOD Record 28(1), 54–59 (1999)
Bergamaschi, S., Po, L., Sorrentino, S.: Automatic annotation for mapping discovery in data integration systems. In: SEBD 2008, pp. 334–341 (2008)
Beneventano, D., Bergamaschi, S., Guerra, F., Vincini, M.: Synthesizing an integrated ontology. IEEE Internet Computing 7(5), 42–51 (2003)
Do, H.H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: Web, Web-Services, and Database Systems, pp. 221–237 (2002)
Le, B.T., et al.: On ontology matching problems - for building a corporate semantic web in a multi-communities organization. ICEIS (4), 236–243 (2004)
Hill, E., et al.: AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools. In: MSR 2008 (2008)
Miller, G.A., et al.: Wordnet: An on-line lexical database. International Journal of Lexicography 3, 235–244 (1990)
Feild, H., et al.: An Empirical Comparison of Techniques for Extracting Concept Abbreviations from Identifiers. In: SEA 2006 (November 2006)
Miller, R.J., et al.: The Amalgam Schema and Data Integration Test Suite (2001), http://www.cs.toronto.edu/miller/amalgam
Uthurusamy, R., et al.: Extracting knowledge from diagnostic databases. IEEE Expert: Intelligent Systems and Their Applications 8(6), 27–38 (1993)
Nastase, V., et al.: Learning noun-modifier semantic relations with corpus-based and wordnet-based features. In: AAAI (2006)
Wong, W., et al.: Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text. In: AusDM 2006, pp. 83–89 (2006)
Euzenat, J., Shvaiko, P.: Ontology matching. Springer, Heidelberg (2007)
Fan, J., Barker, K., Porter, B.W.: The knowledge required to interpret noun compounds. In: IJCAI, pp. 1483–1485 (2003)
Finin, T.W.: The semantic interpretation of nominal compounds. In: AAAI, pp. 310–312 (1980)
Giunchiglia, F., Shvaiko, P., Yatskevich, M.: S-match: an algorithm and an implementation of semantic matching. In: Semantic Interoperability and Integration (2005)
Lapata, M.: The disambiguation of nominalizations. Computational Linguistics 28(3), 357–388 (2002)
Levi, J.N.: The Syntax and Semantics of Complex Nominals. Academic Press, New York (1978)
Madhavan, J., Bernstein, P.A., Rahm, E.: Generic schema matching with cupid. In: VLDB, pp. 49–58 (2001)
Plag, I.: Word-Formation in English. Cambridge Textbooks in Linguistics. Cambridge University Press, New York (2003)
Ratinov, L., Gudes, E.: Abbreviation Expansion in Schema Matching and Web Integration. In: WI 2004, pp. 485–489 (2004)
Su, X., Gulla, J.A.: Semantic enrichment for ontology mapping. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 217–228. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sorrentino, S., Bergamaschi, S., Gawinecki, M., Po, L. (2009). Schema Normalization for Improving Schema Matching. In: Laender, A.H.F., Castano, S., Dayal, U., Casati, F., de Oliveira, J.P.M. (eds) Conceptual Modeling - ER 2009. ER 2009. Lecture Notes in Computer Science, vol 5829. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04840-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-04840-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04839-5
Online ISBN: 978-3-642-04840-1
eBook Packages: Computer ScienceComputer Science (R0)