Abstract
The increasing size and the widespread use of XML data and different types of ontologies result in the big challenge of how to integrate these data. A critical step towards building this integration is to identify and discover semantically corresponding elements across heterogeneous data sets. This identification process becomes more and more challenging when dealing with large schemas and ontologies. Clustering-based matching is a great step towards more significant reduction of the search space and thus improving the matching efficiency. However, current methods used to identify similar clusters depend on literally matching terms. To keep high matching quality along with high matching efficiency, hidden semantic relationships among clusters’ elements should be discovered. To this end, in this paper, we propose a Latent Semantic Indexing-based approach that allows retrieving the conceptual meaning between clusters. The experimental evaluations reveal that the proposed approach permits encouraging and significant improvements towards building large-scale matching approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
XML Schema - Data Types Quick Reference, http://www.xml.dvint.com/.
- 6.
References
Abiteboul, S., Suciu, D., Buneman, P.: Data on the Web: From Relations to Semistructed Data and XML. Morgan Kaufmann, San Francisco (2000)
Algergawy, A., Massmann, S., Rahm, E.: A clustering-based approach for large-scale ontology matching. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds.) ADBIS 2011. LNCS, vol. 6909, pp. 415–428. Springer, Heidelberg (2011)
Algergawy, A., Nayak, R., Saake, G.: Element similarity measures in XML schema matching. Inf. Sci. 180(24), 4975–4998 (2010)
Algergawy, A., Nayak, R., Siegmund, N., Köppen, V., Saake, G.: Combining schema and level-based matching for web service discovery. In: Benatallah, B., Casati, F., Kappel, G., Rossi, G. (eds.) ICWE 2010. LNCS, vol. 6189, pp. 114–128. Springer, Heidelberg (2010)
Algergawy, A., Schallehn, E., Saake, G.: Improving XML schema matching using Prüfer sequences. DKE 68(8), 728–747 (2009)
Aslan, G., McLeod, D.: Semantic heterogeneity resolution in federated databases by metadata implantation and stepwise evolution. VLDB J. 8(2), 120–132 (1999)
Bellahsene, Z., Bonifati, A., Rahm. E.: Schema Matching and Mapping. Springer, Heidelberg (2011).
Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, vector spaces, and information retrieval. SIAM Rev. 41(2), 335–362 (1999)
Bonifati, A., Mecca, G., Pappalardo, A., Raunich, S., Summa, G.: Schema mapping verification: the spicy way. In: EDBT 2008, France, pp. 85–96 (2008)
Chiticariu, L., Hernández, M.A., Kolaitis, P.G., Popa, L.: Semi-automatic schema integration in Clio. In: VLDB’07, pp. 1326–1329 (2007)
Cohen, W.W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: IIWeb, pp. 73–78 (2003)
Deerwester, S., Dumais, S.T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)
Do, H.H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: The 2nd International Workshop on Web Databases (2002)
Do, H.H., Rahm, E.: Matching large schemas: approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)
Doan, A., Halevy, A.: Semantic integration research in the database community: a brief survey. AAAI AI Mag. 25(1), 83–94 (2005)
Doan, A., Halevy, A.Y., Ives, Z.G.: Principles of Data Integration. Morgan Kaufmann, San Francisco (2012)
Ehrig, M., Staab, S.: QOM – quick ontology mapping. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 683–697. Springer, Heidelberg (2004)
Halevy, A.Y., Ives, Z.G., Suciu, D., Tatarinov, I.: Schema mediation in peer data management systems. In: 19th International Conference on Data Engineering, pp. 505–516 (2003)
Hamdi, F., Safar, B., Reynaud, C., Zargayouna, H.: Alignment-based partitioning of large-scale ontologies. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds.) Advances in Knowledge Discovery and Management. SCI, vol. 292, pp. 251–269. Springer, Heidelberg (2010)
Hao, Y., Zhang, Y.: Web services discovery based on schema matching. In: ACSC 2007, pp. 107–113 (2007)
Hu, W., Qu, Y., Cheng, G.: Matching large ontologies: a divide-and-conquer approach. DKE 67, 140–160 (2008)
Landauer, T.: Handbook of Latent Semantic Analysis. Lawrence Erlbaum, Mahwah (2007)
Lee, D., Chu, W.W.: Comparative analysis of six XML schema languages. SIGMOD Rec. 9(3), 76–87 (2000)
Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: Xclust: clustering XML schemas for effective integration. In: CIKM’02, pp. 63–74 (2002)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Moawed, S., Algergawy, A., Sarhan, A., Eldosouky, A., Saake, G.: A latent semantic indexing-based approach to determine similar clusters in large-scale schema matching. In: Catania, B., et al. (eds.) New Trends in Databases and Information Systems. AISC, vol. 241, pp. 267–276. Springer, Heidelberg (2014)
Peukert, E., Berthold, H., Rahm, E.: Rewrite techniques for performance optimization of schema matching processes. In: EDBT, pp. 453–464 (2010)
Peukert, E., Eberius, J., Rahm, E.: A self-configuring schema matching system. In: 28th International Conference on Data Engineering (ICDE), 2012, pp. 306–317 (2012)
Peukert, E., Massmann, S., Konig, K.: Comparing similarity combination methods for schema matching. In: GI-Workshop, pp. 692–701 (2010)
Rahm, E.: Towards large-scale schema and ontology matching. In: Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping. Data-Centric Systems and Applications, pp. 3–27. Springer, Heidelberg (2011)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Seddiquia, M.H., Aono, M.: An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size. Web Semant. 7(4), 344–356 (2009)
Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)
Thuy, P.: Hybrid similarity measure for XML data integration and transformation. Ph.D. thesis, Seoul, Korea (2012)
Wang, Z., Wang, Y., Zhang, S.-S., Shen, G., Du, T.: Matching large scale ontology effectively. In: Mizoguchi, R., Shi, Z.-Z., Giunchiglia, F. (eds.) ASWC 2006. LNCS, vol. 4185, pp. 99–105. Springer, Heidelberg (2006)
Zhong, Q., Li, H., Li, J., Xie, G.T., Tang, J., Zhou, L., Pan, Y.: A Gauss function based approach for unbalanced ontology matching. In: ACM SIGMOD International Conference on Management of Data, (SIGMOD 2009), pp. 669–680 (2009)
Acknowledgments
This paper is a revised and extended version of the paper presented in [26]. A. Algergawy partially worked on this paper while at Magdeburg University.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Algergawy, A., Moawed, S., Sarhan, A., Eldosouky, A., Saake, G. (2014). Improving Clustering-Based Schema Matching Using Latent Semantic Indexing. In: Hameurlain, A., et al. Transactions on Large-Scale Data- and Knowledge-Centered Systems XV. Lecture Notes in Computer Science(), vol 8920. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45761-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-662-45761-0_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45760-3
Online ISBN: 978-3-662-45761-0
eBook Packages: Computer ScienceComputer Science (R0)