Abstract
The problem of supporting similarity analysis of XML data is a major problem in the data fusion research area. Several approaches have been proposed in literature, but lack of flexibility represents a hard challenge to be faced-off, especially in modern Cloud Computing environments. Inspired by this motivation, we propose SemSynX, a novel technique for supporting similarity analysis of XML data via semantic and syntactic heterogeneity/homogeneity detection. SemSynX retrieves several similarity scores over input XML documents, thus enabling flexible management and “customization” of similarity tools over XML data. In particular, the proposed technique is highly customizable, and it permits the specification of thresholds for the requested degree of similarity for paths and values as well as for the degree of relevance for path and value matching. Also, selection of paths and semantics-based comparison of label content are supported. It thus makes possible to “adjust” the similarity analysis depending on the nature of the input XML documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cannataro, M., Cuzzocrea, A., Mastroianni, C., Ortale, R., Pugliese, A.: Modeling adaptive hypermedia with an object-oriented approach and xml. In: Proceedings of the Second International Workshop on Web Dynamics, WebDyn 2002, May 7–11, 2002, Honolulu, HI, USA, pp. 35–44 (2002)
Cuzzocrea, A.: Accuracy control in compressed multidimensional data cubes for quality of answer-based OLAP tools. In: Proceedings of the 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006, July 3–5, 2006, Vienna, Austria, pp. 301–310 (2006)
Cuzzocrea, A., Russo, V., Saccà, D.: A robust sampling-based framework for privacy preserving OLAP. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 97–114. Springer, Heidelberg (2008)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. In: Soviet Physics Doklady, vol. 10, p. 707 (1966)
Lung, C.-H., Sanaullah, M., Cao, Y., Majumdar, S.: Design and performance evaluation of cloud-based XML publish/subscribe services. In: IEEE International Conference on Services Computing, SCC 2014, Anchorage, AK, USA, June 27 – July 2, 2014, pp. 583–589 (2014)
Winkler, W.E.: The state of record linkage and current research problems. In: Statistical Research Division, US Census Bureau (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Almendros-Jiménez, J.M., Cuzzocrea, A. (2015). Towards Flexible Similarity Analysis of XML Data. In: Ciuciu, I., et al. On the Move to Meaningful Internet Systems: OTM 2015 Workshops. OTM 2015. Lecture Notes in Computer Science(), vol 9416. Springer, Cham. https://doi.org/10.1007/978-3-319-26138-6_61
Download citation
DOI: https://doi.org/10.1007/978-3-319-26138-6_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26137-9
Online ISBN: 978-3-319-26138-6
eBook Packages: Computer ScienceComputer Science (R0)