Skip to main content

Towards Flexible Similarity Analysis of XML Data

  • Conference paper
  • First Online:
On the Move to Meaningful Internet Systems: OTM 2015 Workshops (OTM 2015)

Abstract

The problem of supporting similarity analysis of XML data is a major problem in the data fusion research area. Several approaches have been proposed in literature, but lack of flexibility represents a hard challenge to be faced-off, especially in modern Cloud Computing environments. Inspired by this motivation, we propose SemSynX, a novel technique for supporting similarity analysis of XML data via semantic and syntactic heterogeneity/homogeneity detection. SemSynX retrieves several similarity scores over input XML documents, thus enabling flexible management and “customization” of similarity tools over XML data. In particular, the proposed technique is highly customizable, and it permits the specification of thresholds for the requested degree of similarity for paths and values as well as for the degree of relevance for path and value matching. Also, selection of paths and semantics-based comparison of label content are supported. It thus makes possible to “adjust” the similarity analysis depending on the nature of the input XML documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cannataro, M., Cuzzocrea, A., Mastroianni, C., Ortale, R., Pugliese, A.: Modeling adaptive hypermedia with an object-oriented approach and xml. In: Proceedings of the Second International Workshop on Web Dynamics, WebDyn 2002, May 7–11, 2002, Honolulu, HI, USA, pp. 35–44 (2002)

    Google Scholar 

  2. Cuzzocrea, A.: Accuracy control in compressed multidimensional data cubes for quality of answer-based OLAP tools. In: Proceedings of the 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006, July 3–5, 2006, Vienna, Austria, pp. 301–310 (2006)

    Google Scholar 

  3. Cuzzocrea, A., Russo, V., Saccà, D.: A robust sampling-based framework for privacy preserving OLAP. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 97–114. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  4. Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. In: Soviet Physics Doklady, vol. 10, p. 707 (1966)

    Google Scholar 

  5. Lung, C.-H., Sanaullah, M., Cao, Y., Majumdar, S.: Design and performance evaluation of cloud-based XML publish/subscribe services. In: IEEE International Conference on Services Computing, SCC 2014, Anchorage, AK, USA, June 27 – July 2, 2014, pp. 583–589 (2014)

    Google Scholar 

  6. Winkler, W.E.: The state of record linkage and current research problems. In: Statistical Research Division, US Census Bureau (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alfredo Cuzzocrea .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Almendros-Jiménez, J.M., Cuzzocrea, A. (2015). Towards Flexible Similarity Analysis of XML Data. In: Ciuciu, I., et al. On the Move to Meaningful Internet Systems: OTM 2015 Workshops. OTM 2015. Lecture Notes in Computer Science(), vol 9416. Springer, Cham. https://doi.org/10.1007/978-3-319-26138-6_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26138-6_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26137-9

  • Online ISBN: 978-3-319-26138-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics