Skip to main content

Adaptive Similarity of XML Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8841))

Abstract

In this work we explore application of XML schema similarity mapping in the area of conceptual modeling of XML schemas. We expand upon our previous efforts to map XML schemas to a common platform-independent schema using similarity evaluation based on exploitation of a decision tree. In particular, in this paper a more versatile method is implemented and the decision tree is trained using a large set of user-annotated mapping decision samples. Several variations of training that could improve the mapping results are proposed. The approach is implemented within a modeling and evolution management framework called eXolutio and its variations are evaluated using a wide range of experiments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Do, H.H., Rahm, E.: COMA – A system for flexible combination of schema matching approaches. In: Proceedings of the 28th International Conference on Very Large Data Bases, Pages, pp. 610–621. VLDB Endowment, Hong Kong (2002)

    Google Scholar 

  2. Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and Ontology Matching with COMA++. In: Proceeding SIGMOD 2005 Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908 (2005) ISBN:1-59593-060-4

    Google Scholar 

  3. Duchateau, F., Bellahsene, Z., Coletta, R.: A flexible approach for planning schema matching algorithms. In: Meersman, R., Tari, Z. (eds.) OTM 2008, Part I. LNCS, vol. 5331, pp. 249–264. Springer, Heidelberg (2008)

    Google Scholar 

  4. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph Matching Algorithm. In: Proceeding ICDE 2002 Proceedings of the 18th International Conference on Data Engineering, p. 117. IEEE Computer Society, Washington, DC (2002)

    Chapter  Google Scholar 

  5. Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., Yergeau, F.: Extensible Markup Language (XML) 1.0, 5th edn. W3C Recommendation (November 26, 2008), http://www.w3.org/TR/REC-xml .

  6. Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  7. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc, San Francisco (1993) ISBN:1-55860-238-0

    Google Scholar 

  8. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Chapman & Hall, New York (1984)

    Google Scholar 

  9. Hunt, E. B., Marin, J., Stone, P. T.: Experiments in Induction. Academic Press, New York (1966)

    Google Scholar 

  10. Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A fast scalable classier for data mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–540. Springer, Heidelberg (1996)

    Google Scholar 

  11. Stárka, J.: Similarity of XML Data. Master’s thesis, Charles University in Prague (2010), http://www.ksi.mff.cuni.cz/~holubova/dp/Starka.pdf

  12. Shasha, D., Zhang, K.: Approximate Tree Pattern Matching. Pattern Matching Algorithms, pp. 341–371. Oxford University Press (1997)

    Google Scholar 

  13. Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: Proceedings of the Fifth International Workshop on the Web and Databases, pp. 61–66 (2002)

    Google Scholar 

  14. Li, W., Clifton, C.: SemInt: a tool for identifying attribute correspondences in heterogeneous databases using neural network. Data & Knowledge Engineering 33(1), 169–123 (2000) ISSN 0169-023X

    Google Scholar 

  15. Chen, P.: The Entity-Relationship Model – Toward a Unified View of Data. ACM Transactions on Database Systems, 9–36 (March 1976)

    Google Scholar 

  16. Quinlan, R.: C5.0, http://www.rulequest.com/see5-unix.html .

  17. Stárka, J., Mlýnková, I., Klímek, J., Nečaský, M.: Integration of web service interfaces via decision trees. In: Proceedings of the 7th International Symposium on Innovations in Information Technology, pp. 47–52. IEEE Computer Society, Abu Dhabi (2011) ISBN: 978-1-4577-0311-9

    Google Scholar 

  18. Klímek, J., Mlýnková, I., Nečaský, M.: eXolutio: Tool for XML and Data Management. In: CEUR Workshop Proceedings, pp. 1613–1673 (2012) ISSN: 1613-0073

    Google Scholar 

  19. Miller, J., Mukerji, J.: MDA Guide Version 1.0.1. Object Management Group (2003), http://www.omg.org/docs/omg/03-06-01.pdf

  20. Nečaský, M., Mlýnková, I., Klímek, J., Malý, J.: When conceptual model meets grammar: A dual approach to XML data modeling. International Journal on Data & Knowledge Engineering 72, 1–30 (2012) ISBN:3-642-17615-1, 978-3-642-17615-9

    Google Scholar 

  21. Jílková, E.: Adaptive Similarity of XML Data. Master’s thesis, Charles University in Prague (2013), http://www.ksi.mff.cuni.cz/~holubova/dp/Jilkova.pdf

  22. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11(1) (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jílková, E., Polák, M., Holubová, I. (2014). Adaptive Similarity of XML Data. In: Meersman, R., et al. On the Move to Meaningful Internet Systems: OTM 2014 Conferences. OTM 2014. Lecture Notes in Computer Science, vol 8841. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45563-0_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45563-0_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45562-3

  • Online ISBN: 978-3-662-45563-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics