A methodology for measuring structure similarity of fuzzy XML documents

Zhao, Zhen; Ma, Zongmin

doi:10.1007/s00607-017-0553-x

A methodology for measuring structure similarity of fuzzy XML documents

Published: 10 April 2017

Volume 99, pages 493–506, (2017)
Cite this article

Computing Aims and scope Submit manuscript

Zhen Zhao^1,2 &
Zongmin Ma³

241 Accesses
2 Citations
Explore all metrics

Abstract

Document matching has become a crucial task for data integration. A considerable amount of algorithms for comparing XML documents have been proposed in the literature. Yet, the existing approaches fall short in ability to identify structural similarities of fuzzy XML documents. To fill this gap, in this paper, we provide an integrated comparison approach to cope with structural similarities of the fuzzy XML documents. Firstly, we propose a new fuzzy XML document tree model to represent fuzzy XML document. Secondly, we offer element/attribute features similarity measure approach to identify matching nodes. Thirdly, we present an effective algorithm based on the tree edit distance to detect the structural similarities between fuzzy XML document trees represented with the proposed model. Finally, the experimental results demonstrate that our approach can efficiently perform structural similarity measure of the fuzzy XML documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An approach of top-k keyword querying for fuzzy XML

Article 20 October 2017

Zongmin Ma, Ting Li & Li Yan

A Prufer Sequence Based Approach to Measure Structural Similarity of XML Documents

Towards Flexible Similarity Analysis of XML Data

References

Thomo A, Venkatesh S (2008) Rewriting of visibly pushdown languages for xml data integration. In: Proceedings of the 17th ACM conference on information and knowledge management. ACM, Napa Valley, pp 521–530
Nierman A, Jagadish HV (2002) Evaluating structural similarity in XML documents. In: Proceedings of ACM SIGMOD WebDB, vol 2. ACM, Madison, pp 61–66
Dalamagas T, Cheng T, Winkel KJ et al (2006) A methodology for clustering XML documents by structure. Inf Syst 31(3):187–228. doi:10.1016/j.is.2004.11.009
Article Google Scholar
Guha S, Jagadish HV, Koudas N, Srivastava D, Yu T (2006) Integrating XML data sources using approximate joins. ACM Trans Database Syst 31(1):161–207
Article Google Scholar
Köpcke H, Rahm E (2010) Frameworks for entity matching: a comparison. Data Knowl Eng 69(2):197–210. doi:10.1016/j.datak.2009.10.003
Article Google Scholar
Ribeiro L, H\(\ddot{a}\)rder T (2006) Entity identification in XML documents. In: 18th GI-workshop on the foundations of databases, pp 130–134
Weis M, Naumann F, Brosy F (2006) A duplicate detection benchmark for XML (and relational) data. In: SIGMOD 2006 workshop on information quality for information systems. Chicago
Oliboni B, Pozzani G (2008) Representing fuzzy information by using XML schema. In: Proceedings of the 19th international conference on database and expert systems application. Turin, pp 683-687. doi:10.1109/DEXA.2008.44
Abiteboul S, Segoufin L, Vianu V (2006) Representing and querying XML with incomplete information. ACM Trans Database Syst 31(1):208–254
Article Google Scholar
Nierman A, Jagadish HV (2002) ProTDB: probabilistic data in XML. In: Proceedings of the 28th international conference on vary large data bases. Hong Kong, VLDB Endowment, pp 646–657. doi:10.1016/B978-155860869-6/50063-9
Negoita C, Zadeh L, Zimmermann H (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 1:3–28
Article MathSciNet Google Scholar
Gaurav A, Alhajj R (2006) Incorporating fuzziness in XML and mapping fuzzy relational data into fuzzy XML. In: Proceedings of the 2006 ACM symposium on applied computing. ACM, Dijon, pp 456–460. doi:10.1145/1141277.1141386
Turowski K, Weng U (2002) Representing and processing fuzzy information-an XML-based approach. Knowl Based Syst 15(1):67–75. doi:10.1016/S0950-7051(01)00122-8
Article Google Scholar
Tekli J, Chbeir R, Traina AJ, Traina C, Fileto R (2015) Approximate XML structure validation based on document- grammar tree similarity. Inf Sci 295:258–302
Article MathSciNet Google Scholar
Tekli J, Chbeir R (2012) A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics. Web Semant 11:14–40. doi:10.1016/j.websem.2011.10.002
Article Google Scholar
Algergawy A, Nayak R, Saake G (2010) Element similarity measures in XML schema matching. Inf Sci 180(24):4975–4998. doi:10.1016/j.ins.2010.08.022
Article Google Scholar
Wojnar A, Mlýnková I, Dokulil J (2010) Structural and semantic aspects of similarity of document type definitions and XML schemas. Inf Sci 180(10):1817–1836
Article MathSciNet Google Scholar
Sabbah T, Selamat A, Ashraf M, Herawan T (2014) Effect of thesaurus size on schema matching quality. Knowl Based Syst 71:211–226. doi:10.1016/j.knosys.2014.08.002
Article Google Scholar
Ma ZM, Yan L (2007) Fuzzy XML data modeling with the UML and relational data models. Data Knowl Eng 63(3):972–996. doi:10.1016/j.datak.2007.06.003
Article Google Scholar
Nicol G, Wood L, Champion M et al (2001) Document object model (DOM) level 3 core specification. W3C Work Draft 13:1–146
Google Scholar
Cohen W W, Ravikumar P, Fienberg S E (2003) A comparison of string distance metrics for name-matching tasks. In: Kdd workshop on data cleaning and object consolidation, vol 3. Washington, pp 73–78
Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the international conference on machine learning. Madison, pp 296–304
Levenshtein VI (1966) Binary codes capable of correcting deletions. Insertions Revers Sov Phys Doklady 6:707–710
Google Scholar
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33(1):31–88
Article Google Scholar
Marie A, Gal A (2008) Boosting schema matchers. In: Proceedings of the OTM 2008 confederated inter. Conferences. Springer, Monterrey, pp 283–300
XML Data Repository. http://www.cs.washington.edu/research/xmldatasets/
Sorrentino S, Bergamaschi S, Gawinecki M, Po L (2010) Schema label normalization for improving schema matching. Data Knowl Eng 69(12):1254–1273. doi:10.1016/j.datak.2010.10.004
Article Google Scholar

Download references

Acknowledgements

This work was supported by the \(National Natural Science Foundation of China \) (61370075 & 61572118) and the \(Program for New Century Excellent Talents in University \) (NCET- 05-0288).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Northeastern University, Shenyang, 110819, Liaoning, People’s Republic of China
Zhen Zhao
School of Information Science and Technology, Bohai University, Jinzhou, 121013, Liaoning, People’s Republic of China
Zhen Zhao
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, Jiangsu, People’s Republic of China
Zongmin Ma

Authors

Zhen Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zongmin Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zongmin Ma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, Z., Ma, Z. A methodology for measuring structure similarity of fuzzy XML documents. Computing 99, 493–506 (2017). https://doi.org/10.1007/s00607-017-0553-x

Download citation

Received: 25 February 2016
Accepted: 25 March 2017
Published: 10 April 2017
Issue Date: May 2017
DOI: https://doi.org/10.1007/s00607-017-0553-x

Keywords

Mathematics Subject Classification

68Q42(05C05)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A methodology for measuring structure similarity of fuzzy XML documents

Abstract

Access this article

Similar content being viewed by others

An approach of top-k keyword querying for fuzzy XML

A Prufer Sequence Based Approach to Measure Structural Similarity of XML Documents

Towards Flexible Similarity Analysis of XML Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A methodology for measuring structure similarity of fuzzy XML documents

Abstract

Access this article

Similar content being viewed by others

An approach of top-k keyword querying for fuzzy XML

A Prufer Sequence Based Approach to Measure Structural Similarity of XML Documents

Towards Flexible Similarity Analysis of XML Data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation