XML document clustering: techniques and challenges

Asghari, Elaheh; KeyvanPour, MohammadReza

doi:10.1007/s10462-012-9379-2

XML document clustering: techniques and challenges

Published: 04 January 2013

Volume 43, pages 417–436, (2015)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Elaheh Asghari¹ &
MohammadReza KeyvanPour¹

663 Accesses
6 Citations
Explore all metrics

Abstract

The increasing availability of heterogeneous XML sources has raised a number of issues concerning how to represent and manage these semi-structured data. In recent years due to the importance of managing these resources and extracting knowledge from them, lots of methods have been proposed in order to represent and cluster them in different ways. Different similarity measures have been extended and also in some context semantic issues have been taken into account. In this context, we review different XML clustering methods with considering different representation methods such as tree based and vector based with use of different similarity measures. We also propose taxonomy for these proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Alshahat A, Algergawy A (2010) Management of xml data by means of schema matching. Publisher Dr, Hut. ISBN 3868533834, 9783868533835
Antonellis P, Makris C, Tsirakis N (2008) XEdge: clustering homogeneous and heterogeneous XML documents using edge summaries. In: Proceedings of the 2008 ACM symposium on applied computing (SAC ’08). ACM, New York, NY, USA, pp 1081–1088
Bray T, Paoli J (2000) Extensible markup language (XML) 1.0, 2nd edn. Sperberg-McQueen CM University of Illinois at Chicago and text encoding initiative. Sun Microsystems Inc, Eve Maler
Dalamagas T, Cheng T, Winkel KJ, Sellis T (2006) A methodology for clustering XML documents by structure. Inf Syst 31(3):187–228
Google Scholar
Doucet A, Lehtonen M (2006) Unsupervised classification of text-centric XML document collections. In: Comparative evaluation of XML information retrieval systems, 5th international workshop of the initiative for the evaluation of XML retrieval, INEX 2006, Dagstuhl Castle, Germany, December 17–20, 2006, Revised and selected papers. Volume 4518 of Lecture Notes in Computer Science. Springer, pp 497–509
Doucet A, Myka HA (2002) Naive clustering of a large XML document collection. In: Proceedings of the INEX annual ERCIM, workshop, pp 81–88
Flesca S, Manco G, Masciari E, Pontieri L, Pugliese A (2002) Detecting structural similarities between XML documents. In: Proceedings of the international workshop on the web and databases (WebDB)
Flesca S, Manco G, Masciari E, Pontieri L (2005) Fast detection of XML structural similarity. IEEE Trans Knowl Data Eng 17(2):160–175
Article Google Scholar
Kozielski M (2007) Application of different clustering algorithms to multilevel clustering of XML documents, vol 16. Institute of Informatics, Silesian University of Technology, Akademicka Gliwice, pp 44–100
Lian W, Cheung DW, Mamoulis N, Yiu SM (2004) An efficient and scalable algorithm for clustering XML documents by structure. IEEE Trans Knowl Data Eng 16(1):82–96
Article Google Scholar
Nayak R (2006) Investigating semantic measures in XML clustering. In: Proceedings of the (2006) IEEE/WIC/ACM international conference on web intelligence (WI ’06). IEEE Computer Society, Washington, DC, USA, pp 1042–1045
Nayak R, De Vries CM, Kutty S, Geva Sh, Denoyer L, Gallinari P (2009) Overview of the INEX 2009 XML mining track : clustering and classification of XML documents. In: Focused retrieval and evaluation: proceedings of 8th international workshop of the initiative for the evaluation of XML retrieval, INEX (2009). Springer, Brisbane, Queensland, pp 366–378
Nayak R, Xu S (2006) XCLS: a fast and effective clustering algorithm for heterogenous XML documents. In: Ng WK, Kitsuregawa M, Chang K (eds) Advances in knowledge discovery and data mining: proceedings of the 10th Pacific-Asia conference (LNCS 3918) 9–12 April, 2006, Singapore
Nierman A, Jagadish HV (2002) Evaluating structural similarity in XML documents. In: Proceedings ACM SIGMOD WebDB (international workshop on the web and databases), workshop, pp 61–66
Ruso LR (2012) XML data mining, part 3: clustering XML documents for improved data mining. DW and BI consultant, computershare technology services Australia, La Trobe University Australia, Development Team Lead
Tagarelli A, Greco S (2006) Toward semantic XML clustering. In: Proceedings of the sixth SIAM international conference on data mining, University of Calabria
Tagarelli A, Greco S (2010) Semantic clustering of XML documents. ACM Trans Inf Syst 28(1):3
Article Google Scholar
Yang J, Cheung W K, Chen X (2005) Learning the Kernel matrix for XML document clustering. In: IEEE international conference on e-technology, e-commerce and e-service, pp 353–358
Yang R, Kalnis P, Tung A (2005) Similarity evaluation on tree-structured data. In: Proceedings of the ACM international conference on management of data, pp 754–765
Yoon J, Raghavan V, Chakilam V, Kerschberg V (2001) BitCube: a three-dimensional bitmap indexing for XML documents. J Intell Inf Syst 17:241–254
Article MATH Google Scholar
Zhang K, Shasha D (1989) Simple fast algorithms for the editing distance between trees and related problems. SIAM J Comput 18(6):1245–1262
Article MATH MathSciNet Google Scholar
Zhao B, Zhang Y, Zhang H (2008) A robust clustering method for XML documents. In: International conference on information management, innovation management and industrial engineering

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Alzahra University, Tehran, Iran
Elaheh Asghari & MohammadReza KeyvanPour

Authors

Elaheh Asghari
View author publications
You can also search for this author in PubMed Google Scholar
MohammadReza KeyvanPour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elaheh Asghari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Asghari, E., KeyvanPour, M. XML document clustering: techniques and challenges. Artif Intell Rev 43, 417–436 (2015). https://doi.org/10.1007/s10462-012-9379-2

Download citation

Published: 04 January 2013
Issue Date: March 2015
DOI: https://doi.org/10.1007/s10462-012-9379-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

XML document clustering: techniques and challenges

Abstract

Access this article

Similar content being viewed by others

Clustering XML documents by patterns

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

Clustering XML Documents Using Frequent Edge-Sets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

XML document clustering: techniques and challenges

Abstract

Access this article

Similar content being viewed by others

Clustering XML documents by patterns

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

Clustering XML Documents Using Frequent Edge-Sets

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation