Abstract
Clustering on XML documents is an important task. However, it is difficult to select the appropriate parameters’ value for the clustering algorithms. By integrating outlier detection with clustering, the paper takes a new approach for analyzing the XML documents by structure distance. After stating the XML tree distance, the paper proposes a new clustering algorithm, which stops clustering automatically by utilizing the outlier information and needs only one parameter, whose appropriate value range can be decided in the outlier mining process. The paper adopts the XML dataset with different structure and other real-life datasets to compare it with other clustering algorithms.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML Schemas for Effective Integration. In: Proc. 11th ACM Int. Conf. on Information and Knowledge Management, pp. 292–299 (2002)
Shen, Y., Wang, B.: Clustering Schemaless XML Document. In: Proc. of the 11th Int. Conf. on Cooperative Information System, pp. 767–784 (2003)
Dalamagas, T., et al.: Clustering XML documents by structure. In: Proceedings Methods and Applications of Artificial Intelligence, pp. 112–121 (2004)
Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiment and Analysis. Technical Report #01-40, University of Minnesota (2001)
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: Proc. of the 15th Int’l Conf. on Data Eng. (1999)
Fred, A.L.N., Leitão, J.M.N.: A new Cluster Isolation criterion Based on Dissimilarity Increments. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(8), 944–958 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lv, Ty., Zhang, Xz., Zuo, Wl., Wang, Zx. (2006). XML Clustering Based on Common Neighbor. In: Shen, H.T., Li, J., Li, M., Ni, J., Wang, W. (eds) Advanced Web and Network Technologies, and Applications. APWeb 2006. Lecture Notes in Computer Science, vol 3842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11610496_18
Download citation
DOI: https://doi.org/10.1007/11610496_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31158-4
Online ISBN: 978-3-540-32435-5
eBook Packages: Computer ScienceComputer Science (R0)