Abstract
XML documents possess inherent semi-structured property, consisting of structural and content features. Most existing methods for XML documents clustering consider only one aspect of them. In this paper, we propose a fuzzy XML documents projected clustering algorithm, which can be used to cluster XML documents efficiently by combining the structural and content features. Another contribution is the adoption of some fuzzy techniques in a way that each frequent induced substructure has a fuzzy parameter associated with each cluster. Experimental results on both synthetic and real datasets show its effectiveness, especially when applying to large schemaless XML document collections.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Ta, N., Wang, J., Feng, J., Zaki, M.: Xproj: a framework for projected structural clustering of xml documents. In: Proceeding of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2007, pp. 46–55 (2007)
Kutty, S., Nayak, R., Li, Y.: XCFS - An XML Documents Clustering Approach using both the Structure and the Content. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 1729–1732 (2009)
Seeland, M., Girschick, T., Buchwald, F., Kramer, S.: Online Structural Graph Clustering using Frequent Subgraph Mining. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 213–228. Springer, Heidelberg (2010)
Tran, T., Nayak, R.: Document Clustering using Incremental and Pairwise Approaches. Focused Access to XML Documents. 222-232 (2008)
Doucet, A., Ahonen-Myka, H.: Naive clustering of a large XML document collection. In: Proceedings of the First Workshop of the INitiative for the Evaluation of XML Retrieval, INEX 2002, pp. 81–87 (2002)
Kutty, S., Nayak, R., Li, Y.: XML Documents Clustering using Tensor Space Model A Preliminary Study. In: Proceedings of the 10th IEEE International Conference on Data Mining Workshops, ICDMW 2010, pp. 1167–1173 (2010)
Lesniewska, A.: Clustering XML Documents by Structure. In: Advances in Databases and Information Systems - Associated Workshops and Doctoral Consortium of the 13th East European Conference, ADBIS 2009, pp. 238–246 (2009)
Gan, G., Wu, J., Yang, Z.: The XML web: a first study. In: Proceedings of the 12th International Conference on World Wide Web, WWW 2003, pp. 500–510 (2003)
Hwang, J.H., Ryu, K.H.: A weighted common structure based clustering technique for XML documents. Journal of Systems and Software, 1267–1274 (2010)
Tekli, J., Chbeir, R., Yetongnon, K.: An overview on XML similarity: Background, current trends and future directions. Computer Science Review, 151–173 (2009)
Kutty, S., Nayak, R., Li, Y.: HCX: An Efficient Hybrid Clustering Approach for XML Documents. In: Proceedings of the 2009 ACM Symposium on Document Engineering, DocEng 2009, pp. 94–97 (2009)
Zhang, L., Li, Z., Chen, Q., Li, N.: Structure and Content Similarity for Clustering XML Documents. In: Shen, H.T., Pei, J., Özsu, M.T., Zou, L., Lu, J., Ling, T.-W., Yu, G., Zhuang, Y., Shao, J. (eds.) WAIM 2010. LNCS, vol. 6185, pp. 116–124. Springer, Heidelberg (2010)
Domeniconi, C., Papadopoulos, D., Gunopulos, D., Ma, S.: Subspace clustering of high dimensional data. In: Proceedings of the SIAM International Conference on Data Mining (2004)
Abel, J., Teahan, W.: Universal Text Preprocessing for Data Compression. IEEE Transactions on Computers, 497–507 (2005)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management, 513–523 (1988)
Dalamagas, T., Cheng, T., Winkel, K.-J., Sellis, T.K.: Clustering XML Documents Using Structural Summaries. In: Lindner, W., Fischer, F., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 547–556. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ji, T., Bao, X., Yang, D. (2011). FXProj – A Fuzzy XML Documents Projected Clustering Based on Structure and Content. In: Tang, J., King, I., Chen, L., Wang, J. (eds) Advanced Data Mining and Applications. ADMA 2011. Lecture Notes in Computer Science(), vol 7120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25853-4_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-25853-4_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25852-7
Online ISBN: 978-3-642-25853-4
eBook Packages: Computer ScienceComputer Science (R0)