Abstract
This article presents an original supervised classification technique for XML documents which is based on structure only. Each XML document is viewed as an ordered labeled tree, represented by his tags only. Our method has three steps. After a cleaning step, we characterize each predefined cluster in terms of frequent structural subsequences. Then we classify the XML documents based on the mined patterns of each cluster.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of the 1993 ACM SIGMOD Conference, Washington DC, USA, pp. 207–216 (May 1993)
Theodore Dalamagas, T., Cheng, K., Winkel, K., Sellis, T.: Clustering xml documents using structural summarie. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 547–556. Springer, Heidelberg (2004)
De Francesca, F., Gordano, G., Ortale, R., Tagarelli, A.: Distance-based clustering of xml documents. In: ECML/PKDD 2003 workshop proceedings, pp. 75–78 (September 2003)
Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT: a system for extracting document type descriptors from XML documents, pp. 165–176 (2000)
Laur, P.A., Masseglia, F., Poncelet, P.: Schema mining: Finding structural regularity among semi structured data. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 498–503. Springer, Heidelberg (2000)
Lian, W., Cheung, D.W.-L., Mamoulis, N., Yiu, S.-M.: An efficient and scalable algorithm for clustering xml documents by structure. IEEE Trans. Knowl. Data Eng. 16(1) (January 2004)
Miyahara, T., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of frequent tree structured patterns in semistructured web documents. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS, vol. 2035, pp. 47–52. Springer, Heidelberg (2001)
Nestorov, S., Abiteboul, S., Motwani, R.: Extracting schema from semistructured data, pp. 295–306 (1998)
Nierman, A., Jagadish, H.V.: Evaluating structural similarity in XML documents. In: Proceedings of the Fifth International Workshop on the Web and Databases (WebDB 2002), Madison, Wisconsin, USA (June 2002)
Srikant, R., Agrawal, R.: Mining Sequential Patterns: Generalizations and Performance Improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)
Termier, A., Rousset, M.-C., Se’bag, M.: Treefinder: a first step towards xml data mining. In: International Conference on Data Mining (ICDM 2002), Maebashi City, Japan (2002)
Wang, K., Liu, H.: Discovering structural association of semistructured data. Knowledge and Data Engineering 12(2), 353–371 (2000)
Zaki, M.: Efficiently mining frequent trees in a forest. In: KDD 2002 (July 2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garboni, C., Masseglia, F., Trousse, B. (2006). Sequential Pattern Mining for Structure-Based XML Document Classification. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds) Advances in XML Information Retrieval and Evaluation. INEX 2005. Lecture Notes in Computer Science, vol 3977. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-34963-1_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-34963-1_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34962-4
Online ISBN: 978-3-540-34963-1
eBook Packages: Computer ScienceComputer Science (R0)