Abstract
This paper presents our participation in the INEX 2010 XML Mining track. Our classification and clustering solutions for XML documents have used both the structure and content information, where the frequent subtrees as structural units are used for content extraction from the XML document. In addition, we used the WordNet and the link information for better performance, and applied the structured link vector model for classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yang, J., Chen, X.: A semi_structured document model for text mining. Journal of Computer Science and Technology 17(5), 603–610 (2002)
Salton, G., McGill, M.J.: Introduction to Modern information Retrieval. McGraw-Hill, New York (1983)
Robertson, S.E., Spark Jones, K.: Relevance weighting of search terms. JASIST 27(3), 129–146 (1976)
Chi, Y., Nijssen, S., Muntz, R.R., Kok, J.N.: Frequent Subtree Mining – An Overview. Fundamenta Information (2005)
Chi, Y., Yang, Y., Xia, Y., Muntz, R.R.: CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. In: The Eighth Pacific Asia Conference on Knowledge Discover and Data Mining (2004)
Xie, W., Manmadov, M., Yearwood, J.: Using Links to Aid Web Classification. In: ICIS 2007 (2007) 0-7695-2841-4/07
Yang, J., Zhang, F.: XML Document Classification using Extended VSM. In: Pre-Proceedings of the Sixth Workshop of Initiative for the Evaluation of XML Retrieval, Dagstuhl, Germany (2007)
Berry, M.: Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, Heidelberg (2003)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Yang, J., Wang, S.: Extended VSM for XML document classification using frequent subtrees. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009. LNCS, vol. 6203, pp. 408–415. Springer, Heidelberg (2010)
Shin, K., Han, S.Y.: Fast clustering algorithm for information organization. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 619–622. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, S., Liang, F., Yang, J. (2011). PKU at INEX 2010 XML Mining Track. In: Geva, S., Kamps, J., Schenkel, R., Trotman, A. (eds) Comparative Evaluation of Focused Retrieval. INEX 2010. Lecture Notes in Computer Science, vol 6932. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23577-1_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-23577-1_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23576-4
Online ISBN: 978-3-642-23577-1
eBook Packages: Computer ScienceComputer Science (R0)