PKU at INEX 2010 XML Mining Track

Wang, Songlin; Liang, Feng; Yang, Jianwu

doi:10.1007/978-3-642-23577-1_37

Songlin Wang²⁰,
Feng Liang²⁰ &
Jianwu Yang²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6932))

Included in the following conference series:

International Workshop of the Initiative for the Evaluation of XML Retrieval

411 Accesses

Abstract

This paper presents our participation in the INEX 2010 XML Mining track. Our classification and clustering solutions for XML documents have used both the structure and content information, where the frequent subtrees as structural units are used for content extraction from the XML document. In addition, we used the WordNet and the link information for better performance, and applied the structured link vector model for classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Mining Cluster Patterns in XML Corpora via Latent Topic Models of Content and Structure

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

Article 04 August 2017

Clustering XML Documents Using Frequent Edge-Sets

References

Yang, J., Chen, X.: A semi_structured document model for text mining. Journal of Computer Science and Technology 17(5), 603–610 (2002)
Article MATH Google Scholar
Salton, G., McGill, M.J.: Introduction to Modern information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Robertson, S.E., Spark Jones, K.: Relevance weighting of search terms. JASIST 27(3), 129–146 (1976)
Article Google Scholar
Chi, Y., Nijssen, S., Muntz, R.R., Kok, J.N.: Frequent Subtree Mining – An Overview. Fundamenta Information (2005)
Google Scholar
Chi, Y., Yang, Y., Xia, Y., Muntz, R.R.: CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. In: The Eighth Pacific Asia Conference on Knowledge Discover and Data Mining (2004)
Google Scholar
Xie, W., Manmadov, M., Yearwood, J.: Using Links to Aid Web Classification. In: ICIS 2007 (2007) 0-7695-2841-4/07
Google Scholar
Yang, J., Zhang, F.: XML Document Classification using Extended VSM. In: Pre-Proceedings of the Sixth Workshop of Initiative for the Evaluation of XML Retrieval, Dagstuhl, Germany (2007)
Google Scholar
Berry, M.: Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, Heidelberg (2003)
Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Yang, J., Wang, S.: Extended VSM for XML document classification using frequent subtrees. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009. LNCS, vol. 6203, pp. 408–415. Springer, Heidelberg (2010)
Chapter Google Scholar
Shin, K., Han, S.Y.: Fast clustering algorithm for information organization. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 619–622. Springer, Heidelberg (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Sci. & Tech., Peking University, Beijing, 100871, China
Songlin Wang, Feng Liang & Jianwu Yang

Authors

Songlin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Liang
View author publications
You can also search for this author in PubMed Google Scholar
Jianwu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Science and Technology, Queensland University of Technology, GPO Box 2434, Qld 4001, Brisbane, Australia
Shlomo Geva
Archives and Information Studies/Humanities, University of Amsterdam, Turfdraagsterpad 9, 1012XT, Amsterdam, The Netherlands
Jaap Kamps
Multimodal Computing and Interaction, Saarland University, 66123, Saarbrücken, Germany
Ralf Schenkel
Department of Computer Science, University of Otago, P.O. Box 56, 9054, Dunedin, New Zealand
Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Liang, F., Yang, J. (2011). PKU at INEX 2010 XML Mining Track. In: Geva, S., Kamps, J., Schenkel, R., Trotman, A. (eds) Comparative Evaluation of Focused Retrieval. INEX 2010. Lecture Notes in Computer Science, vol 6932. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23577-1_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-23577-1_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23576-4
Online ISBN: 978-3-642-23577-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PKU at INEX 2010 XML Mining Track

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Mining Cluster Patterns in XML Corpora via Latent Topic Models of Content and Structure

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

Clustering XML Documents Using Frequent Edge-Sets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

PKU at INEX 2010 XML Mining Track

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Mining Cluster Patterns in XML Corpora via Latent Topic Models of Content and Structure

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

Clustering XML Documents Using Frequent Edge-Sets

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation