Adaptive subspace learning: an iterative approach for document clustering

Wu, Xian; Chen, Xiaoming; Li, Xiang; Zhou, Lingli; Lai, Jianhuang

doi:10.1007/s00521-013-1486-8

Adaptive subspace learning: an iterative approach for document clustering

Original Article
Published: 05 October 2013

Volume 25, pages 333–342, (2014)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Xian Wu^1,2,
Xiaoming Chen³,
Xiang Li⁴,
Lingli Zhou⁵ &
…
Jianhuang Lai¹

402 Accesses
Explore all metrics

Abstract

The performance of clustering in document space can be influenced by the high dimension of the vectors, because there exists a great deal of redundant information in the high-dimensional vectors, which may make the similarity between vectors inaccurate. Hence, it is very considerable to derive a low-dimensional subspace that contains less redundant information, so that document vectors can be grouped more reasonably. In general, learning a subspace and clustering vectors are treated as two independent steps; in this case, we cannot estimate whether the subspace is appropriate for the method of clustering or vice versa. To overcome this drawback, this paper combines subspace learning and clustering into an iterative procedure named adaptive subspace learning (ASL). Firstly, the intracluster similarity and the intercluster separability of vectors can be increased via the initial cluster indicators in the step of subspace learning, and then affinity propagation is adopted to partition the vectors into a specific number of clusters, so as to update the cluster indicators and repeat subspace learning. In ASL, the obtained subspace can become more suitable for the clustering with the iterative optimization. The proposed method is evaluated using NG20, Classic3 and K1b datasets, and the results are shown to be superior to the conventional methods of document clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

k-SubMix: Common Subspace Clustering on Mixed-Type Data

Robust subspace clustering via two-way manifold representation

Article 27 June 2024

Two-dimensional k-subspace clustering and its applications on image recognition

Article 23 February 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Article Google Scholar
Zhong S, Ghosh J (2005) Generative model-based document clustering: a comparative study. Knowl Inf Syst 8(3):374–384
Article Google Scholar
Andrews NO, Fox EA (2007) Recent developments in document clustering, Technical Report TR-07-35, Computer Science
Premalatha K, Natarajan AM (2010) A literature review on document clustering. Inf Technol J 9(5):993–1002
Article Google Scholar
Jing L, Ng MK, Huang JZ (2010) Knowledge-based vector space model for text clustering. Knowl Inf Syst 25(1):35–55
Article Google Scholar
Sjöberg M, Laaksonen J, Honkela T, Pöllä M (2008) Inferring semantics from textual information in multimedia retrieval. Neurocomputing 71(13):2576–2586
Article Google Scholar
Ding C, He X (2004) K-means clustering via principal component analysis. ACM international conference on machine learning
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. ACM international conference on machine learning
Shahnaz F, Berry MW, Pauca VP, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag 42(2):373–386
Article MATH Google Scholar
Zhu Z, Guo YF, Zhu X, Xue X (2010) Normalized dimensionality reduction using nonnegative matrix factorization. Neurocomputing 73(10):1783–1793
Article Google Scholar
Chen C, Zhang L, Bu J, Wang C, Chen W (2010) Constrained Laplacian eigenmap for dimensionality reduction. Neurocomputing 73(4–6):951–958
Article Google Scholar
Cai D, He X, Han JW (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1638
Article Google Scholar
Zhang T, Tang Y, Fang B, Xiang Y (2011) Document clustering in correlation similarity measure space. IEEE Trans Knowl Data Eng 99:1–13
Google Scholar
Ding C, He X, Zha H, Simon HD (2002) Adaptive dimension reduction for clustering high dimensional data. IEEE international conference on data mining, pp 147–154
Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. ACM SIGIR international conference on research and development in information retrieval, pp 218–225
Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and K-means clustering. IEEE international conference on machine learning, pp 521–528
Wang F, Zhang C (2007) Feature extraction by maximizing the average neighborhood margin. IEEE conference on computer vision and pattern recognition, pp 1–8
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Article MATH MathSciNet Google Scholar
Guan R, Shi X, Marchese M, Yang C, Liang Y (2011) Text clustering with seeds affinity propagation. IEEE Trans Knowl Data Eng 23(4):627–637
Article Google Scholar
Sun C, Wang Y, Zhao H (2009) Web page clustering via partition adaptive affinity propagation. In: International symposium on neural networks, pp 727–736
Lu Z, Carreira-Perpinán MA (2008) Constrained spectral clustering through affinity propagation. IEEE international conference on computer vision and pattern recognition
Zhang X, Wang W, Norvag K, Sebag M (2010) K-AP: generating specified K clusters by efficient affinity propagation. IEEE international conference on data mining, pp 1187–1192
Han EH, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1998) WebACE: a web agent for document categorization and exploration. ACM international conference on autonomous agents
Dhillon IS, Guan Y (2003) Clustering large and sparse co-occurrence data. SIAM international conference on data mining
Wu JS, Lai JH, Wang CD (2011) A novel co-clustering method with intra-similarities. IEEE international conference on data mining workshops, pp 300–306
Manevitz L, Yousef M (2007) One-class document classification via neural networks. Neurocomputing 70(7):1466–1481
Article Google Scholar
Liu H, Wu Z (2010) Non-negative matrix factorization with constraints. AAAI conference on artificial intelligence
Lovász L, Plummer MD (1986) Matching theory. North Holland, Amsterdam
MATH Google Scholar
Wang F, Wang X, Zhang D, Zhang C, Li T (2009) MarginFace: a novel face recognition method by average neighborhood margin maximization. Pattern Recognit 42(11):2863–2875
Article MATH MathSciNet Google Scholar

Download references

Acknowledgments

The project was supported by the National Science Foundation of China (61173084) and the China Postdoctoral Science Foundation (2011M-501360). Acknowledgement is also given to Mr. Wu Jiansheng for contributing the document corpora after the preprocessing.

Author information

Authors and Affiliations

School of Information Science and Technology, Sun Yat-sen University, Guangzhou, 510006, People’s Republic of China
Xian Wu & Jianhuang Lai
Nanfang Media Group, Guangzhou, 510601, People’s Republic of China
Xian Wu
Software Development, Digital Technology International (DTI) Group Ltd, Perth, Western Australia, 6105, Australia
Xiaoming Chen
School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, People’s Republic of China
Xiang Li
School of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou, 510275, People’s Republic of China
Lingli Zhou

Authors

Xian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Lingli Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jianhuang Lai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoming Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, X., Chen, X., Li, X. et al. Adaptive subspace learning: an iterative approach for document clustering. Neural Comput & Applic 25, 333–342 (2014). https://doi.org/10.1007/s00521-013-1486-8

Download citation

Received: 07 January 2013
Accepted: 10 September 2013
Published: 05 October 2013
Issue Date: August 2014
DOI: https://doi.org/10.1007/s00521-013-1486-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Adaptive subspace learning: an iterative approach for document clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

k-SubMix: Common Subspace Clustering on Mixed-Type Data

Robust subspace clustering via two-way manifold representation

Two-dimensional k-subspace clustering and its applications on image recognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Adaptive subspace learning: an iterative approach for document clustering

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

k-SubMix: Common Subspace Clustering on Mixed-Type Data

Robust subspace clustering via two-way manifold representation

Two-dimensional k-subspace clustering and its applications on image recognition

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation