Abstract
The performance of clustering in document space can be influenced by the high dimension of the vectors, because there exists a great deal of redundant information in the high-dimensional vectors, which may make the similarity between vectors inaccurate. Hence, it is very considerable to derive a low-dimensional subspace that contains less redundant information, so that document vectors can be grouped more reasonably. In general, learning a subspace and clustering vectors are treated as two independent steps; in this case, we cannot estimate whether the subspace is appropriate for the method of clustering or vice versa. To overcome this drawback, this paper combines subspace learning and clustering into an iterative procedure named adaptive subspace learning (ASL). Firstly, the intracluster similarity and the intercluster separability of vectors can be increased via the initial cluster indicators in the step of subspace learning, and then affinity propagation is adopted to partition the vectors into a specific number of clusters, so as to update the cluster indicators and repeat subspace learning. In ASL, the obtained subspace can become more suitable for the clustering with the iterative optimization. The proposed method is evaluated using NG20, Classic3 and K1b datasets, and the results are shown to be superior to the conventional methods of document clustering.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678
Zhong S, Ghosh J (2005) Generative model-based document clustering: a comparative study. Knowl Inf Syst 8(3):374–384
Andrews NO, Fox EA (2007) Recent developments in document clustering, Technical Report TR-07-35, Computer Science
Premalatha K, Natarajan AM (2010) A literature review on document clustering. Inf Technol J 9(5):993–1002
Jing L, Ng MK, Huang JZ (2010) Knowledge-based vector space model for text clustering. Knowl Inf Syst 25(1):35–55
Sjöberg M, Laaksonen J, Honkela T, Pöllä M (2008) Inferring semantics from textual information in multimedia retrieval. Neurocomputing 71(13):2576–2586
Ding C, He X (2004) K-means clustering via principal component analysis. ACM international conference on machine learning
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. ACM international conference on machine learning
Shahnaz F, Berry MW, Pauca VP, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag 42(2):373–386
Zhu Z, Guo YF, Zhu X, Xue X (2010) Normalized dimensionality reduction using nonnegative matrix factorization. Neurocomputing 73(10):1783–1793
Chen C, Zhang L, Bu J, Wang C, Chen W (2010) Constrained Laplacian eigenmap for dimensionality reduction. Neurocomputing 73(4–6):951–958
Cai D, He X, Han JW (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1638
Zhang T, Tang Y, Fang B, Xiang Y (2011) Document clustering in correlation similarity measure space. IEEE Trans Knowl Data Eng 99:1–13
Ding C, He X, Zha H, Simon HD (2002) Adaptive dimension reduction for clustering high dimensional data. IEEE international conference on data mining, pp 147–154
Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. ACM SIGIR international conference on research and development in information retrieval, pp 218–225
Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and K-means clustering. IEEE international conference on machine learning, pp 521–528
Wang F, Zhang C (2007) Feature extraction by maximizing the average neighborhood margin. IEEE conference on computer vision and pattern recognition, pp 1–8
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Guan R, Shi X, Marchese M, Yang C, Liang Y (2011) Text clustering with seeds affinity propagation. IEEE Trans Knowl Data Eng 23(4):627–637
Sun C, Wang Y, Zhao H (2009) Web page clustering via partition adaptive affinity propagation. In: International symposium on neural networks, pp 727–736
Lu Z, Carreira-Perpinán MA (2008) Constrained spectral clustering through affinity propagation. IEEE international conference on computer vision and pattern recognition
Zhang X, Wang W, Norvag K, Sebag M (2010) K-AP: generating specified K clusters by efficient affinity propagation. IEEE international conference on data mining, pp 1187–1192
Han EH, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1998) WebACE: a web agent for document categorization and exploration. ACM international conference on autonomous agents
Dhillon IS, Guan Y (2003) Clustering large and sparse co-occurrence data. SIAM international conference on data mining
Wu JS, Lai JH, Wang CD (2011) A novel co-clustering method with intra-similarities. IEEE international conference on data mining workshops, pp 300–306
Manevitz L, Yousef M (2007) One-class document classification via neural networks. Neurocomputing 70(7):1466–1481
Liu H, Wu Z (2010) Non-negative matrix factorization with constraints. AAAI conference on artificial intelligence
Lovász L, Plummer MD (1986) Matching theory. North Holland, Amsterdam
Wang F, Wang X, Zhang D, Zhang C, Li T (2009) MarginFace: a novel face recognition method by average neighborhood margin maximization. Pattern Recognit 42(11):2863–2875
Acknowledgments
The project was supported by the National Science Foundation of China (61173084) and the China Postdoctoral Science Foundation (2011M-501360). Acknowledgement is also given to Mr. Wu Jiansheng for contributing the document corpora after the preprocessing.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, X., Chen, X., Li, X. et al. Adaptive subspace learning: an iterative approach for document clustering. Neural Comput & Applic 25, 333–342 (2014). https://doi.org/10.1007/s00521-013-1486-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-013-1486-8