Skip to main content
Log in

Adaptive subspace learning: an iterative approach for document clustering

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The performance of clustering in document space can be influenced by the high dimension of the vectors, because there exists a great deal of redundant information in the high-dimensional vectors, which may make the similarity between vectors inaccurate. Hence, it is very considerable to derive a low-dimensional subspace that contains less redundant information, so that document vectors can be grouped more reasonably. In general, learning a subspace and clustering vectors are treated as two independent steps; in this case, we cannot estimate whether the subspace is appropriate for the method of clustering or vice versa. To overcome this drawback, this paper combines subspace learning and clustering into an iterative procedure named adaptive subspace learning (ASL). Firstly, the intracluster similarity and the intercluster separability of vectors can be increased via the initial cluster indicators in the step of subspace learning, and then affinity propagation is adopted to partition the vectors into a specific number of clusters, so as to update the cluster indicators and repeat subspace learning. In ASL, the obtained subspace can become more suitable for the clustering with the iterative optimization. The proposed method is evaluated using NG20, Classic3 and K1b datasets, and the results are shown to be superior to the conventional methods of document clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://people.csail.mit.edu/jrennie/20Newsgroups/.

  2. ftp://ftp.cs.cornell.edu/pub/smart/.

References

  1. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

    Article  Google Scholar 

  2. Zhong S, Ghosh J (2005) Generative model-based document clustering: a comparative study. Knowl Inf Syst 8(3):374–384

    Article  Google Scholar 

  3. Andrews NO, Fox EA (2007) Recent developments in document clustering, Technical Report TR-07-35, Computer Science

  4. Premalatha K, Natarajan AM (2010) A literature review on document clustering. Inf Technol J 9(5):993–1002

    Article  Google Scholar 

  5. Jing L, Ng MK, Huang JZ (2010) Knowledge-based vector space model for text clustering. Knowl Inf Syst 25(1):35–55

    Article  Google Scholar 

  6. Sjöberg M, Laaksonen J, Honkela T, Pöllä M (2008) Inferring semantics from textual information in multimedia retrieval. Neurocomputing 71(13):2576–2586

    Article  Google Scholar 

  7. Ding C, He X (2004) K-means clustering via principal component analysis. ACM international conference on machine learning

  8. Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. ACM international conference on machine learning

  9. Shahnaz F, Berry MW, Pauca VP, Plemmons RJ (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag 42(2):373–386

    Article  MATH  Google Scholar 

  10. Zhu Z, Guo YF, Zhu X, Xue X (2010) Normalized dimensionality reduction using nonnegative matrix factorization. Neurocomputing 73(10):1783–1793

    Article  Google Scholar 

  11. Chen C, Zhang L, Bu J, Wang C, Chen W (2010) Constrained Laplacian eigenmap for dimensionality reduction. Neurocomputing 73(4–6):951–958

    Article  Google Scholar 

  12. Cai D, He X, Han JW (2005) Document clustering using locality preserving indexing. IEEE Trans Knowl Data Eng 17(12):1624–1638

    Article  Google Scholar 

  13. Zhang T, Tang Y, Fang B, Xiang Y (2011) Document clustering in correlation similarity measure space. IEEE Trans Knowl Data Eng 99:1–13

    Google Scholar 

  14. Ding C, He X, Zha H, Simon HD (2002) Adaptive dimension reduction for clustering high dimensional data. IEEE international conference on data mining, pp 147–154

  15. Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. ACM SIGIR international conference on research and development in information retrieval, pp 218–225

  16. Ding C, Li T (2007) Adaptive dimension reduction using discriminant analysis and K-means clustering. IEEE international conference on machine learning, pp 521–528

  17. Wang F, Zhang C (2007) Feature extraction by maximizing the average neighborhood margin. IEEE conference on computer vision and pattern recognition, pp 1–8

  18. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MATH  MathSciNet  Google Scholar 

  19. Guan R, Shi X, Marchese M, Yang C, Liang Y (2011) Text clustering with seeds affinity propagation. IEEE Trans Knowl Data Eng 23(4):627–637

    Article  Google Scholar 

  20. Sun C, Wang Y, Zhao H (2009) Web page clustering via partition adaptive affinity propagation. In: International symposium on neural networks, pp 727–736

  21. Lu Z, Carreira-Perpinán MA (2008) Constrained spectral clustering through affinity propagation. IEEE international conference on computer vision and pattern recognition

  22. Zhang X, Wang W, Norvag K, Sebag M (2010) K-AP: generating specified K clusters by efficient affinity propagation. IEEE international conference on data mining, pp 1187–1192

  23. Han EH, Boley D, Gini M, Gross R, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J (1998) WebACE: a web agent for document categorization and exploration. ACM international conference on autonomous agents

  24. Dhillon IS, Guan Y (2003) Clustering large and sparse co-occurrence data. SIAM international conference on data mining

  25. Wu JS, Lai JH, Wang CD (2011) A novel co-clustering method with intra-similarities. IEEE international conference on data mining workshops, pp 300–306

  26. Manevitz L, Yousef M (2007) One-class document classification via neural networks. Neurocomputing 70(7):1466–1481

    Article  Google Scholar 

  27. Liu H, Wu Z (2010) Non-negative matrix factorization with constraints. AAAI conference on artificial intelligence

  28. Lovász L, Plummer MD (1986) Matching theory. North Holland, Amsterdam

    MATH  Google Scholar 

  29. Wang F, Wang X, Zhang D, Zhang C, Li T (2009) MarginFace: a novel face recognition method by average neighborhood margin maximization. Pattern Recognit 42(11):2863–2875

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

The project was supported by the National Science Foundation of China (61173084) and the China Postdoctoral Science Foundation (2011M-501360). Acknowledgement is also given to Mr. Wu Jiansheng for contributing the document corpora after the preprocessing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoming Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, X., Chen, X., Li, X. et al. Adaptive subspace learning: an iterative approach for document clustering. Neural Comput & Applic 25, 333–342 (2014). https://doi.org/10.1007/s00521-013-1486-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-013-1486-8

Keywords

Navigation