Abstract
In this paper, we propose a novel non-negative matrix factorization (NMF) to the affinity matrix for document clustering, which enforces non-negativity and orthogonality constraints simultaneously. With the help of orthogonality constraints, this NMF provides a solution to spectral clustering, which inherits the advantages of spectral clustering and presents a much more reasonable clustering interpretation than the previous NMF-based clustering methods. Furthermore, with the help of non-negativity constraints, the proposed method is also superior to traditional eigenvector-based spectral clustering, as it can inherit the benefits of NMF-based methods that the non-negative solution is institutive, from which the final clusters could be directly derived. As a result, the proposed method combines the advantages of spectral clustering and the NMF-based methods together, and hence outperforms both of them, which is demonstrated by experimental results on TDT2 and Reuters-21578 corpus.
This research was supported by National Basic Research Program of China (973 Program, 2007CB311100), National High Technology and Research Development Program of China (863 Program, 2007AA01Z416), Beijing New Star Project on Science & Technology (2007B071).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Li, T., Ma, S., Ogihara, M.: Document Clustering via Adaptive Subspace Iteration. In: Proceedings of the 27th ACM SIGIR Conference, pp. 218–225 (2004)
Xu, W., Liu, X., Gong, Y.: Document Clustering Based on Non-Negative Matrix Factorization. In: Proceedings of the 26th ACM SIGIR Conference, pp. 267–273 (2003)
Xu, W., Liu, X., Gong, Y.: Document Clustering by Concept Factorization. In: Proceedings of the 27th ACM SIGIR Conference, pp. 202–209 (2004)
Chan, P.K., Schlag, D.F., Zien, J.Y.: Spectral K-way Ratio-cut Partitioning and Clustering. IEEE Trans. on CAD 13, 1088–1096 (1994)
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)
Ding, C., He, X., Zha, H., et al.: A Min-max Cut Algorithm for Graph Partitioning and Data Clustering. In: Proceedings of the 2001 IEEE ICDM Conference, pp. 107–114 (2001)
von Luxburg, U.: A Tutorial on Spectral Clustering. Technical Report No. TR-149, Max Planck Institute for Biological Cybernetics (2006)
Lee, D.D., Seung, H.S.: Learning the Parts of Objects by Non-negative Matrix Factorization. Nature 401, 788–791 (1999)
Lee, D.D., Seung, H.S.: Algorithms for Non-negative Matrix Factorization. Advances in Neural Information Processing Systems 13, 556–562 (2001)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., et al.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41, 391–407 (1990)
Ding, C., He, X., Simon, H.D.: On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering. In: Proceedings of the 2005 SIAM Data Mining Conference, pp. 606–610 (2005)
Lütkepohl, H.: Handbook of Matrices. Wiley, Chichester (1997)
Long, B., Zhang, A., Wu, X., et al.: Relational Clustering by Symmetric Convex Coding. In: Proceeding of the 24th International Conference on Machine Learning, pp. 680–687 (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bao, L., Tang, S., Li, J., Zhang, Y., Ye, Wp. (2008). Document Clustering Based on Spectral Clustering and Non-negative Matrix Factorization. In: Nguyen, N.T., Borzemski, L., Grzech, A., Ali, M. (eds) New Frontiers in Applied Artificial Intelligence. IEA/AIE 2008. Lecture Notes in Computer Science(), vol 5027. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69052-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-69052-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69045-0
Online ISBN: 978-3-540-69052-8
eBook Packages: Computer ScienceComputer Science (R0)