Abstract
Topic modeling has become a popular method used for data analysis in various domains including text documents. Previous topic model approaches, such as probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA), have shown impressive success in discovering low-rank hidden structures for modeling text documents. These approaches, however do not take into account the manifold structure of the data, which is generally informative for nonlinear dimensionality reduction mapping. More recent topic model approaches, Laplacian PLSI (LapPLSI) and Locally-consistent Topic Model (LTM), have incorporated the local manifold structure into topic models and have shown resulting benefits. But they fall short of achieving full discriminating power of manifold learning as they only enhance the proximity between the low-rank representations of neighboring pairs without any consideration for non-neighboring pairs. In this article, we propose a new approach, Discriminative Topic Model (DTM), which separates non-neighboring pairs from each other in addition to bringing neighboring pairs closer together, thereby preserving the global manifold structure as well as improving local consistency. We also present a novel model-fitting algorithm based on the generalized EM algorithm and the concept of Pareto improvement. We empirically demonstrate the success of DTM in terms of unsupervised clustering and semisupervised classification accuracies on text corpora and robustness to parameters compared to state-of-the-art techniques.
- Barr, N. 2004. Economics of the Welfare State. Oxford University Press, Oxford, UK.Google Scholar
- Belkin, M. and Niyogi, P. 2001. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 586--691.Google Scholar
- Belkin, M., Niyogi, P., and Sindhwani, V. 2006. Mainfold regularization: A geometric framework for learning from examples. J. Mach. Learn. 7, 2399--2434. Google ScholarDigital Library
- Bengio, Y., Paiement, J., Vincent, P., Delalleau, O., Le Roux, N., and Ouimet, M. 2004. Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google Scholar
- Blei, D. and McAuliffe, J. 2008. Supervised topic models. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 121--128.Google Scholar
- Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet allocation. J. Mach. Learn. 3, 993--1022. Google Scholar
- Boley, D. L. 1998. Principal direction divisive partitioning. Data Mining Knowl. Discov. 2, 4, 325--344. Google ScholarDigital Library
- Cai, D., Mei, Q., Han, J., and Zhai, C. 2008. Modeling hidden topics on document manifold. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM). 911--920. Google ScholarDigital Library
- Cai, D., Wang, X., and He, X. 2009. Probabilistic dyadic data analysis with local and global consistency. In Proceedings of the International Conference on Machine Learning (ICML). 105--112. Google ScholarDigital Library
- Chung, F. R. K. 1997. Spectral Graph Theory. American Mathematical Society.Google Scholar
- Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 391--407.Google ScholarCross Ref
- Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. Series B (Methodological) 39, 1--38.Google ScholarCross Ref
- Hinton, G. and Roweis, S. 2002. Stochastic neighbor embedding. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 833--840.Google Scholar
- Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 50--57. Google ScholarDigital Library
- Huh, S. and Fienberg, S. E. 2010. Discriminative topic modeling based on manifold learning. In Proceedings of the ACM SIGKDD Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). 653--661. Google ScholarDigital Library
- Jolliffe, I. T. 2002. Principal Component Analysis 2nd Ed. Springer Series in Statistics, Vol. 29, Springer, NY.Google Scholar
- Kuhn, H. W. 1955. The Hungarian method for the assignment problem. Naval Res. Logist. Quarterly, 83--97.Google Scholar
- Lacoste-Julien, S., Sha, F., and Jordan, M. I. 2008. DiscLDA: Discriminative learning for dimensionality reduction and classification. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 897-904.Google Scholar
- Lee, D. D. and Seung, H. S. 2000. Algorithms for non-negative matrix factorization. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 556--562.Google Scholar
- Roweis, S. and Saul, L. K. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323--2326.Google ScholarCross Ref
- Salton, G. and Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Inform. Proc. Manage. 24, 513--523. Google ScholarDigital Library
- Sha, F., Saul, L. K., and Lee, D. D. 2003. Multiplicative updates for nonnegative quadratic programming in support vector machines. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 1041--1048.Google Scholar
- Tenenbaum, J. B., de Silva, V., and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319--2323.Google ScholarCross Ref
- Trosset, M. W., Priebe, C. E., Park, Y., and Miller, M. I. 2008. Semisupervised learning from dissimilarity data. Comput. Statist. Data Anal. 52, 10, 4643--4657. Google ScholarDigital Library
- van der Maaten, M. and Hilton, G. 2008. Visualizing data using T-SNE. J. Mach. Learn. Res. 9, 2579--2605.Google Scholar
- Xu, W., Liu, X., and Gong, Y. 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the Annual ACM Conference on Research and Development in Information Retrieval (SIGIR). 267--273. Google ScholarDigital Library
- Yan, S., Xu, D., Zhang, B., Zhang, H.-J., Yang, Q., and Lin, S. 2007. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29, 1, 40--51. Google ScholarDigital Library
- Zhou, D., Bousquet, O., Lal, T. N., Weston, J., and Schölkopf, B. 2003. Learning with local and global consistency. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 321--328.Google Scholar
- Zhu, J., Ahmed, A., and Xing, E. P. 2009. MedLDA: Maximum margin supervised topic models for regression and classification. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarDigital Library
- Zhu, X., Ghahramani, Z., and Lafferty, J. D. 2003. Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning (ICML). 912--919.Google Scholar
Index Terms
- Discriminative Topic Modeling Based on Manifold Learning
Recommendations
Discriminative topic modeling based on manifold learning
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data miningTopic modeling has been popularly used for data analysis in various domains including text documents. Previous topic models, such as probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA), have shown impressive success in ...
Locally discriminative topic modeling
Topic modeling is a powerful tool for discovering the underlying or hidden structure in text corpora. Typical algorithms for topic modeling include probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA). Despite their ...
Extractive text summarization using clustering-based topic modeling
AbstractText summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. ...
Comments