skip to main content
research-article

Discriminative Topic Modeling Based on Manifold Learning

Published:01 February 2012Publication History
Skip Abstract Section

Abstract

Topic modeling has become a popular method used for data analysis in various domains including text documents. Previous topic model approaches, such as probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA), have shown impressive success in discovering low-rank hidden structures for modeling text documents. These approaches, however do not take into account the manifold structure of the data, which is generally informative for nonlinear dimensionality reduction mapping. More recent topic model approaches, Laplacian PLSI (LapPLSI) and Locally-consistent Topic Model (LTM), have incorporated the local manifold structure into topic models and have shown resulting benefits. But they fall short of achieving full discriminating power of manifold learning as they only enhance the proximity between the low-rank representations of neighboring pairs without any consideration for non-neighboring pairs. In this article, we propose a new approach, Discriminative Topic Model (DTM), which separates non-neighboring pairs from each other in addition to bringing neighboring pairs closer together, thereby preserving the global manifold structure as well as improving local consistency. We also present a novel model-fitting algorithm based on the generalized EM algorithm and the concept of Pareto improvement. We empirically demonstrate the success of DTM in terms of unsupervised clustering and semisupervised classification accuracies on text corpora and robustness to parameters compared to state-of-the-art techniques.

References

  1. Barr, N. 2004. Economics of the Welfare State. Oxford University Press, Oxford, UK.Google ScholarGoogle Scholar
  2. Belkin, M. and Niyogi, P. 2001. Laplacian eigenmaps and spectral techniques for embedding and clustering. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 586--691.Google ScholarGoogle Scholar
  3. Belkin, M., Niyogi, P., and Sindhwani, V. 2006. Mainfold regularization: A geometric framework for learning from examples. J. Mach. Learn. 7, 2399--2434. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bengio, Y., Paiement, J., Vincent, P., Delalleau, O., Le Roux, N., and Ouimet, M. 2004. Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS).Google ScholarGoogle Scholar
  5. Blei, D. and McAuliffe, J. 2008. Supervised topic models. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 121--128.Google ScholarGoogle Scholar
  6. Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet allocation. J. Mach. Learn. 3, 993--1022. Google ScholarGoogle Scholar
  7. Boley, D. L. 1998. Principal direction divisive partitioning. Data Mining Knowl. Discov. 2, 4, 325--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cai, D., Mei, Q., Han, J., and Zhai, C. 2008. Modeling hidden topics on document manifold. In Proceedings of the ACM Conference on Information and Knowledge Management (CIKM). 911--920. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cai, D., Wang, X., and He, X. 2009. Probabilistic dyadic data analysis with local and global consistency. In Proceedings of the International Conference on Machine Learning (ICML). 105--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chung, F. R. K. 1997. Spectral Graph Theory. American Mathematical Society.Google ScholarGoogle Scholar
  11. Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., and Harshman, R. A. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 391--407.Google ScholarGoogle ScholarCross RefCross Ref
  12. Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. Series B (Methodological) 39, 1--38.Google ScholarGoogle ScholarCross RefCross Ref
  13. Hinton, G. and Roweis, S. 2002. Stochastic neighbor embedding. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 833--840.Google ScholarGoogle Scholar
  14. Hofmann, T. 1999. Probabilistic latent semantic indexing. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Huh, S. and Fienberg, S. E. 2010. Discriminative topic modeling based on manifold learning. In Proceedings of the ACM SIGKDD Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). 653--661. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jolliffe, I. T. 2002. Principal Component Analysis 2nd Ed. Springer Series in Statistics, Vol. 29, Springer, NY.Google ScholarGoogle Scholar
  17. Kuhn, H. W. 1955. The Hungarian method for the assignment problem. Naval Res. Logist. Quarterly, 83--97.Google ScholarGoogle Scholar
  18. Lacoste-Julien, S., Sha, F., and Jordan, M. I. 2008. DiscLDA: Discriminative learning for dimensionality reduction and classification. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 897-904.Google ScholarGoogle Scholar
  19. Lee, D. D. and Seung, H. S. 2000. Algorithms for non-negative matrix factorization. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 556--562.Google ScholarGoogle Scholar
  20. Roweis, S. and Saul, L. K. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323--2326.Google ScholarGoogle ScholarCross RefCross Ref
  21. Salton, G. and Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Inform. Proc. Manage. 24, 513--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sha, F., Saul, L. K., and Lee, D. D. 2003. Multiplicative updates for nonnegative quadratic programming in support vector machines. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 1041--1048.Google ScholarGoogle Scholar
  23. Tenenbaum, J. B., de Silva, V., and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319--2323.Google ScholarGoogle ScholarCross RefCross Ref
  24. Trosset, M. W., Priebe, C. E., Park, Y., and Miller, M. I. 2008. Semisupervised learning from dissimilarity data. Comput. Statist. Data Anal. 52, 10, 4643--4657. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. van der Maaten, M. and Hilton, G. 2008. Visualizing data using T-SNE. J. Mach. Learn. Res. 9, 2579--2605.Google ScholarGoogle Scholar
  26. Xu, W., Liu, X., and Gong, Y. 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the Annual ACM Conference on Research and Development in Information Retrieval (SIGIR). 267--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yan, S., Xu, D., Zhang, B., Zhang, H.-J., Yang, Q., and Lin, S. 2007. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29, 1, 40--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhou, D., Bousquet, O., Lal, T. N., Weston, J., and Schölkopf, B. 2003. Learning with local and global consistency. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS). 321--328.Google ScholarGoogle Scholar
  29. Zhu, J., Ahmed, A., and Xing, E. P. 2009. MedLDA: Maximum margin supervised topic models for regression and classification. In Proceedings of the International Conference on Machine Learning (ICML). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Zhu, X., Ghahramani, Z., and Lafferty, J. D. 2003. Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning (ICML). 912--919.Google ScholarGoogle Scholar

Index Terms

  1. Discriminative Topic Modeling Based on Manifold Learning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 5, Issue 4
        February 2012
        176 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/2086737
        Issue’s Table of Contents

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 February 2012
        • Accepted: 1 November 2011
        • Revised: 1 August 2011
        • Received: 1 March 2011
        Published in tkdd Volume 5, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader