skip to main content
10.1145/2566486.2567980acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections

The dual-sparse topic model: mining focused topics and focused terms in short text

Published: 07 April 2014 Publication History


Topic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate on several salient topics instead of covering a wide variety of topics. A real topic also adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this sparsity of information is especially important for analyzing user-generated Web content and social media, which are featured as extremely short posts and condensed discussions. In this paper, we propose a dual-sparse topic model that addresses the sparsity in both the topic mixtures and the word usage. By applying a "Spike and Slab" prior to decouple the sparsity and smoothness of the document-topic and topic-word distributions, we allow individual documents to select a few focused topics and a topic to select focused terms, respectively. Experiments on different genres of large corpora demonstrate that the dual-sparse topic model outperforms both classical topic models and existing sparsity-enhanced topic models. This improvement is especially notable on collections of short documents.


C. Archambeau, B. Lakshminarayanan, and G. Bouchard. Latent IBP compound dirichlet allocation. In NIPS Bayesian Nonparametrics Workshop, 2011.
A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On smoothing and inference for topic models. In UAI, pages 27--34, 2009.
Y. Bengio, A. C. Courville, and J. S. Bergstra. Unsupervised models of images by spike-and-slab rbms. In ICML, pages 1145--1152, 2011.
D. M. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77--84, 2012.
D. M. Blei, T. L. Griffiths, M. I. Jordan, and J. B. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. In NIPS, pages 106--114, 2003.
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.
J. Chang, J. L. Boyd-Graber, S. Gerrish, C. Wang, and D. M. Blei. Reading tea leaves: How humans interpret topic models. In NIPS, pages 288--296, 2009.
X. Chen, M. Zhou, and L. Carin. The contextual focused topic model. In KDD, pages 96--104, 2012.
A. C. Courville, J. Bergstra, and Y. Bengio. A spike and slab restricted boltzmann machine. In International Conference on Artificial Intelligence and Statistics, pages 233--241, 2011.
K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2:265--292, 2002.
J. V. Graca, K. Ganchev, B. Taskar, and F. Pereira. Posterior vs. parameter sparsity in latent variable models. In NIPS, pages 664--672, 2009.
T. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101:5228--5235, 2004.
T. Hofmann. Probabilistic latent semantic analysis. In UAI, pages 289--296, 1999.
P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. JMLR, 5:1457--1469, 2004.
H. Ishwaran and J. S. Rao. Spike and slab variable selection: Frequentist and bayesian strategies. The Annals of Statistics, 33(2):730--773, 2005.
A. Kabán, E. Bingham, and T. Hirsimäki. Learning to read between the lines: The aspect bernoulli model. In SDM, pages 462--466, 2004.
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788--791, 1999.
Y. Lu, Q. Mei, and C. Zhai. Investigating task performance of probabilistic topic models: an empirical study of plsa and lda. Information Retrieval, 14(2):178--203, 2011.
R. Mehrotra, S. Sanner, W. Buntine, and L. Xie. Improving lda topic models for microblogs via tweet pooling and automatic labeling. In SIGIR, pages 889--892, 2013.
D. Newman, J. H. Lau, K. Grieser, and T. Baldwin. Automatic evaluation of topic coherence. In NAACL, pages 100--108, 2010.
I. Sato and H. Nakagawa. Rethinking collapsed variational bayes inference for lda. In ICML, 2012.
E. Saund. A multiply cause mixture model for unsupervised learning. Neural Comput., 7(1):51--71, 1995.
M. Shashanka, B. Raj, and P. Smaragdis. Sparse overcomplete latent variable decomposition of counts data. In NIPS, pages 1313--1320, 2007.
J. Tang, M. Zhang, and Q. Mei. One theme in all views: Modeling consensus topics in multiple contexts authors. In KDD, pages 5--13, 2013.
Y. W. Teh, D. Newman, and M. Welling. A collapsed variational bayesian inference algorithm for latent dirichlet allocation. In NIPS, pages 1353--1360, 2006.
H. M. Wallach, D. Mimno, and A. McCallum. Rethinking lda: Why priors matter. In NIPS, pages 1973--1981, 2009.
C. Wang and D. M. Blei. Decoupling sparsity and smoothness in the discrete hierarchical dirichlet process. In NIPS, pages 1982--1989, 2009.
Q. Wang, J. Xu, H. Li, and N. Craswell. Regularized latent semantic indexing. In SIGIR, pages 685--694, 2011.
S. Williamson, C. Wang, K. A. Heller, and D. M. Blei. Focused topic models. In NIPS Workshop on Applications for Topic Models: Text and Beyond, 2009.
S. Williamson, C. Wang, K. A. Heller, and D. M. Blei. The ibp compound dirichlet process and its application to focused topic modeling. In ICML, pages 1151--1158, 2010.
W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In ECIR, pages 338--349, 2011.
J. Zhu and E. P. Xing. Sparse topical coding. In UAI, pages 831--838, 2011.

Cited By

View all
  • (2024)A Study of Discriminatory Speech Classification Based on Improved Smote and SVM-RFApplied Sciences10.3390/app1415646814:15(6468)Online publication date: 24-Jul-2024
  • (2024)Dynamic Dual Sparse Topic Model: Integrating Temporal Dynamics and Sparsity with Spike and Slab Priors into Topic Model2024 16th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)10.1109/IIAI-AAI63651.2024.00063(299-304)Online publication date: 6-Jul-2024
  • (2024)An Approach for Evaluating Topic Models for Knowledge Management2024 15th International Conference on Mechanical and Intelligent Manufacturing Technologies (ICMIMT)10.1109/ICMIMT61937.2024.10585675(46-51)Online publication date: 17-May-2024
  • Show More Cited By

Index Terms

  1. The dual-sparse topic model: mining focused topics and focused terms in short text



    Information & Contributors


    Published In

    cover image ACM Other conferences
    WWW '14: Proceedings of the 23rd international conference on World wide web
    April 2014
    926 pages


    • IW3C2: International World Wide Web Conference Committee



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 April 2014


    Request permissions for this article.

    Check for updates

    Author Tags

    1. sparse representation
    2. spike and slab
    3. topic modeling
    4. user-generated content


    • Research-article

    Funding Sources


    WWW '14
    • IW3C2

    Acceptance Rates

    WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 15 Feb 2025

    Other Metrics


    Cited By

    View all
    • (2024)A Study of Discriminatory Speech Classification Based on Improved Smote and SVM-RFApplied Sciences10.3390/app1415646814:15(6468)Online publication date: 24-Jul-2024
    • (2024)Dynamic Dual Sparse Topic Model: Integrating Temporal Dynamics and Sparsity with Spike and Slab Priors into Topic Model2024 16th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)10.1109/IIAI-AAI63651.2024.00063(299-304)Online publication date: 6-Jul-2024
    • (2024)An Approach for Evaluating Topic Models for Knowledge Management2024 15th International Conference on Mechanical and Intelligent Manufacturing Technologies (ICMIMT)10.1109/ICMIMT61937.2024.10585675(46-51)Online publication date: 17-May-2024
    • (2024)Applying short text topic models to instant messaging communication of software developersJournal of Systems and Software10.1016/j.jss.2024.112111216:COnline publication date: 1-Oct-2024
    • (2023)WES-BTM: A Short Text-Based Topic Clustering ModelSymmetry10.3390/sym1510188915:10(1889)Online publication date: 9-Oct-2023
    • (2023)Now It Makes More Sense: How Narratives Can Help Atypical Actors Increase Market AppealJournal of Management10.1177/0149206323115163750:5(1599-1642)Online publication date: 6-Feb-2023
    • (2023)Online Confirmation-Augmented Probabilistic Topic Modeling in Cyber-Physical Social Infrastructure SystemsProceedings of the 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation10.1145/3600100.3626341(390-397)Online publication date: 15-Nov-2023
    • (2023)Incorporating Embedding to Topic Modeling for More Effective Short Text AnalysisCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587316(73-76)Online publication date: 30-Apr-2023
    • (2023)Topic Model Based on Co-Occurrence Word Networks for Unbalanced Short Text Datasets2023 5th International Conference on Data-driven Optimization of Complex Systems (DOCS)10.1109/DOCS60977.2023.10294993(1-7)Online publication date: 22-Sep-2023
    • (2023)A data-driven approach for understanding invalid bug reports: An industrial case studyInformation and Software Technology10.1016/j.infsof.2023.107305164(107305)Online publication date: Dec-2023
    • Show More Cited By

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media