skip to main content
10.1145/2566486.2567980acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

The dual-sparse topic model: mining focused topics and focused terms in short text

Published: 07 April 2014 Publication History

Abstract

Topic modeling has been proved to be an effective method for exploratory text mining. It is a common assumption of most topic models that a document is generated from a mixture of topics. In real-world scenarios, individual documents usually concentrate on several salient topics instead of covering a wide variety of topics. A real topic also adopts a narrow range of terms instead of a wide coverage of the vocabulary. Understanding this sparsity of information is especially important for analyzing user-generated Web content and social media, which are featured as extremely short posts and condensed discussions. In this paper, we propose a dual-sparse topic model that addresses the sparsity in both the topic mixtures and the word usage. By applying a "Spike and Slab" prior to decouple the sparsity and smoothness of the document-topic and topic-word distributions, we allow individual documents to select a few focused topics and a topic to select focused terms, respectively. Experiments on different genres of large corpora demonstrate that the dual-sparse topic model outperforms both classical topic models and existing sparsity-enhanced topic models. This improvement is especially notable on collections of short documents.

References

[1]
C. Archambeau, B. Lakshminarayanan, and G. Bouchard. Latent IBP compound dirichlet allocation. In NIPS Bayesian Nonparametrics Workshop, 2011.
[2]
A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On smoothing and inference for topic models. In UAI, pages 27--34, 2009.
[3]
Y. Bengio, A. C. Courville, and J. S. Bergstra. Unsupervised models of images by spike-and-slab rbms. In ICML, pages 1145--1152, 2011.
[4]
D. M. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77--84, 2012.
[5]
D. M. Blei, T. L. Griffiths, M. I. Jordan, and J. B. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. In NIPS, pages 106--114, 2003.
[6]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.
[7]
J. Chang, J. L. Boyd-Graber, S. Gerrish, C. Wang, and D. M. Blei. Reading tea leaves: How humans interpret topic models. In NIPS, pages 288--296, 2009.
[8]
X. Chen, M. Zhou, and L. Carin. The contextual focused topic model. In KDD, pages 96--104, 2012.
[9]
A. C. Courville, J. Bergstra, and Y. Bengio. A spike and slab restricted boltzmann machine. In International Conference on Artificial Intelligence and Statistics, pages 233--241, 2011.
[10]
K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. JMLR, 2:265--292, 2002.
[11]
J. V. Graca, K. Ganchev, B. Taskar, and F. Pereira. Posterior vs. parameter sparsity in latent variable models. In NIPS, pages 664--672, 2009.
[12]
T. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101:5228--5235, 2004.
[13]
T. Hofmann. Probabilistic latent semantic analysis. In UAI, pages 289--296, 1999.
[14]
P. O. Hoyer. Non-negative matrix factorization with sparseness constraints. JMLR, 5:1457--1469, 2004.
[15]
H. Ishwaran and J. S. Rao. Spike and slab variable selection: Frequentist and bayesian strategies. The Annals of Statistics, 33(2):730--773, 2005.
[16]
A. Kabán, E. Bingham, and T. Hirsimäki. Learning to read between the lines: The aspect bernoulli model. In SDM, pages 462--466, 2004.
[17]
D. D. Lee and H. S. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788--791, 1999.
[18]
Y. Lu, Q. Mei, and C. Zhai. Investigating task performance of probabilistic topic models: an empirical study of plsa and lda. Information Retrieval, 14(2):178--203, 2011.
[19]
R. Mehrotra, S. Sanner, W. Buntine, and L. Xie. Improving lda topic models for microblogs via tweet pooling and automatic labeling. In SIGIR, pages 889--892, 2013.
[20]
D. Newman, J. H. Lau, K. Grieser, and T. Baldwin. Automatic evaluation of topic coherence. In NAACL, pages 100--108, 2010.
[21]
I. Sato and H. Nakagawa. Rethinking collapsed variational bayes inference for lda. In ICML, 2012.
[22]
E. Saund. A multiply cause mixture model for unsupervised learning. Neural Comput., 7(1):51--71, 1995.
[23]
M. Shashanka, B. Raj, and P. Smaragdis. Sparse overcomplete latent variable decomposition of counts data. In NIPS, pages 1313--1320, 2007.
[24]
J. Tang, M. Zhang, and Q. Mei. One theme in all views: Modeling consensus topics in multiple contexts authors. In KDD, pages 5--13, 2013.
[25]
Y. W. Teh, D. Newman, and M. Welling. A collapsed variational bayesian inference algorithm for latent dirichlet allocation. In NIPS, pages 1353--1360, 2006.
[26]
H. M. Wallach, D. Mimno, and A. McCallum. Rethinking lda: Why priors matter. In NIPS, pages 1973--1981, 2009.
[27]
C. Wang and D. M. Blei. Decoupling sparsity and smoothness in the discrete hierarchical dirichlet process. In NIPS, pages 1982--1989, 2009.
[28]
Q. Wang, J. Xu, H. Li, and N. Craswell. Regularized latent semantic indexing. In SIGIR, pages 685--694, 2011.
[29]
S. Williamson, C. Wang, K. A. Heller, and D. M. Blei. Focused topic models. In NIPS Workshop on Applications for Topic Models: Text and Beyond, 2009.
[30]
S. Williamson, C. Wang, K. A. Heller, and D. M. Blei. The ibp compound dirichlet process and its application to focused topic modeling. In ICML, pages 1151--1158, 2010.
[31]
W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In ECIR, pages 338--349, 2011.
[32]
J. Zhu and E. P. Xing. Sparse topical coding. In UAI, pages 831--838, 2011.

Cited By

View all
  • (2024)A Study of Discriminatory Speech Classification Based on Improved Smote and SVM-RFApplied Sciences10.3390/app1415646814:15(6468)Online publication date: 24-Jul-2024
  • (2024)Dynamic Dual Sparse Topic Model: Integrating Temporal Dynamics and Sparsity with Spike and Slab Priors into Topic Model2024 16th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)10.1109/IIAI-AAI63651.2024.00063(299-304)Online publication date: 6-Jul-2024
  • (2024)An Approach for Evaluating Topic Models for Knowledge Management2024 15th International Conference on Mechanical and Intelligent Manufacturing Technologies (ICMIMT)10.1109/ICMIMT61937.2024.10585675(46-51)Online publication date: 17-May-2024
  • Show More Cited By

Index Terms

  1. The dual-sparse topic model: mining focused topics and focused terms in short text

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '14: Proceedings of the 23rd international conference on World wide web
    April 2014
    926 pages
    ISBN:9781450327442
    DOI:10.1145/2566486

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 April 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. sparse representation
    2. spike and slab
    3. topic modeling
    4. user-generated content

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    WWW '14
    Sponsor:
    • IW3C2

    Acceptance Rates

    WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;
    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 15 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Study of Discriminatory Speech Classification Based on Improved Smote and SVM-RFApplied Sciences10.3390/app1415646814:15(6468)Online publication date: 24-Jul-2024
    • (2024)Dynamic Dual Sparse Topic Model: Integrating Temporal Dynamics and Sparsity with Spike and Slab Priors into Topic Model2024 16th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI)10.1109/IIAI-AAI63651.2024.00063(299-304)Online publication date: 6-Jul-2024
    • (2024)An Approach for Evaluating Topic Models for Knowledge Management2024 15th International Conference on Mechanical and Intelligent Manufacturing Technologies (ICMIMT)10.1109/ICMIMT61937.2024.10585675(46-51)Online publication date: 17-May-2024
    • (2024)Applying short text topic models to instant messaging communication of software developersJournal of Systems and Software10.1016/j.jss.2024.112111216:COnline publication date: 1-Oct-2024
    • (2023)WES-BTM: A Short Text-Based Topic Clustering ModelSymmetry10.3390/sym1510188915:10(1889)Online publication date: 9-Oct-2023
    • (2023)Now It Makes More Sense: How Narratives Can Help Atypical Actors Increase Market AppealJournal of Management10.1177/0149206323115163750:5(1599-1642)Online publication date: 6-Feb-2023
    • (2023)Online Confirmation-Augmented Probabilistic Topic Modeling in Cyber-Physical Social Infrastructure SystemsProceedings of the 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation10.1145/3600100.3626341(390-397)Online publication date: 15-Nov-2023
    • (2023)Incorporating Embedding to Topic Modeling for More Effective Short Text AnalysisCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587316(73-76)Online publication date: 30-Apr-2023
    • (2023)Topic Model Based on Co-Occurrence Word Networks for Unbalanced Short Text Datasets2023 5th International Conference on Data-driven Optimization of Complex Systems (DOCS)10.1109/DOCS60977.2023.10294993(1-7)Online publication date: 22-Sep-2023
    • (2023)A data-driven approach for understanding invalid bug reports: An industrial case studyInformation and Software Technology10.1016/j.infsof.2023.107305164(107305)Online publication date: Dec-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media