Abstract:
We propose a sparse Bayesian topic model, based on parameter sharing, for modeling text corpora. In Latent Dirichlet Allocation (LDA), each topic models all words, even t...Show MoreMetadata
Abstract:
We propose a sparse Bayesian topic model, based on parameter sharing, for modeling text corpora. In Latent Dirichlet Allocation (LDA), each topic models all words, even though many words are not topic-specific, i.e. have similar occurrence frequencies across different topics. We propose a sparser approach by introducing a universal shared model, used by each topic to model the subset of words that are not topic-specific. A Bernoulli random variable is associated with each word under every topic, determining whether that word is modeled topic-specifically, with a free parameter, or by the shared model, with a common parameter. Results of our experiments show that our model achieves sparser topic presence in documents and higher test likelihood than LDA.
Date of Conference: 21-24 September 2014
Date Added to IEEE Xplore: 20 November 2014
Electronic ISBN:978-1-4799-3694-6