Abstract:
Probabilistic Topic Models are widely applied in many NLP-related tasks due to their effective use of unlabeled data to capture variable dependencies. Analytical solution...Show MoreMetadata
Abstract:
Probabilistic Topic Models are widely applied in many NLP-related tasks due to their effective use of unlabeled data to capture variable dependencies. Analytical solutions for Bayesian inference of such models, however, are usually intractable, hindering the proposition of highly expressive text models. In this scenario, Variational Auto-Encoders (VAEs), where an inference network (the encoder) is used to approximate the posterior distribution, became a promising alternative for inferring latent topic distributions of text documents. These models, however, also pose new challenges such as the requirement of continuous and reparameterizable distributions which may not fit so well the true latent topic distributions. Moreover, inference networks are prone to component collapsing, impairing the collection of coherent topics. To overcome these problems, we propose two new text topic models based on the categorical distribution Gumbel-Softmax (GSDTM) and on mixtures of Logistic-Normal distributions (LMDTM). We also provide a study on the impact of different modeling choices on the generated topics, observing a trade-off between topic coherence and document reconstruction. Through experiments using two reference datasets, we show that GSDTM largely outperforms previous state-of-the-art baselines when considering three different evaluation metrics.
Date of Conference: 08-13 July 2018
Date Added to IEEE Xplore: 14 October 2018
ISBN Information:
Electronic ISSN: 2161-4407