Short text topic modeling by exploring original documents

Li, Ximing; Li, Changchun; Chi, Jinjin; Ouyang, Jihong

doi:10.1007/s10115-017-1099-0

Short text topic modeling by exploring original documents

Regular Paper
Published: 18 September 2017

Volume 56, pages 443–462, (2018)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Ximing Li^1,2,
Changchun Li^1,2,
Jinjin Chi^1,2 &
…
Jihong Ouyang^1,2

1411 Accesses
32 Citations
Explore all metrics

Abstract

Topic modeling for short texts faces a tough challenge, owing to the sparsity problem. An effective solution is to aggregate short texts into long pseudo-documents before training a standard topic model. The main concern of this solution is the way of aggregating short texts. A recent developed self-aggregation-based topic model (SATM) can adaptively aggregate short texts without using heuristic information. However, the model definition of SATM is a bit rigid, and more importantly, it tends to overfitting and time-consuming for large-scale corpora. To improve SATM, we propose a generalized topic model for short texts, namely latent topic model (LTM). In LTM, we assume that the observable short texts are snippets of normal long texts (namely original documents) generated by a given standard topic model, but their original document memberships are unknown. With Gibbs sampling, LTM drives an adaptive aggregation process of short texts, and simultaneously estimates other latent variables of interest. Additionally, we propose a mini-batch scheme for fast inference. Experimental results indicate that LTM is competitive with the state-of-the-art baseline models on short text topic modeling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

Article 28 November 2018

How to Fine-Tune BERT for Text Classification?

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Article 26 October 2022

Notes

References

Bouma G (2009) Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of the Biennial GSCL conference, pp 31–40
Blei DM, Lafferty JD (2007) A correlated topic model fo science. Ann Appl Stat 1(2):17–35
Article MathSciNet MATH Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Chang J, Boyd-Graber J, Gerrish S, Wang C, Blei DM (2009) Reading tea leaves: How humans interpret topic models. In: Neural information processing systems, pp 288–296
Chen W, Wang J, Zhang Y, Yan H, Li X (2015) User based aggregation for biterm topic model. In: Annual meeting of the association for computational linguistics and international joint conference on natural language processing of the Asian Federation of Natural Language Processing, pp 489–494
Cheng X, Yan X, Lan Y, Guo J (2014) BTM: topic modeling over short texts. IEEE Trans Knowl Data Eng 26(12):2928–2941
Article Google Scholar
Gangemi A, Presutti V, Recupero DR (2014) Frame-based detection of opinion holders and topics: a model and a tool. IEEE Comput Intell Mag 9(1):20–30
Article Google Scholar
Griffiths TL, Steyvers M (2004) Finding scientific topics. Natl Acad Sci USA 101(Suppl. 1):5228–5235
Article Google Scholar
Hong L, Davison BD (2010) Empirical study of topic modeling in Twitter. In: Proceedings of the first workshop on social media analytics. ACM, New York, pp 80–88
Lakkaraju H, Bhattacharya I, Bhattacharyys C (2012) Dynamic multi-relational Chinese restaurant process for analyzing influences on users in social media. In: International conference on data mining. IEEE, pp 389–398
Lau JH, Newman D, Baldwin T (2014) Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: Conference of the European chapter of the Association for Computational Linguistics, pp 530–539
Lau RYK, Xia Y, Ye Y (2014) A probabilistic generative model for mining cybercriminal networks from online social media. IEEE Comput Intell Mag 9(1):31–43
Article Google Scholar
Li X, Ouyang J, You L, Zhou X, Tian T (2015) Group topic model: organizing topics into groups. Inf Retr J 18(1):1–25
Article Google Scholar
Mehrotra R, Sanner S, Buntine W, Xie L (2013) Improving LDA topic models for microblogs via tweet pooling and automatic labeling. In: International ACM SIGIR conference on research and development in information retrieval. ACM, New York, pp 889–892
Mimno D, Wallach HM, Talley E, Leenders M, McCallum A (2011) Optimizing semantic coherence in topic models. In: Conference on empirical methods in natural language processing, pp 262–272
Newman D, Lau HJ, Grieser K, Baldwin T (2010) Automatic evaluation of topic coherence. In: Conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies, pp 100–108
Nigam K, Mccallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2):103–134
Article MATH Google Scholar
Phan X, Nguyen, L (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: International conference on world wide web. ACM, New York, pp 91–100
Poria S, Chaturvedi I, Cambria E, Bisio F (2016) Sentic LDA: improving on LDA with semantic similarity for aspect-based sentiment analysis. In: International joint conference on neural networks, pp 4465–4473
Quan X, Kit C, Ge Y, Pan SJ (2015) Short and sparse text topic modeling via self-aggregation. In: International joint conference on artificial intelligence. AAAI Press, pp 2270–2276
Roder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: ACM international conference on web search and data mining, pp 399–408
Sasaki K, Yoshikawa T, Furuhashi T (2014) Twitter-TTM: an efficient online topic modeling for Twitter considering dynamics of user interests and topic trends. In: 15th international symposium on soft computing and intelligent systems (SCIS), joint 7th international conference on and advanced intelligent systems (ISIS). IEEE, pp 440–445
Sridhar VKR (2015) Unsupervised topic modeling for short texts using distributed representations of words. In: Proceedings of NAACL-HLT, pp 192–200
Weng J, Lim E, Jiang J, He Q (2010) TwitterRank: finding topic-sensitive influential twitterers. In: Proceedings of the third ACM international conference on web search and data mining. ACM, New York, pp 261–270
Teh YW, Jordan MI, Beal MG, Blei DM (2007) Hierarchical Dirichlet Processes. J Am Stat Assoc 101(476):1566–1581
Article MathSciNet MATH Google Scholar
Tsoumakas G, Katakis I, Vlahavas I (2011) Random k-labelsets for multilabel classification. IEEE Trans Knowl Data Eng 23(7):1079–1089
Article Google Scholar
Xia Y, Tang N, Hussain A, Cambria E (2015) Discriminative bi-term topic model for headline-based social news clustering. In: International Florida artificial intelligence research society conference. AAAI Press, pp 311–316
Yan X, Guo J, Lan Y, Cheng X (2013) A biterm topic model for short texts. In: international conference on world wide web. ACM, New York, pp 1445–1456
Yan X, Guo J, Lan Y, Xu J, Cheng X (2015) A probabilistic model for bursty topic discovery in microblogs. In: Association for the advancement of artificial intelligence, pp 353–359
Yin J, Wang J (2014) A Dirichlet multinomial mixture model-based approach for short text clustering. In: ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 233–242
Zhao WX, Jiang J, Weng J, He J, Lim E, Yan H, Li X (2011) Comparing Twitter and traditional media using topic models. In: Proceedings of the 33rd European conference on advances in information retrieval. Springer, Heidelberg, pp 338–349
Zuo Y, Zhao J, Xu K (2016) Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (NSFC) Grant Numbers 61602204 and 61472157).

Author information

Authors and Affiliations

College of Computer Science and Technology, Jilin University, Changchun, China
Ximing Li, Changchun Li, Jinjin Chi & Jihong Ouyang
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Changchun, China
Ximing Li, Changchun Li, Jinjin Chi & Jihong Ouyang

Authors

Ximing Li
View author publications
You can also search for this author in PubMed Google Scholar
Changchun Li
View author publications
You can also search for this author in PubMed Google Scholar
Jinjin Chi
View author publications
You can also search for this author in PubMed Google Scholar
Jihong Ouyang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jihong Ouyang.

Appendices

Appendix

A. Derivation of Gibbs sampling LTM

We derive the Gibbs sampling equations for LTM. In terms of LTM, four latent variables of interest exist, including the topic-word distribution $\phi $, the original document-topic distribution $\theta $, the original document assignment for short texts ${\hat{z}}$ and the topic assignment for word tokens z. Thanks to the conjugate Dirichlet-Multinomial design for $\phi $ and $\theta $, we can effectively marginalize out these two variables, and turn to the joint distribution of the observation W and the two assignment variables ${\hat{z}}$ and z as follows:

$$\begin{aligned}&p\left( {W,{\hat{z}},z|\beta ,\alpha } \right) = \int {\int {p\left( {W,\phi ,\theta ,{\hat{z}},z|\beta ,\alpha } \right) \hbox {d}\phi } \hbox {d}\theta } \nonumber \\&\quad = \int {\int {\prod \limits _{k = 1}^K {Dir(\phi |\beta )} \prod \limits _{d = 1}^D {Dir(\theta |\alpha )} \prod \limits _{v = 1}^V {\prod \limits _{k = 1}^K {\prod \limits _{d = 1}^D {\phi _{kv}^{{N_{kv}}}\theta _{dk}^{{N_{dk}}}} } } \hbox {d}\phi } \hbox {d}\theta } \nonumber \\&\quad = \left( {\prod \limits _{k = 1}^K {\frac{{\prod \nolimits _{v = 1}^V {\varGamma \left( {{N_{kv}} + \beta } \right) } }}{{\varGamma \left( {{N_k} + V\beta } \right) }}\frac{{\varGamma \left( {V\beta } \right) }}{{\prod \nolimits _{v = 1}^V {\varGamma \left( \beta \right) } }}} } \right) \left( {\prod \limits _{d = 1}^D {\frac{{\prod \nolimits _{k = 1}^K {\varGamma \left( {{N_{dk}} + \alpha } \right) } }}{{\varGamma \left( {{N_d} + K\alpha } \right) }}\frac{{\varGamma \left( {K\alpha } \right) }}{{\prod \nolimits _{k = 1}^K {\varGamma \left( \alpha \right) } }}} } \right) \nonumber \\&\qquad \propto \left( {\prod \limits _{k = 1}^K {\frac{{\prod \nolimits _{v = 1}^V {\varGamma \left( {{N_{kv}} + \beta } \right) } }}{{\varGamma \left( {{N_k} + V\beta } \right) }}} } \right) \left( {\prod \limits _{d = 1}^D {\frac{{\prod \nolimits _{k = 1}^K {\varGamma \left( {{N_{dk}} + \alpha } \right) } }}{{\varGamma \left( {{N_d} + K\alpha } \right) }}} } \right) \nonumber \\&\quad \buildrel \varDelta \over = B\left( {{N_{kv}},{N_{k,}}\beta } \right) B\left( {{N_{dk}},{N_{d,}}\alpha } \right) \end{aligned}$$

(11)

where the fourth line in Eq. (11) follows that $\beta $ and $\alpha $ are constant; notations $B\left( {{N_{kv}},{N_{k,}}\beta } \right) $ and $B\left( {{N_{dk}},{N_{d,}}\alpha } \right) $ are used to denote $\prod \nolimits _{k = 1}^K {\frac{{\prod \nolimits _{v = 1}^V {\varGamma \left( {{N_{kv}} + \beta } \right) } }}{{\varGamma \left( {{N_k} + V\beta } \right) }}} $ and ${\prod \nolimits _{d = 1}^D {\frac{{\prod \nolimits _{k = 1}^K {\varGamma \left( {{N_{dk}} + \alpha } \right) } }}{{\varGamma \left( {{N_d} + K\alpha } \right) }}} }$ for convenience.

We employ the blocked Gibbs sampling framework, where in each iteration ${\hat{z}}$ and z are alternately sampled given the other one. We derive the Gibbs sampling equations for ${\hat{z}}$ and z one by one.

Sampling equation of ${\hat{z}}$ given the current z To draw a sample of ${\hat{z}}$, following Gibbs sampling idea we consecutively draw a single original document assignment ${\hat{z}}_s$ from a posterior conditioned on all other original document assignments ${\hat{z}}^{-s}$:

$$\begin{aligned}&p\left( {{{{\hat{z}}}_s}|{{{\hat{z}}}^{ - s}},z,W, \beta ,\alpha } \right) = \frac{{p\left( {W,{\hat{z}}|z,\beta ,\alpha } \right) }}{{p\left( {W,{{{\hat{z}}}^{ - s}}|z,\beta ,\alpha } \right) }} \propto \frac{{p\left( {W,{\hat{z}}|z,\beta ,\alpha } \right) }}{{p\left( {{W^{ - s}},{{{\hat{z}}}^{ - s}}|z,\beta ,\alpha } \right) }} \nonumber \\&\quad = \frac{{p\left( {W,{\hat{z}},z|\beta ,\alpha } \right) }}{{p\left( {{W^{ - s}},{{{\hat{z}}}^{ - s}},z|\beta ,\alpha } \right) }} \propto \frac{{p\left( {W,{\hat{z}},z|\beta ,\alpha } \right) }}{{p\left( {{W^{ - s}},{{{\hat{z}}}^{ - s}},{z^{ - s}}|\beta ,\alpha } \right) }} \end{aligned}$$

(12)

By combing Eqs. (11) and (12), we have:

$$\begin{aligned}&p\left( {{{{\hat{z}}}_s} = d|{{{\hat{z}}}^{ - s}},z,W,\beta ,\alpha } \right) \propto \frac{{B\left( {{N_{kv}},{N_k},\beta } \right) B\left( {{N_{dk}},{N_d},\alpha } \right) }}{{B\left( {N_{kv}^{ - s},N_k^{ - s},\beta } \right) B\left( {N_{dk}^{ - s},N_d^{ - s},\alpha } \right) }} \nonumber \\&\quad \propto \frac{{B\left( {{N_{dk}},{N_d},\alpha } \right) }}{{B\left( {N_{dk}^{ - s},N_d^{ - s},\alpha } \right) }} \end{aligned}$$

(13)

The second line in Eq. (13) follows that the terms with respect to topic-word counts are independent of the current assignment ${\hat{z}}_s$. We then expand Eq. (13) and obtain the final Gibbs sampling equation for ${\hat{z}}$ (i.e., Eq. 1):

$$\begin{aligned}&p\left( {{{{\hat{z}}}_s} = d|{{{\hat{z}}}^{ - s}},z,W,\alpha } \right) \propto \frac{{\prod \nolimits _{j = 1}^D {\frac{{\prod \nolimits _{k = 1}^K {\varGamma \left( {{N_{jk}} + \alpha } \right) } }}{{\varGamma \left( {{N_j} + K\alpha } \right) }}} }}{{\frac{{\prod \nolimits _{k = 1}^K {\varGamma \left( {N_{dk}^{ - s} + \alpha } \right) } }}{{\varGamma \left( {N_d^{ - s} + K\alpha } \right) }}\prod \nolimits _{j \ne d} {\frac{{\prod \nolimits _{k = 1}^K {\varGamma \left( {{N_{jk}} + \alpha } \right) } }}{{\varGamma \left( {{N_j} + K\alpha } \right) }}} }} \nonumber \\&\quad = \frac{{\prod \nolimits _{k = 1}^K {\varGamma \left( {{N_{dk}} + \alpha } \right) } }}{{\prod \nolimits _{k = 1}^K {\varGamma \left( {N_{dk}^{ - s} + \alpha } \right) } }}\frac{{\varGamma \left( {N_d^{ - s} + K\alpha } \right) }}{{\varGamma \left( {{N_d} + K\alpha } \right) }} \nonumber \\&\quad = \frac{{\prod \nolimits _{k = 1}^K {\prod \nolimits _{n = 1}^{{N_{sk}}} {\left( {N_{dk}^{ - s} + n - 1 + \alpha } \right) } } }}{{\prod \nolimits _{n = 1}^{{N_s}} {\left( {N_d^{ - s} + n - 1 + K\alpha } \right) } }} \end{aligned}$$

(14)

The third line in Eq. (14) follows the fact ($m>n$):

$$\begin{aligned} \frac{{\varGamma \left( n \right) }}{{\varGamma \left( m \right) }} = \frac{{\varGamma \left( n \right) }}{{\varGamma \left( {n + 1} \right) }}\frac{{\varGamma \left( {n + 1} \right) }}{{\varGamma \left( {n + 2} \right) }}\frac{{\varGamma \left( {n + 2} \right) }}{{\varGamma \left( {n + 3} \right) }} \cdots \frac{{\varGamma \left( {m - 1} \right) }}{{\varGamma \left( m \right) }} = \frac{1}{{\prod \nolimits _{i = 1}^{m - n} {\left( {n + i - 1} \right) } }} \end{aligned}$$

Sampling equation of z given the current ${\hat{z}}$ We now turn to the sampling of z. Note that given ${\hat{z}}$, the case is completely equivalent to the standard LDA Gibbs sampling. Similar with the derivations of Eqs. (12) and (13), the posterior of a single topic assignment $z_{dn}$ conditioned on all other topic assignments $z^{-dn}$ is given by:

$$\begin{aligned}&p\left( {{z_{dn}}|{\hat{z}},z,W,\beta ,\alpha } \right) \propto \frac{{p\left( {W,{\hat{z}},z|\beta ,\alpha } \right) }}{{p\left( {{W^{ - s}},{{{\hat{z}}}^{ - s}},{z^{ - s}}|\beta ,\alpha } \right) }} \nonumber \\&\quad = \frac{{B\left( {{N_{kv}},{N_k},\beta } \right) B\left( {{N_{dk}},{N_d},\alpha } \right) }}{{B\left( {N_{kv}^{ - s},N_k^{ - s},\beta } \right) B\left( {N_{dk}^{ - s},N_d^{ - s},\alpha } \right) }} \end{aligned}$$

(15)

By expanding Eq. (15), we derive the final Gibbs sampling equation for z (i.e., Eq. 2):

$$\begin{aligned}&p\left( {{z_{dn}} = k|{\hat{z}},{z^{ - dn}},W,\beta ,\alpha } \right) \propto \frac{{\frac{{\varGamma \left( {{N_{k{w_{dn}}}} + \beta } \right) }}{{\varGamma \left( {{N_k} + V\beta } \right) }}\frac{{\varGamma \left( {{N_{k{w_{dn}}}} + \beta } \right) }}{{\varGamma \left( {{N_k} + V\beta } \right) }}}}{{\frac{{\varGamma \left( {N_{k{w_{dn}}}^{ - dn} + \beta } \right) }}{{\varGamma \left( {N_k^{ - dn} + V\beta } \right) }}\frac{{\varGamma \left( {N_{dk}^{ - dn} + \alpha } \right) }}{{\varGamma \left( {N_d^{ - dn} + K\alpha } \right) }}}} \nonumber \\&\quad = \frac{{N_{k{w_{dn}}}^{ - dn} + \beta }}{{N_k^{ - dn} + V\beta }}\frac{{N_{dk}^{ - dn} + \alpha }}{{N_d^{ - dn} + K\alpha }} \nonumber \\&\qquad \propto \frac{{N_{k{w_{dn}}}^{ - dn} + \beta }}{{N_k^{ - dn} + V\beta }}\left( {N_{dk}^{ - dn} + \alpha } \right) \end{aligned}$$

(16)

The third line in Eq. (16) follows that the term $(N_d^{ - dn} + K\alpha )$ is independent of the current topic assignment $z_{dn}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Li, C., Chi, J. et al. Short text topic modeling by exploring original documents. Knowl Inf Syst 56, 443–462 (2018). https://doi.org/10.1007/s10115-017-1099-0

Download citation

Received: 07 May 2016
Revised: 26 December 2016
Accepted: 01 September 2017
Published: 18 September 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s10115-017-1099-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Short text topic modeling by exploring original documents

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

How to Fine-Tune BERT for Text Classification?

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

A. Derivation of Gibbs sampling LTM

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Short text topic modeling by exploring original documents

Abstract

Access this article

Similar content being viewed by others

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey

How to Fine-Tune BERT for Text Classification?

Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

A. Derivation of Gibbs sampling LTM

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation