Skip to main content
Log in

Targeted aspects oriented topic modeling for short texts

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Topic modeling has demonstrated its value in short text topic discovery. For this task, a common way adopted by many topic models is to perform a full analysis to find all the possible topics. However, these topic models overlook the importance of deeper topics, leading to confusing topics discovered. In practice, people always tend to find more focused topics on some special aspects (or events), rather than a set of coarse topics. Therefore, in this paper, we propose a novel method, Targeted Aspects Oriented Topic Modeling (TATM), to discover more focused topics on specific aspects in short texts. Specifically, each short text is assigned to only one targeted aspect derived from an enhanced Dirichlet Multinomial Mixture process (E-DMM). This process helps group similar words as many as possible, which achieves topic homogeneity. In addition, TATM discovers the topics for each targeted aspect from as many angles as possible by performing target-level modeling, which achieves topic completeness. Thus, TATM can make a balance between the two conflicting properties without employing any additional information or pre-trained knowledge. The extensive experiments conducted on five real-world datasets demonstrate that our proposed model can effectively discover more focused and complete topics, and it outperforms the state-of-the-art baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://data.world/uci/news-aggregator

  2. http://trec.nist.gov/data/microblog.html

  3. https://github.com/ptnplanet/Java-Naive-Bayes-Classifier

References

  1. Ahuja A, Wei W, Carley KM (2016) Microblog sentiment topic model. In: Proceedings of the 2016 IEEE 16th international conference on data mining workshops (ICDMW), pp 1031–1038

  2. Beykikhoshk A, Arandjelović O, Phung D, Venkatesh S (2018) Discovering topic structures of a temporally evolving document corpus. Knowl Inf Syst 55(3):599–632

    Article  Google Scholar 

  3. Blair S J, Bi Y, Mulvenna M D (2020) Aggregated topic models for increasing social media topic coherence. Appl Intell 50(1):138–156

    Article  Google Scholar 

  4. Blei D M, Ng A Y, Jordan M I (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  5. Chang J, Gerrish S, Wang C, Boyd-Graber J L, Blei D M (2009) Reading tea leaves: How humans interpret topic models. In: Proceedings of the 20th annual conference on neural information processing systems, NIPS 2009, pp 288–296

  6. Chen W, Wang J, Zhang Y, Yan H, Li X (2015) User based aggregation for biterm topic model. In: Proceedings of the 53rd annual meeting of the association for computational linguistics, ACL 2015, pp 489–494

  7. Cheng X, Yan X, Lan Y, Guo J (2014) Btm: Topic modeling over short texts. IEEE Trans Knowl Data Eng 26(12):2928–2941

    Article  Google Scholar 

  8. Esposito M, Damiano E, Minutolo A, De Pietro G, Fujita H (2020) Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Inform Sci 514:88–105

    Article  Google Scholar 

  9. Finegan-Dollak C, Coke R, Zhang R, Ye X, Radev D (2016) Effects of creativity and cluster tightness on short text clustering performance. In: Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, pp 654–665

  10. Griffiths T L, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(suppl 1):5228–5235

    Article  Google Scholar 

  11. Hayashi T, Fujita H (2019) Word embeddings-based sentence-level sentiment analysis considering word importance. Acta Polytechnica Hungarica 16(7):152–52

    Google Scholar 

  12. He J, Li L, Wu X (2017) A self-adaptive sliding window based topic model for non-uniform texts. In: Proceedings of the 2017 IEEE international conference on data mining, ICDM 2017, pp 147–156

  13. Hisano R (2019) Learning topic models by neighborhood aggregation. In: Proceedings of the 28th international joint conference on artificial intelligence, IJCAI 2019, pp 2498–2505

  14. Huang R, Yu G, Wang Z, Zhang J, Shi L (2012) Dirichlet process mixture model for document clustering with feature partition. IEEE Trans Knowl Data Eng 25(8):1748–1759

    Article  Google Scholar 

  15. Ibrahim R, Elbagoury A, Kamel M S, Karray F (2018) Tools and approaches for topic detection from twitter streams: Survey. Knowl Inf Syst 54(3):511–539

    Article  Google Scholar 

  16. Jain AK (2008) Data clustering: 50 years beyond k-means. In: Proceedings of joint European conference on machine learning and knowledge discovery in databases, pp 3–4

  17. Kwak H, Lee C, Park H, Moon S (2010) What is twitter, a social network or a news media?. In: Proceedings of the 19th international conference on World Wide Web, WWW 2010, pp 591–600

  18. Li X, Li C, Chi J, Ouyang J (2018) Short text topic modeling by exploring original documents. Knowl Inf Syst 56(2):443–462

    Article  Google Scholar 

  19. Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th annual conference on neural information processing systems, NIPS 2013, pp 3111–3119

  20. Newman D, Lau JH, Grieser K, Baldwin T (2010) Automatic evaluation of topic coherence. In: Human language technologies: The 2010 annual conference of the north american chapter of the association for computational linguistics, pp 100–108

  21. Nigam K, McCallum A K, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using em. Mach Learn 39(2-3):103–134

    Article  Google Scholar 

  22. Pedrosa G, Pita M, Bicalho P, Lacerda A, Pappa G L (2016) Topic modeling for short texts with co-occurrence frequency-based expansion. In: Proceddings of the 5th Brazilian conference on intelligent systems, BRACIS 2016, pp 277–282

  23. Qiang J, Chen P, Wang T, Wu X (2017) Topic modeling over short texts by incorporating word embeddings. In: Proceedings in the 21st Pacific-Asia conference on knowledge discovery and data mining, PAKDD 2017, pp 363–374

  24. Quan X, Kit C, Ge Y, Pan S J (2015) Short and sparse text topic modeling via self-aggregation. In: Proceedings of the 24th international joint conference on artificial intelligence, IJCAI 2015, pp 2270–2276

  25. Rahman M M, Wang H (2016) Hidden topic sentiment model. In: Proceedings of the 25th international conference on World Wide Web, WWW 2016, pp 155–165

  26. Shi B, Lam W, Jameel S, Schockaert S, Lai K P (2017) Jointly learning word embeddings and latent topics. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, SIGIR 2017, pp 375–384

  27. Teh Y W, Newman D, Welling M (2007) A collapsed variational bayesian inference algorithm for latent dirichlet allocation. In: Proceedings of the 20th annual conference on neural information processing systems, NIPS 2006, pp 1353–1360

  28. Wang H, Lu Y, Zhai C (2011) Latent aspect rating analysis without aspect keyword supervision. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, SIGKDD 2011, pp 618–626

  29. Wang J, Chen L, Qin L, Wu X (2018) Astm: An attentional segmentation based topic model for short texts. In: Proceedings of the 2018 IEEE international conference on data mining, ICDM 2018, pp 577–586

  30. Wang S, Chen Z, Fei G, Liu B, Emery S (2016) Targeted topic modeling for focused analysis. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, SIGKDD 2016, pp 1235–1244

  31. Wang Y, Wang M, Fujita H (2019) Word sense disambiguation: A comprehensive knowledge exploitation framework. Knowl-Based Syst P 190:105030

    Article  Google Scholar 

  32. Yan X, Guo J, Lan Y, Xu J, Cheng X (2015) A probabilistic model for bursty topic discovery in microblogs. In: Proceedings ot the 29th AAAI conference on artificial intelligence, AAAI 2015, pp 353–359

  33. Yin J, Wang J (2014) A dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, SIGKDD 2014, pp 233–242

  34. Yin J, Wang J (2016) A text clustering algorithm using an online clustering scheme for initialization. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, SIGKDD 2016, pp 1995–2004

  35. Yu G, Huang R, Wang Z (2010) Document clustering via dirichlet process mixture model with feature selection. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, SIGKDD 2010, pp 763–772

  36. Zhang Y, Song D, Zhang P, Li X, Wang P (2019) A quantum-inspired sentiment representation model for twitter sentiment analysis. Appl Intell 49(8):3093–3108

    Article  Google Scholar 

  37. Zhao W X, Jiang J, Weng J, He J, Lim E P, Yan H, Li X (2011) Comparing twitter and traditional media using topic models. In: Proceedings of the 33rd European conference on information retrieval, ECIR 2011, pp 338–349

  38. Zhou X, Ouyang J, Li X (2018) Two time-efficient gibbs sampling inference algorithms for biterm topic model. Appl Intell 48(3):730–754

    Article  Google Scholar 

  39. Zuo Y, Zhao J, Xu K (2016) Word network topic model: A simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398

    Article  Google Scholar 

Download references

Acknowledgements

This work has been supported by the National Key Research and Development Program of China under grant 2016YFB1000901, the National Natural Science Foundation of China under grant 91746209 and the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education of China under grant IRT17R32.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, J., Li, L., Wang, Y. et al. Targeted aspects oriented topic modeling for short texts. Appl Intell 50, 2384–2399 (2020). https://doi.org/10.1007/s10489-020-01672-w

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01672-w

Keywords

Navigation