Topic Modeling for Short Texts via Adaptive P $$\acute{o}$$ lya Urn Dirichlet Multinomial Mixture

Li, Mark Junjie; Wang, Rui; Li, Jun; Bao, Xianyu; He, Jueying; Chen, Jiayao; He, Lijuan

doi:10.1007/978-981-99-8181-6_28

Mark Junjie Li¹⁰,
Rui Wang¹⁰,
Jun Li¹¹,
Xianyu Bao¹¹,
Jueying He¹⁰,
Jiayao Chen¹⁰ &
…
Lijuan He¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1968))

Included in the following conference series:

International Conference on Neural Information Processing

453 Accesses

Abstract

Inferring coherent and diverse latent topics from short texts is crucial in topic modeling. Existing approaches leverage the Generalized P$\acute{o}$lya Urn (GPU) model to incorporate external knowledge and improve topic modeling performance. While the GPU scheme successfully promotes similarity among words within the same topic, it has two major limitations. Firstly, it assumes that similar words contribute equally to the same topic, disregarding the distinctiveness of different words. Secondly, it assumes that a specific word should have the same promotion across all topics, overlooking the variations in word importance across different topics. To address these limitations, we propose a novel Adaptive P$\acute{o}$lya Urn (APU) scheme, which builds topic-word correlation according to the external and local knowledge, and the Adaptive P$\acute{o}$lya Urn Dirichlet Multinomial Mixture (APU-DMM) model that uses the topic-word correlation as an adaptive weight to promote topic inference process. Our extensive experimental study on three benchmark datasets shows the superiority of our model in terms of topic coherence and topic diversity over the eight baseline methods (The code is available at https://github.com/ddwangr/APUDMM).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bicalho, P.V., Pita, M., Pedrosa, G., Lacerda, A., Pappa, G.L.: A general framework to expand short text for topic modeling. Inf. Sci. 393, 66–81 (2017)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Google Scholar
Burkhardt, S., Kramer, S.: Decoupling sparsity and smoothness in the Dirichlet variational autoencoder topic model. JMLR 20, 131:1–131:27 (2019)
Google Scholar
Chen, J., Wang, R., He, J., Li, M.J.: Encouraging sparsity in neural topic modeling with non-mean-field inference. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds.) ECML PKDD 2023. LNCS, vol. 14172, pp. 142–158. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43421-1_9
Chapter Google Scholar
Chen, Z., Liu, B.: Mining topics in documents: standing on the shoulders of big data. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp. 1116–1125. ACM (2014)
Google Scholar
Cheng, X., Yan, X., Lan, Y., Guo, J.: BTM: topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26(12), 2928–2941 (2014)
Article Google Scholar
Dieng, A.B., Ruiz, F.J.R., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020)
Article Google Scholar
Guo, Y., Huang, Y., Ding, Y., Qi, S., Wang, X., Liao, Q.: GPU-BTM: a topic model for short text using auxiliary information. In: 5th IEEE International Conference on Data Science in Cyberspace, DSC, pp. 198–205. IEEE (2020)
Google Scholar
He, J., Chen, J., Li, M.J.: Multi-knowledge embeddings enhanced topic modeling for short texts. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds.) ICONIP 2022. LNCS, vol. 13625, pp. 521–532. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-30111-7_44
Chapter Google Scholar
Li, C., Duan, Y., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans. Inf. Syst. 36(2), 11:1–11:30 (2017)
Google Scholar
Li, C., Wang, H., Zhang, Z., Sun, A., Ma, Z.: Topic modeling for short texts with auxiliary word embeddings. In: Proceedings of the 39th International ACM SIGIR, pp. 165–174 (2016)
Google Scholar
Mai, C., Qiu, X., Luo, K., Chen, M., Zhao, B., Huang, Y.: TSSE-DMM: topic modeling for short texts based on topic subdivision and semantic enhancement. In: Advances in Knowledge Discovery and Data Mining - 25th Pacific-Asia Conference, PAKDD, vol. 12713, pp. 640–651 (2021)
Google Scholar
Mimno, D.M., Wallach, H.M., Talley, E.M., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, EMNLP. pp. 262–272. ACL (2011)
Google Scholar
Nan, F., Ding, R., Nallapati, R., Xiang, B.: Topic modeling with Wasserstein autoencoders. In: Proceedings of the 57th Conference of the Association for Computational Linguistics, pp. 6345–6381. ACL (2019)
Google Scholar
Newman, D., Lau, J.H., Grieser, K., Baldwin, T.: Automatic evaluation of topic coherence. In: Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, pp. 100–108 (2010)
Google Scholar
Nguyen, D.Q., Billingsley, R., Du, L., Johnson, M.: Improving topic models with latent feature word representations. Trans. Assoc. Comput. Linguist. 3, 299–313 (2015)
Article Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2/3), 103–134 (2000)
Article MATH Google Scholar
Niu, Y., Zhang, H., Li, J.: A Pitman-Yor process self-aggregated topic model for short texts of social media. IEEE Access 9, 129011–129021 (2021)
Article Google Scholar
Phan, X.H., Nguyen, M.L., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, pp. 91–100. ACM (2008)
Google Scholar
Qiang, J., Qian, Z., Li, Y., Yuan, Y., Wu, X.: Short text topic modeling techniques, applications, and performance: a survey. IEEE Trans. Knowl. Data Eng. 34(3), 1427–1445 (2022)
Article Google Scholar
Quan, X., Kit, C., Ge, Y., Pan, S.J.: Short and sparse text topic modeling via self-aggregation. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI, pp. 2270–2276. AAAI Press (2015)
Google Scholar
Shahnaz, F., Berry, M.W., Pauca, V.P., Plemmons, R.J.: Document clustering using nonnegative matrix factorization. Inf. Process. Manag. 42(2), 373–386 (2006)
Article MATH Google Scholar
Wang, R., Zhou, D., He, Y.: Optimising topic coherence with weighted Po’lya Urn scheme. Neurocomputing 385, 329–339 (2020)
Article Google Scholar
Yin, J., Wang, J.: A Dirichlet multinomial mixture model-based approach for short text clustering. In: The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp. 233–242. ACM (2014)
Google Scholar
Zubiaga, A., Ji, H.: Harnessing web page directories for large-scale classification of tweets. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 225–226 (2013)
Google Scholar
Zuo, Y., et al. Topic modeling of short texts: a pseudo-document view. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2105–2114. ACM (2016)
Google Scholar
Zuo, Y., Zhao, J., Xu, K.: Word network topic model: a simple but general solution for short and imbalanced texts. Knowl. Inf. Syst. 48(2), 379–398 (2016)
Article Google Scholar

Download references

Acknowledgement

This work was supported by the National Key R &D Program of China: Research on the applicability of port food risk traceability, early warning and emergency assessment models (No.: 2019YFC1605504).

Author information

Authors and Affiliations

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
Mark Junjie Li, Rui Wang, Jueying He & Jiayao Chen
Shenzhen Academy of Inspection and Quarantine, Shenzhen, China
Jun Li, Xianyu Bao & Lijuan He

Authors

Mark Junjie Li
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Li
View author publications
You can also search for this author in PubMed Google Scholar
Xianyu Bao
View author publications
You can also search for this author in PubMed Google Scholar
Jueying He
View author publications
You can also search for this author in PubMed Google Scholar
Jiayao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lijuan He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiayao Chen .

Editor information

Editors and Affiliations

Scholl of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M.J. et al. (2024). Topic Modeling for Short Texts via Adaptive P$\acute{o}$lya Urn Dirichlet Multinomial Mixture. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1968. Springer, Singapore. https://doi.org/10.1007/978-981-99-8181-6_28

Download citation

DOI: https://doi.org/10.1007/978-981-99-8181-6_28
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8180-9
Online ISBN: 978-981-99-8181-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Topic Modeling for Short Texts via Adaptive P\(\acute{o}\)lya Urn Dirichlet Multinomial Mixture

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Topic Modeling for Short Texts via Adaptive P\(\acute{o}\)lya Urn Dirichlet Multinomial Mixture

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation