BCTM: A Topic Modeling Method Based on External Information

Liu, Gang; Wan, Taiying; Yu, Jinfeng; Zhan, Kai; Wang, Wei

doi:10.1007/978-3-031-40467-2_6

Gang Liu^17,19,
Taiying Wan^17,19,
Jinfeng Yu^17,19,
Kai Zhan¹⁸ &
…
Wei Wang^17,19

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 511))

Included in the following conference series:

International Conference on Broadband Communications, Networks and Systems

157 Accesses

Abstract

Topic models are often used as intermediate algorithms for text mining and semantic analysis in natural language processing, and have a wide range of functions. However, most of the existing improvements to the topic model use word embedding to improve the accuracy of text modeling, but ignore the external information in the text. This paper proposes a topic model BCTM (Bi-Concept Topic Model) using the word feature information and concept information. Based on the BTM topic model, BCTM introduces word feature information through word vector technology and concept information based on ConceptNet to optimize topic modeling. The construction method of Bi-Concept pair is proposed. Based on ConceptNet semantic network, and the content of text is enriched with concept information. A more accurate topic distribution is obtained through the improved topic model, at the same time, due to the rich feature information, the model is also superior to the baseline model in short text modeling. The experiments prove that the bilingual topic model proposed in this paper has a good performance in modeling accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ye, J., Zou, B., Hong, Y., Shen, L., Zhu, Q., Zhou, G.: Negation and speculation scope detection in Chinese. J. Comput. Res. Dev. 56(7), 1506–1516 (2019). (in Chinese)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 34(5), 993–1022 (2003)
MATH Google Scholar
Liu, Y., Wang, Z., Hou, Y., Yan, H.: A method of extracting malware features based on probabilistic topic model. J. Comput. Res. Dev. 56(11), 2339–3234 (2019). (in Chinese)
Google Scholar
Lee, Y.Y., Ke, H., Yen, T.Y., et al.: Combining and learning word embedding with WordNet for semantic relatedness and similarity measurement. J. Am. Soc. Inf. Sci. 71(6), 657–670 (2020)
Google Scholar
Limwattana, S., Prom-On, S.: Topic modeling enhancement using word embeddings. In: 2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE) (2021)
Google Scholar
Zhao, H., Du, L., Liu, G., et al.: Leveraging meta information in short text aggregation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019)
Google Scholar
Yan, X., Guo, J., Lan, Y., et al.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)
Google Scholar
Wu, T., Qi, G., Wang, H., et al.: Cross-Lingual taxonomy alignment with bilingual biterm topic model. In: AAAI, pp. 287–293 (2016)
Google Scholar
Zhu, Q., Feng, Z., Li, X.: GraphBTM: graph enhanced autoencoded variational inference for biterm topic model. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4663–4672 (2018)
Google Scholar
Li, X., Zhang, A., Li, C., et al.: Relational biterm topic model: Short-text topic modeling using word embeddings. Comput. J. 62(3), 359–372 (2019)
Article Google Scholar
Huang, J., Peng, M., Li, P., et al.: Improving biterm topic model with word embeddings. World Wide Web 23(6), 3099–3124 (2020)
Article Google Scholar
Nguyen, D.Q., Billingsley, R., Du, L., et al.: Improving topic models with latent feature word representations. Trans. Assoc. Comput. Linguist. 3, 299–313 (2015)
Article Google Scholar
Li, C., Wang, H., Zhang, Z., et al.: Topic modeling for short texts with auxiliary word embeddings. In: International ACM SIGIR Conference, pp. 165–174. ACM (2016)
Google Scholar
Gao, W., Peng, M., Wang, H., Zhang, Y., Xie, Q., Tian, G.: Incorporating word embeddings into topic modeling of short text. Knowl. Inf. Syst. 61(2), 1123–1145 (2018). https://doi.org/10.1007/s10115-018-1314-7
Article Google Scholar
Yi, F., Jiang, B., Wu, J.: Topic modeling for short texts via word embedding and document correlation. IEEE Access PP(99), 1 (2020)
Google Scholar
Archambeau, C., Lakshminarayanan, B., Bouchard, G.: Latent IBP compound Dirichlet allocation. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 321–333 (2014)
Article Google Scholar
Wu, X., Li, C., Zhu, Y., et al.: Short text topic modeling with topic distribution quantization and negative sampling decoder. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020)
Google Scholar
Wallach, H.M., Minmo, D., Mccallum, A.: Rethinking LDA: why priors matter. Adv. Neural. Inf. Process. Syst. 23, 1973–1981 (2009)
Google Scholar
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: The 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 530–533 (2014)
Google Scholar

Download references

Acknowledgments

This work is supported by Key Research and Development Projects of Heilongjiang Province under grant number GA21C020, and Natural Science Foundation of Heilongjiang Province under grant number LH2021F015.

Author information

Authors and Affiliations

College of Computer Science and Technology, Harbin Engineering University, Harbin, 150001, China
Gang Liu, Taiying Wan, Jinfeng Yu & Wei Wang
PwC Enterprise Digital, PricewaterhouseCoopers, Sydney, NSW, 2070, Australia
Kai Zhan
National Engineering Laboratory of E-Government Modeling Simulation, Harbin Engineering University, Harbin, 150001, China
Gang Liu, Taiying Wan, Jinfeng Yu & Wei Wang

Authors

Gang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Taiying Wan
View author publications
You can also search for this author in PubMed Google Scholar
Jinfeng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Zhan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taiying Wan .

Editor information

Editors and Affiliations

Harbin Engineering University, Harbin, Heilongjiang, China
Wei Wang
Shanghai Jiao Tong University, Shanghai, China
Jun Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, G., Wan, T., Yu, J., Zhan, K., Wang, W. (2023). BCTM: A Topic Modeling Method Based on External Information. In: Wang, W., Wu, J. (eds) Broadband Communications, Networks, and Systems. BROADNETS 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 511. Springer, Cham. https://doi.org/10.1007/978-3-031-40467-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-40467-2_6
Published: 30 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40466-5
Online ISBN: 978-3-031-40467-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics