Abstract
Topic models are often used as intermediate algorithms for text mining and semantic analysis in natural language processing, and have a wide range of functions. However, most of the existing improvements to the topic model use word embedding to improve the accuracy of text modeling, but ignore the external information in the text. This paper proposes a topic model BCTM (Bi-Concept Topic Model) using the word feature information and concept information. Based on the BTM topic model, BCTM introduces word feature information through word vector technology and concept information based on ConceptNet to optimize topic modeling. The construction method of Bi-Concept pair is proposed. Based on ConceptNet semantic network, and the content of text is enriched with concept information. A more accurate topic distribution is obtained through the improved topic model, at the same time, due to the rich feature information, the model is also superior to the baseline model in short text modeling. The experiments prove that the bilingual topic model proposed in this paper has a good performance in modeling accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ye, J., Zou, B., Hong, Y., Shen, L., Zhu, Q., Zhou, G.: Negation and speculation scope detection in Chinese. J. Comput. Res. Dev. 56(7), 1506–1516 (2019). (in Chinese)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 34(5), 993–1022 (2003)
Liu, Y., Wang, Z., Hou, Y., Yan, H.: A method of extracting malware features based on probabilistic topic model. J. Comput. Res. Dev. 56(11), 2339–3234 (2019). (in Chinese)
Lee, Y.Y., Ke, H., Yen, T.Y., et al.: Combining and learning word embedding with WordNet for semantic relatedness and similarity measurement. J. Am. Soc. Inf. Sci. 71(6), 657–670 (2020)
Limwattana, S., Prom-On, S.: Topic modeling enhancement using word embeddings. In: 2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE) (2021)
Zhao, H., Du, L., Liu, G., et al.: Leveraging meta information in short text aggregation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019)
Yan, X., Guo, J., Lan, Y., et al.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)
Wu, T., Qi, G., Wang, H., et al.: Cross-Lingual taxonomy alignment with bilingual biterm topic model. In: AAAI, pp. 287–293 (2016)
Zhu, Q., Feng, Z., Li, X.: GraphBTM: graph enhanced autoencoded variational inference for biterm topic model. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4663–4672 (2018)
Li, X., Zhang, A., Li, C., et al.: Relational biterm topic model: Short-text topic modeling using word embeddings. Comput. J. 62(3), 359–372 (2019)
Huang, J., Peng, M., Li, P., et al.: Improving biterm topic model with word embeddings. World Wide Web 23(6), 3099–3124 (2020)
Nguyen, D.Q., Billingsley, R., Du, L., et al.: Improving topic models with latent feature word representations. Trans. Assoc. Comput. Linguist. 3, 299–313 (2015)
Li, C., Wang, H., Zhang, Z., et al.: Topic modeling for short texts with auxiliary word embeddings. In: International ACM SIGIR Conference, pp. 165–174. ACM (2016)
Gao, W., Peng, M., Wang, H., Zhang, Y., Xie, Q., Tian, G.: Incorporating word embeddings into topic modeling of short text. Knowl. Inf. Syst. 61(2), 1123–1145 (2018). https://doi.org/10.1007/s10115-018-1314-7
Yi, F., Jiang, B., Wu, J.: Topic modeling for short texts via word embedding and document correlation. IEEE Access PP(99), 1 (2020)
Archambeau, C., Lakshminarayanan, B., Bouchard, G.: Latent IBP compound Dirichlet allocation. IEEE Trans. Pattern Anal. Mach. Intell. 37(2), 321–333 (2014)
Wu, X., Li, C., Zhu, Y., et al.: Short text topic modeling with topic distribution quantization and negative sampling decoder. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020)
Wallach, H.M., Minmo, D., Mccallum, A.: Rethinking LDA: why priors matter. Adv. Neural. Inf. Process. Syst. 23, 1973–1981 (2009)
Lau, J.H., Newman, D., Baldwin, T.: Machine reading tea leaves: automatically evaluating topic coherence and topic model quality. In: The 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 530–533 (2014)
Acknowledgments
This work is supported by Key Research and Development Projects of Heilongjiang Province under grant number GA21C020, and Natural Science Foundation of Heilongjiang Province under grant number LH2021F015.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Liu, G., Wan, T., Yu, J., Zhan, K., Wang, W. (2023). BCTM: A Topic Modeling Method Based on External Information. In: Wang, W., Wu, J. (eds) Broadband Communications, Networks, and Systems. BROADNETS 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 511. Springer, Cham. https://doi.org/10.1007/978-3-031-40467-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-40467-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40466-5
Online ISBN: 978-3-031-40467-2
eBook Packages: Computer ScienceComputer Science (R0)