Abstract
Existing topic modelling methods primarily use text features to discover topics without considering other data modalities such as images. The recent advances in multi-modal representation learning show that the multi-modality features are useful to enhance the semantic information within the text data for downstream tasks. This paper proposes a novel Neural Topic Model framework in a multi-modal setting where visual and textual information are utilized to derive text-based topic models. The framework includes a Gated Data Fusion module to learn the textual-specific visual representations for generating contextualized multi-modality features. These features are then mapped into a joint latent space by using a Neural Topic Model to learn topic distributions. Experiments on diverse datasets show that the proposed framework improves topic quality significantly.
D. Zhang and Y. Wang—Equal contribution.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Alam, F., Ofli, F., Imran, M.: Crisismmd: multimodal twitter datasets from natural disasters. In: Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM) (June 2018)
Arevalo, J., Solorio, T., Montes-y Gómez, M., González, F.A.: Gated multimodal units for information fusion. arXiv preprint arXiv:1702.01992 (2017)
Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018)
Bashar, M.A., Nayak, R., Balasubramaniam, T.: Deep learning based topic and sentiment analysis: Covid19 information seeking on social media. Soc. Netw. Anal. Min. 12(1), 1–15 (2022)
Bianchi, F., Terragni, S., Hovy, D.: Pre-training is a hot topic: Contextualized document embeddings improve topic coherence. arXiv preprint arXiv:2004.03974 (2020)
Blei, D.M., Lafferty, J.D.: A correlated topic model of science. Ann. Appl. Stat. 1(1), 17–35 (2007)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dieng, A.B., Ruiz, F.J., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguistics 8, 439–453 (2020)
Grootendorst, M.: Bertopic: neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794 (2022)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Illingworth, V.: The Penguin Dictionary of Physics 4e. National Geographic Books (2009)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Reimers, N., Gurevych, I.S.B.: Sentence embeddings using siamese bert-networks. arxiv 2019. arXiv preprint arXiv:1908.10084 (1908)
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408 (2015)
Roller, S., Im Walde, S.S.: A multimodal lda model integrating textual, cognitive and visual modalities. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1146–1157 (2013)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Terragni, S., Fersini, E., Galuzzi, B.G., Tropeano, P., Candelieri, A.: Octis: comparing and optimizing topic models is simple! In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 263–270 (2021)
Zhang, D., Nayak, R., Bashar, M.A.: Exploring fusion strategies in deep learning models for multi-modal classification. In: Australasian Conference on Data Mining, pp. 102–117. Springer (2021)
Zhang, L., et al.: Pre-training and fine-tuning neural topic model: A simple yet effective approach to incorporating external knowledge. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 5980–5989 (2022)
Zhao, H., Phung, D., Huynh, V., Jin, Y., Du, L., Buntine, W.: Topic modelling meets deep neural networks: a survey. arXiv preprint arXiv:2103.00498 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, D., Wang, Y., Bashar, M.A., Nayak, R. (2023). Enhanced Topic Modeling with Multi-modal Representation Learning. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13935. Springer, Cham. https://doi.org/10.1007/978-3-031-33374-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-33374-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33373-6
Online ISBN: 978-3-031-33374-3
eBook Packages: Computer ScienceComputer Science (R0)