Enhanced Topic Modeling with Multi-modal Representation Learning

Zhang, Duoyi; Wang, Yue; Bashar, Md Abul; Nayak, Richi

doi:10.1007/978-3-031-33374-3_31

Enhanced Topic Modeling with Multi-modal Representation Learning

Conference paper
First Online: 27 May 2023

1231 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13935))

Abstract

Existing topic modelling methods primarily use text features to discover topics without considering other data modalities such as images. The recent advances in multi-modal representation learning show that the multi-modality features are useful to enhance the semantic information within the text data for downstream tasks. This paper proposes a novel Neural Topic Model framework in a multi-modal setting where visual and textual information are utilized to derive text-based topic models. The framework includes a Gated Data Fusion module to learn the textual-specific visual representations for generating contextualized multi-modality features. These features are then mapped into a joint latent space by using a Neural Topic Model to learn topic distributions. Experiments on diverse datasets show that the proposed framework improves topic quality significantly.

D. Zhang and Y. Wang—Equal contribution.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/Duoyi1/GDF-NTM.
2.
This dataset is available on https://github.com/Duoyi1/GDF-NTM.
3.
https://developer.twitter.com/en/docs/twitter-api.
4.
https://radimrehurek.com/gensim/.
5.
https://github.com/adjidieng/ETM.
6.
https://github.com/MilaNLProc/contextualized-topic-models.

References

Alam, F., Ofli, F., Imran, M.: Crisismmd: multimodal twitter datasets from natural disasters. In: Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM) (June 2018)
Google Scholar
Arevalo, J., Solorio, T., Montes-y Gómez, M., González, F.A.: Gated multimodal units for information fusion. arXiv preprint arXiv:1702.01992 (2017)
Baltrušaitis, T., Ahuja, C., Morency, L.P.: Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41(2), 423–443 (2018)
Article Google Scholar
Bashar, M.A., Nayak, R., Balasubramaniam, T.: Deep learning based topic and sentiment analysis: Covid19 information seeking on social media. Soc. Netw. Anal. Min. 12(1), 1–15 (2022)
Article Google Scholar
Bianchi, F., Terragni, S., Hovy, D.: Pre-training is a hot topic: Contextualized document embeddings improve topic coherence. arXiv preprint arXiv:2004.03974 (2020)
Blei, D.M., Lafferty, J.D.: A correlated topic model of science. Ann. Appl. Stat. 1(1), 17–35 (2007)
Article MathSciNet Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dieng, A.B., Ruiz, F.J., Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguistics 8, 439–453 (2020)
Article Google Scholar
Grootendorst, M.: Bertopic: neural topic modeling with a class-based tf-idf procedure. arXiv preprint arXiv:2203.05794 (2022)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Illingworth, V.: The Penguin Dictionary of Physics 4e. National Geographic Books (2009)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Reimers, N., Gurevych, I.S.B.: Sentence embeddings using siamese bert-networks. arxiv 2019. arXiv preprint arXiv:1908.10084 (1908)
Röder, M., Both, A., Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408 (2015)
Google Scholar
Roller, S., Im Walde, S.S.: A multimodal lda model integrating textual, cognitive and visual modalities. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1146–1157 (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Terragni, S., Fersini, E., Galuzzi, B.G., Tropeano, P., Candelieri, A.: Octis: comparing and optimizing topic models is simple! In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 263–270 (2021)
Google Scholar
Zhang, D., Nayak, R., Bashar, M.A.: Exploring fusion strategies in deep learning models for multi-modal classification. In: Australasian Conference on Data Mining, pp. 102–117. Springer (2021)
Google Scholar
Zhang, L., et al.: Pre-training and fine-tuning neural topic model: A simple yet effective approach to incorporating external knowledge. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 5980–5989 (2022)
Google Scholar
Zhao, H., Phung, D., Huynh, V., Jin, Y., Du, L., Buntine, W.: Topic modelling meets deep neural networks: a survey. arXiv preprint arXiv:2103.00498 (2021)

Download references

Author information

Authors and Affiliations

Centre for Data Science, School of Computer Science, Queensland University of Technology, Brisbane, Queensland, 4000, Australia
Duoyi Zhang, Yue Wang, Md Abul Bashar & Richi Nayak

Authors

Duoyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Md Abul Bashar
View author publications
You can also search for this author in PubMed Google Scholar
Richi Nayak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Wang .

Editor information

Editors and Affiliations

Kyoto University, Kyoto, Japan
Hisashi Kashima
IBM Research, Thomas J. Watson Research Center, Yorktown Heights, NY, USA
Tsuyoshi Ide
National Chiao Tung University, Hsinchu, Taiwan
Wen-Chih Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, D., Wang, Y., Bashar, M.A., Nayak, R. (2023). Enhanced Topic Modeling with Multi-modal Representation Learning. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13935. Springer, Cham. https://doi.org/10.1007/978-3-031-33374-3_31

Download citation

DOI: https://doi.org/10.1007/978-3-031-33374-3_31
Published: 27 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33373-6
Online ISBN: 978-3-031-33374-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics