Abstract
This paper proposes a new image caption generative model for Memes called GUMI-AE. Meme denotes a humorous short sentence suitable for the given image in this paper. An Image caption generative model usually consists of an image encoder and a sentence decoder. Furthermore, most conventional models use a pre-trained neural network model for the image encoder, e.g., ResNet152 trained using ImageNet. However, pre-trained ResNet152 may not be effective as an encoder for extracting features from arbitrary images. Because the training samples for the meme generative model can be obtained from the website “Bokete” (in Japanese) which is a website that provides a system for people to post images and humorous short sentences associated with these images. Images posted on Bokete include a wide variety of images such as illustrations and text-only images which may be outside of the training images of ImageNet. This paper proposes an image caption generative model incorporating AutoEncoder (AE) as the image encoder. AE can be trained with the training samples obtained from Bokete without the image annotation. This enables the proposed method to generate short sentences with humor for memes. Finally, the proposed model is compared with the conventional one, and the evaluation of the proposed GUMI-AE will be discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goodfellow, I., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
OpenAI. https://chat.openai.com, ChatGPT. Accessed 6 June 2023
Vinyals, O., et al.: Show and tell: a neural image caption generator. In: Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2015)
Akhther, N.: Internet Memes as Form of Cultural Discourse: A Rhetorical Analysis on Facebook. PsyArXiv (2021)
He, K., Zhang, et al.: Deep residual learning for image recognition. In: Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Szegedy, C., et al.: Rethinking the inception architecture for computer vision. In: Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826 (2016)
Deng, J., et al.: ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), pp. 248–255 (2009)
Peirson, V., et al.: Dank learning: generating memes using deep neural networks. arXiv preprint arXiv:1806.04510 (2018)
Yoshida, K., et al.: Neural joking machine: humorous image captioning. In: Proceedings of IEEE Computer Vision and Pattern Recognition Conference (CVPR) Language and Vision Workshop (2018)
Memegenerator.net. https://memegenerator.net. Accessed 7 June 2023
Omoroki INC. https://bokete.jp, Bokete. Accessed 7 June 2023
Hinton, G.E., et al.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Graves, A., et al.: Long short-term memory. In: Supervised Sequence Labeling with Recurrent Neural Networks, vol. 385, pp. 37–45. Springer, Cham (2012). https://doi.org/10.1007/978-3-642-24797-2_4
LeCun, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Williams, R.J., et al.: A learning algorithm for continually running fully recurrent neural networks. Neural Comput. 1(2), 270–280 (1989)
Kingma, D.P., et al.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (ICLR), pp. 1–13 (2015)
Yoshikawa, Y., et al.: STAIR captions: constructing a large-scale Japanese image caption dataset. In: Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), vol. 2, pp. 417–421 (2017)
LINE Corp. https://linecorp.com, LINE. Accessed 7 June 2023
Acknowledgements
This work is supported by The Japan Society Promotion of Science (JSPS), KAKENHI (23K11267).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yamatomi, R., Mahboubi, S., Ninomiya, H. (2024). Generative Model of Suitable Meme Sentences for Images Using AutoEncoder. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14325. Springer, Singapore. https://doi.org/10.1007/978-981-99-7019-3_23
Download citation
DOI: https://doi.org/10.1007/978-981-99-7019-3_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7018-6
Online ISBN: 978-981-99-7019-3
eBook Packages: Computer ScienceComputer Science (R0)