skip to main content
10.1145/3581783.3611996acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment

Published: 27 October 2023 Publication History

Abstract

Image aesthetic assessment (IAA) has drawn wide attention in recent years as more and more users post images and texts on the Internet to share their views. The intense subjectivity and complexity of IAA make it extremely challenging. Text triggers the subjective expression of human aesthetic experience based on human implicit memory, so incorporating the textual information and identifying the relationship with the image is of great importance for IAA. However, IAA with the image as input fails to fully consider subjectivity, while existing multimodal IAA ignores the interrelationship among modalities. To this end, we propose a brain-inspired multimodal interaction network (BMI-Net) that simulates how the association area of the cerebral cortex processes sensory stimuli. In particular, the knowledge integration LSTM (KI-LSTM) is proposed to learn the image-text interaction relation. The proposed scalable multimodal fusion (SMF) based on low-rank decomposition fuses image, text and interaction modalities to predict the aesthetic distribution. Extensive experiments show that the proposed BMI-Net outperforms existing state-of-the-art methods on three IAA tasks.

References

[1]
Simone Bianco, Claudio Cusano, Flavio Piccoli, and Raimondo Schettini. 2020. Personalized image enhancement using neural spline color transforms. IEEE Transactions on Image Processing, Vol. 29 (2020), 6223--6236.
[2]
Luigi Celona, Marco Leonardi, Paolo Napoletano, and Alessandro Rozza. 2022. Composition and style attributes guided image aesthetic assessment. IEEE Transactions on Image Processing, Vol. 31 (2022), 5009--5024.
[3]
Anjan Chatterjee and Oshin Vartanian. 2014. Neuroaesthetics. Trends in Cognitive Sciences, Vol. 18, 7 (2014), 370--375.
[4]
Neal J Cohen and Larry R Squire. 1980. Preserved learning and retention of pattern-analyzing skill in amnesia: Dissociation of knowing how and knowing that. Science, Vol. 210, 4466 (1980), 207--210.
[5]
Ritendra Datta, Jia Li, and James Z Wang. 2008. Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. In Proceedings of the 15th IEEE International Conference on Image Processing. 105--108.
[6]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.
[7]
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6645--6649.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[9]
Shuai He, Yongchang Zhang, Rui Xie, Dongxiang Jiang, and Anlong Ming. 2022. Rethinking image aesthetics assessment: models, datasets and benchmarks. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. 942--948.
[10]
Yong-Lian Hii, John See, Magzhan Kairanbay, and Lai-Kuan Wong. 2017. Multigap: Multi-pooled inception network with text augmentation for aesthetic prediction of photographs. In IEEE International Conference on Image Processing (ICIP). 1722--1726.
[11]
Vlad Hosu, Bastian Goldlucke, and Dietmar Saupe. 2019. Effective aesthetics prediction with multi-level spatially pooled features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9375--9383.
[12]
Jingwen Hou, Henghui Ding, Weisi Lin, Weide Liu, and Yuming Fang. 2022a. Distilling knowledge from object classification to aesthetics assessment. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 11 (2022), 7386--7402.
[13]
Jingwen Hou, Weisi Lin, Guanghui Yue, Weide Liu, and Baoquan Zhao. 2022b. Interaction-Matrix based personalized image aesthetics assessment. IEEE Transactions on Multimedia (2022).
[14]
Yueying Kao, Ran He, and Kaiqi Huang. 2017. Deep aesthetic quality assessment with semantic information. IEEE Transactions on Image Processing, Vol. 26, 3 (2017), 1482--1495.
[15]
Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, and Feng Yang. 2023. VILA: Learning image aesthetics from user comments with vision-language pretraining. arXiv preprint arXiv:2303.14302 (2023).
[16]
Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[17]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM, Vol. 60, 6 (2017), 84--90.
[18]
Elizaveta Levina and Peter Bickel. 2001. The earth mover's distance is the mallows distance: Some insights from statistics. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 2. 251--256.
[19]
Peng Lu, Jiahui Liu, Xujun Peng, and Xiaojie Wang. 2020. Weakly supervised real-time image cropping based on aesthetic distributions. In Proceedings of the 28th ACM International Conference on Multimedia. 120--128.
[20]
Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z Wang. 2014. Rapid: Rating pictorial aesthetics using deep learning. In Proceedings of the 22nd ACM International Conference on Multimedia. 457--466.
[21]
Xin Lu, Zhe Lin, Xiaohui Shen, Radomir Mech, and James Z Wang. 2015. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In Proceedings of the IEEE International Conference on Computer Vision. 990--998.
[22]
Shuang Ma, Jing Liu, and Chang Wen Chen. 2017. A-lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4535--4544.
[23]
Luca Marchesotti, Florent Perronnin, Diane Larlus, and Gabriela Csurka. 2011. Assessing the aesthetic quality of photographs using generic image descriptors. In Proceedings of the IEEE International Conference on Computer Vision. 1784--1791.
[24]
Winfried Menninghaus, Valentin Wagner, Eugen Wassiliwizky, Ines Schindler, Julian Hanich, Thomas Jacobsen, and Stefan Koelsch. 2019. What are aesthetic emotions? Psychological Review, Vol. 126, 2 (2019), 171.
[25]
M. T. Pearce, D. W. Zaidel, O. Vartanian, M. Skov, H. Leder, A. Chatterjee, and M. Nadal. 2016. Neuroaesthetics: The cognitive neuroscience of aesthetic experience. Perspectives on Psychological Science, Vol. 11, 2 (2016), 265--279.
[26]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532--1543.
[27]
F. Perronnin. 2012. AVA: A large-scale database for aesthetic visual analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2408--2415.
[28]
Jia Li Ritendra Datta, Dhiraj Joshi and James Z. Wang. 2006. Studying aesthetics in photographic images using a computational approach. In Proceedings of the European Conference on Computer Vision. 288--301.
[29]
Michael N. Shadlen and Daphna Shohamy. 2016. Decision making and sequential sampling from memory. Neuron, Vol. 90, 5 (2016), 927--939.
[30]
Dongyu She, Yu-Kun Lai, Gaoxiong Yi, and Kun Xu. 2021. Hierarchical layout-aware graph convolutional network for unified aesthetics assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8471--8480.
[31]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[32]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.
[33]
Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE Transactions on Image Processing, Vol. 27, 8 (2018), 3998--4011.
[34]
Hanghang Tong, Mingjing Li, Hong Jiang Zhang, Jingrui He, and Changshui Zhang. 2004. Classification of digital photos taken by photographers or home users. In Proceedings of the 5th Pacific Rim Conference on Advances in Multimedia Information Processing (PCM). 198--205.
[35]
Giuseppe Valenzise, Chen Kang, and Dufaux. 2022. Advances and challenges in computational image aesthetics. Human Perception of Visual Information (2022), 133--181.
[36]
Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei Xiong, Hongwen Kang, and Zhouhui Lian. 2022a. Aesthetic text logo synthesis via content-aware layout inferring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2436--2445.
[37]
Zhizhong Wang, Zhanjie Zhang, Lei Zhao, Zhiwen Zuo, Ailin Li, Wei Xing, and Dongming Lu. 2022b. AesUST: Towards aesthetic-enhanced universal style transfer. In Proceedings of the 30th ACM International Conference on Multimedia. 1095--1106.
[38]
Ou Wu, Weiming Hu, and Jun Gao. 2011. Learning to predict the perceived visual quality of photos. In Proceedings of the IEEE International Conference on Computer Vision. 225--232.
[39]
Zhou Yu, Jun Yu, Chenchao Xiang, Jianping Fan, and Dacheng Tao. 2018. Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering. IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, 12 (2018), 5947--5959.
[40]
A. Zadeh, M. Chen, S. Poria, E. Cambria, and L. P. Morency. 2017a. Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250 (2017).
[41]
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017b. Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250 (2017).
[42]
Xiaodan Zhang, Xinbo Gao, Wen Lu, and Lihuo He. 2019. A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction. IEEE Transactions on Multimedia, Vol. 21, 11 (2019), 2815--2826.
[43]
Xiaodan Zhang, Xinbo Gao, Wen Lu, Lihuo He, and Jie Li. 2021. Beyond vision: A multimodal recurrent attention convolutional neural network for unified image aesthetic prediction tasks. IEEE Transactions on Multimedia, Vol. 23 (2021), 611--623.
[44]
Ye Zhou, Xin Lu, Junping Zhang, and James Z Wang. 2016. Joint image and text representation for aesthetics analysis. In Proceedings of the 24th ACM International Conference on Multimedia. 262--266.
[45]
Hancheng Zhu, Leida Li, Jinjian Wu, Sicheng Zhao, Guiguang Ding, and Guangming Shi. 2022. Personalized image aesthetics assessment via meta-learning with bilevel gradient optimization. IEEE Transactions on Cybernetics, Vol. 52, 3 (2022), 1798--1811.
[46]
Hancheng Zhu, Yong Zhou, Leida Li, Yaqian Li, and Yandong Guo. 2023. Learning personalized image aesthetics from subjective and objective attributes. IEEE Transactions on Multimedia, Vol. 25 (2023), 179--190.

Cited By

View all
  • (2025)Omnidirectional Image Quality Captioning: A Large-Scale Database and a New ModelIEEE Transactions on Image Processing10.1109/TIP.2025.353946834(1326-1339)Online publication date: 2025
  • (2024)"Special Relativity" of Image Aesthetics Assessment: a Preliminary Empirical PerspectiveProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681172(2554-2563)Online publication date: 28-Oct-2024
  • (2024)AesMamba: Universal Image Aesthetic Assessment with State Space ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681011(7444-7453)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bi-directional long short-term memory
    2. image aesthetic assessment
    3. image-text interaction
    4. implicit memory

    Qualifiers

    • Research-article

    Funding Sources

    • the Creative Research Groups of Chongqing Municipal Education Commission
    • the National Natural Science Foundation of China
    • Chongqing Excellent Scientist Project
    • Chongqing University of Posts and Telecommunications Ph.D. Innovative Talents Project

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)158
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Omnidirectional Image Quality Captioning: A Large-Scale Database and a New ModelIEEE Transactions on Image Processing10.1109/TIP.2025.353946834(1326-1339)Online publication date: 2025
    • (2024)"Special Relativity" of Image Aesthetics Assessment: a Preliminary Empirical PerspectiveProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681172(2554-2563)Online publication date: 28-Oct-2024
    • (2024)AesMamba: Universal Image Aesthetic Assessment with State Space ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681011(7444-7453)Online publication date: 28-Oct-2024
    • (2024)Beyond pixels: text-guided deep insights into graphic design image aestheticsJournal of Electronic Imaging10.1117/1.JEI.33.5.05305933:05Online publication date: 1-Sep-2024
    • (2024)Semantic and style based multiple reference learning for artistic and general image aesthetic assessmentNeurocomputing10.1016/j.neucom.2024.127434582:COnline publication date: 9-Jul-2024
    • (2024)View adjustment: helping users improve photographic compositionMultimedia Systems10.1007/s00530-024-01490-x30:5Online publication date: 26-Sep-2024
    • (2024)Advancing neural aesthetic assessment of artistic images based on bundle features integrationThe Visual Computer10.1007/s00371-024-03732-5Online publication date: 10-Dec-2024
    • (2024)A Computational Aesthetic Design Science Study on Online Video Based on Triple-Dimensional Multimodal AnalysisHCI International 2024 – Late Breaking Papers10.1007/978-3-031-76821-7_6(68-79)Online publication date: 17-Dec-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media