research-article

BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment

Authors:

Bin XiaoAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 5514 - 5522

https://doi.org/10.1145/3581783.3611996

Published: 27 October 2023 Publication History

Abstract

Image aesthetic assessment (IAA) has drawn wide attention in recent years as more and more users post images and texts on the Internet to share their views. The intense subjectivity and complexity of IAA make it extremely challenging. Text triggers the subjective expression of human aesthetic experience based on human implicit memory, so incorporating the textual information and identifying the relationship with the image is of great importance for IAA. However, IAA with the image as input fails to fully consider subjectivity, while existing multimodal IAA ignores the interrelationship among modalities. To this end, we propose a brain-inspired multimodal interaction network (BMI-Net) that simulates how the association area of the cerebral cortex processes sensory stimuli. In particular, the knowledge integration LSTM (KI-LSTM) is proposed to learn the image-text interaction relation. The proposed scalable multimodal fusion (SMF) based on low-rank decomposition fuses image, text and interaction modalities to predict the aesthetic distribution. Extensive experiments show that the proposed BMI-Net outperforms existing state-of-the-art methods on three IAA tasks.

References

[1]

Simone Bianco, Claudio Cusano, Flavio Piccoli, and Raimondo Schettini. 2020. Personalized image enhancement using neural spline color transforms. IEEE Transactions on Image Processing, Vol. 29 (2020), 6223--6236.

[2]

Luigi Celona, Marco Leonardi, Paolo Napoletano, and Alessandro Rozza. 2022. Composition and style attributes guided image aesthetic assessment. IEEE Transactions on Image Processing, Vol. 31 (2022), 5009--5024.

Digital Library

[3]

Anjan Chatterjee and Oshin Vartanian. 2014. Neuroaesthetics. Trends in Cognitive Sciences, Vol. 18, 7 (2014), 370--375.

[4]

Neal J Cohen and Larry R Squire. 1980. Preserved learning and retention of pattern-analyzing skill in amnesia: Dissociation of knowing how and knowing that. Science, Vol. 210, 4466 (1980), 207--210.

[5]

Ritendra Datta, Jia Li, and James Z Wang. 2008. Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. In Proceedings of the 15th IEEE International Conference on Image Processing. 105--108.

[6]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.

[7]

Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6645--6649.

[8]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[9]

Shuai He, Yongchang Zhang, Rui Xie, Dongxiang Jiang, and Anlong Ming. 2022. Rethinking image aesthetics assessment: models, datasets and benchmarks. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. 942--948.

[10]

Yong-Lian Hii, John See, Magzhan Kairanbay, and Lai-Kuan Wong. 2017. Multigap: Multi-pooled inception network with text augmentation for aesthetic prediction of photographs. In IEEE International Conference on Image Processing (ICIP). 1722--1726.

[11]

Vlad Hosu, Bastian Goldlucke, and Dietmar Saupe. 2019. Effective aesthetics prediction with multi-level spatially pooled features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9375--9383.

[12]

Jingwen Hou, Henghui Ding, Weisi Lin, Weide Liu, and Yuming Fang. 2022a. Distilling knowledge from object classification to aesthetics assessment. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32, 11 (2022), 7386--7402.

Digital Library

[13]

Jingwen Hou, Weisi Lin, Guanghui Yue, Weide Liu, and Baoquan Zhao. 2022b. Interaction-Matrix based personalized image aesthetics assessment. IEEE Transactions on Multimedia (2022).

[14]

Yueying Kao, Ran He, and Kaiqi Huang. 2017. Deep aesthetic quality assessment with semantic information. IEEE Transactions on Image Processing, Vol. 26, 3 (2017), 1482--1495.

Digital Library

[15]

Junjie Ke, Keren Ye, Jiahui Yu, Yonghui Wu, Peyman Milanfar, and Feng Yang. 2023. VILA: Learning image aesthetics from user comments with vision-language pretraining. arXiv preprint arXiv:2303.14302 (2023).

[16]

Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[17]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM, Vol. 60, 6 (2017), 84--90.

Digital Library

[18]

Elizaveta Levina and Peter Bickel. 2001. The earth mover's distance is the mallows distance: Some insights from statistics. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 2. 251--256.

[19]

Peng Lu, Jiahui Liu, Xujun Peng, and Xiaojie Wang. 2020. Weakly supervised real-time image cropping based on aesthetic distributions. In Proceedings of the 28th ACM International Conference on Multimedia. 120--128.

Digital Library

[20]

Xin Lu, Zhe Lin, Hailin Jin, Jianchao Yang, and James Z Wang. 2014. Rapid: Rating pictorial aesthetics using deep learning. In Proceedings of the 22nd ACM International Conference on Multimedia. 457--466.

Digital Library

[21]

Xin Lu, Zhe Lin, Xiaohui Shen, Radomir Mech, and James Z Wang. 2015. Deep multi-patch aggregation network for image style, aesthetics, and quality estimation. In Proceedings of the IEEE International Conference on Computer Vision. 990--998.

Digital Library

[22]

Shuang Ma, Jing Liu, and Chang Wen Chen. 2017. A-lamp: Adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4535--4544.

[23]

Luca Marchesotti, Florent Perronnin, Diane Larlus, and Gabriela Csurka. 2011. Assessing the aesthetic quality of photographs using generic image descriptors. In Proceedings of the IEEE International Conference on Computer Vision. 1784--1791.

Digital Library

[24]

Winfried Menninghaus, Valentin Wagner, Eugen Wassiliwizky, Ines Schindler, Julian Hanich, Thomas Jacobsen, and Stefan Koelsch. 2019. What are aesthetic emotions? Psychological Review, Vol. 126, 2 (2019), 171.

[25]

M. T. Pearce, D. W. Zaidel, O. Vartanian, M. Skov, H. Leder, A. Chatterjee, and M. Nadal. 2016. Neuroaesthetics: The cognitive neuroscience of aesthetic experience. Perspectives on Psychological Science, Vol. 11, 2 (2016), 265--279.

[26]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532--1543.

[27]

F. Perronnin. 2012. AVA: A large-scale database for aesthetic visual analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2408--2415.

[28]

Jia Li Ritendra Datta, Dhiraj Joshi and James Z. Wang. 2006. Studying aesthetics in photographic images using a computational approach. In Proceedings of the European Conference on Computer Vision. 288--301.

[29]

Michael N. Shadlen and Daphna Shohamy. 2016. Decision making and sequential sampling from memory. Neuron, Vol. 90, 5 (2016), 927--939.

[30]

Dongyu She, Yu-Kun Lai, Gaoxiong Yi, and Kun Xu. 2021. Hierarchical layout-aware graph convolutional network for unified aesthetics assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8471--8480.

[31]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[32]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.

[33]

Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE Transactions on Image Processing, Vol. 27, 8 (2018), 3998--4011.

[34]

Hanghang Tong, Mingjing Li, Hong Jiang Zhang, Jingrui He, and Changshui Zhang. 2004. Classification of digital photos taken by photographers or home users. In Proceedings of the 5th Pacific Rim Conference on Advances in Multimedia Information Processing (PCM). 198--205.

Digital Library

[35]

Giuseppe Valenzise, Chen Kang, and Dufaux. 2022. Advances and challenges in computational image aesthetics. Human Perception of Visual Information (2022), 133--181.

[36]

Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei Xiong, Hongwen Kang, and Zhouhui Lian. 2022a. Aesthetic text logo synthesis via content-aware layout inferring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2436--2445.

[37]

Zhizhong Wang, Zhanjie Zhang, Lei Zhao, Zhiwen Zuo, Ailin Li, Wei Xing, and Dongming Lu. 2022b. AesUST: Towards aesthetic-enhanced universal style transfer. In Proceedings of the 30th ACM International Conference on Multimedia. 1095--1106.

Digital Library

[38]

Ou Wu, Weiming Hu, and Jun Gao. 2011. Learning to predict the perceived visual quality of photos. In Proceedings of the IEEE International Conference on Computer Vision. 225--232.

[39]

Zhou Yu, Jun Yu, Chenchao Xiang, Jianping Fan, and Dacheng Tao. 2018. Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering. IEEE Transactions on Neural Networks and Learning Systems, Vol. 29, 12 (2018), 5947--5959.

[40]

A. Zadeh, M. Chen, S. Poria, E. Cambria, and L. P. Morency. 2017a. Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250 (2017).

[41]

Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017b. Tensor fusion network for multimodal sentiment analysis. arXiv preprint arXiv:1707.07250 (2017).

[42]

Xiaodan Zhang, Xinbo Gao, Wen Lu, and Lihuo He. 2019. A gated peripheral-foveal convolutional neural network for unified image aesthetic prediction. IEEE Transactions on Multimedia, Vol. 21, 11 (2019), 2815--2826.

[43]

Xiaodan Zhang, Xinbo Gao, Wen Lu, Lihuo He, and Jie Li. 2021. Beyond vision: A multimodal recurrent attention convolutional neural network for unified image aesthetic prediction tasks. IEEE Transactions on Multimedia, Vol. 23 (2021), 611--623.

[44]

Ye Zhou, Xin Lu, Junping Zhang, and James Z Wang. 2016. Joint image and text representation for aesthetics analysis. In Proceedings of the 24th ACM International Conference on Multimedia. 262--266.

Digital Library

[45]

Hancheng Zhu, Leida Li, Jinjian Wu, Sicheng Zhao, Guiguang Ding, and Guangming Shi. 2022. Personalized image aesthetics assessment via meta-learning with bilevel gradient optimization. IEEE Transactions on Cybernetics, Vol. 52, 3 (2022), 1798--1811.

[46]

Hancheng Zhu, Yong Zhou, Leida Li, Yaqian Li, and Yandong Guo. 2023. Learning personalized image aesthetics from subjective and objective attributes. IEEE Transactions on Multimedia, Vol. 25 (2023), 179--190.

Cited By

Yan JTan ZFang YChen JJiang WWang Z(2025)Omnidirectional Image Quality Captioning: A Large-Scale Database and a New ModelIEEE Transactions on Image Processing10.1109/TIP.2025.353946834(1326-1339)Online publication date: 2025
https://doi.org/10.1109/TIP.2025.3539468
Xie RMing AHe SXiao YMa HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)"Special Relativity" of Image Aesthetics Assessment: a Preliminary Empirical PerspectiveProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681172(2554-2563)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681172
Gao FLin YShi JQiao MWang NCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)AesMamba: Universal Image Aesthetic Assessment with State Space ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681011(7444-7453)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681011
Show More Cited By

Index Terms

BMI-Net: A Brain-inspired Multimodal Interaction Network for Image Aesthetic Assessment
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks

Recommendations

Automatic Image Aesthetic Assessment for Human-designed Digital Images
McGE '23: Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

Recently, with the ever-growing scale of aesthetic assessment data, researchers have the image aesthetic assessment (IAA) task. Meanwhile, as technology developing, there are more and more human-designed digital images through software like Photoshop on ...
Emotion-aware hierarchical interaction network for multimodal image aesthetics assessment
Abstract
Image aesthetics assessment (IAA) has attracted increasing attention recently but is still challenging due to its high abstraction and complexity. Intuitively, image emotion and aesthetics are both human subjective feelings evoked by visual ...
Highlights
- Human emotional experience potentially affects image aesthetics perception.
- Exploiting emotion information to enhance aesthetic learning.
- User comments provide aesthetic and emotional semantic information.
- Interactions between ...
Image Aesthetic Assessment Based on Emotion-Assisted Multi-Task Learning Network
ICMSSP '21: Proceedings of the 2021 6th International Conference on Multimedia Systems and Signal Processing

Image emotion recognition and image aesthetic assessment are recent research hotspots in user perception of image content. However, for the study of image aesthetics and image emotion, the vast majority of studies are separated from the relationship ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the Creative Research Groups of Chongqing Municipal Education Commission
the National Natural Science Foundation of China
Chongqing Excellent Scientist Project
Chongqing University of Posts and Telecommunications Ph.D. Innovative Talents Project

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
306
Total Downloads

Downloads (Last 12 months)158
Downloads (Last 6 weeks)15

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yan JTan ZFang YChen JJiang WWang Z(2025)Omnidirectional Image Quality Captioning: A Large-Scale Database and a New ModelIEEE Transactions on Image Processing10.1109/TIP.2025.353946834(1326-1339)Online publication date: 2025
https://doi.org/10.1109/TIP.2025.3539468
Xie RMing AHe SXiao YMa HCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)"Special Relativity" of Image Aesthetics Assessment: a Preliminary Empirical PerspectiveProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681172(2554-2563)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681172
Gao FLin YShi JQiao MWang NCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)AesMamba: Universal Image Aesthetic Assessment with State Space ModelsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681011(7444-7453)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681011
Shi GLi LSong M(2024)Beyond pixels: text-guided deep insights into graphic design image aestheticsJournal of Electronic Imaging10.1117/1.JEI.33.5.05305933:05Online publication date: 1-Sep-2024
https://doi.org/10.1117/1.JEI.33.5.053059
Shi TChen CLi XHao A(2024)Semantic and style based multiple reference learning for artistic and general image aesthetic assessmentNeurocomputing10.1016/j.neucom.2024.127434582:COnline publication date: 9-Jul-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127434
Sheng NKe YYang SYang YChen L(2024)View adjustment: helping users improve photographic compositionMultimedia Systems10.1007/s00530-024-01490-x30:5Online publication date: 26-Sep-2024
https://dl.acm.org/doi/10.1007/s00530-024-01490-x
Yan SXu SLei AZhang S(2024)Advancing neural aesthetic assessment of artistic images based on bundle features integrationThe Visual Computer10.1007/s00371-024-03732-5Online publication date: 10-Dec-2024
https://doi.org/10.1007/s00371-024-03732-5
Kang ZNah FSiau K(2024)A Computational Aesthetic Design Science Study on Online Video Based on Triple-Dimensional Multimodal AnalysisHCI International 2024 – Late Breaking Papers10.1007/978-3-031-76821-7_6(68-79)Online publication date: 17-Dec-2024
https://doi.org/10.1007/978-3-031-76821-7_6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten