Image aesthetics assessment using composite features from transformer and CNN

Ke, Yongzhen; Wang, Yin; Wang, Kai; Qin, Fan; Guo, Jing; Yang, Shuai

doi:10.1007/s00530-023-01141-7

Image aesthetics assessment using composite features from transformer and CNN

Regular Paper
Published: 01 August 2023

Volume 29, pages 2483–2494, (2023)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Yongzhen Ke^1,2,4,
Yin Wang^1,2,
Kai Wang^1,2,
Fan Qin³,
Jing Guo^1,2 &
…
Shuai Yang^1,2

500 Accesses
Explore all metrics

Abstract

As a popular research problem in computational aesthetics, image aesthetic assessment has many important applications in image editing, retrieval, and recommendation. However, the existing mainstream CNN-based image aesthetic assessment methods are difficult to obtain the global aesthetic attributes of images well. To this end, we propose a two-stream image aesthetic assessment model that couples Transformer and CNN features. We use the traditional CNN network to extract the image’s local aesthetic feature in the first stream, apply the superpixel algorithm to segment the image, and then feed the segmented image region into the Transformer network to learn the image’s aesthetic global features in the second stream. Finally, the features learned by Transformer and CNN are fused to achieve the image aesthetic assessment. The experimental results on the AVA dataset show that our proposed method can obtain local and global aesthetic information on images, which enables the model to learn richer aesthetic information, and the combination of whole and part is more in line with human aesthetic characteristics. Our proposed model achieves an accuracy of 84.5% in the classification task, achieving optimal performance compared to existing methods and good performance in the other two tasks (Score Regression and Distribution).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Technological Development of Image Aesthetics Assessment

TSC-Net: Theme-Style-Color Guided Artistic Image Aesthetics Assessment Network

Image Aesthetic Quality Evaluation Using Convolution Neural Network Embedded Fine-Tune

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017). https://doi.org/10.1145/3065386
Article Google Scholar
She, D., Lai, Y.-K., Yi, G., Xu, K.: Hierarchical layout-aware graph convolutional network for unified aesthetics assessment. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 8471–8480 (2021). https://doi.org/10.1109/CVPR46437.2021.00837
Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In Presented at the NIPS June 12 (2017)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv. (2020)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv (2020)
Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 7242–7252 (2021). https://doi.org/10.1109/ICCV48922.2021.00717
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2274–2282 (2012). https://doi.org/10.1109/TPAMI.2012.120
Article Google Scholar
Lu, X., Lin, Z., Jin, H., Yang, J., Wang, J.Z.: RAPID: rating pictorial aesthetics using deep learning. In: Proceedings of the 22nd ACM International Conference on Multimedia. pp. 457–466 (2014). https://doi.org/10.1145/2647868.2654927
Talebi, H., Milanfar, P.: NIMA: neural image assessment. IEEE Trans. Image Process. 27, 3998–4011 (2018). https://doi.org/10.1109/TIP.2018.2831899
Article MathSciNet MATH Google Scholar
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR. (2014)
Szegedy, C., Wei Liu, Yangqing Jia, Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1–9 (2015). https://doi.org/10.1109/CVPR.2015.7298594
Wu, O., Hu, W., Gao, J: Learning to predict the perceived visual quality of photos. In: 2011 International Conference on Computer Vision. pp. 225–232 (2011). https://doi.org/10.1109/ICCV.2011.6126246
Kong, S., Shen, X., Lin, Z., Mech, R., Fowlkes, C.: Photo aesthetics ranking network with attributes and content adaptation. Vol. 9905, pp. 662–679 (2016). https://doi.org/10.1007/978-3-319-46448-0_40
Gao, F., Li, Z., Yu, J., Yu, J., Huang, Q., Tian, Q.: Style-adaptive photo aesthetic rating via convolutional neural networks and multi-task learning. Neurocomputing 395, 247–254 (2020). https://doi.org/10.1016/j.neucom.2018.06.099
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Murray, N., Marchesotti, L., Perronnin, F.: AVA: a large-scale database for aesthetic visual analysis. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. pp. 2408–2415 (2012). https://doi.org/10.1109/CVPR.2012.6247954
Yang, Y., Xu, L., Li, L., Qie, N., Li, Y., Zhang, P., Guo, Y.: Personalized image aesthetics assessment with rich attributes. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 19829–19837 (2022). https://doi.org/10.1109/CVPR52688.2022.01924
Zhu, H., Zhou, Y., Li, L., Li, Y., Guo, Y.: Learning personalized image aesthetics from subjective and objective attributes. IEEE Trans. Multimedia 25, 179–190 (2023). https://doi.org/10.1109/TMM.2021.3123468
Article Google Scholar
Zhu, H., Li, L., Wu, J., Zhao, S., Ding, G., Shi, G.: Personalized image aesthetics assessment via meta-learning with bilevel gradient optimization. IEEE Trans. Cybern. 52, 1798–1811 (2022). https://doi.org/10.1109/TCYB.2020.2984670
Article Google Scholar
Liu, D., Puri, R., Kamath, N., Bhattacharya, S.: Composition-aware image aesthetics assessment. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 3558–3567 (2020). https://doi.org/10.1109/WACV45572.2020.9093412
Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., Ye, Q.: Conformer: local features coupling global representations for visual recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 357–366 (2021). https://doi.org/10.1109/ICCV48922.2021.00042
Srinivas, A., Lin, T.-Y., Parmar, N., Shlens, J., Abbeel, P., Vaswani, A.: bottleneck transformers for visual recognition. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16514–16524 (2021). https://doi.org/10.1109/CVPR46437.2021.01625
Guo, J., Han, K., Wu, H., Tang, Y., Chen, X., Wang, Y., Xu, C.: CMT: Convolutional neural networks meet vision transformers. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 12165–12175 (2022). https://doi.org/10.1109/CVPR52688.2022.01186
Wu, H., Xiao, B., Codella, N., Liu, M., Dai, X., Yuan, L., Zhang, L.: CvT: introducing convolutions to vision transformers. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 22–31 (2021). https://doi.org/10.1109/ICCV48922.2021.00009
Li, K., Wang, Y., Gao, P., Song, G., Liu, Y., Li, H., Qiao, Y.: UniFormer: unified transformer for efficient spatiotemporal representation learning. arXiv (2022)
Deng, J., Dong, W., Socher, R., Li, L.-J., Kai, L., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Achanta, R., Susstrunk, S.: Superpixels and polygons using simple non-iterative clustering. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4895–4904 (2017). https://doi.org/10.1109/CVPR.2017.520
Van Den Bergh, M., Boix, X., Roig, G., Van Gool, L.: SEEDS: superpixels extracted via energy-driven sampling. Int. J. Comput. Vis. 111, 298–314 (2015). https://doi.org/10.1007/s11263-014-0744-2
Article MathSciNet Google Scholar
Yao, J., Boben, M., Fidler, S., Urtasun, R.: Real-time coarse-to-fine topologically preserving segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2947–2955 (2015). https://doi.org/10.1109/CVPR.2015.7298913
Li, L., Zhu, H., Zhao, S., Ding, G., Lin, W.: Personality-assisted multi-task learning for generic and personalized image aesthetics assessment. IEEE Trans. Image Process. 29, 3898–3910 (2020). https://doi.org/10.1109/TIP.2020.2968285
Article MATH Google Scholar
Chen, Q., Zhang, W., Zhou, N., Lei, P., Xu, Y., Zheng, Y., Fan, J.: Adaptive fractional dilated convolution network for image aesthetics assessment. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 14102–14111 (2020). https://doi.org/10.1109/CVPR42600.2020.01412
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2921–2929 (2016). https://doi.org/10.1109/CVPR.2016.319
Ma, S., Liu, J., Chen, C.W.: A-lamp: adaptive layout-aware multi-patch deep convolutional neural network for photo aesthetic assessment. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 722–731 (2017). https://doi.org/10.1109/CVPR.2017.84
Fu, X., Yan, J., Fan, C.: Image aesthetics assessment using composite features from off-the-shelf deep models. In: 2018 25th IEEE International Conference on Image Processing (ICIP). pp. 3528–3532 (2018). https://doi.org/10.1109/ICIP.2018.8451133
Hosu, V., Goldlucke, B., Saupe, D.: Effective aesthetics prediction with multi-level spatially pooled features. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 9367–9375 (2019). https://doi.org/10.1109/CVPR.2019.00960
Ko, K., Lee, J.-T., Kim, C.-S.: PAC-Net: pairwise aesthetic comparison network for image aesthetic assessment. In: 2018 25th IEEE International Conference on Image Processing (ICIP). pp. 2491–2495 (2018). https://doi.org/10.1109/ICIP.2018.8451621
Lee, J.-T., Kim, C.-S.: Image aesthetic assessment based on pairwise comparison—a unified approach to score regression, binary classification, and personalization. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 1191–1200 (2019). https://doi.org/10.1109/ICCV.2019.00128
Zeng, H., Cao, Z., Zhang, L., Bovik, A.C.: A unified probabilistic formulation of image aesthetic assessment. IEEE Trans. Image Process. 29, 1548–1561 (2020). https://doi.org/10.1109/TIP.2019.2941778
Article MathSciNet MATH Google Scholar
Murray, N., Gordo, A.: A deep architecture for unified aesthetic prediction. arXiv (2017)
Sheng, K., Dong, W., Ma, C., Mei, X., Huang, F., Hu, B.-G.: Attention-based multi-patch aggregation for image aesthetic assessment. In: Proceedings of the 26th ACM international conference on Multimedia. pp. 879–886 (2018). https://doi.org/10.1145/3240508.3240554

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Tiangong University, No. 399 Binshui Road, Tianjin, 300387, China
Yongzhen Ke, Yin Wang, Kai Wang, Jing Guo & Shuai Yang
Tianjin Key Laboratory of Autonomous Intelligence Technology and Systems, Tianjin, 300387, China
Yongzhen Ke, Yin Wang, Kai Wang, Jing Guo & Shuai Yang
Business School, Nankai University, Tianjin, 300071, China
Fan Qin
National Demonstration Center for Experimental Engineering Training Education (Tiangong University), Tianjin, 300387, China
Yongzhen Ke

Authors

Yongzhen Ke
View author publications
You can also search for this author inPubMed Google Scholar
Yin Wang
View author publications
You can also search for this author inPubMed Google Scholar
Kai Wang
View author publications
You can also search for this author inPubMed Google Scholar
Fan Qin
View author publications
You can also search for this author inPubMed Google Scholar
Jing Guo
View author publications
You can also search for this author inPubMed Google Scholar
Shuai Yang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Yongzhen Ke: Conceptualization, Methodology, Supervision, Project administration. Yin Wang: Methodology, Software, Writing - Original Draft, Writing - Review & Editing Kai Wang: Methodology, Software, Writing - Original Draft. Fan Qin: Validation, Writing - Review & Editing Jing Guo: Writing - Review & Editing, Formal analysis, Visualization. Shuai Yang: Resources, Validation, Data Curation.

Corresponding author

Correspondence to Kai Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Conflict of interest

The authors have no financial or proprietary interests in any material discussed in this article.

Additional information

Communicated by B. Bao.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ke, Y., Wang, Y., Wang, K. et al. Image aesthetics assessment using composite features from transformer and CNN. Multimedia Systems 29, 2483–2494 (2023). https://doi.org/10.1007/s00530-023-01141-7

Download citation

Received: 30 December 2022
Accepted: 17 July 2023
Published: 01 August 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00530-023-01141-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image aesthetics assessment using composite features from transformer and CNN

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Technological Development of Image Aesthetics Assessment

TSC-Net: Theme-Style-Color Guided Artistic Image Aesthetics Assessment Network

Image Aesthetic Quality Evaluation Using Convolution Neural Network Embedded Fine-Tune

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now