skip to main content
10.1145/3664647.3681511acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Align-IQA: Aligning Image Quality Assessment Models with Diverse Human Preferences via Customizable Guidance

Published: 28 October 2024 Publication History

Abstract

The alignment of Image Quality Assessment (IQA) models with diverse human preferences remains a challenge, owing to the variability in preferences for different types of visual content, including user-generated content and AI-Generated Content (AIGC), etc. Despite the significant success of existing IQA methods in assessing specific visual content by leveraging knowledge from pre-trained models, the intricate factors impacting final ratings and the specially designed network architecture of these methods result in gaps in their ability to accurately capture human preferences for novel visual content. To address this issue, we propose Align-IQA, a novel framework that aims to generate visual quality scores aligned with diverse human preferences for various types of visual content. Align-IQA contains two key designs: (1) A customizable quality-aware guidance injection module. By injecting specializable quality-aware prior knowledge into general-purpose pre-trained models, the proposed module guides the acquisition of quality-aware features and allows for various adjustments of features to be consistent with diverse human preferences for different types of visual content. (2) A multi-scale feature aggregation module. By simulating the multi-scale mechanism in the human visual system, the proposed module enables the extraction of a more comprehensive representation of quality-aware features from the human perception perspective. Extensive experimental results demonstrate that Align-IQA achieves better or comparable performance to State-Of-The-Art (SOTA) methods. Notably, Align-IQA outperforms the previous best results on AIGC datasets, achieving Pearson's Linear Correlation Coefficients (PLCCs) of 0.890 (+3.73%) on AGIQA-1K and 0.924 (+1.99%) on AGIQA-3K. Additionally, Align-IQA reduces training parameters by 72.26% and inference overhead by 78.12%, while maintaining SOTA performance.

References

[1]
Simone Bianco, Luigi Celona, Paolo Napoletano, and Raimondo Schettini. 2018. On the use of deep learning for blind image quality assessment. Signal, Image and Video Processing, Vol. 12 (2018), 355--362.
[2]
Sebastian Bosse, Dominique Maniry, Klaus-Robert Müller, Thomas Wiegand, and Wojciech Samek. 2017. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Transactions on image processing, Vol. 27, 1 (2017), 206--219.
[3]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119), Hal Daumé III and Aarti Singh (Eds.). PMLR, 1597--1607. https://proceedings.mlr.press/v119/chen20j.html
[4]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248--255. https://doi.org/10.1109/CVPR.2009.5206848
[5]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arxiv: 2010.11929 [cs.CV] https://arxiv.org/abs/2010.11929
[6]
Deepti Ghadiyaram and Alan C Bovik. 2015. Massive online crowdsourced study of subjective and objective picture quality. IEEE Transactions on Image Processing, Vol. 25, 1 (2015), 372--387.
[7]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729--9738.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[9]
Vlad Hosu, Hanhe Lin, Tamas Sziranyi, and Dietmar Saupe. 2020. KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Transactions on Image Processing, Vol. 29 (2020), 4041--4056.
[10]
Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, and Ser-Nam Lim. 2022. Visual prompt tuning. In European Conference on Computer Vision. Springer, 709--727.
[11]
Le Kang, Peng Ye, Yi Li, and David Doermann. 2014. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1733--1740.
[12]
Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. 2021. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF international conference on computer vision. 5148--5157.
[13]
Jongyoo Kim and Sanghoon Lee. 2016. Fully deep blind image quality predictor. IEEE Journal of selected topics in signal processing, Vol. 11, 1 (2016), 206--220.
[14]
Shanshan Lao, Yuan Gong, Shuwei Shi, Sidi Yang, Tianhe Wu, Jiahao Wang, Weihao Xia, and Yujiu Yang. 2022. Attentions help cnns see better: Attention-based hybrid image quality assessment network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1140--1149.
[15]
Eric C Larson and Damon M Chandler. 2010. Most apparent distortion: full-reference image quality assessment and the role of strategy. Journal of electronic imaging, Vol. 19, 1 (2010), 011006--011006.
[16]
Chunyi Li, Zicheng Zhang, Haoning Wu, Wei Sun, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, and Weisi Lin. 2023. AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment. IEEE Transactions on Circuits and Systems for Video Technology (2023), 1--1. https://doi.org/10.1109/TCSVT.2023.3319020
[17]
Yanghao Li, Hanzi Mao, Ross Girshick, and Kaiming He. 2022. Exploring plain vision transformer backbones for object detection. In European Conference on Computer Vision. Springer, 280--296.
[18]
Hanhe Lin, Vlad Hosu, and Dietmar Saupe. 2019. KADID-10k: A large-scale artificially distorted IQA database. In 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 1--3.
[19]
Manni Liu, Jiabin Huang, Delu Zeng, Xinghao Ding, and John Paisley. 2023. A Multiscale Approach to Deep Blind Image Quality Assessment. IEEE Transactions on Image Processing, Vol. 32 (2023), 1656--1667.
[20]
Qiong Liu, You Yang, Xu Wang, and Liujuan Cao. 2013. Quality assessment on user generated image for mobile search application. In Advances in Multimedia Modeling: 19th International Conference, MMM 2013, Huangshan, China, January 7--9, 2013, Proceedings, Part II. Springer, 1--11.
[21]
Kede Ma, Wentao Liu, Kai Zhang, Zhengfang Duanmu, Zhou Wang, and Wangmeng Zuo. 2017. End-to-end blind image quality assessment using deep neural networks. IEEE Transactions on Image Processing, Vol. 27, 3 (2017), 1202--1213.
[22]
Pavan C Madhusudana, Neil Birkbeck, Yilin Wang, Balu Adsumilli, and Alan C Bovik. 2022. Image quality assessment using contrastive learning. IEEE Transactions on Image Processing, Vol. 31 (2022), 4149--4161.
[23]
Anish Mittal, Anush Krishna Moorthy, and Alan Conrad Bovik. 2012. No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing, Vol. 21, 12 (2012), 4695--4708.
[24]
Anush Krishna Moorthy and Alan Conrad Bovik. 2011. Blind image quality assessment: From natural scene statistics to perceptual quality. IEEE transactions on Image Processing, Vol. 20, 12 (2011), 3350--3364.
[25]
Soo-Chang Pei and Li-Heng Chen. 2015. Image quality assessment using human visual DOG model fused with random forest. IEEE Transactions on Image Processing, Vol. 24, 11 (2015), 3282--3292.
[26]
Nikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazarian, Jaakko Astola, Benoit Vozel, Kacem Chehdi, Marco Carli, Federica Battisti, et al. 2015. Image database TID2013: Peculiarities, results and perspectives. Signal processing: Image communication, Vol. 30 (2015), 57--77.
[27]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.
[28]
Daniel L Ruderman. 1994. The statistics of natural images. Network: computation in neural systems, Vol. 5, 4 (1994), 517.
[29]
Avinab Saha, Sandeep Mishra, and Alan C Bovik. 2023. Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5846--5855.
[30]
Hamid R Sheikh, Muhammad F Sabir, and Alan C Bovik. 2006. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Transactions on image processing, Vol. 15, 11 (2006), 3440--3451.
[31]
Shaolin Su, Qingsen Yan, Yu Zhu, Cheng Zhang, Xin Ge, Jinqiu Sun, and Yanning Zhang. 2020. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3667--3676.
[32]
Hossein Talebi and Peyman Milanfar. 2018. NIMA: Neural image assessment. IEEE transactions on image processing, Vol. 27, 8 (2018), 3998--4011.
[33]
Alexander Toet. 2011. Computational versus psychophysical bottom-up image saliency: A comparative evaluation study. IEEE transactions on pattern analysis and machine intelligence, Vol. 33, 11 (2011), 2131--2146.
[34]
Guangcheng Wang, Zhongyuan Wang, Ke Gu, Leida Li, Zhifang Xia, and Lifang Wu. 2019. Blind quality metric of DIBR-synthesized images in the discrete wavelet transform domain. IEEE Transactions on Image Processing, Vol. 29 (2019), 1802--1814.
[35]
Haoning Wu, Zicheng Zhang, Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, Qiong Yan, Xiongkuo Min, Guangtao Zhai, and Weisi Lin. 2023. Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels. arxiv: 2312.17090 [cs.CV] https://arxiv.org/abs/2312.17090
[36]
Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. 2023. ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation. In Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 15903--15935. https://proceedings.neurips.cc/paper_files/paper/2023/file/33646ef0ed554145eab65f6250fab0c9-Paper-Conference.pdf
[37]
Jingtao Xu, Peng Ye, Qiaohong Li, Haiqing Du, Yong Liu, and David Doermann. 2016. Blind image quality assessment based on high order statistics aggregation. IEEE Transactions on Image Processing, Vol. 25, 9 (2016), 4444--4457.
[38]
Bo Yan, Bahetiyaer Bare, and Weimin Tan. 2019. Naturalness-aware deep no-reference image quality assessment. IEEE Transactions on Multimedia, Vol. 21, 10 (2019), 2603--2615.
[39]
Junfeng Yang, Jing Fu, Wei Zhang, Wenzhi Cao, Limei Liu, and Han Peng. 2024. Moe-agiqa: Mixture-of-experts boosted visual perception-driven and semantic-aware quality assessment for ai-generated images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6395--6404.
[40]
Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. 2022. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1191--1200.
[41]
Peng Ye, Jayant Kumar, Le Kang, and David Doermann. 2012. Unsupervised feature learning framework for no-reference image quality assessment. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 1098--1105.
[42]
Zhenqiang Ying, Haoran Niu, Praful Gupta, Dhruv Mahajan, Deepti Ghadiyaram, and Alan Bovik. 2020. From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3575--3585.
[43]
Jiquan Yuan, Xinyan Cao, Linjing Cao, Jinlong Lin, and Xixin Cao. 2023. PSCR: Patches Sampling-based Contrastive Regression for AIGC Image Quality Assessment. arxiv: 2312.05897 [cs.CV] https://arxiv.org/abs/2312.05897
[44]
Jiquan Yuan, Xinyan Cao, Jinming Che, Qinyuan Wang, Sen Liang, Wei Ren, Jinlong Lin, and Xixin Cao. 2024. TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment. arxiv: 2401.03854 [cs.CV] https://arxiv.org/abs/2401.03854
[45]
Weixia Zhang, Kede Ma, Guangtao Zhai, and Xiaokang Yang. 2021. Uncertainty-aware blind image quality assessment in the laboratory and wild. IEEE Transactions on Image Processing, Vol. 30 (2021), 3474--3486.
[46]
Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. 2023. Blind image quality assessment via vision-language correspondence: A multitask learning perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14071--14081.
[47]
Zicheng Zhang, Chunyi Li, Wei Sun, Xiaohong Liu, Xiongkuo Min, and Guangtao Zhai. 2023. A perceptual quality assessment exploration for aigc images. In 2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW). IEEE, 440--445.
[48]
Kai Zhao, Kun Yuan, Ming Sun, Mading Li, and Xing Wen. 2023. Quality-Aware Pre-Trained Models for Blind Image Quality Assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 22302--22313.
[49]
Wenhan Zhu, Guangtao Zhai, Xiongkuo Min, Menghan Hu, Jing Liu, Guodong Guo, and Xiaokang Yang. 2019. Multi-channel decomposition in tandem with free-energy principle for reduced-reference image quality assessment. IEEE Transactions on Multimedia, Vol. 21, 9 (2019), 2334--2346.

Cited By

View all
  • (2025)No-reference image quality assessment based on information entropy vision transformerThe Imaging Science Journal10.1080/13682199.2025.2456431(1-15)Online publication date: 31-Jan-2025

Index Terms

  1. Align-IQA: Aligning Image Quality Assessment Models with Diverse Human Preferences via Customizable Guidance

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '24: Proceedings of the 32nd ACM International Conference on Multimedia
    October 2024
    11719 pages
    ISBN:9798400706868
    DOI:10.1145/3664647
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ai-generated content
    2. customizable guidance
    3. human preference
    4. image quality assessment

    Qualifiers

    • Research-article

    Funding Sources

    • National Key R&D Program of China
    • Research Foundation of Education Bureau of Hunan Province
    • Major Project of Xiangjiang Laboratory
    • Natural Science Foundation of Hunan Province
    • NSFC

    Conference

    MM '24
    Sponsor:
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne VIC, Australia

    Acceptance Rates

    MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)190
    • Downloads (Last 6 weeks)77
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)No-reference image quality assessment based on information entropy vision transformerThe Imaging Science Journal10.1080/13682199.2025.2456431(1-15)Online publication date: 31-Jan-2025

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media