Blind image quality assessment with channel attention based deep residual network and extended LargeVis dimensionality reduction

https://doi.org/10.1016/j.jvcir.2021.103296Get rights and content

Abstract

Image Quality Assessment (IQA) is one of the fundamental problems in the fields of image processing, image/video coding and transmission, and so on. In this paper, a Blind Image Quality Assessment (BIQA) approach with channel attention based deep Residual Network (ResNet)and extended LargeVis dimensionality reduction is proposed. Firstly, ResNet50 with channel attention mechanism is used as the backbone network to extract the deep features from the image. In order to reduce the dimensionality of the deep features, LargeVis, which is originally designed for the visualization of large scale high-dimensional data, is extended by using Support Vector Regression (SVR) to perform on a single feature vector data. The extended LargeVis can remove the redundant information of the deep features so as to obtain a low-dimensional and discriminative feature representation. Finally, the quality prediction model is established by using SVR as the fitting method. The low-dimensional feature representation and quality score of the image form the pair-wise data samples to train the fitting model. Experimental results on authentic distortions datasets and synthetic distortions datasets show that our proposed method can achieve superior performance compared with the state-of-the-art methods.

Introduction

Image Quality Assessment (IQA) is one of the fundamental problems in the image processing field. In recent years, many achievements have been made [1], [2], [3], in which No-Reference (NR) or Blind IQA (BIQA) has become the hotspots since the reference information is often unavailable (or may not exist) in many practical applications. IQA has been developed from specific distortion to non-specific distortion, and from single distortion to hybrid distortions. IQA for specific distortion is designed for certain distortion types such as color, blur, noise, and JPEG compression effect [4], [5], thereby, they need to know the distortion type of the image in advance. In contrast, IQA for non-specific distortion does not need to distinguish the type and the cause of the distortion [6], [7], [8], [9], which enable it to have wider application than IQA for specific distortion.

Since 2012, deep learning has made tremendous breakthroughs in image classification [10], natural language processing [11], speech recognition [12] and other fields. The most representative algorithm for deep learning is Convolutional Neural Network (CNN), which can learn the implicit relationships from big data directly to obtain a hierarchical feature representation from low-level visual features to high-level semantic features by simulating the multi-layered cognitive mechanism of the human brain. Compared with the traditional handcrafted features, deep features have a very prominent advantage in extracting multi-level features and context information from the image, and has stronger representative and discriminative capability [13].

Recently, the researchers have applied deep learning to BIQA, and proposed many models [1], [2]. Deep learning based BIQA has become the mainstream work. The existing models can be divided into two categories. The first is to use an end-to-end training manner to obtain the mapping model between the image and the quality score directly [14], [15]. And the second is to adopt a framework of “feature extraction + regression ” [16], [17], that is, extract the deep features of the image first, and then use a regression method to establish a mapping model between the deep features and the quality score. The key of both categories is to use large scale pair-wise data to train the deep CNN networks for extracting the deep features of the image.

In this paper, a BIQA method has been proposed under the framework of “feature extraction + regression”. First, channel attention based ResNet50 [10] is used as the backbone network to extract the deep features of the image [18]. By assigning different weights to the feature channels, the representative capability of the deep features can be improved significantly. In order to reduce the dimensionality of deep features, the LargeVis [19] dimensionality reduction method, which was originally designed for the visualization of big data, is extended to enable it to perform on a single feature vector data, Next, the dimensionality reduced features and quality score of the image form the pair-wise data samples, which are used to train SVR to establish a mapping model between the low-dimensional feature representation and quality score. Extensive experiments demonstrate that the proposed method can achieve superior performance on both synthetic and authentic distortions datasets compared with the state-of-the-art methods.

Section snippets

Related work

BIQA methods can be roughly divided into traditional methods and deep learning-based methods. Traditional methods generally adopt a typical framework of “feature extraction + modeling”, that is, extract the features of the image first, and then use a mathematical method to establish a mapping model between the features and the quality score of the image [20]. One of the most representative work is Distortion Identification-based Image Verity and INtegrity Evaluation(DIIVINE) [21], proposed by

Proposed BIQA method

Based on the framework of “feature extraction + regression”, a BIQA method is proposed in this paper, as shown in Fig. 1. At first, channel attention mechanism is integrated into ResNet50 [10] to enhance the representative capability of deep features by assigning different weights to the feature channels. The constructed network is denoted as CA-ResNet50, which is used as the backbone network to extract the features of the image. The fully connected layer features of CA-ResNet50 are extracted,

Experimental results and analysis

In order to verify the effectiveness of the proposed BIQA method in this paper, extensive experiments are conducted on authentic distortions datasets and synthetic distortions datasets respectively. And it is also compared with state-of-the-art BIQA methods.

Conclusions

Based on a framework of “feature extraction + regression”, we proposed a BIQA method based on the channel attention mechanism and extended LargeVis dimensionality reduction. Channel attention mechanism is embedded into ResNet50 to extract the deep features with powerful representative capability. Then, the extended LargeVis is used to reduce the dimensionality of the deep features. The quality assessment model is finally established by training SVR with the dimensionality reduced

Funding

This work is supported by the National Natural Science Foundation of China (Grant Number 61871006, 61971016), and Beijing Municipal Education Commission Cooperation Beijing Natural Science Foundation (No. KZ201810005002, No. KZ201910005007).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (52)

  • L. Liu et al.

    No-reference image quality assessment in curvelet domain

    Signal Process. Image Commun.

    (2014)
  • J. Hu et al.

    Squeeze-and-excitation networks

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2019)
  • L. Kang, P. Ye, Y. Li, D. Doermann, Convolutional Neural Networks for No-Reference Image Quality Assessment,...
  • L. Kang et al.

    Simultaneous estimation of image quality and distortion via multi-task convolutional neural networks

  • Y. Liang et al.

    Image quality assessment using similar scene as reference

    European Conference on Computer Vision

    (2016)
  • Q. Yan et al.

    No-reference image blur assessment based on gradient profile sharpness

  • S.A. Golestaneh et al.

    No-reference quality assessment of JPEG images via a quality relevance map

    IEEE Signal Process Lett.

    (2014)
  • A. Mittal et al.

    No-reference image quality assessment in the spatial domain

    IEEE Trans. Image Process.

    (2012)
  • A.K. Moorthy et al.

    A two-step framework for constructing blind image quality indices

    IEEE Signal Process Lett.

    (2010)
  • M.A. Saad et al.

    Blind image quality assessment: A natural scene statistics approach in the DCT domain

    IEEE Trans. Image Process.

    (2012)
  • K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition, arXiv e-prints, 2015, pp....
  • S. Ji, S.V.N. Vishwanathan, N. Satish, M.J. Anderson, P. Dubey, BlackOut: Speeding up Recurrent Neural Network Language...
  • Haşim Sak et al.

    Fast and accurate recurrent neural network acoustic models for speech recognition

    Comput. Sci.

    (2015)
  • K. Simonyan et al.

    Very deep convolutional networks for large-scale image recognition

  • K. Ma et al.

    End-to-end blind image quality assessment using deep neural networks

    IEEE Trans. Image Process.

    (2018)
  • S. Bosse et al.

    Deep neural networks for no-reference and full-reference image quality assessment

    IEEE Trans. Image Process.

    (2018)
  • S. Bianco et al.

    On the use of deep learning for blind image quality assessment

    Signal, Image Video Process.

    (2018)
  • D. Varga et al.

    DeepRN: A content preserving deep architecture for blind image quality assessment

  • J. Tang et al.

    Visualizing large-scale and high-dimensional data

  • A.C. Bovik

    Automatic prediction of perceptual image and video quality

    Proc. IEEE

    (2013)
  • A.K. Moorthy et al.

    Blind image quality assessment: from natural scene statistics to perceptual quality

    IEEE Trans. Image Process.

    (2011)
  • X. Gao et al.

    Universal blind image quality assessment metrics via natural scene statistics and multiple kernel learning

    IEEE Trans. Neural Networks Learn. Syst.

    (2013)
  • Q. Wu et al.

    Blind image quality assessment based on multichannel feature fusion and label transfer

    IEEE Trans. Circuits Syst. Video Technol.

    (2016)
  • S. Wang et al.

    NMF-based image quality assessment using extreme learning machine

    IEEE Trans. Cybern.

    (2017)
  • D. Ghadiyaram et al.

    Perceptual quality prediction on authentically distorted images using a bag of features approach

    J. Vision

    (2017)
  • W. Zhang et al.

    Blind image quality assessment using a deep bilinear convolutional neural network

    IEEE Transactions on Circuits and Systems for Video Technology

    (2018)
  • Cited by (4)

    This paper has been recommended for acceptance by Zicheng Liu.

    View full text