Elsevier

Displays

Volume 69, September 2021, 102058
Displays

No-reference stereoscopic image quality assessment using quaternion wavelet transform and heterogeneous ensemble learning

https://doi.org/10.1016/j.displa.2021.102058Get rights and content

Highlights

  • Number of ‘quality-aware’ features are discovered in QWT domain, including entropy of chromaticity map, texture features, energy, energy differences and MSCN coefficients of high frequency sub-band from monocular and binocular views.

  • We combine monocular and binocular images to assess image quality.

  • A heterogeneous ensemble learning model via SVR&ELM&RF is proposed for 3D quality assessment, and bootstrap sampling and rotated feature space are used to increase the diversity of data distribution.

Abstract

As the demand for high-quality stereo images has grown in recent years, stereoscopic image quality assessment (SIQA) has become an important research area in modern image processing technology.

In this paper, we propose a no-reference stereoscopic image quality assessment (NR-SIQA) model using heterogeneous ensemble learning ‘quality-aware’ features from luminance image, chrominance image, disparity and cyclopean images via quaternion wavelet transform (QWT). Firstly, luminance image and chrominance image are generated by CIELAB color space as monocular perception, and the novel disparity and cyclopean images are utilized to complement with monocular information. Then, a number of ‘quality-aware’ features in the quaternion wavelet domain are discovered, including entropy, texture features, energy features, energy differences features and MSCN coefficients of high frequency sub-band. Finally, a heterogeneous ensemble model via support vector regression (SVR) & extreme learning machine (ELM) & random forest (RF) is proposed to predict quality score, and bootstrap sampling and rotated feature space are used to increase the diversity of data distribution. Comparing with the state-of-the-art NR-SIQA models, experimental results on four public databases prove the accuracy and robustness of the proposed model.

Introduction

With the rapid development of science and technology, 3D technology has been widely used in entertainment, medical treatment and smart home, etc. [1]. However, during the different processes of 3D image or video including acquisition, compression, transmission and storage, different types or degrees of distortion will inevitably occur. Quantitatively evaluating the quality of stereo images has become an indispensable research direction in 3D field. Like 2D image quality assessment (IQA), stereoscopic image quality assessment (SIQA) method can be divided into subjective and objective assessment methods. Subjective SIQA is intuitive, but time-consuming and costly. So, researchers have paid more attention to objective SIQA method, which is convenient and can automatically predict the image quality score. According to the amount of original image information which are taken as the reference, SIQA can be divided into three categories: full-reference (FR), reduced-reference (RR) and no-reference (NR)/blind SIQA methods.

As for FR-SIQA issue, in the early stage, the scores of left and right views obtained by 2D-IQA methods are averaged to acquire the final quality score [2]. Due to the binocular visual characteristics, the task of SIQA is more complicated than 2D counterpart. Then, binocular disparity existed between left and right eyes that can reflect the depth information is integrated into SIQA model. You et al. [3] added disparity map to the process of image quality assessment, and the results show that the proper combination of disparity information and original images can get better results. Benoit et al. [4] directly fused 2D quality metrics and depth information to evaluate 3D image quality score. According to the recent research, there also exists binocular rival, binocular suppression, and binocular fusion in the primary visual cortex (V1), which makes the above models still have a big room for improvement. Chen et al. [5] introduced an intermediate image called cyclopean image based on binocular rival mechanism into the FR-SIQA task. Shao et al. [6] applied dictionary learning to learn the features of left and right views to assess the quality of 3D image. Liu et al. [7] consider binocular interaction and depth perception at the same time to evaluate the quality of FR-SIQA. To alleviate the dependence of entire reference information, RR-SIQA models based on part of reference information have also been studied [8], [9], [10]. Considering the difficulty of obtaining the information of the original image, the NR-SIQA model becomes necessary.

The essential idea of NR-SIQA is to capture the effective features that can reflect the quality deterioration, so most of methods adopted handcrafted quality-aware features to quantify distorted images. For instance, Liu et al. [11] extracted univariate and bivariate statistical features from two monocular views, the synthesized cyclopean image and the binocular product image for quality prediction. In [12], a series of perceptual features in spatial and frequency domain were extracted from summation, difference and cyclopean images, and then transformed them into deep features by stacked auto-encoder (SAE) to conduct quality regression. Chen et al. [13] extracted the content relevant and depth perception features to complement each other. Liu et al. [14] exploited local monocular super-pixel spatial entropy and natural scene statistics (NSS) features to represent high-level semantic perception mechanisms and image naturalness respectively.

To tackle the problem of asymmetric distortion existed in stereo image pairs, Li et al. [15] proposed a new color cyclopean image, which considers the characteristics of binocular fusion, rivalry and suppression. Moreover, several NSS features were extracted from it. Yang et al. [16] extracted gradient magnitude and gradient orientation map from the color components and difference map of distortion stereopairs.

Furthermore, the sparse representation method was implemented to extract quality-aware features from above feature maps. Besides above-mentioned perceptual features in spatial domain, several researches focused on feature analysis in frequency domain. Li et al. [17] proposed a completely blind image quality assessment method based on contourlet energy, which indicated that the quality-aware features extracted in frequency domain highly correlated with human visual perception. Besides, the authors in [18] utilized joint wavelet decomposition to extract texture features from left and right views as well as depth features of disparity image to predict image quality scores.

With the advancement of deep learning, researchers concentrated on using deep neural networks (DNNs) to learn high-level semantic features to improve the predicted accuracy of model. For example, Shao et al. [19] trained two separate 2D DNN from monocular and cyclopean views to evaluate the quality of stereoscopic image. The authors of [20] proposed a NR-SIQA model based on multi-level feature fusion DNN, in which different-level feature maps of left and right views are deeply fused to obtain abstract quality-aware features. In [21], an end-to-end deep fusion network (DFNet) is proposed to extract high-level features, which is trained under a unified framework. However, the limited amount of training images in current SIQA databases hindered the generalization of the deep learning-based SIQA methods.

In addition to the extracting effective perceptual features, another key step of NR-SIQA task is to establish a regression model that can map image features into quality scores. In the previous literatures, researchers mainly used a single learner for quality prediction. Yang et al. [22] utilized SVR to learn the features that extracted from sum and difference maps, cyclopean and color maps. Despite the certain performance with the use of SVR, using a single learner is easy to fall into the risk of local optimization.

To overcome this difficulty, ensemble learning is gradually acquired attention, which can often achieve significantly generalization performance than a single learner through combining multiple learners [23]. In the early stage, researchers tended to simply use Decision Tree (DT), RF and SVR as base learners. In [24], the authors use RF to map the features of image histogram shape information into quality scores. Ma et al. [25] proposed a novel ensemble learning model AdaBoost-RF to assess the quality of image. In [26], the final quality score of stereoscopic images is obtained by using SVR as base learner of ensemble learning method. Subsequently, the AdaBoost neural network model is proposed to learn the extracted features [27]. Liu et al. [28] trained ensemble BP neural network to learn image features, which successfully improved the accuracy of image quality assessment. The authors of [29] developed a weighted ensemble learning network to learn the energy distribution from left and right views, which performs well in predicting the quality of symmetrical distorted stereoscopic images.

Notwithstanding active performance is gained via the above ensemble models, they are homogeneous that limits its predicted ability to a large extent. Previous studies [30] have proved the superiority of heterogeneous ensemble, which can boost up the predicted ability by complementing the advantages and disadvantages of different models. For example, in [31], Zhang et al. exploited SVR and K-Nearest Neighbor (KNN) as the base regressors of stacking model to predict super-resolution image quality.

Inspired by the above works, we propose a novel NR-SIQA model by using quaternion wavelet transform (QWT) and heterogeneous ensemble learning. Given that the wavelet transform conforms to the characteristics of human visual cortex, QWT is used to extract a series of perceptual features from monocular views and binocular views simultaneously. Furthermore, the features of left and right chromaticity maps are taken into accounted, which have been ignored in most of literatures. Finally, a heterogeneous ensemble learning model is applied to predict quality score of stereo images. In brief, our contributions are as follows.

  • (1)

    Number of ‘quality-aware’ features are discovered in QWT domain, including entropy of chromaticity map, texture features of the third phase of high frequency sub-band, energy of high frequency sub-band, energy differences of high frequency sub-band and MSCN coefficients of high frequency sub-band from monocular and binocular views.

  • (2)

    We combine monocular and binocular images to assess image quality. For the monocular view, the luminance and chroma map of CIELAB color space corresponding to stereopairs are used to capture the visual distortion, for the binocular view, a new cyclopean image is constructed via using binocular visual characteristics and visual saliency, and the disparity map is generated by Global Error Energy Minimization to express depth perception.

  • (3)

    A heterogeneous ensemble learning model via SVR&ELM&RF is proposed for 3D image quality assessment, and bootstrap sampling and rotated feature space are used to increase the diversity of data distribution.

The rest of this paper is organized as follows. Section 2 presents the proposed no-reference SIQA method. Monocular and binocular images are introduced at Section 3. Section 4 elaborates quality-aware features discovery in the quaternion wavelet domain. Proposed heterogeneous ensemble learning for no-reference SIQA is described at Section 5. At Section 6, the experimental results are shown, analyzed and discussed. A conclusion is made at Section 7.

Section snippets

Proposed no-reference SIQA method

Fig. 1 shows the flow chart of our proposed SIQA method. Firstly, two monocular views are converted to luminance images and chroma images that generated from CIELAB color space, and cyclopean image and disparity image are generated based on binocular visual characteristics and depth perception characteristics respectively. Subsequently, Quality-aware features are extracted on the chroma image, luminance image, cyclopean image, and disparity image via QWT. Finally, these features are fed to

Chroma map of CIELAB color space

CIELAB color space is uniform, which makes up for the defects of uneven color distribution in RGB, CMYK and other color spaces. The CIELAB color space is made up of one brightness and two chromaticity channels. Among, channel L is used to express the brightness of pixels, channel a and b represent the range from red to green and yellow to blue, respectively. To transform RGB color space into CIELAB space, it is necessary to transform original RGB into CIEXYZ color space first, and then from

Quality-aware features in the quaternion wavelet domain

QWT contains four standard orthogonal basis sets and forms a compact frame with 4× redundancy with horizontal, vertical and diagonal sub-band [40]. The component of each scale QWT is composed of the following matrix:G=ψhcψhdψhcϕhdϕhcψhdψgcψhdψhcψgdψgcψgdψgcϕhdψhcϕgdψgcϕgdϕgcψhdϕhcψgdϕgcψgd#where ψhc and ϕh(c) represent wavelet function and scale function respectively. Each column of matrix Gcontains 4 components of quaternion wavelet corresponding to three sub-bands of the QWT, which

Proposed heterogeneous ensemble learning for no-reference SIQA

Ensemble learning can improve the predicted ability of the model by combining multiple individual learners, and get higher accuracy and reliable estimation than a single model. Heterogeneous ensemble learning combines many different kinds of algorithms to generate its individual learners, and they complement each other to form a more extensive regression model when dealing with different types of features. For example, there are three types of individual learners which are independent each

SIQA databases

In this paper, four publicly available 3D IQA databases are utilized to verify the performance of the proposed method: LIVE 3D Phase I [51], LIVE 3D Phase II [52], IVC 3D Phase I [4] and IVC 3D Phase II [4]. These databases all contain reference images and various types of distorted images with their corresponding subjective scores.

  • (1)

    LIVE 3D Phase I: It consists of 365 distorted stereopairs with a co-registered difference mean opinion score (DMOS), and is created from 20 original stereopairs.

Conclusion

In this paper, a QWT and heterogeneous ensemble learning based NR-SIQA model is proposed. In this model, the luminance and chroma images corresponding to original image are used to mimic the monocular perception, and disparity and cyclopean images based on depth perception and binocular visual mechanism are utilized to complement as monocular perception. Then, a series of quality-aware features in QWT are extracted, including entropy of chromaticity map, texture features of the third phase,

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (53)

  • P. Gorley et al.

    Stereoscopic image quality metrics and compression, Proceedings of SPIE - The International Society for

    Opt. Eng.

    (2008)
  • J. You et al.

    Perceptual quality assessment for stereoscopic images based on 2D image quality metrics and disparity analysis

  • A. Benoit et al.

    Quality assessment of stereoscopic images

    EURASIP J. Image Video Process.

    (2008)
  • F. Shao et al.

    Full-reference quality assessment of stereoscopic images by learning binocular receptive field properties

    IEEE Trans. Image Process.

    (2015)
  • Y. Liu et al.

    Toward a quality predictor for stereoscopic images via analysis of human binocular visual perception

    IEEE Access

    (2019)
  • J. Ma et al.

    Reduced-reference stereoscopic image quality assessment using natural scene statistics and structural degradation

    IEEE Access

    (2018)
  • F. Qi et al.

    Reduced reference stereoscopic image quality assessment based on binocular perceptual information

    IEEE Trans. Multimedia.

    (2015)
  • Z. Wan et al.

    Reduced reference stereoscopic image quality assessment using sparse representation and natural scene statistics

    IEEE Trans. Multimedia.

    (2020)
  • J. Yang et al.

    Predicting stereoscopic image quality via stacked auto-encoders based on stereopsis formation

    IEEE Trans. Multimedia.

    (2019)
  • Y. Chen et al.

    Blind stereo image quality assessment based on binocular visual characteristics and depth perception

    IEEE Access

    (2020)
  • S. Li et al.

    No-reference stereoscopic image quality assessment based on cyclopean image and enhanced image

    SIViP

    (2020)
  • J. Yang, P. An, J. Ma, K. Li, L. Shen, No-reference stereo image quality assessment by learning gradient...
  • Chaofeng Li et al.

    Completely blind image quality assessment via contourlet energy statistics

    IET Image Proc.

    (2021)
  • Feng Shao et al.

    Toward a blind deep quality evaluator for stereoscopic images based on monocular and binocular interactions

    IEEE Trans. Image Process.

    (2016)
  • J. Yan, Y. Fang, L. Huang, X. Min, Y. Yao, G. Zhai, Blind stereoscopic image quality assessment by deep neural network...
  • P. Zhao, S. Li, Y. Chang, No-reference stereoscopic image quality assessment based on dilation convolution, in: 2019...
  • Cited by (16)

    • Macroalgal blooms affect the food web of tropical coastal ecosystems impacted by fisheries

      2023, Marine Environmental Research
      Citation Excerpt :

      https://www.marinha.mil.br/]. Still on the boat, we collected water samples with a Van Dorn bottle for the evaluation of the physical-chemical variables of the water in the laboratory: The pH (pH) from a pH meter, salinity (Sal) by a refractometer, total nitrogen (TN) and total organic carbon (TOC) via combustion by Shimad TOC-V analyzer, dissolved oxygen (DO) by Winkler's method modified by sodium azide (APHAAmerican Public Health Association and APHAAWWA, 2012) and phosphorus concentration by ascorbic acid method (Valderrama 1981), followed by filtration (0.45 μm glass fiber) obtaining the soluble reactive phosphorus (SRP) concentration (Murphy and Riley 1962). We also collected, in each trawl, sediment samples with the aid of a Van Veen grab (0.025 m2) for particle size analysis and organic matter (OM) estimation.

    • Historic changes in nutrient fluxes from the Yangtze River to the sea: Recent response to catchment regulation and potential linkage to maritime red tides

      2023, Journal of Hydrology
      Citation Excerpt :

      The temporal changing processes of the fluxes were accompanied by fluctuations back and forth, demonstrating that in addition to agricultural inputs, the Yangtze River’s fluxes to the ocean were also highly influenced by other factors, such as industrial contributions, land-use changes, urbanization, and sewage discharges (Fig. 1c; Gao and Wang, 2008; Deng et al., 2021; Zhang et al., 2021). For example, both the sewage-induced TP and TN fluxes discharged from the Yangtze Basin had increased for decades before 2010 (Liu et al., 2018), while they stopped increasing after 2011 (Fig. S4), primarily attributed to the government-dominated enhanced N management in aquatic systems (Wang et al., 2021) and reduction in fertilizer use (Fig. 5b & c; Ma et al., 2020; Zhang et al., 2021) during the past decade. Even so, the sewage-induced TP and TN fluxes discharged from the Yangtze Basin still stayed at a relatively high level with no significant temporal trends (P > 0.05; Fig. S4), and respectively averaged 13.43 ± 3.23 × 104 t/yr and 122.69 ± 14.54 × 104 t/yr for the past decade, which might contribute much to the riverine nutrients to the sea.

    • LG-IQA: Integration of local and global features for no-reference image quality assessment

      2022, Displays
      Citation Excerpt :

      Zhang et al. [31] proposed a deep bilinear model for blind image quality assessment (BIQA) that works for both synthetically and authentically distorted images. Wang et al. [32] proposed a no-reference stereoscopic image quality assessment (NR-SIQA) model, which uses heterogeneous ensemble learning quality aware features from luminance image, chrominance image, disparity and cyclopean images via quaternion wavelet transform (QWT). Ke et al. [33] proposed a multi-scale image quality (MUSIQ) transformer for processing images with varying resolutions and aspect ratios.

    • Motion measurement and quality variation driven video quality assessment

      2022, Displays
      Citation Excerpt :

      Spatial information is the primary consideration in the video quality assessment. Its characteristics have been extensively explored in the field of image quality assessment (IQA) and achieve excellent performance [24–29]. Mittal et al. [24] extracted NSS features from images, and then trained a support vector regressor (SVR) to obtain the predicted quality.

    View all citing articles on Scopus

    This paper was recommended for publication by Prof G Guangtao Zhai.

    View full text