Blind quality assessment for screen content images by orientation selectivity mechanism
Introduction
With the rapid development of information technology, the amount of multimedia interactive devices has increased dramatically in recent years. Screen content images (SCIs), including pictorial and textual parts, are often used as the medium of information transmission (some samples are shown in Fig. 1). In the previous years, various SCI processing methods have been proposed for different SCI applications, such as SCI segmentation [1], [2], SCI compression [3], [4], SCI sampling [5], SCI coding [6], and visual quality assessment of SCIs [7]. During SCI processing, many visual distortions might be introduced, including Motion Blur, Noise Distortion, Contrast Change, etc. The visual quality degradation of SCIs has great influence on quality of experiences from viewers. Visual quality assessment of SCIs is important to optimize various SCI processing algorithms and monitor the performance of SCI processing systems. Thus, it is much desired to design effective image quality assessment (IQA) metrics for SCIs.
In the past decades, there have been many IQA methods proposed for images/video. Among these methods, peak signal to noise (PSNR) and mean square error (MSE) are well-known for its low-complexity and simple implementation. However, they cannot obtain promising quality prediction performance, since they do not consider the visual characteristics of the human visual systems (HVS). Apart from PSNR and MSE, many other IQA metrics have been designed by considering the properties of the HVS, such as structure similarity (SSIM) [8], information content weighted SSIM (IW-SSIM) [9], feature similarity (FSIM) [10], gradient similarity [11], gradient magnitude similarity deviation (GMSD) [12], visual information fidelity (VIF) [13], information fidelity criterion (IFC) [14], visual saliency-based index (VSI) [15], and internal generative mechanism (IGM) [16]. These methods require the complete reference information and they are regarded as full reference (FR) approaches. Some studies also introduce reduced-reference (RR) IQA methods [17], [18] which require part of reference information for quality prediction. Besides, there have been some no reference (NR) methods designed without any reference information for visual quality prediction [19], [20], [21], [22], [23], [24], [25], [26]. Among these NR IQA methods, natural scene statistic (NSS) features are used in [19], [20], [22], [24], [25]. In [23], Gu et al. designed a blind sharpness assessment metric in autoregressive parameter space. In [21], Xue et al. proposed a quality-aware clustering (QAC) method to learn a set of centroids at each quality level, which are further used as a codebook to infer the visual quality of each image patch in a given image. Min et al. built a pseudo structural similarity (PSS) model which calculates the similarity between pseudo structures of the distorted image and the most distorted image (MDI) [26].
These aforementioned methods are designed for IQA of natural images. As shown in the initial study on IQA of SCIs [27], [28], the properties of SCIs are different from those of natural images in naturalness. SCIs can be divided into textual and pictorial regions, and many words might be included in textual regions. Applying IQA methods designed for natural images to predict the visual quality of SCIs directly cannot obtain consistent prediction results as subjective scores. In addition, the sensitivity of the HVS to visual distortion of textual and pictorial regions are different. According to different perceptual properties of textual and pictorial regions, Yang et al. adopted two different manners to calculate the visual quality of textual and pictorial regions, and a weighted activity map is employed to combine the quality scores of textual and pictorial regions to predict the final visual quality of SCI [28]. Gu et al. proposed a structure-induced quality metric (SIQM) by weighting SSIM with a structural degradation model [29] and a saliency-guided quality measure of SCIs [30]. In [4], Wang et al. designed an IQA method for SCIs by incorporating viewing field adaption that the extent of the visual field used to extract useful information in pictorial regions is much larger than textual regions [31], [32]. Fang et al. developed a FR IQA metric for SCIs by uncertainty weighting with the consideration that the HVS is more sensitive to high-frequency information (such as edge information) than other smooth regions in SCIs [33], [34]. Ni et al. designed IQA models to evaluate the perceptual quality of SCIs based on gradient direction [35] and edge information [36]. Wang et al. proposed a RR IQA method from the perspective of SCI visual perception in which both primary visual information and unpredictable uncertainty are taken into account [37]. All these methods require the reference information which is always unavailable in practical applications.
In [38], Gu et al. established a blind quality evaluation engine of SCIs by the free energy based brain theory and structural degradation model. Meanwhile, a large-scale database without human ratings was applied to train the IQA model in order to avoid the over-fitting problem [38]. However, these distorted SCIs in the training database are labeled by an objective FR IQA method [29], and thus the IQA model may exist deviations with the actual model obtained by training with the labels obtained by subjective experiments. Shao et al. designed a NR IQA model for SCIs from the perspective of sparse representation in which both local and global properties are taken into account. However, these two methods cannot get high accuracy in predicting the visual quality of distorted SCIs. Thus, how to design an effective NR IQA method for SCIs is still challenging.
In this paper, we propose a novel blind IQA model based on orientation selectivity mechanism [39] with which the primary visual cortex performs visual information extraction for scene understanding. In the local receptive field, pixels with similar gradient directions are regarded as excitatory and vice versa, and the pattern calculated by local orientation will be degraded by the distortion [39]. The orientation information is also used for IQA in other existing studies [35], [40], [41], [42]. As shown in existing studies [8], [43], the HVS is sensitive to the structure information. There are also many existing studies of IQA focused on structure degradation. In [44], Gu et al. designed a structural similarity weighted SSIM metric by locally weighting SSIM map with local structure similarities computed by SSIM. In [45], Wang et al. proposed a patch-based IQA model for contrast changed images using an adaptive representation of local patch structure. Ding et al. introduced a directional anisotropic structure measurement to represent the dominant structures and the visual quality of images is predicted by measuring its degradations [46]. In this work, we employ the structure features to capture the distortion of SCIs which can be regarded as the complementary information of orientation features. We calculate the gradient magnitudes from adjacent pixels of the neighboring pixels and the center pixel along eight directions. Besides, another gradient map is computed by convolution operation as one part of structure feature. As shown in existing studies [47], the histograms of second order gradients are powerful in capturing the curvature related geometric properties of the neural landscape, there are also some studies applying local binary pattern (LBP) [48], [49] or high order gradients in IQA tasks [50], [51], [52]. There is much difference between the proposed method and the relevant methods. In [50], the authors calculated the image structure features only along two directions. In [51], the authors decomposed the image into four scales with a Laplacian of Gaussian filter and extracted features from four scales. In [52], the first order derivatives are extracted by local contrast normalization. Different from these existing studies, we calculate the first order derivatives along eight orientations and local region. The proposed method is designed by the consideration that there are many words contained in textual region as well as other pictorial regions with high variations, and they contain structure information with different orientations. Thus, we calculate the magnitudes from adjacent pixels of the neighboring pixels and the center pixel along eight directions. Besides, we employ the Laplacian filter to calculate the deviation of all pixels in the local region. We can get nine gradient maps which are denoted as the first order derivatives. These maps are used to calculate the second derivatives based on LBP [53], and the structure features are extracted from the second order derivatives in the form of histogram. Finally, support vector regression (SVR) is adopted as the mapping function from the extracted features to subjective scores. Experiments are conducted on the large-scale database to demonstrate the promising performance of the proposed method.
The remaining of this paper is organized as follows. In Section 2, we introduce the proposed method step by step. In Section 3, we provide the experimental results to demonstrate the advantages of the proposed method. The final section summarizes the paper.
Section snippets
Proposed method
Inspired by the orientation selectivity mechanism [39], we extract orientation features as an indictor of quality degradation of SCIs. Meanwhile, structure features are used as the complementary information to predict the visual quality of SCIs. As shown in Fig. 2, we first train the quality prediction model for SCIs and then use this model to predict visual quality of SCIs. In the training phase, we select the training samples from the database randomly and adopt SVR to train the model. In the
Database description
The comparison experiments are conducted on two SCI databases SIQAD [27], [28] and SCD [56]. In SIQAD, twenty reference SCIs are collected from webpages, slides, PDF files and digital magazines, with seven degradation types introduced to generate distorted images including Gaussian Noise (GN), Gaussian Blur, Motion Blur, Contrast Change, JPEG, JPEG 2000, and Layer Segmentation Based Coding, each of which includes seven levels. There are 980 distorted SCIs in total in this large-scale database;
Conclusion
In this paper, we propose a novel blind image quality assessment method for SCIs based on orientation selectivity mechanism with which the primary visual cortex performs visual information extraction for scene understanding. The proposed method extracts orientation information based on orientation selectivity mechanism and the structure feature is also extracted as another quality indictor of SCIs. After the features are extracted, SVR is adopted as the mapping function from the feature space
References (59)
- et al.
Overview of the emerging hevc screen content coding extension
IEEE Trans. Circuits Syst. Video Technol.
(2016) - et al.
Visual orientation selectivity based structure description
IEEE Trans. Image Process.
(2015) Distinctive image features from scale-invariant keypoints
Int. J. Comput. Vis.
(2004)- et al.
Structural similarity weighting for image quality assessment
in: IEEE International Conference on Multimedia and Expo Workshops
(2013) - et al.
BSD: blind image quality assessment based on structural degradation
Neurocomputing.
(2017) - et al.
Scale and orientation invariant text segmentation for born-digital compund images
IEEE Trans. Cybern.
(2015) - et al.
Screen content image segmentation using robust regression and sparse decomposition
IEEE J. Emerg. Sel. Top. Circuits Syst.
(2016) - et al.
Objective quality assessment and perceptual compression of screen content images
IEEE Comput. Graph. Appl.
(2016) - et al.
Joint chroma downsampling and upsampling for screen content image
IEEE Trans. Circuits Syst. Video Technol.
(2016) - et al.
Adaptive guided image filtering for screen content coding
in: IEEE International Conference on Image Processing
(2014)