Elsevier

Digital Signal Processing

Volume 91, August 2019, Pages 66-76
Digital Signal Processing

Prediction of mobile image saliency and quality under cloud computing environment

https://doi.org/10.1016/j.dsp.2018.12.006Get rights and content

Abstract

Recent years have witnessed the explosive growth of multimedia applications over networks and increasingly high requirements of consumers for multimedia signals in terms of quality of experience (QoE). Effective and efficient yet energy-saving saliency detection model and quality prediction method are eagerly desired, since they play critical roles in raising users' QoE and promoting the progress of green multimedia communication. Current studies of saliency detection and quality evaluation are far from ideal yet. In this paper we investigate the influence of complexity on visual saliency and quality. Complexity is an essential concept in human perception to visual stimulus, but it is substantially abstract and hard to be endowed with a clear definition. We suppose that brain systematically combines global and local features during the whole process of human perception. Global features lead a dominant position in seeking salient areas under the condition that image complexity is high, namely without obviously isolated foreground objects, whereas local features play a key role in an opposite situation. With this consideration, this paper establishes a novel framework for detecting visual saliency based on image complexity estimation before complexity-adaptive merging of global and local features. Furthermore, the concept of complexity is deployed for blind photographic image quality assessment (IQA) by means of saliency-based weighting. Features which refer to contrast, artifacts, brightness and natural scene statistics (NSS) are modified and integrated to derive a blind IQA model and predict the quality of photos. Based on the above two technologies, this paper introduces smart phones as mobile terminals, cloud platforms for speed-up and energy-saving, and wireless networks for transmission, and provides a practical mobile multimedia application. Comparative experiments validate that, within this application system, our proposed saliency detection model and blind photographic IQA method implement better than existing relevant competitors in terms of effectiveness and efficiency comparison.

Introduction

To meet the continually increasing human requirements for visual enjoyment, multimedia communication (particularly green multimedia communication) over the networks have been suffering the rapid and constant growth. Generally speaking, it is believed that multimedia technology development aims to generate much enjoyment and flexibility at little cost. Recently, the popularity of cloud computing provides a new way for largely promoting the growth of multimedia applications, especially mobile multimedia communications. Via ubiquitous wireless networks (e.g. WiFi or 3G), some common mobile terminals (e.g. smart phones, Pads and CarPCs) can be simply and smoothly connected to the cloud platform and resort to helps from it. Based on the statement above, we are able to build an application framework between mobile terminals and cloud platforms, for facilitating the studies of mobile image processing over networks, as shown in Fig. 1. By the network connections, this framework is possessed of the advantages of mobile terminals and cloud platforms, i.e. the formers' convenience and the latters' real-time computational and processing ability. This framework has a broad range of applications for mobile multimedia. Visual saliency detection and image quality assessment (IQA) are two main multimedia technologies leading to a high level of influence on our daily lives [1], [2]. In this study our attention is focused on solving the problem of image saliency detection and quality evaluation based on the wireless network and cloud computing environment.

In order to remove redundant information, visual saliency detection has a big merit and application value in the process of human perceptions, e.g. in IQA, the artifacts appeared in salient areas are easier to arouse human attention and accordingly more seriously decrease the perceptual quality of an image than non-salient parts. The last three decades have seen the trends of thousands of saliency detection methods developed and this number tends to be constantly increasing in the future. According to different attentional mechanisms, current approaches can be roughly separated into two teams. One team refers to top-down task-dependent methods that require prior knowledge about visual content and the other team refers to bottom-up stimulus-driven methods that only rely on the self information of the input visual stimulus. The second one has acquired a broad and deep development recently owing to its greater utility value and wider application scenarios.

In this work, we concentrate on the design of bottom-up models. Numerous algorithms in this type were proposed to look for particular areas through making local saliency maximized on the basis of biologically inspired regional features [3], [4], [5], [6], [7], [8], [9], [10]. These features are primarily explored according to important findings about neural feedbacks which happened in V1 cortex and lateral geniculate nucleus. Edge, luminance intensity, color, orientation, and texture are typical features. In [3], Itti et al. provided a unified architecture for detecting visual saliency. Given an image, this framework first downsamples it into pyramid with Gaussian filter. Each level in this pyramid are then decomposed into three channels of intensity, orientation and color. Eventually, salient areas are derived by pooling the normalized maps in each of intensity, orientation and color channels across all the scales.

Several other algorithms are dependent on global features [11], [12], [13], [14], [15], in which we use transform domains to predict salient parts from an image signal. Based on global views, this type of models are good at precisely and promptly detecting visual “pop-outs” by locating potential salient targets. For example, in [11], the classical spectral residual (SR) model deployed the remaining Fourier amplitude spectrum to construct a saliency may by assuming more information exists in high frequency bands compared with that in low frequency bands. In contrast, in [13], image signal (IS) model devised recently only preserves the sign of every DCT ingredient to be as the saliency map regardless of any information in amplitude. This IS technique merely takes one bit for each component and is thus very compact.

However, there are still some limitations in distinguishing saliency areas through only utilizing local or global features. Therefore, many latest research works have turned to considering the above-mentioned two types of features [16], [17], [18]. The majority of these methods attain significant performance improvement owing to the complementary strategies.

Despite many approaches explored for detecting saliency during the past few decades, limited works have been made to study the impact of image complexity. In reality, image complexity possesses strong influences on the performance of salient region detection. We simply compare the classical SR model [11], state-of-the-art boolean map based saliency (BMS) model [15] and free energy-based saliency (FES) model [8], and our proposed Complexity-Adaptive Saliency Detection (CASD) model. For low-complexity images which are defined with obviously isolated foreground objects and background regions, as shown in Figs. 2(a)–(d), their saliency maps can be effortlessly predicted via suitably downsampling prior for local comparisons. Conversely, for high-complexity images which are defined without definite foreground and background layers, as given in Figs. 2(f)–(i), global-consideration-based saliency detection methods are better at detecting salient regions. Enlightened from this, we put forward a new framework of CASD, which first extracts the complexity feature of a given image followed by deploying this feature to adaptively combine global and local features towards yielding a more reliable saliency map, as shown in (e) and (j).

Apart from visual saliency, quality assessment is one of the most important multimedia applications. As a matter of fact, there exists a close relationship between the two research topics above, for instance, emphasizing salient areas in pooling [19], [20], [21], comparing the difference between the reference and distorted images' saliency maps to be a measure of distortion [22], [23], [24], etc.

Similar to visual saliency, IQA has been long studied for more than 20 years. It is believed that IQA can be generally classified into two categories, i.e. subjective assessment and objective assessment. Subjective assessment offers benchmark results [25], [26], which can tell a metric is good or not by examining the correlation performance between the objective metric's predictions and subjective opinion scores. This assessment is, however, at the expense of much time, labor power and financial capacity, and hence unavailable in real-time and in-service application scenarios.

As for objective assessment models, in view of the amount of reference information to be compared with during quality evaluation, they can be further categorized into three classes, namely full-reference (FR) IQA, reduced-reference (RR) IQA and no-reference (NR) IQA. NR-IQA is oftentimes called blind IQA. Among the three above, FR-IQA approaches have acquired deeper exploration than the other two. Some typical FR-IQA methods, e.g. feature similarity index (FSIM) [22] and visual saliency-induced index (VSI) [23]. In comparison to FR-IQA metrics, RR algorithms are able to work with very limited reference information, such as [27], [28]. Yet, in multiple scenarios, any reference information is not accessible and thus blind/NR-IQA techniques are highly desired, especially at the present time. Classical blind quality metrics are opinion-aware (OA) based [29], [30], [31], [32], which adopt recorded human mean opinion scores (MOSs) to learn the regression module for converting HVS- or natural scene statistics (NSS)-backed features into an overall index for traditional distortions, e.g. noise and blur. Recently, blind metrics are explored along two main research lines: one is opinion-unaware (OU) based, e.g. [33], [34], which can reduce the time-consuming and labor-intensive human scoring tests as much as possible; the other is to develop more effective features for new distortion types, e.g. contrast distortion [35], [36], [37], tone mapping [38], [39], [40], DIBR [41], [42], compression [43], fog problem [44], [45], etc.

Despite numerous elegant works, as described above, few attentions have been shifted to photographic IQA. Regardless of influences of masking effects caused by different image contents (e.g. texture and edge), traditional distortions, such as noise and blur, are distributed uniformly. As a result, NR methods just by the usage of simple NSS models are capable of delivering high performance [29], [33], [34]. In comparison, photographic images are usually contaminated simultaneously by several distortions, which can be perceived as a multiply distorted IQA issue. As illustrated in Fig. 3, for exemplified photographic images from [46], we can see that the distortion types primarily involve brightness, contrast, noise and blur. Facing this real practical problem, current blind IQA models perform invalidly since they do not comprehensively take the above-mentioned distortions into consideration.

As for photographic IQA, in [47], Nuutinen et al. attempted to combine classical feature groups to infer the visual quality of real photographs. In [48], Saad et al. proposed an approach devoted to defining a consumer-relevant and tractable space for quality evaluation and then blindly assessing photographic images based on specific interpretable imaging characteristics. In [49], Zhu et al. developed a NR algorithm by integrating color-based local sharpness measure and NSS-inspired noise estimation with linear additive model. It is unlucky that those above quality methods are still far from ideal, even no more than 0.8 in prediction monotonicity evaluation, i.e. Spearman rank-order correlation coefficient (SROCC) [49].

For addressing this problem, we propose a novel saliency-guided blind IQA metric, Blind Photographic Quality Index (BPQI), in which four feature sets are included and devised based on the proposed CASD saliency detection technique. Particularly, the first (or second) set refers to one feature that is regarded as an index of contrast distortion (or sharpness and noise estimation). The third set of feature is employed for brightness distortion measure by characterizing luminance information variations under protecting/increasing/decreasing its original luminance intensity of a given photo. The fourth set of features concentrates on measuring the naturalness of a photo from two points of view. Subsequently, the well-known support vector regressor (SVR) is leveraged for intelligently integrating the above-stated 14 features towards inferring an overall quality prediction.

The pursuit of humans has been constantly towards simpler, handier and more powerful devises. This makes the smart phones become a new minion of public, being leading a trend to replace other visualized equipments, such as televisions and computers, and playing a more important role in multimedia applications. The most two frequently used functions of smart phones are browsing webpages and taking photos. A natural problem is that we cannot guarantee the photos captured by smart phones of high quality. One possible solution is to take more photos for a specific scene as candidates and pick the best one via later manual selection. But there are some big problems: one is that the limited storage capacity cannot permit a large body of photos, especially high-resolution (HD) photos, stored simultaneously; the other is that, due to user's unprofessionality, bad brightness environments and improper settings, we are still unable to make sure good-looking photos exist even though numerous photos have been taken. Another solution may resort to the installment of some APP programs for computing visual saliency and its corresponding quality assessment score. As far as we know, in present high-accuracy methods of saliency detection or quality evaluation, it requires massive computation load that decreases the standby time of smart phones, or huge memory space and expensive device components that increase costs of smart phones. This solution sounds not good yet.

In this paper, we put forward a new heuristic solution. As analyzed earlier, the real-time massive calculation is hardly conducted on most common smart phones so far. However, with the popularity of cloud platform and extensive coverage of WiFi and 3G networks, we are able to upload photos to the cloud platform via wireless networks, and then receive the quality prediction scores which are derived by using powerful cloud computing servers. This solution can result in several merits: 1) reducing the consumption of computational time, memory space and device cost of smart phones noticeably; 2) offering a real-time result of photo quality scores based on the highly efficient parallel computing; 3) even providing analyses and suggestions on how to improve the quality of photos, such as adjusting aperture size and shutter speed.

Our main contribution are summarized as follows. First, for aiming at the photographic IQA on smart phones, we provide a new practical solution based on the interaction between smart phones and cloud platform via ubiquitous wireless networks for green communication. Based on the information of interaction between smart phones and cloud platform, we first build the saliency detection model by merging complexity features. Third, our model achieves the superior performance when predicting fixations and the saliency-guided IQA metric with the best performance is developed to blindly evaluate the quality of photos.

In the remainder of this paper, we arrange the structure as follows. Section 2 in turn detailedly describes the proposed CASD saliency detection algorithm, BPQI quality assessment metric, as well as an associated practical application under the cloud computing environment. In Section 3, via thorough experiments, our CASD approach is demonstrated of superior accuracy beyond classical/modern relevant methods on three typical datasets [5], [6], [16], our BPQI technique is validated to outperform state-of-the-art competitors using the specific database used for photographic quality evaluation [46], and the promotion of implementation speed using cloud computing platforms is analyzed. In Section 4, some concluding remarks are provided for the whole paper.

Section snippets

Methodology

In this section, we begin with presenting how to extract the complexity feature as well as use it to search visual saliency. Next, on the basis of salient regions detected, a blind IQA algorithm is explored to predict the quality of photos. Lastly, we introduce cloud computing to render the proposed saliency detection and quality assessment technologies highly efficient and furthermore provide a relevant beneficial application.

Experiments

In this part, we implement two experiments for comparison with recently published competitors. The first experiment is to examine the performance of the proposed CASD saliency detection model for predicting fixation using three popular datasets, i.e. FIFA dataset [5], Toronto dataset [6], and MIT dataset [16]. We select eleven relevant algorithms, including classical Itti [3], AIM [6], Judd [16], and modern QDCT [12], SigSal [13], HFT [14], LG [17], CAS [18], AWS [7], FES [8], BMS [15].

The

Conclusion

This paper has researched mobile multimedia applications over the wireless network. We have developed the novel visual saliency detection and quality assessment methods under the cloud computing environment. By introducing a new important image complexity feature, we have proposed a useful unified framework to systematically fuse local and global features for seeking salient regions. Results of trials using popular databases confirm the effectiveness of the CASD model for predicting fixations.

Acknowledgements

This work was supported in part by the National Science Foundation of China under Grant 61862029, China Postdoctoral Science Foundation under Grant 2018M642325.

Zhifang Xia received the B.S. degree in measuring and control instrument from Anhui University, Hefei, China in 2008 and received the Master degree in control science and engineering from Tsinghua University, Beijing, China in 2012. She is currently an engineer and a registered consultant (investment) with State Information Center, Beijing, China. Her interests include image processing, quality assessment, machine learning and e-government. She won the second prize of National excellent

References (64)

  • X. Min et al.

    Fixation prediction through multimodal analysis

    ACM Trans. Multimed. Comput. Commun. Appl.

    (2016)
  • X. Hou et al.

    Saliency detection: a spectral residual approach

  • B. Schauerte et al.

    Quaternion-based spectral saliency detection for eye fixation prediction

  • X. Hou et al.

    Image signature: highlighting sparse salient regions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (Jan. 2012)
  • J. Li et al.

    Visual saliency based on scale-space analysis in the frequency domain

    IEEE Trans. Pattern Anal. Mach. Intell.

    (Apr. 2013)
  • J. Zhang et al.

    Exploiting surroundedness for saliency detection: a Boolean map approach

    IEEE Trans. Pattern Anal. Mach. Intell.

    (May 2016)
  • T. Judd et al.

    Learning to predict where humans look

  • A. Borji et al.

    Exploiting local and global patch rarities for saliency detection

  • S. Goferman et al.

    Context-aware saliency detection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (Oct. 2012)
  • K. Gu et al.

    Saliency-guided quality assessment of screen content images

    IEEE Trans. Multimed.

    (Jun. 2016)
  • K. Gu et al.

    Model-based referenceless quality metric of 3D synthesized images using local image description

    IEEE Trans. Image Process.

    (Jan. 2018)
  • V. Jakhetiya et al.

    A prediction backed model for quality assessment of screen content and 3D synthesized images

    IEEE Trans. Ind. Inform.

    (Feb. 2018)
  • L. Zhang et al.

    FSIM: a feature similarity index for image quality assessment

    IEEE Trans. Image Process.

    (Aug. 2011)
  • L. Zhang et al.

    VSI: a visual saliency induced index for perceptual image quality assessment

    IEEE Trans. Image Process.

    (Oct. 2014)
  • E.C. Larson et al.

    Most apparent distortion: full-reference image quality assessment and the role of strategy

    J. Electron. Imaging

    (Mar. 2010)
  • A. Ninassi et al.

    Subjective quality assessment-IVC database

  • M. Narwaria et al.

    Fourier transform-based scalable image quality measure

    IEEE Trans. Image Process.

    (Aug. 2012)
  • K. Gu et al.

    The analysis of image contrast: from quality assessment to automatic enhancement

    IEEE Trans. Cybern.

    (Jan. 2016)
  • A. Mittal et al.

    No-reference image quality assessment in the spatial domain

    IEEE Trans. Image Process.

    (Dec. 2012)
  • K. Gu et al.

    Using free energy principle for blind image quality assessment

    IEEE Trans. Multimed.

    (Jan. 2015)
  • X. Min et al.

    Blind image quality estimation via distortion aggravation

    IEEE Trans. Broadcast.

    (2018)
  • K. Gu et al.

    No-reference quality assessment of screen content pictures

    IEEE Trans. Image Process.

    (2017)
  • Cited by (0)

    Zhifang Xia received the B.S. degree in measuring and control instrument from Anhui University, Hefei, China in 2008 and received the Master degree in control science and engineering from Tsinghua University, Beijing, China in 2012. She is currently an engineer and a registered consultant (investment) with State Information Center, Beijing, China. Her interests include image processing, quality assessment, machine learning and e-government. She won the second prize of National excellent engineering consultation award in 2016.

    Weiling Chen received the B.S. and Ph.D. degrees in communication engineering from Xiamen University, Xiamen, China, in 2013 and 2018. She is currently a lecturer in Fuzhou University, Fuzhou, China. From Sep. 2016 to Dec. 2016, she was visiting at the School of Computer Science and Engineering, Nanyang Technological University, Singapore. And she is the reviewer for ICIP 2016, ICIP 2017, IEEE ACCESS and T-IP. Her current research interests include image quality assessment, image compression, and underwater acoustic communication.

    Qiaohong Li received the B.E. degree and M.E. degree in School of Information and Communication Engineering from Beijing University of Posts and Telecommunications, Beijing, China, in 2009 and 2012, and the Ph.D. degree from Nanyang Technological University, Singapore, in 2017. She is currently a Research Fellow with the Department of Industrial Systems Engineering & Management, National University of Singapore, Singapore. Her research interests include multimedia quality assessment, visual perceptual modeling, and machine learning.

    Feiniu Yuan received his B.Eng. and M.E. degrees in mechanical engineering from Hefei University of Technology, Hefei, China, in 1998 and 2001, respectively, and his Ph.D. degree in pattern recognition and intelligence system from University of Science and Technology of China (USTC), Hefei, in 2004. From 2004 to 2006, he worked as a post-doctoral researcher with State Key Lab of Fire Science, USTC. From 2010 to 2012, he was a Senior Research Fellow with Singapore Bioimaging Consortium, Agency for Science, Technology and Research, Singapore. From 2006 to 2018, he worked as a professor with School of Information Technology, Jiangxi University of Finance and Economics. He is currently a professor with College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 201418, China. His research interests include 3D modeling, image processing and pattern recognition.

    Chuangeng Tian received the B.S. degrees from Hangzhou Dianzi University in 2005, the M.S. and Ph.D. degrees from China University of Mining and Technology, Xuzhou, China, in 2009 and 2016, respectively. Currently, he is a teacher with Xuzhou Medical University, Xuzhou, China. His research interests include signal processing and image based biomedical applications.

    Lu Tang received the B.S. and M.S. degrees from China University of Mining and Technology, Xuzhou, China, in 2006 and 2009, respectively. From July 2009, she was a teacher in the School of Medical Imaging, Xuzhou Medical University, Xuzhou, China, and she is currently a Ph.D. student in the School of Information and Control Engineering, China University of Mining and Technology, China. Her research interests include image processing, signal processing and image based biomedical applications.

    View full text