Hierarchical visual comfort assessment for stereoscopic image retargeting

https://doi.org/10.1016/j.image.2021.116236Get rights and content

Highlights

  • The first visual comfort assessment metric for 3D image retargeting is proposed.

  • Our model named Hi-VCA, considers both direct and indirect factors affecting comfort.

  • A HVS-based binocular incongruity measurement is delicately designed in Hi-VCA.

  • Hi-VCA has the excellent ability to assess retargeting/non-retargeting 3D images.

  • We design a breakdown test for robustness validation to overcome the data bottleneck.

Abstract

Stereoscopic Image Retargeting (SIR) has made it possible for the popularity of 3D application. Meanwhile, the adjustments brought to images may affect the visual comfort when enjoying 3D service. While for SIR, previous Visual Comfort Assessment (VCA) methods often cannot perform well, because they only analyze the influence of disparity on discomfort and do not take into account the effects from the unique and complex distortions of SIR. In this paper, we propose a Hierarchical Visual Comfort Assessment (Hi-VCA) scheme for SIR, considering hybrid distortions including structure, information, semantic distortions usually occurring in retargeting, and binocular incongruity existing in stereoscopic multimedia. Specifically, we first propose valid Local-SSIM and Dual Natural Scene Statistics (D-NSS) features to measure structural distortion and information loss. Considering disparity adjustments may brought by SIR, we design the binocular incongruity measurement by analyzing various binocular anomaly perception mechanisms of HVS. Finally, CNN-based feature is utilized to ensure the correct delivery of semantic information. Each measurement is complementary in describing visual comfort degradation and they are further aggregated. Extensive experiment results on published SIR database SIRD and two ordinary databases IEEE-SA and NBU 3D-VCA, demonstrate Hi-VCA has superior performance by better handling hybrid distortions compared to state-of-the-art schemes.

Introduction

With the popularity of various stereoscopic display technologies and applications, such as three-dimensional (3D) television, 3D Virtual Reality (VR), and Augmented Reality (AR), stereoscopic content quality assessment has attracted extensive attention. Different from two-dimensional (2D) images, stereoscopic images contain additional disparity information and deliver a more immersive experience for users. Therefore, stereoscopic image assessment needs to deal with multi-dimensional quality factors including image quality, depth quality, visual comfort, naturalness, sense of presence, etc., which is more complicated than 2D image assessment [1]. Among them, visual comfort is one of the important factors attracting more and more research attention recently. The term visual comfort refers to the subjective sensation of comfort that is associated with viewing stereoscopic images [1]. Specifically, it directly reflects the physiological experience of Human Visual System (HVS) and further affects people’s desire for immersive multimedia content [2]. Therefore, visual comfort has become an inevitable issue that people pay close attention to. Traditional Visual Comfort Assessment (VCA) metrics only assess captured stereo images without any postprocessing/distortion [3], [4], [5]. But actually, it is often necessary to process media content with different purposes before delivering them to heterogeneous application scenarios, which is usually called repurposing process in the industry. In this process, multimedia retargeting techniques have been extensively studied to adapt images/videos to heterogeneous display devices with different screen resolutions and aspect ratios, while preserving important content of multimedia. Therefore, what the user actually perceives is retargeted multimedia content rather than directly captured content.

Recently, [6] has published the first subjective Stereoscopic Image Retargeting Database (SIRD) based on human subjective assessment for stereo retargeted images from image quality, visual comfort, depth quality, and the overall quality perspectives. By analyzing the Mean Opinion Scores (MOSs), we can obtain Table 1. It can be observed that the four assessment aspects have strong correlations with each other. Therefore, when evaluating visual comfort, the influences of image quality and depth quality should also be considered. This is consistent with our conventional wisdom [7]. In the case of good image quality, like most previous works [5], [8], [9], [10], [11], [12], [13], measuring visual comfort directly analyzes the effect of disparity/depth. But when image quality is impaired, such as image deformation, discontinuous content, etc., this will naturally affect visual comfortness.

Our proposed Hierarchical Visual Comfort Assessment (Hi-VCA) metric is not only able to evaluate captured stereo images, but also able to assess retargeted stereo images. This is more complex and needs to consider more influences on visual comfort from depth adjustment, structure modification, and information loss in addition to disparity. Actually, structure modification and information loss reflect the image quality in some degree, while disparity can be converted to depth which has relations with depth quality. As mentioned above, VCA is inseparable from the discussion of image quality and depth quality under the background of Stereoscopic Image Retargeting (SIR). Therefore, in our Hi-VCA metric, the impact of possible distortions in SIR on visual comfort is comprehensively considered.

Traditional 2D image retargeting techniques have been illustrated in [14], [15]. Although there are fewer SIR methods, more and more related researches are devoted to develop a perceptual SIR algorithm now, such as stereo warping based on user’s quality of experience [16], stereo seam carving [17], and stereo multi-operator [18]. How to evaluate stereoscopic retargeting technique becomes an important issue. As a qualified SIR method, it is important to measure its generated visual discomfort since it directly affects the user’s acceptance of stereo technology. From this point of view, an objective VCA for SIR metric can not only guide the direction of SIR improvement, but also help to make the proper choice among several retargeting methods in real applications.

The proposed Hi-VCA metric for SIR is inspired by the two-stream hierarchical perception hypothesis of HVS [19] as shown in Fig. 1. In detail, when the binocular eyes receive stereoscopic visual signals, the photoreceptors on the retina transform the optical signal into an electrical signal, which is then transmitted to the retinal ganglion cells where the first receptive field of HVS is formed [4]. The outputs of retinal ganglion cells are relayed via the lateral geniculate nucleus to primary visual cortex termed as V1 [20]. After the binocular vision signals are first combined in V1, they have been segregated [21]. Then, there are two separate neural pathways diverging from V1 termed as the ventral and dorsal streams, respectively. The neurons along the ventral stream are mainly implicated with shape recognition and object representation [22]. And the neurons along the dorsal stream are predominantly involved in horizontal disparity primitives, motion computations, such as optical flow [23]. Finally, the semantic perception is formed at the visual center of the cerebral cortex. Since visual comfort is a high-level subjective experience, in order to simulate the perception process of HVS throughout, we describe these perception information in our metric correspondingly. We first propose Local Structural SIMilarity (Local-SSIM) and Dual Natural Scene Statistics (D-NSS) features corresponding to the structural arrangements and object information in ventral stream [19], [22], respectively. Then, the binocular incongruity measurement is designed to correspond to the perception of horizontal disparity in dorsal stream. And motion is not considered since we only discuss still images here. Finally, we introduce semantic distortion to reflect the higher-level stereoscopic perception and understanding process in HVS. The main contributions of this work are as follows:

  • The first VCA metric for SIR. Traditional VCA methods are designed for captured stereo images without retargeting or other distortions, hence directly applying them to stereo retargeted images is far from enough. Under such a background, we propose the first VCA metric that is suitable for SIR scenarios.

  • A comprehensive Hi-VCA metric. In addition to the influence from disparity that traditional VCA methods consider, many unique factors of SIR are also measured in our proposed Hi-VCA metric, such as structure distortion, information loss, and semantic distortion. These factors have a non-negligible impact on visual comfort.

  • A HVS-based binocular incongruity measurement. In addition to the visual comfort zone that is often analyzed, we pay more attention to elaborating the likelihood of window violation and binocular rivalry, and expressing the adjustment intensity of Accommodation-Vergence (A/V) process in HVS [24]. Due to possible disparity adjustment in SIR, these binocular perception mechanisms are easily triggered. However, these visual phenomena, especially A/V process, are important and direct causes of visual discomfort.

  • Strong universality. Although Hi-VCA is designed for stereo retargeted images, it is also able to assess the visual comfort of general stereo images and is comparable with the state-of-the-art performance, because we take into account more factors than the traditional way without affecting the traditional VCA.

Rest of this paper is organized as follows. Related work is introduced in Section 2. In Section 3, we introduce the proposed Hi-VCA for SIR metric. The experimental setting, results, and analysis are presented in Section 4. And we conclude the paper in Section 5.

Section snippets

Related work

In this section, we mainly introduce relevant quality assessment studies from 2D and 3D perspectives. In the 2D field, we first analyze 2D general Image Quality Assessment (IQA) methods, and then discuss the existing 2D retargeted IQA studies. In the 3D field, in addition to 3D general IQA and 3D retargeted IQA works, we also introduce the research status of 3D VCA which is unique in 3D multimedia processing.

During the last two decades, many successful 2D IQA metrics have been developed, such

Proposed Hi-VCA metric

The block diagram of Hi-VCA metric used for SIR is shown in Fig. 2. It consists of four kinds of measurement. According to [39], there are some major determining factors for human visual perception on 2D retargeted images, i.e., global structural distortion, local region distortion, and loss of salient information. And as mentioned earlier, visual comfort is affected by image quality. Therefore, in VCA for SIR, we naturally also analyze the impacts from these factors on visual comfort. Firstly,

Performance evaluation

Due to the lack of public subjective stereoscopic image retargeting databases, we have established a Stereoscopic Image Retargeting Database (SIRD) in our previous work [6], which is available online for public research usage [62]. In the SIRD, there are 100 source images and 400 stereoscopic retargeted images, and four typically stereoscopic retargeting methods are involved, i.e., stereo cropping, stereo seam carving, stereo scaling, and stereo multi-operator. The resolution of source images

Conclusion

In this paper, we have presented a Hierarchical Visual Comfort Assessment (Hi-VCA) metric for stereoscopic image retargeting, which is also the first visual comfort assessment metric for stereoscopic image retargeting. Hybrid distortion types from structure distortion, information loss, binocular incongruity, and semantic distortion are introduced and aggregated to build the comprehensive visual comfort assessment results. Extensive experiment results demonstrate that our proposed Hi-VCA metric

CRediT authorship contribution statement

Ya Zhou: Conceptualization, Investigation, Methodology, Software, Visualization, Writing - original draft. Zhibo Chen: Data Curation, Formal analysis, Funding acquisition, Project administration, Writing - original draft. Weiping Li: Supervision, Validation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ya Zhou received the B.S. degree in communication engineering from Shanghai University in 2017, and the M.S. degree in information and communication engineering from the University of Science and Technology of China (USTC) in 2020. She was a trainee with KDDI Research, Inc. in 2019. Her current research interests include visual quality of experience assessment, computer vision, and machine learning.

References (66)

  • MaJ. et al.

    Joint binocular energy-contrast perception for quality assessment of stereoscopic images

    Signal Process., Image Commun.

    (2018)
  • International Telecommunication UnionR.S.

    Methodology for the subjective assessment of the quality of television pictures

    (2019)
  • LambooijM. et al.

    Visual discomfort and visual fatigue in stereoscopic displays : a review

    J. Imaging Sci. Technol.

    (2009)
  • ParkJ. et al.

    3D visual discomfort prediction: Vergence, foveation, and the physiological optics of accommodation

    IEEE J. Sel. Top. Sign. Proces.

    (2014)
  • ParkJ. et al.

    3D visual discomfort predictor: Analysis of disparity and neural activity statistics

    IEEE Trans. Image Process.

    (2015)
  • KimH.G. et al.

    Binocular fusion net: Deep learning visual comfort assessment for stereoscopic 3D

    IEEE Trans. Circuits Syst. Video Technol.

    (2019)
  • ZhouY. et al.

    Visual comfort assessment for stereoscopic image retargeting

  • UrvoyM. et al.

    How visual fatigue and discomfort impact 3D-TV quality of experience: a comprehensive review of technological, psychophysical, and psychological factors

    Ann. Telecommun.-Ann. Des. Télécommun.

    (2013)
  • NojiriY. et al.

    Measurement of parallax distribution and its application to the analysis of visual comfort for stereoscopic HDTV

  • KimD. et al.

    Visual fatigue prediction for stereoscopic image

    IEEE Trans. Circuits Syst. Video Technol.

    (2011)
  • ChoiJ. et al.

    Visual fatigue modeling and analysis for stereoscopic video

    Opt. Eng.

    (2012)
  • JiangQ. et al.

    Three-dimensional visual comfort assessment via preference learning

    J. Electron. Imaging

    (2015)
  • ZhangY. et al.

    Aspect ratio similarity (ARS) for image retargeting quality assessment

  • SohnH. et al.

    Predicting visual discomfort using object size and disparity information in stereoscopic images

    IEEE Trans. Broadcast.

    (2013)
  • RubinsteinM. et al.

    A comparative study of image retargeting

    ACM Trans. Graph.

    (2010)
  • ShaoF. et al.

    QoE-guided warping for stereoscopic image retargeting

    IEEE Trans. Image Process.

    (2017)
  • BashaT. et al.

    Geometrically consistent stereo seam carving

  • ZhuL. et al.

    Multi-operator stereoscopic image retargeting based on human visual comfort

  • MartinezL.M. et al.

    Complex receptive fields in primary visual cortex

    Neuroscientist

    (2003)
  • SeshadrinathanK. et al.

    Motion tuned spatio-temporal quality assessment of natural videos

    IEEE Trans. Image Process.

    (2010)
  • TamW. et al.

    Stereoscopic 3D-TV: Visual comfort

    IEEE Trans. Broadcast.

    (2011)
  • MittalA. et al.

    No-reference image quality assessment in the spatial domain

    IEEE Trans. Image Process.

    (2012)
  • LiuC. et al.

    SIFT flow: Dense correspondence across scenes and its applications

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • Cited by (0)

    Ya Zhou received the B.S. degree in communication engineering from Shanghai University in 2017, and the M.S. degree in information and communication engineering from the University of Science and Technology of China (USTC) in 2020. She was a trainee with KDDI Research, Inc. in 2019. Her current research interests include visual quality of experience assessment, computer vision, and machine learning.

    Zhibo Chen (M’01–SM’11) received the B. Sc., and Ph.D. degree from Department of Electrical Engineering Tsinghua University in 1998 and 2003, respectively. He is now a professor in University of Science and Technology of China. His research interests include image and video compression, visual quality of experience assessment, immersive media computing and intelligent media computing. He has more than 100 publications and more than 50 granted EU and US patent applications. He is IEEE senior member, member of IEEE Visual Signal Processing and Communications Committee, and member of IEEE Multimedia System and Applications Committee. He was TPC chair of IEEE PCS 2019 and organization committee member of ICIP 2017 and ICME 2013, served as TPC member in IEEE ISCAS and IEEE VCIP.

    Weiping Li (S’84–M’87–SM’97–F’00) received the B.S. degree in electrical engineering from University of Science and Technology of China (USTC), Hefei, China, in 1982, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1983 and 1988, respectively. In 1987, he joined Lehigh University, Bethlehem, PA, USA, as an Assistant Professor with the Department of Electrical Engineering and Computer Science. In 1993, he was promoted to Associate Professor with tenure. In 1998, he was promoted to Full Professor. From 1998 to 2010, he worked in several high-tech companies in the Silicon Valley (1998–2000, Optivision, Palo Alto; 2000–2002, Webcast Technologies, Mountain View; 2002–2008, Amity Systems, Milpitas, 2008–2010, Bada Networks, Santa Clara; all in California, USA). In March 2010, he returned to USTC to serve as the Dean of the School of Information Science and Technology until July 2014 and is currently a Professor with the School of Information Science and Technology.

    Dr. Li served as the Editor-in-Chief of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY and Guest Editor of the PROCEEDINGS OF THE IEEE. He was the Chair of several Technical Committees in the IEEE Circuits and Systems Society and IEEE International Conferences, and the Chair of the Best Student Paper Award Committee for SPIE Visual Communications and Image Processing Conference. He has made many contributions to international standards. His inventions on fine granularity scalable video coding and shape adaptive wavelet coding have been included in the MPEG-4 international standard. He served as a Member of the Moving Picture Experts Group (MPEG) of the International Standard Organization (ISO) and an Editor of MPEG-4 international standard. He served as a Founding Member of the Board of Directors of MPEG-4 Industry Forum. As a Technical Advisor, he also made contributions to the Chinese audio video coding standard and its applications. He was the recipient of the Certificate of Appreciation from ISO/IEC as a Project Editor in development of an international standard in 2004, the Spira Award for Excellence in Teaching in 1992 at Lehigh University, and the first Guo Mo-Ruo Prize for Outstanding Student in 1980 at USTC.

    This work was supported in part by NSFC, China under Grant U1908209, 61632001 and the National Key Research and Development Program of China 2018AAA0101400.

    View full text