Joint disparity and variable size-block optimization algorithm for stereoscopic image compression

doi:10.1016/j.image.2017.10.008

Signal Processing: Image Communication

Volume 61, February 2018, Pages 1-8

https://doi.org/10.1016/j.image.2017.10.008 Get rights and content

Highlights

•
Disparity map estimation for stereoscopic image compression.
•
Bitrate-efficient representation of an irregular-block image decomposition.
•
Efficient rate-distortion suboptimal solution to a discrete optimization problem.
•
Rate-distortion-optimization simplified algorithm using priority queues.
•
Performance tests on stereoscopic images from the CMU/VASC and Middlebury datasets.

Abstract

This paper addresses the disparity map estimation problem in the context of stereoscopic image coding. It is undeniable that the use of variable size blocks offers the possibility to describe more precisely the predicted view but at the expense of a high bitrate if no particular consideration is taken into account by the estimation algorithm. Indeed more information related to the block layout, considered here as a block-length map, is required at the prediction step. This paper presents an algorithm which jointly optimizes the block-length map as well as the disparity map so as to ensure a good reconstruction of the predicted view while minimizing the bitrate. This is done thanks to a joint metric taking into account the quality of the reconstruction as well as the bitrate needed to encode the maps. Moreover the developed algorithm iteratively improves its performance by refining the estimated maps. Simulation results conducted on several stereoscopic images from the CMU-VASC and the Middlebury dataset confirm the benefits of this approach as compared to competitive block matching algorithms.

Introduction

Applications using stereoscopic videos allow the viewer to have a greater immersion in the scene as it is the case with stereo-visioconferencing or 3D-TV [[1], [2]]. Stereoscopic images are composed of two views of the same scene acquired from two cameras capturing the scene from two slightly different viewpoints. The viewer has an impression of immersion in the scene by making his left eye look at the left view and the right eye look at the right view. In stereoscopic images, objects in the scene are shifted from one view to another. This special displacement is called disparity and it conveys to the viewer depth information. 3D-immersion bears some technical resemblance with rendering motion in video. Indeed objects being shifted from one frame to the following one convey speed information. This resemblance advocates using motion-based algorithms for stereoscopic images. Note that, as mentioned in section III.C of [3] and in section II.A of [4], disparities have statistical properties different from that of motion vectors: disparities can be represented using vectors having nearly horizontal directions and very broad range of magnitudes. The collection of all disparities is called the disparity map and there is a very wide literature on how to estimate the true disparity map, some of these techniques are listed and evaluated in [5].

In this paper we are concerned with compressing stereoscopic images. This kind of images, being composed of two views, requires twice the amount of information needed to encode it when stored or transmitted, as compared to traditional 2D-images. One simple method would consist in encoding the two views separately, but more efficient methods have been developed, they exploit redundancies between two views. Some methods achieve a joint coding and do not use disparity maps: see in [6] the use of lifting schemes and in [7] an adaptation of Least Square Prediction. Most compressing techniques designed for stereoscopic images follow this scheme: (a) first, one of the two views is taken as the reference one. Let us say the left one. It is encoded independently; (b) then the second view (the right one) is predicted from the reference view by estimating a disparity map; (c) a residual image is computed between the original right view and its prediction. The disparity map is usually encoded using an entropy coder while the left view and the residual image are encoded using transforms such as DCT (Discrete Cosine Transform) [[8], [9], [10]]. They are all sent to the decoder which reconstructs the right view using the left view and the disparity map, compensated with the residual image. This coding scheme is known as the disparity compensated scheme because of its resemblance with the motion compensated scheme developed in video codec [11]. Our proposition concerns step (b) in this coding scheme and consists in a new disparity map estimation technique with the aim to improve the visual quality of the compressed stereoscopic image for a given bitrate.

When designing or assessing a stereoscopic image coding scheme, taking into account the end-user experience is important, it is also very challenging. Studying binocular rivalry, [12] shows that the viewer perceives information primarily from whichever of the two views contains the highest frequencies. Differences in the frequency content affects the ability to perceive precisely the depth. [13] adds that the human visual system has a comfort zone in identifying depth: the accommodation cue caused by the distance of the display should not be too contradictory with the disparity cue. According to [14], an increase in JPEG coding has a negative effect of image quality and sharpness but not on depth perception; in some cases image quality ratings of a symmetric coded pair can be higher than for an asymmetric coded pair. There exists quality metrics for 3D stereoscopic images. In [13], quality is modeled as the information contained in the disparity map, and their quality metric is computed using a saliency map, a segmented disparity map and an entropy computation. Quality metric for stereoscopic images are divided into two main categories: metrics based on those used for traditional 2D images (as PSNR, or SSIM) computed as a global distortion of the left and right views of the stereo-image [[15], [16]]; and metrics including depth/disparity information. The second category evaluates the quality of the textured views as in the first category and computes also the distortion of the disparity map compared to the true disparities of the scene [[17], [18]]. This second kind of metric which is appropriated for 3D-reconstruction purpose is not in the coding context: indeed it is sufficient for the viewer to have a good reconstruction of one or both views of the stereo-image to have a immersive experience even if the estimated disparities are different from the true disparities. Moreover, no real assessment has shown that a quality metric could measure the immersive experience of a compressed stereoscopic image with such accuracy that one would be able to compare coding schemes. In our work, we have chosen to evaluate our coding scheme by its ability to yield a prediction of the right image having the least distortion compared to the true right view. This distortion is measured with the Peak Signal-to-Noise Ratio (PSNR) which is related to the log-value of the mean square error.

In the compressing context, estimating a disparity map is a trade-off between its efficiency to reduce the distortion and its ability to be compressed using the available bitrate budget. The most used algorithm to estimate the disparity map is the Block Matching Algorithm (BMA), recalled in [19]. This algorithm assumes both views are divided into blocks of the same size and for each block the selected disparity is the one that minimizes the distortion. It is an efficient way of reducing the bitrate as only one disparity value is coded for each block. Some research focuses on reducing the bitrate by smoothing the disparity map and hence introducing correlations which reduces the bitrate: [20] uses overlapped blocks. A new way of reducing the bitrate, presented in [21], is to model the bitrate–distortion trade-off as an optimization problem: the global distortion is the sum of a local distortion computed at each block and as the disparity map is encoded with an entropy-coder, the bitrate is approximated, up to a factor, by the entropy of the categorical distribution computed from the disparity map. As pointed in [22], finding the best solution is challenging. The suboptimal solution of this optimization problem, presented in [21], consists in processing each block and modifying a disparity when this modification improves a cost function computed on the whole image. Unfortunately, both the model and the suboptimal solution work only if all blocks are of the same size.

An important literature in stereo image compression, see [[23], [24]], focuses on using blocks of different sizes, as indeed this can further reduce the bitrate of the disparity map. It enables to transmit only one disparity value for a big block and many disparity values for another block divided into many subblocks. The former could for example be relevant for a uniform region or an object being located at roughly the same depth of the true 3D scene. The latter could be appropriate when predicting textured objects of small sizes. Using blocks of variable sizes comes at a cost, that of transmitting to the decoder sufficient information as to how the blocks of variable sizes are displayed.

This paper extends [21] to blocks of variable sizes. It presents an algorithm jointly optimizing the disparity and the block-length maps. The developed method relies on an entropy-distortion metric taking into account the reduction of the distortion of the predicted view but also an approximation of the bitrate needed to encode both the disparity and the block length maps. At each step of the algorithm, a decision is taken to decide if a given block should be predicted using a single disparity value or if that block should be divided into four subblocks and predicted using four different disparity values. This choice is coupled with a refinement of the disparity map.

The rest of the paper is organized as follows. Section 2 introduces the stereo-matching optimization problem. Section 3 proposes the suboptimal optimization algorithm. Section 4 provides simulation results evaluating the developed algorithm’s performance. Finally Section 5 concludes this paper.

Section snippets

Stereo-matching optimization problem

This section deals with the problem of estimating the disparity map achieving the best compromise between yielding a good prediction and encoding the disparity map at a low bitrate.

The reference view and the view to be predicted are assumed to be rectified so that the disparities are searched within the same scan lines. The stereo-matching optimization algorithm is intended to yield a good estimate of the predicted view from a given reference view while requiring a low bitrate to encode the

Joint disparity and block-length maps optimization algorithm (JDBLMO)

This section focuses on the minimization of the Lagrangian cost function, $J (λ, d, l, S)$ introduced in the previous section, for a given value of $λ$ . Given the complexity of this optimization problem, a sub-optimal optimization algorithm is developed. In the rest of the paper, this algorithm is called JDBLMO for Joint Optimization of the Disparity and Block-Length Map . It is briefly described in Fig. 3.

The proposed sub-optimization algorithm is performed according to three main stages. The first

Performance evaluation and discussions

This section analyzes the performance of the proposed algorithm JDBLMO. The ability of this algorithm to achieve a good prediction of the right view knowing the exact left view while requiring the least bitrate is discussed. The rationale is that such an algorithm should also yield good performance when it is integrated in a disparity compensated coding scheme.

The Peak Signal-to-Noise Ratio (PSNR) is adopted to measure the prediction quality computed between the original and the predicted right

Conclusion

This paper proposed a new stereo-matching image algorithm using variable block sizes. This algorithm optimized jointly the disparity and the block-length maps so as to ensure a good reconstruction of the predicted view while minimizing the bitrate. Simulation results have shown that this sub-optimal stereo-matching algorithm achieves better prediction performance when compared to the competitive BMA algorithm using variable size-block. In future work, the proposed algorithm will be integrated

References (32)

MoorthyA.K. et al.
Subjective evaluation of stereoscopic image quality
Signal Process., Image Commun.
(2013)
KadaikarA. et al.
Sequential block-based disparity map estimation algorithm for stereoscopic image coding
Signal Process., Image Commun.
(2015)
KauffP. et al.
An immersive 3D video-conferencing system using shared virtual team user environments
TamW. et al.
Stereoscopic 3D-TV: visual comfort
IEEE Trans. Broadcast.
(2011)
AydinogluH. et al.
Stereo image coding: a projection approach
IEEE Trans. Image Process.
(1998)
SmolicA. et al.
Coding algorithms for 3D-TV — a survey
IEEE Trans. Circuits Syst. Video Technol.
(2007)
ScharsteinD. et al.
A taxonomy and evaluation of dense two-frame stereo correspondence algorithms
Int. J. Comput. Vis.
(2002)
KaanicheM. et al.
Vector lifting schemes for stereo image coding
IEEE Trans. Image Process.
(2009)
L. Lucas, N. Rodrigues, E. da Silva, S. de Faria, Adaptive least squares prediction for stereo image coding, in: 18th...
LiS. et al.
Approaches to H. 264-based stereoscopic video coding

WooW. et al.

Stereo image compression with disparity compensation using the MRF model

AhlversU. et al.

FFT-based disparity estimation for stereo image coding

UgurK. et al.

Motion compensated prediction and interpolation filter design in H. 265/HEVC

IEEE J. Sel. Top. Sign. Proces.

(2013)

PerkinsM.G.

Data compression of stereopairs

IEEE Trans. Commun.

(1992)

B. Teréki, P. Oittinen, L. Szirmay-Kalos, Informational aesthetic measure for 3D stereoscopic imaging, in: Conference...

SeuntiensP. et al.

Perceived quality of compressed stereoscopic images: Effects of symmetric and asymmetric JPEG coding and camera separation

ACM Trans. Appl. Percept.

(2006)

Cited by (10)

BGT: A blind image quality evaluator via gradient and texture statistical features
2021, Signal Processing: Image Communication
Citation Excerpt :
With the incremental use of digital devices such as cell phones and digital videos/cameras, multimedia applications flourished over the past few decades, which led to a rapid development and ubiquity of digital images. While in the process of acquisition, compression, transmission, and duplication of digital images, there are a lot of distortions [1–5], which seriously affect the experiences of end-users. Therefore, it is necessary to design an accurate and reliable model to evaluate the image quality in various scenes.
Blind image quality assessment (BIQA) aims to design a model that can accurately evaluate the quality of the distorted image without any information about its reference image. Previous studies have shown that gradients and textures of image is widely used in image quality evaluation tasks. However, few studies used the joint statistics of gradient and texture information to evaluate image quality. Considering the visual perception characteristics of the human visual system, we develop a novel general-purpose BIQA model via two sets of complementary perception features. Specifically, the joint statistical histograms of gradient and texture are extracted as the first set of features, and the second set of features is extracted using the local binary pattern (LBP) operator. After extracting two groups of complementary quality-aware features, the feature vectors are sent to the support vector regression machine to establish the nonlinear relationship between quality-aware features and quality scores. A large number of experiments on seven large benchmark databases show that the proposed BIQA model has higher accuracy, better generalization properties and lower computational complexity than the relevant state-of-the-art BIQA metrics.
Towards a blind image quality evaluator using multi-scale second-order statistics
2019, Signal Processing: Image Communication
Citation Excerpt :
People are thrilled by various fascinating apps, such as Facebook, Instagram, Wechat and Youtube, which can be used to share photos and videos with others. However, image/video quality is subject to degradation during the acquisition, transmission and restoration processes [1–4]. To provide greater user experience, a fast yet reliable Image Quality Assessment (IQA) method is needed.
Natural image statistics have proved to be effective indicators in measuring quality degradations. Most of the current statistics-based Image Quality Assessment (IQA) metrics aim at utilizing features derived from first-order models. However, second-order statistics are also of great value in image quality prediction, which are not yet fully studied. In this paper, a Blind image Quality Evaluator based on Multi-scale Second-order Statistics (BQEMSS) is proposed. The distorted image is first transformed into an opponent color space, and then quality-aware features are extracted in multiple scales from the joint distribution of adjacent sub-band coefficients in the wavelet domain and the histogram of Gaussian derivative pattern in the spatial domain respectively. To quantify the statistical regularities between sub-band coefficients, three types of image dependencies are explored, including spatially adjacent dependency, sub-band orientation dependency and sub-band scale dependency. In the final step, features are stacked to form a feature vector and a regression module is employed to map the feature vectors into quality scores. Extensive experiments on several public image quality databases demonstrate that BQEMSS is superior over the relevant state-of-the-art general-purpose blind IQA models.
MASIC: Deep Mask Stereo Image Compression
2023, IEEE Transactions on Circuits and Systems for Video Technology
Disparity-based Stereo Image Compression with Aligned Cross-View Priors
2022, arXiv
Disparity-based Stereo Image Compression with Aligned Cross-View Priors
2022, MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
Stereo Image Compression Using Recurrent Neural Network With A Convolutional Neural Network-Based Occlusion Detection
2022, Proceedings - International Conference on Pattern Recognition

View all citing articles on Scopus

View full text

Joint disparity and variable size-block optimization algorithm for stereoscopic image compression

Highlights

Abstract

Introduction

Section snippets

Stereo-matching optimization problem

Joint disparity and block-length maps optimization algorithm (JDBLMO)

Performance evaluation and discussions

Conclusion

Signal Process., Image Commun.

Signal Process., Image Commun.

An immersive 3D video-conferencing system using shared virtual team user environments

Stereoscopic 3D-TV: visual comfort

IEEE Trans. Broadcast.

Stereo image coding: a projection approach

IEEE Trans. Image Process.

Coding algorithms for 3D-TV — a survey

IEEE Trans. Circuits Syst. Video Technol.

A taxonomy and evaluation of dense two-frame stereo correspondence algorithms

Int. J. Comput. Vis.

Vector lifting schemes for stereo image coding