Keywords

1 Introduction

With the rapid development of 3D video services, the efficient compression of 3D video data has become a popular research topic over the past few years. 3D-HEVC is an extension of the well-known video coding standard High Efficiency Video Coding (HEVC), and has a more complex and complete structure compared with HEVC and MV-HEVC. The MV-HEVC and 3D-HEVC both use the multi-viewpoint coding structure, while only 3D-HEVC encodes the depth sequences in term of corresponding viewpoints.

Conventional HEVC intra prediction modes were applied in almost smooth depth maps very well, but they will produce ringing effect in the sharp edge, resulting in that the intermediate synthesis view can not meet the expectations of the quality of the video. JCT-3V developed two kinds of intra partition modes for depth maps named DMM1 (Wedgelets) and DMM4 (Contour) [1]. In Wedgelets, the PB (prediction block) is divided into two SBP (sub-block partition) by a straight line. And in Contour, the separation line between the two regions cannot be easily described by a geometrical function.

However, DMMs in the 3D-HEVC mode decision process introduce a huge computational load. There has been many previous works in intra depth of 3D-HEVC [2,3,4,5,6,7,8,9,10]. Gu et al. [2, 3] terminated the unnecessary prediction modes by full RD cost calculation in 3D-HEVC. Park et al. [4] omitted unnecessary DMMs in the mode decision process based on the edge classification results. Peng [5] proposed two techniques including fast intra mode decision and fast Coding Unit (CU) size decision to speed up the encoding of depth video. In [6], Sanchez et al. applied a filter to the borders of the encoded block and determined the best positions to evaluate the DMM 1, reducing the computational effort of DMM 1 process. Zhang et al. [7] simplified the intra mode decision in 3D-HEVC depth map coding based on the way of obtaining the picture texture from the mode with Sum of Absolute Transform Difference (SATD) in rough mode decision. Ruhan [8] put forward a novel early Skip/DIS mode decision for 3D-HEVC depth encoding which aims at reducing the complexity effort of this process. The proposed solution is based on an adaptive threshold model, which takes into consideration the occurrence rate of both Skip and DIS modes. Zhang [9] applied a method for early determination of segment-wise DC coding (SDC) decision based on the hierarchical coding structure. In [10], the proposed algorithm exploits the edge orientation of the depth blocks to reduce the number of modes to be evaluated in the intra mode decision. In addition, the correlation between the Planar mode choice and the most probable modes (MPMs) selected is also exploited, to accelerate the depth intra coding.

This paper proposes propose two techniques to speed up the encoding of depth video, including DCT decision and fast CU split decision. Based on the result of analysis that the CU blocks in the smooth region usually do not perform the DMM mode, we determine DMMs are not added into the candidate modes list if the DCT coefficients in the lower right part of the current CU are completely zero. The experimental results show that the proposed decision reduces 52.45% computational runtime on average while maintaining almost the same coding performance as the original 3D-HEVC encoder.

2 DCT in Depth

Depth maps contain the information of distance. Most depth maps are composed of large nearly constant areas or slowly varying sample values (which represent object areas) and sharp edges (which represent object borders). Thus, the depth map differs from the texture map is that the depth map is composed of large smooth areas and sharp edges. For depth map coding in each CU, there are 37 intra prediction modes, including 35 conventional intra prediction modes and 2 DMMs. And in the DMMs, there are two different types of partition patterns called Wedgelets and Contour. Table 1 represents that the optimal intra prediction modes of CUs. It contains 98.21% conventional modes and 1.79% DMMs on average. It means that most of DMMs are unnecessary for depth coding [1]. As we known, Wedgelets and Contour are always performed in sharp edges. If CUs contain edges can be identified in advance, the DMMs can be decided that whether to add into the candidate modes list. It will significantly reduce the computational time.

Table 1. The optimal intra prediction modes of CUs

DCT is a transformation associated with Fast Fourier Transform (FFT). 2D DCT is usually used in signal and image processing, especially lossy compression, which has a strong concentration of energy distribution. And DCT is usually used to distinguish smooth region from maps.

Fig. 1.
figure 1

DCT coefficient matrix in depth (Color figure online)

As shown in Fig. 1, Fig. 1(a)–(c) is depth maps (4 \(\times \) 4), and Fig. 1(d)–(f) is DCT coefficient matrixes. We use \(DCT_{lowerright}\) to represent the numbers in the lower right part of the matrix which marked in red triangle. In Fig. 1(d), \(DCT_{lowerright}\) are all zero while the depth map in Fig. 1(a) is smooth. The depth map in Fig. 1(b) changes slowly and \(DCT_{lowerright}\) in Fig. 1(e) are nearly zero. And in Fig. 1(f), \(DCT_{lowerright}\) are not zero because there is an obvious sharp edge in depth map Fig. 1(c). It can be analyzed that for CUs with a slow gray value variation, most energy after DCT is in the upper left part which called low-frequency region. Conversely, if the CUs contain more detail texture information, more energy is scattered in the lower right part, which called high frequency region.

Based on Table 1 and the analysis that only few CUs with edges in depth maps select the best modes as DMMs for intra mode prediction, we conjecture that the \(DCT_{lowerright}\), which are all zero, can be used as the basis for judging smooth region. More than 34 hundred million CUs from eight depth sequences released by JCT-3V Group are statisticed, and the results is shown in Table 2. It presents the hit rate of that depth CU chooses conventional HEVC intra mode as the best prediction mode while \(DCT_{lowerright}\) are completely zero. It means that about 99% CUs select conventional modes and only less than 1% select DMMs as best intra mode while \(DCT_{lowerright}\) are all zero. Thus, DCT can be used to distinguish between smooth regions and sharp edges, which decides DMMs whether to add into the candidate modes list. The current CU only calculate conventional modes with SATD and don’t add DMMs into the candidate modes list when \(DCT_{lowerright}\) are all zero.

Table 2. Statistical analysis for conventional modes hit rate in 3D-HEVC intra coding

3 Proposed Decision

Based on the observation in Sect. 2, we propose two fast coding techniques and describe them in detail in the following.

3.1 DCT Decision

We compute the DCT coefficient matrix of current CU and calculate the \(DCT_{lowerright}\). If they are not zero, we believe that current CU has sharp edges and DMMs should be added into the candidate modes list for intra mode prediction.

Fig. 2.
figure 2

The processing flow of DCT decision

The flowchart of the proposed DCT decision is shown in Fig. 2. If \(DCT_{lowerright}\) are all zero, DMMs will not be added into the candidate modes list. Otherwise, all modes in the candidate modes list will be coded. Because of high computational complexity of traditional DCT, we use integer DCT technology of H.265/HEVC, which adopts a fast butterfly-shaped algorithm [11].

Table 3. The proportion of all zero blocks (QP42)

However, as shown in Table 3, with the size of CUs increasing, the proportion of the blocks whose \(DCT_{lowerright}\) are all zero is decreased. Balloons and Kendo reach 69.76% and 75.44% on average. Big CUs (16 \(\times \) 16, 32 \(\times \) 32) of GTFly achieves to 30.97% and 17.33%, and PoznanStreet even only achieves up to 14.58% and 5.77%. Small CUs (4 \(\times \) 4, 8 \(\times \) 8) of GTFly achieves to 86.88% and 60.05%, and PoznanStreet achieves to 64.70% and 34.67%. And the number of small CUs whose \(DCT_{lowerright}\) are all zero is greatly larger than big CUs.

Meanwhile, computational complexity of big CUs is higher than the small and it’s wasteful to compute the DCT coefficient matrixes whose \(DCT_{lowerright}\) are not all zero. Based on the analysis, we believe that it’s expensive to compute DCT coefficient matrixes of big CUs.

3.2 Fast CU Split Decision

Depth maps have large smooth and uniform areas. Hence, in current CU split decisions, the runtime of RD-Cost computation can be reduced and the sharp areas should be divided more carefully. Since the DCT decision is not suitable for big CUs, an early CU splitting termination algorithm is proposed.

In 2014, the variance of CU and threshold was firstly used to describe whether the CU is smooth [3]. The algorithm of Park [4] and Peng [5] also use variance as a condition, but Park modified the threshold which determines whether DMMs should be added into the candidate modes list and performed better than Gu. Peng applied threshold and variance in CU split, which shows that the variance and threshold decision is a good method to judge whether the depth map is smooth.

Above all, we choose variance and threshold decision as our fast CU split decision, as is shown in Fig. 3, \(Th_{CU}=\{(max(QP\gg 3-1,3))^2-8\}\ll 2\). If \(Var_{CU}\) is bigger than \(Th_{CU}\), current CU should be divided into four partition CUs. Otherwise, it shows that intra Prediction of current CU performs better than partition CUs.

Fig. 3.
figure 3

The processing flow of Fast CU split decision

4 Experimental Results

In the experiments, we test eight sequences to verify the coding efficiency of the proposed decision and 300 frames are tested. All the experiments are implemented on the 3D-HEVC Test Model (HTM13.0) under all intra configuration. The encoder configuration is as follows: 3 view case, the coding treeblock has a fixed size of 64 \(\times \) 64 pixels and depth range is from 0 to 3. The texture maps use the QPs at 25, 30, 35, 40 and the depth maps use 34, 39, 42, 45. The proposed algorithm is evaluated with Bjontegaard Delta bitrate (BD-rate) and Bjontegarrd Delta bitrate (BD-PSNR) [12] under all-intra configuration. BD-rate represents the total bitrates differences, BD-PSNR represents rendered PSNR change. We define Time Saving (TS) in Eq. (1), which represents reduction of total encoding time, including texture video coding and depth video coding under the all intra configuration.

$$\begin{aligned} Time\ Saving = 1-\frac{runtime\ of\ proposed\ algorithm}{runtime\ of\ orignal\ encoder\; (HTM 13.0)} \end{aligned}$$
(1)

Performance of DCT decision compared with encoder (HTM 13.0) is shown in Table 4, four sequences are tested. DCT decision only reduce 5.9% computational complexity on average while achieving 1.0 BD-rate increasing in depth coding. Not surprising, it’s a waste of time by computing DCT coefficient matrix of big CUs whose \(DCT_{lowerright}\) are all zero.

Table 4. Performance of DCT decision

Table 5 shows the performance of fast CU split decision under four video sequences. Up to 40.2% time saving is achieved. On average, the time saving is 29.9% at a cost of 0.5% bitrate increasing.

Table 5. Performance of fast CU split decision

Table 6 presents the detail of time saving of proposed decision under different QPs for four sequences. The proposed decision combines DCT decision and Fast CU Split decision. It can be observed from Table 6 that time saving on average of proposed decision when QP is 25 are almost the same as fast CU split decision. As the QP increases, proposed decision achieves more complexity reduction of coding on average.

Table 6. The detail of Time Saving (%) of proposed decision under different QPs for four sequences

Table 7 shows the experimental results of the coding performance and complexity reduction compared with HTM13.0. Compared with Table 5, although GTFly achieves up to 40.2% time saving in fast CU split decision and 57.0% time saving in proposed decision, it also save 16.8% runtime by DCT decision. It’s satisfied that Kendo in proposed decision achieves 46.0% time reduction rather than 22.7% in fast CU split decision. Based on the above, DCT decision can save time by deciding whether to add DMMs into the candidate modes list. And it’s obvious that DCT decision performs well in distinguish smooth maps between maps with sharp edges. And proposed decision leads to 0.03 BD-rate increasing for video and 2.71 decreasing for depth on average. It’s observed that fast CU split decision only affects time reduction rather than video quality and DCT decision plays an important role in the quality of rebuilt videos. Our proposed decision achieves 52.45% complexity reduction of coding on average. And the proposed decision save time from 37.30% to 68.60% without significant performance loss.

Table 7. Experimental results compared with original encoder
Table 8. Comparison result

Table 8 compares the proposed algorithm with the state-of-arts for intra coding. The BD-Rate is measured on the synthesized views. Most researches on intra prediction mode decision achieve 27.8%–37.65% time reduction with negligible loss. Our decision can save 52.45% coding runtime while maintaining almost the same RD performance as the original 3D-HEVC encoder.

5 Conclusion

In this paper, we propose a fast intra mode decision algorithm based on DCT to reduce the computational complexity of 3D-HEVC encoder. Although DCT decision encodes better in small CUs, the ratio of big CUs whose \(DCT_{lowerright}\) are all zero is extremely small, which leads to high complexity of DCT. We add existing fast CU split decision into the proposed decision to divide big CUs. The recent 3D-HEVC test model (HTM 13.0) is applied to evaluate the proposed decision. The experimental results show that the proposed decision can significantly save the encoding time while maintaining nearly the same RD performance as the original 3D-HEVC encoder. Meanwhile, it performs well in comparison with the state-of-art fast algorithm for 3D-HEVC.