Video intra prediction using convolutional encoder decoder network☆
Introduction
Intra prediction methods play an important role in current state-of-the-art video coding standards [1], as they provide an efficient solution to reduce signal energy by prediction from spatial neighboring encoded pixels. In order to capture finer edge directions presented in natural images, High Efficiency Video Coding (HEVC) employs 35 intra prediction modes, which include planar mode, DC mode, and 33 angular prediction modes [2]. Furthermore, in the developing Joint Exploration Model (JEM) [3], the number of angular prediction modes has been extended to 65. This kind of fine-grained modes can provide more accurate prediction when compared with the intra prediction in H.264/AVC, in which there are only 9 modes [4].
Video intra prediction is a well-studied and challenging task, and its classical method is to create a prediction block by extrapolating the reference pixels surrounding the target block, as shown in Fig. 1. For angular prediction, each pixel in the current block will be projected to the nearest reference line along the angular direction, and the projected pixel is used as the prediction. A linear interpolation filter with 1/32 pixel accuracy is used to generate the reference line. And, the filter coefficient is the inverse proportion of the two distances between the projected fraction position and its two adjacent integer positions. In essence, the angular prediction in HEVC is a copying based process with the assumption that image content follows a pure direction of propagation. Besides, for DC mode, the prediction is the average of all the reference pixels. For planar mode, a bi-linear interpolation is used to create a prediction block. However, all these modes together are still too simple to fully characterize the complex non-linear relationship between the reference pixels and the target block.
There are many works to further improve intra prediction efficiency. Kamisli et al. [5], [6] models the correlation between adjacent pixels as a first order 2D Markov process, where each pixel is predicted by linearly weighing several adjacent pixels. Lai et al. [7] propose an error diffused intra prediction algorithm for HEVC. In addition, Chen et al. [8] incorporating ordered dither technique into intra prediction instead of error diffusion, to reduce computational complexity. Chen et al. [9] propose a copying-based improving intra prediction method. Lucas et al. [10] propose a intra prediction framework based on adaptive linear filters with sparsity constraints. Dias et al. [11] propose an improved combined intra prediction (CIP) method, which both use the reference pixels and the prediction pixels generated by the intra prediction modes. Li et al. [12] propose a piece-wise linear projection method based on canonical correlation analysis (CCA), to better exploit the local spatial correlations. However, these aforementioned works are single line-based scheme, where only the nearest neighboring reference line is used for predicting a block. This causes inaccurate prediction especially when weak spatial correlation exists between the target block and the reference line.
Actually, the utilization of further reference lines has been investigated in [13], where the non-adjacent lines as well as the adjacent lines, are used for generating a prediction. Chang et al. [14] propose an arbitrary reference tier coding (ARTC) schemes, which allow the intra prediction modes to exploit farther four reference lines. In [15], intra block copy (IBC) method is proposed, through search for similar reconstructed blocks, which is commonly used in screen content coding. Recently, Li et al. [16] propose an efficient multiple line-based intra prediction (MLIP) scheme to improve coding efficiency. At the same time, the residue compensation is introduced to calibrate the prediction of boundary regions in a block, when using farther reference lines. Experimental results show that MLIP method achieves 2.4% Bjontegaard-Delta bitrate (BD-rate) savings on average. However, MLIP is limited by the effectiveness of hand-designed features, in general, can only predict simple textures, which are not effective to predict complicated structures.
In recent years, deep learning methods have been successfully applied in various computer vision tasks, such as image denoising [17], [18], video frame interpolation [19], [20], and new views prediction [21]. Inspired by these successes, Cui et al. [22] applies convolutional neural network to intra prediction (IPCNN), and achieves a preliminary success. IPCNN takes a 16 × 16 block as input, which contains the best 8 × 8 intra prediction block and three nearest 8 × 8 reference blocks. Then, IPCNN outputs residue blocks, which are used to refine the current prediction block and its three reference blocks. We believe IPCNN is an intra prediction refinement method, which is much closer to the essence of the image denoising. Unlike [22], Li et al. [23], [24] propose a fully connected network for intra prediction, IPFCN, where the inputs are multiple reference lines of the current block and the output is the intra prediction for the block. In IPFCN, there is no need to generate the HEVC intra prediction to feed the network, and the encoder will use rate distortion optimization (RDO) to choose the best one from IPFCN and HEVC intra prediction.
In this paper, we propose a novel intra prediction method using convolutional encoder–decoder network for HEVC. The main contributions of this paper can be summarized as the following:
- (1)
We propose a novel intra prediction mode, called IPCED, using convolutional encoder-decoder network. And, a GAN-based framework is integrated into the training pipeline and is jointly optimized with the IPCED network.
- (2)
IPCED encoder network builds a novel multi-scale skip architecture that combines deep global information with shallow local information. IPCED decoder network employs multi-level branches to generate the prediction at different levels, and synthesizes the intra prediction result in a coarse-to-fine fashion.
- (3)
Experiments demonstrate that the proposed IPCED method generates higher-quality intra prediction results than existing state-of-the-art methods [7], [8], [9], [10], [11], [12], [16], [22], [24], in terms of both objective and subjective visual quality.
The rest of the paper is organized as follows: a brief review on related works is given in Section 2. Section 3 illustrates our proposed IPCED architecture, with details of the implementation. In Section 4, extensive experiments are conducted to evaluate IPCED method. Finally, we conclude this work with some future directions in Section 5.
Section snippets
Related work
Our work is related to a set of topics such as intra prediction and image inpainting based on very different motivations and arguments. In this section, we briefly review the challenges of video intra prediction task, and image inpainting method using generative adversarial networks (GANs) [25], [26].
Framework of IPCED
In this section, we first provide an overview of the IPCED architecture, and then provide the loss functions devised to optimize such network, and finally provide details on the training procedure.
We propose taking IPCED as a novel intra prediction mode added for HEVC 32 × 32, 16 × 16, 8 × 8 and 4 × 4 intra prediction units (PUs). Fig. 3 illustrates the overall framework of our proposed method for various intra prediction units in HEVC. Our framework consists of a convolutional encoder-decoder
Experimental results
In this section, experimental results are presented to validate the effectiveness of our IPCED approach for video intra prediction task, and comparing with the latest state-of-the-art approaches.
Conclusions
In this paper, we combine the intra prediction with texture synthesis technique, and propose a novel intra prediction method using convolutional encoder-decoder network (IPCED). We are innovating the structure of the convolutional encoder-decoder network, which can boost both subjective and objective prediction quality. Experimental results demonstrate that the proposed IPCED method can perform more complicated intra prediction, and generate higher-quality intra prediction results than existing
Declaration of interest
None.
Acknowledgment
The authors would like to thank the anonymous reviewers for their feedback and helpful suggestions.
Zhipeng Jin received the B.S. degree in electrical engineering from the University of Science and Technology of China (USTC), Hefei, China, in 2004, and the M.S. degrees in electrical engineering from Ningbo University, Ningbo, China, in 2007. He is currently pursuing the Ph.D. degree from the Shanghai University. His research interests include image/video coding, video codec optimization and deep learning.
References (45)
- et al.
Non-ce2: intra block copy and inter signaling unification
JCTVC-T0227
(2015) - I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, D.W.-F. B. Xu, e. a. S. Ozair, Generative adversarial networks, arXiv...
- et al.
Context encoders: Feature learning by inpainting
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2016) - et al.
Overview of the high efficiency video coding (HEVC) standard
IEEE Trans. Circuits Syst. Video Technol.
(2013) - et al.
Intra coding of the HEVC standard
IEEE Trans. Circuits Syst. Video Technol.
(2013) - et al.
Algorithm description of joint exploration test model 5
Proceedings of the JVET-E1001
(2017) - et al.
Video quality evaluation methodology and verification testing of hevc compression performance
IEEE Trans. Circuits Syst. Video Technol.
(2016) Intra prediction based on Markov process modeling of images
IEEE Trans. Image Process.
(2013)Block-based spatial prediction and transforms based on 2-d Markov processes for image and video compression
IEEE Trans. Image Process.
(2015)- et al.
Error diffused intra prediction for HEVC
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
(2015)
Enhanced hevc intra prediction with ordered dither technique
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Improving intra prediction in high-efficiency video coding
IEEE Trans. Image Process.
Image coding using generalized predictors based on sparsity and geometric transformations
IEEE Trans. Image Process.
Improved combined intra prediction for higher video compression efficiency
Proceedings of the Conference on Picture Coding Symposium (PCS)
Hierarchical piece-wise linear projections for efficient intra-prediction coding
Proceedings of the IEEE Visual Communications and Images Processing (VCIP)
Intra prediction with spatial gradients and multiple reference lines
Proceedings of the Conference on Picture Coding Symposium (PCS)
Improved intra prediction method based on arbitrary reference tier coding schemes
Proceedings of the Conference on Picture Coding Symposium (PCS)
Efficient multiple line-based intra prediction for HEVC
IEEE Trans. Circuits Syst. Video Technol.
Beyond a gaussian denoiser: residual learning of deep CNN for image denoising
IEEE Trans. Image Process.
Deep multi-scale convolutional neural network for dynamic scene deblurring
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Deep predictive coding networks for video prediction and unsupervised learning
Proceedings of the International Conference on Learning Representations (ICLR)
Video frame interpolation via adaptive separable convolution
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Cited by (5)
MSFNet: MultiStage Fusion Network for infrared and visible image fusion
2022, NeurocomputingCitation Excerpt :In addition, they employ hand-crafted fusion rules that are easily impacted by expert knowledge. Recently, deep learning (DL), especially convolutional neural networks [14,15], has shown great potential in a variety of fields, such as image processing [16–18], image captioning [19], and face recognition [20]. Needless to say, the DL-based methods [6–9,21–23] also represent the main emerging line in the field of IVIF because of the strong feature learning ability of DL models.
FPGA Implementation of the Convolution Coding Method For Industrial Automation
2023, AIP Conference ProceedingsResearch on Deep Compression Method of Expressway Video Based on Content Value
2022, Electronics (Switzerland)Advances in Video Compression System Using Deep Neural Network: A Review and Case Studies
2021, Proceedings of the IEEE
Zhipeng Jin received the B.S. degree in electrical engineering from the University of Science and Technology of China (USTC), Hefei, China, in 2004, and the M.S. degrees in electrical engineering from Ningbo University, Ningbo, China, in 2007. He is currently pursuing the Ph.D. degree from the Shanghai University. His research interests include image/video coding, video codec optimization and deep learning.
Ping An is a professor of the video processing group at School of Communication and Information Engineering, Shanghai University, China. She received the B.S. and M.S. degrees from Hefei University of Technology in 1900, 1993, and Ph.D. from Shanghai University in 2002. In 1993, she joined Shanghai University. Between 2011 and 2012, she joined the Communication Systems Group at Technische University Berlin, Germany, as a visiting professor. Her research interest is image and video processing, especially focuses on 3D video processing recent years. She has finished more than 10 projects supported by the National Natural Science Foundation of China, National Science and Technology Ministry, and Science & Technology Commission of Shanghai Municipality, etc. She awarded the Second Prize of Shanghai Municipal Science & Technology Progress Award in 2011 and the Second Prize in Natural Sciences of the Ministry of Education in 2016.
Liquan Shen received the B.S. degree in automation control from Henan Polytechnic University, Jiaozuo, China, and the M.S. and Ph.D. degrees in communication and information systems from Shanghai University, Shanghai, China, in 2001, 2005, and 2008, respectively. He was with the Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, USA, as a visiting professor from 2013 to 2014. He has been with the Faculty of the School of Communication and Information Engineering, Shanghai University, since 2008, where he is currently a Professor. He has authored and co-authored more than 80 refereed technical papers in international journals and conferences in the field of video coding and image processing. He holds 10 patents in the areas of image/video coding and communications. His research interests include High Efficiency Video Coding, perceptual coding, video codec optimization, 3DTV, and 3D image/video quality assessment.
- ☆
This work was supported in part by the National Natural Science Foundation of China under Grants 61571285 and 61801006, and Shanghai Science and Technology Commission under Grant 17DZ2292400 and 18XD1423900, and Zhejiang Provincial Natural Science Foundation of China under Grant No. LGF20F020003.