Abstract:
With the growing popularity of high-resolution videos, the demand for higher coding efficiency is increasing to cope with multimedia transmission challenges on the commun...Show MoreMetadata
Abstract:
With the growing popularity of high-resolution videos, the demand for higher coding efficiency is increasing to cope with multimedia transmission challenges on the communication network. Since conventional linearly weighted bi-prediction does not handle inhomogeneous motion activities inside one block well, in recent works, Convolutional Neural Network (CNN) is explored to tackle inhomogeneous motion by utilizing patch-level information to predict each individual pixel. However, as only two reference blocks are used as input information, those works ignore the variation of pixel values between reference blocks and current block, and ignore the differences between extrapolation and interpolation. This work utilizes both spatial neighboring pixels and temporal display orders as extra inputs for CNN models to further improve the prediction accuracy of a bi-predictor. The extra input information has the following advantages. First, variations among spatial neighboring pixels of both reference blocks and the current block reflect variations between current block and reference blocks. Together with temporal distance, spatial neighboring pixels are able to address extrapolation and interpolation uniformly. Second, spatial neighboring pixels of the current block have a high correlation with current predicted signals, which helps to reduce prediction residuals around the block boundary and alleviate block artifacts. Last, temporal distances help to improve the accuracy of prediction signals based on its ability of reflecting the correlation of video frames. Experimental results show that our proposed network achieves 2.92% and 5.09% bit-rate savings on average compared with HEVC, under Low-Delay B (LDB) and Random-Access (RA) configurations, respectively. As temporal information is used in our network, the LDB and RA configurations share the same networks in this work.
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 30, Issue: 7, July 2020)