Journals & Magazines >IEEE Transactions on Circuits... >Volume: 30 Issue: 7

Convolutional Neural Network Based Bi-Prediction Utilizing Spatial and Temporal Information in Video Coding

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

With the growing popularity of high-resolution videos, the demand for higher coding efficiency is increasing to cope with multimedia transmission challenges on the commun...Show More

Metadata

Abstract:

With the growing popularity of high-resolution videos, the demand for higher coding efficiency is increasing to cope with multimedia transmission challenges on the communication network. Since conventional linearly weighted bi-prediction does not handle inhomogeneous motion activities inside one block well, in recent works, Convolutional Neural Network (CNN) is explored to tackle inhomogeneous motion by utilizing patch-level information to predict each individual pixel. However, as only two reference blocks are used as input information, those works ignore the variation of pixel values between reference blocks and current block, and ignore the differences between extrapolation and interpolation. This work utilizes both spatial neighboring pixels and temporal display orders as extra inputs for CNN models to further improve the prediction accuracy of a bi-predictor. The extra input information has the following advantages. First, variations among spatial neighboring pixels of both reference blocks and the current block reflect variations between current block and reference blocks. Together with temporal distance, spatial neighboring pixels are able to address extrapolation and interpolation uniformly. Second, spatial neighboring pixels of the current block have a high correlation with current predicted signals, which helps to reduce prediction residuals around the block boundary and alleviate block artifacts. Last, temporal distances help to improve the accuracy of prediction signals based on its ability of reflecting the correlation of video frames. Experimental results show that our proposed network achieves 2.92% and 5.09% bit-rate savings on average compared with HEVC, under Low-Delay B (LDB) and Random-Access (RA) configurations, respectively. As temporal information is used in our network, the LDB and RA configurations share the same networks in this work.

Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 30, Issue: 7, July 2020)

Page(s): 1856 - 1870

Date of Publication: 21 November 2019

ISSN Information:

DOI: 10.1109/TCSVT.2019.2954853

Funding Agency:

Contents

References is not available for this document.

Convolutional Neural Network Based Bi-Prediction Utilizing Spatial and Temporal Information in Video Coding

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Convolutional Neural Network Based Bi-Prediction Utilizing Spatial and Temporal Information in Video Coding

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?