skip to main content
research-article

Neural-Network-Based Cross-Channel Intra Prediction

Published: 22 July 2021 Publication History

Abstract

To reduce the redundancy among different color channels, e.g., YUV, previous methods usually adopt a linear model that tends to be oversimple for complex image content. We propose a neural-network-based method for cross-channel prediction in intra frame coding. The proposed network utilizes twofold cues, i.e., the neighboring reconstructed samples with all channels, and the co-located reconstructed samples with partial channels. Specifically, for YUV video coding, the neighboring samples with YUV are processed by several fully connected layers; the co-located samples with Y are processed by convolutional layers; and the proposed network fuses the twofold cues. We observe that the integration of twofold information is crucial to the performance of intra prediction of the chroma components. We have designed the network architecture to achieve a good balance between compression performance and computational efficiency. Moreover, we propose a transform domain loss for the training of the network. The transform domain loss helps obtain more compact representations of residues in the transform domain, leading to higher compression efficiency. The proposed method is plugged into HEVC and VVC test models to evaluate its effectiveness. Experimental results show that our method provides more accurate cross-channel intra prediction compared with previous methods. On top of HEVC, our method achieves on average 1.3%, 5.4%, and 3.8% BD-rate reductions for Y, Cb, and Cr on common test sequences, and on average 3.8%, 11.3%, and 9.0% BD-rate reductions for Y, Cb, and Cr on ultra-high-definition test sequences. On top of VVC, our method achieves on average 0.5%, 1.7%, and 1.3% BD-rate reductions for Y, Cb, and Cr on common test sequences.

References

[1]
Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2016. End-to-end optimization of nonlinear transform codes for perceptual quality. In Picture Coding Symposium (PCS’16). IEEE, 1–5.
[2]
Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2016. End-to-end optimized image compression. (2016). arXiv:1611.01704 http://arxiv.org/abs/1611.01704.
[3]
Marco Bevilacqua, Aline Roumy, Christine Guillemot, and Marie Line Alberi-Morel. 2012. Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In British Machine Vision Conference (BMVC’12). BMVA Press, 1–10.
[4]
Gisle Bjontegaard. 2001. Calculation of Average PSNR Differences between RD-Curves. Technical Report VCEG-M33. VCEG.
[5]
Frank Bossen. 2011. Common Test Conditions and Software Reference Configurations. Technical Report JCTVC-F900. JCT-VC.
[6]
Frank Bossen, Jill Boyce, X. Li, V. Seregin, and K. Sühring. 2018. JVET Common Test Conditions and Software Reference Configurations for SDR Video. Technical Report JVET-L1010. JVET.
[7]
Grigore C. Burdea and Philippe Coiffet. 2003. Virtual Reality Technology. John Wiley & Sons.
[8]
Guillaume Charpiat, Matthias Hofmann, and Bernhard Schölkopf. 2008. Automatic image colorization via multimodal predictions. In European Conference on Computer Vision (ECCV’08). Springer, 126–139.
[9]
Zezhou Cheng, Qingxiong Yang, and Bin Sheng. 2015. Deep colorization. In International Conference on Computer Vision (ICCV’15). 415–423.
[10]
Alex Yong-Sang Chia, Shaojie Zhuo, Raj Kumar Gupta, Yu-Wing Tai, Siu-Yeung Cho, Ping Tan, and Stephen Lin. 2011. Semantic colorization with internet images. ACM Transactions on Graphics 30, 6 (2011), 156.
[11]
Yuanying Dai, Dong Liu, and Feng Wu. 2017. A convolutional neural network approach for post-processing in HEVC intra coding. In Multimedia Modeling Conference (MMM’17). Springer, 28–39.
[12]
Aditya Deshpande, Jason Rock, and David Forsyth. 2015. Learning large-scale automatic image colorization. In International Conference on Computer Vision (ICCV’15). 567–575.
[13]
Chao Dong, Change Loy Chen, Kaiming He, and Xiaoou Tang. 2014. Learning a deep convolutional network for image super-resolution. In European Conference on Computer Vision (ECCV’14). Springer, 184–199.
[14]
Christophe Gisquet and Edouard François. 2013. Model correction for cross-channel chroma prediction. In Data Compression Conference (DCC’13). IEEE, 23–32.
[15]
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics. 249–256.
[16]
Marc Górriz, Saverio Blasi, Alan F. Smeaton, Noel E. O’Connor, and Marta Mrak. 2020. Chroma intra prediction with attention-based CNN architectures. (2020). arXiv:2006.15349 http://arxiv.org/abs/2006.15349.
[17]
Raj Kumar Gupta, Alex Yong-Sang Chia, Deepu Rajan, Ee Sin Ng, and Huang Zhiyong. 2012. Image colorization using similar images. In ACM Multimedia. ACM, 369–378.
[18]
Philipp Helle, Jonathan Pfaff, Michael Schäfer, Roman Rischke, Heiko Schwarz, Detlev Marpe, and Thomas Wiegand. 2019. Intra picture prediction for video coding with neural networks. In Data Compression Conference (DCC’19). IEEE, 448–457.
[19]
Yueyu Hu, Wenhan Yang, Mading Li, and Jiaying Liu. 2019. Progressive spatial recurrent neural network for intra prediction. IEEE Transactions on Multimedia 21, 12 (2019), 3024–3037.
[20]
Jia-Bin Huang, Abhishek Singh, and Narendra Ahuja. 2015. Single image super-resolution from transformed self-exemplars. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 5197–5206.
[21]
Yi-Chin Huang, Yi-Shin Tung, Jun-Cheng Chen, Sung-Wen Wang, and Ja-Ling Wu. 2005. An adaptive edge detection based colorization algorithm and its applications. In ACM Multimedia. ACM, 351–354.
[22]
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2016. Let there be color!: Joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics 35, 4 (2016), 110.
[23]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In ACM Multimedia. ACM, 675–678.
[24]
Jungsun Kim, S. Park, Younghee Choi, Y. Jeon, and B. Jeon. 2010. New Intra Chroma Prediction Using Inter-channel Correlation. Technical Report JCTVC-B021. JCT-VC.
[25]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS’12). 1097–1105.
[26]
Jani Lainema, Frank Bossen, Woo-Jin Han, Junghye Min, and Kemal Ugur. 2012. Intra coding of the HEVC standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1792–1801.
[27]
Edmund Y. Lam and Joseph W. Goodman. 2000. A mathematical analysis of the DCT coefficient distributions for images. IEEE Transactions on Image Processing 9, 10 (2000), 1661–1666.
[28]
Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. 2016. Learning representations for automatic colorization. In European Conference on Computer Vision (ECCV’16). Springer, 577–593.
[29]
Anat Levin, Dani Lischinski, and Yair Weiss. 2004. Colorization using optimization. ACM Transactions on Graphics 23, 3 (2004), 689–694.
[30]
Jiahao Li, Bin Li, Jizheng Xu, Ruiqin Xiong, and Wen Gao. 2018. Fully connected network-based intra prediction for image coding. IEEE Transactions on Image Processing 27, 7 (2018), 3236–3247.
[31]
Yue Li, Li Li, Zhu Li, Jianchao Yang, Ning Xu, Dong Liu, and Houqiang Li. 2018. A hybrid neural network for chroma intra prediction. In International Conference on Image Processing (ICIP’18). 1797–1801.
[32]
Yue Li, Dong Liu, Houqiang Li, Li Li, Feng Wu, Hong Zhang, and Haitao Yang. 2018. Convolutional neural network-based block up-sampling for intra frame coding. IEEE Transactions on Circuits and Systems for Video Technology 28, 9 (2018), 2316–2330.
[33]
Dong Liu, Yue Li, Jianping Lin, Houqiang Li, and Feng Wu. 2020. Deep learning-based video coding: A review and a case study. Computing Surveys 53, 1 (2020), 1–35.
[34]
Zhenyu Liu, Xianyu Yu, Yuan Gao, Shaolin Chen, Xiangyang Ji, and Dongsheng Wang. 2016. CU partition mode decision for HEVC hardwired intra encoder using convolution neural network. IEEE Transactions on Image Processing 25, 11 (2016), 5088–5103.
[35]
Maria Meyer, Jonathan Wiesner, Jens Schneider, and Christian Rohlfing. 2019. Convolutional neural networks for video intra prediction using cross-component adaptation. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP’19). IEEE, 1607–1611.
[36]
J. Pfaff, P. Helle, D. Maniry, S. Kaltenstadler, W. Samek, H. Schwarz, D. Marpe, and T. Wiegand. 2018. Neural network based intra prediction for video coding. In Applications of Digital Image Processing XLI, Vol. 10752. International Society for Optics and Photonics.
[37]
Jonathan Pfaff, Heiko Schwarz, Detlev Marpe, et al. 2020. Video compression using generalized binary partitioning, trellis coded quantization, perceptually optimized encoding, and advanced prediction and transform coding. IEEE Transactions on Circuits and Systems for Video Technology 30, 5 (2020), 1281–1295.
[38]
A. Segall, V. Baroncini, J. Boyce, J. Chen, and T. Suzuki. 2017. Joint Call for Proposals on Video Compression with Capability Beyond HEVC. Technical Report JVET-H1002. JVET.
[39]
Rui Song, Dong Liu, Houqiang Li, and Feng Wu. 2017. Neural network-based arithmetic coding of intra prediction modes in HEVC. In International Conference on Visual Communications and Image Processing (VCIP’17). IEEE, 1–4.
[40]
Yafei Song, Jia Li, Xiaogang Wang, and Xiaowu Chen. 2017. Single image dehazing using ranking convolutional neural network. IEEE Transactions on Multimedia 20, 6 (2017), 1548–1560.
[41]
Gary J. Sullivan, Jens Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668.
[42]
Youbao Tang and Xiangqian Wu. 2019. Salient object detection using cascaded convolutional neural networks and adversarial learning. IEEE Transactions on Multimedia 21, 9 (2019), 2237–2247.
[43]
Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. 2017. Lossy image compression with compressive autoencoders. (2017). arXiv:1703.00395 http://arxiv.org/abs/1703.00395.
[44]
Radu Timofte, Eirikur Agustsson, Luc Van Gool, et al. 2017. NTIRE 2017 challenge on single image super-resolution: Methods and results. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops’17). IEEE, 1110–1121.
[45]
George Toderici, Sean M. O’Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar. 2015. Variable rate image compression with recurrent neural networks. (2015). arXiv:1511.06085 http://arxiv.org/abs/1511.06085.
[46]
George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, and Michele Covell. 2017. Full resolution image compression with recurrent neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5435–5443.
[47]
Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560–576.
[48]
Junyuan Xie, Linli Xu, and Enhong Chen. 2012. Image denoising and inpainting with deep neural networks. In Advances in Neural Information Processing Systems 25 (NIPS’12). 341–349.
[49]
Ning Yan, Dong Liu, Houqiang Li, Bin Li, Li Li, and Feng Wu. 2019. Convolutional neural network-based fractional-pixel motion compensation. IEEE Transactions on Circuits and Systems for Video Technology 29, 3 (2019), 840–853.
[50]
Chih-Yuan Yang and Ming-Hsuan Yang. 2013. Fast direct super-resolution by simple functions. In International Conference on Computer Vision (ICCV’13). 561–568.
[51]
Chia-Hung Yeh, Tsung-Yi Tseng, Cheng-Wei Lee, and Chih-Yang Lin. 2015. Predictive texture synthesis-based intra coding scheme for advanced video coding. IEEE Transactions on Multimedia 17, 9 (2015), 1508–1514.
[52]
Roman Zeyde, Michael Elad, and Matan Protter. 2010. On single image scale-up using sparse-representations. In International Conference on Curves and Surfaces. Springer, 711–730.
[53]
Kai Zhang, Jianle Chen, Li Zhang, Xiang Li, and Marta Karczewicz. 2018. Enhanced cross-component linear model for chroma intra-prediction in video coding. IEEE Transactions on Image Processing 27, 8 (2018), 3983–3997.
[54]
Richard Zhang, Phillip Isola, and Alexei A. Efros. 2016. Colorful image colorization. In European Conference on Computer Vision (ECCV’16). Springer, 649–666.
[55]
Tao Zhang, Haoming Chen, Ming-Ting Sun, Debin Zhao, and Wen Gao. 2017. Signal dependent transform based on SVD for HEVC intracoding. IEEE Transactions on Multimedia 19, 11 (2017), 2404–2414.
[56]
Tao Zhang, Xiaopeng Fan, Debin Zhao, and Wen Gao. 2016. Improving chroma intra prediction for HEVC. In International Conference on Multimedia and Expo Workshops (ICME Workshops’16). IEEE, 1–6.
[57]
Tao Zhang, Xiaopeng Fan, Debin Zhao, Ruiqin Xiong, and Wen Gao. 2017. Hybrid intraprediction based on local and nonlocal correlations. IEEE Transactions on Multimedia 20, 7 (2017), 1622–1635.
[58]
Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, Jingwei Yan, and Keyu Yan. 2016. A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Transactions on Multimedia 18, 12 (2016), 2528–2536.
[59]
Xingyu Zhang, Christophe Gisquet, Edouard Francois, Feng Zou, and Oscar C. Au. 2014. Chroma intra prediction based on inter-channel correlation for HEVC. IEEE Transactions on Image Processing 23, 1 (2014), 274–286.
[60]
Hang Zhao, Orazio Gallo, Iuri Frosio, and Jan Kautz. 2017. Loss functions for image restoration with neural networks. IEEE Transactions on Computational Imaging 3, 1 (2017), 47–57.
[61]
Linwei Zhu, Sam Kwong, Yun Zhang, Shiqi Wang, and Xu Wang. 2019. Generative adversarial network based intra prediction for video coding. IEEE Transactions on Multimedia 22, 1 (2019), 45–58.

Cited By

View all
  • (2024)Boosting Semi-Supervised Learning with Dual-Threshold Screening and Similarity LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672563Online publication date: 12-Jun-2024
  • (2024)Towards Hybrid-Optimization Video CodingACM Computing Surveys10.1145/365214856:9(1-36)Online publication date: 24-Apr-2024
  • (2024)MF2ShrT: Multimodal Feature Fusion Using Shared Layered Transformer for Face Anti-spoofingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364081720:6(1-21)Online publication date: 8-Mar-2024
  • Show More Cited By

Index Terms

  1. Neural-Network-Based Cross-Channel Intra Prediction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 3
    August 2021
    443 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3476118
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 July 2021
    Accepted: 01 November 2020
    Revised: 01 September 2020
    Received: 01 May 2020
    Published in TOMM Volume 17, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Convolutional neural network
    2. cross-channel prediction
    3. fully connected network
    4. high-efficiency video coding (HEVC)
    5. transform domain loss
    6. versatile video coding (VVC)

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)32
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Boosting Semi-Supervised Learning with Dual-Threshold Screening and Similarity LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672563Online publication date: 12-Jun-2024
    • (2024)Towards Hybrid-Optimization Video CodingACM Computing Surveys10.1145/365214856:9(1-36)Online publication date: 24-Apr-2024
    • (2024)MF2ShrT: Multimodal Feature Fusion Using Shared Layered Transformer for Face Anti-spoofingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364081720:6(1-21)Online publication date: 8-Mar-2024
    • (2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
    • (2024)Head3D: Complete 3D Head Generation via Tri-plane Feature DistillationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363571720:6(1-20)Online publication date: 8-Mar-2024
    • (2024)LiteWiSys: A Lightweight System for WiFi-based Dual-task Action PerceptionACM Transactions on Sensor Networks10.1145/363217720:4(1-19)Online publication date: 11-May-2024
    • (2024)In-Loop Filtering via Trained Look-Up Tables2024 IEEE International Conference on Visual Communications and Image Processing (VCIP)10.1109/VCIP63160.2024.10849824(1-5)Online publication date: 8-Dec-2024
    • (2024)Gait Attribute Recognition: A New Benchmark for Learning Richer Attributes From Human Gait PatternsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.331893419(1-14)Online publication date: 1-Jan-2024
    • (2024)Chroma Intra Prediction With Lightweight Attention-Based Neural NetworksIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328298034:1(549-560)Online publication date: Jan-2024
    • (2024)Neural network-based cross-channel chroma prediction for versatile video codingThe Journal of Supercomputing10.1007/s11227-023-05868-y80:9(12166-12185)Online publication date: 8-Feb-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media