Abstract
With the rapid development of the Internet, video compression and reconstruction have attracted more and more attention as the use and transmission frequency of video data have increased dramatically. Traditional methods rely on hand-crafted modules for inter-frame and intra-frame coding, but they often fail to fully exploit the redundant information of video frames. To address this problem, this paper proposes a deep learning video compression method which combines conditional context information and residual information to fully compress intra-frame and inter-frame redundancy. Specifically, the proposed algorithm uses conditional coding to provide rich context information for residual methods. At the same time, residual coding supports conditional coding in dealing with redundant information. By fusing the video frames generated by the two methods, information complementarity is achieved. Experimental results from two benchmark datasets show that our method can effectively remove redundancy between video frames and reconstruct video frames with low distortion to achieve better than state-of-the-art (SOTA) performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. arXiv preprint arXiv:1611.01704 (2016)
Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018)
Guan, Q.L., Zheng, Y., Meng, L., Dong, L.Q., Hao, Q.: Improving the generalization of visual classification models across IoT cameras via cross-modal inference and fusion. IEEE Internet Things J. 10, 15835–15846 (2023)
He, D., Zheng, Y., Sun, B., Wang, Y., Qin, H.: Checkerboard context model for efficient learned image compression. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14766–14775 (2021). https://doi.org/10.1109/CVPR46437.2021.01453
Hu, Z., Chen, Z., Xu, D., Lu, G., Ouyang, W., Gu, S.: Improving deep video compression by resolution-adaptive flow coding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 193–209. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_12
Hu, Z., Lu, G., Xu, D.: FVC: a new framework towards deep video compression in feature space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1502–1511 (2021)
Johnston, N., et al.: Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4385–4393 (2018)
Lee, S.J., Lee, S., Cho, S.I., Kang, S.J.: Object detection-based video retargeting with spatial-temporal consistency. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4434–4439 (2020)
Li, C., Liu, X., Zhang, X., Qin, B.: Design of UAV single object tracking algorithm based on feature fusion. In: 2021 40th Chinese Control Conference (CCC), pp. 3088–3092. IEEE (2021)
Li, J., Li, B., Lu, Y.: Deep contextual video compression. Adv. Neural. Inf. Process. Syst. 34, 18114–18125 (2021)
Li, X., Wu, L., Chen, X., Meng, L., Meng, X.: DSE-Net: artistic font image synthesis via disentangled style encoding. In: 2022 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2022)
Li, X., Wu, L., Wang, C., Meng, L., Meng, X.: Compositional zero-shot artistic font synthesis. In: Proceedings of IJCAI (2023)
Li, X., Ma, H., Meng, L., Meng, X.: Comparative study of adversarial training methods for long-tailed classification. In: Proceedings of the 1st International Workshop on Adversarial Learning for Multimedia,
Li, X., Zheng, Y., Ma, H., Qi, Z., Meng, X., Meng, L.: Cross-modal learning using privileged information for long-tailed image classification. In: Proceedings of CVM (2023)
Lin, J., Liu, D., Li, H., Wu, F.: M-LVC: multiple frames prediction for learned video compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3546–3554 (2020)
Liu, J., et al.: Prompt learning with cross-modal feature alignment for visual domain adaptation. In: Proceedings of CAAI (2022)
Liu, T., Qi, Z., Chen, Z., Meng, X., Meng, L.: Cross-training with prototypical distillation for improving the generalization of federated learning. In: Proceedings of ICME (2023)
Lu, G., Ouyang, W., Xu, D., Zhang, X., Cai, C., Gao, Z.: DVC: an end-to-end deep video compression framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11006–11015 (2019)
Ma, H., Li, X., Meng, L., Meng, X.: Comparative study of adversarial training methods for cold-start recommendation. In: Proceedings of ADVM (2021)
Ma, H., Qi, Z., Dong, X., Li, X., Zheng, Y., Meng, X.M.L.: Cross-modal content inference and feature enrichment for cold-start recommendation. In: Proceedings of IJCNN (2023)
Ma, H., et al.: Exploring false hard negative sample in cross-domain recommendation. In: Proceedings of RecSys (2023)
Ma, H., et al.: Triple sequence learning for cross-domain recommendation. arXiv preprint arXiv:2304.05027 (2023)
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics, pp. 1273–1282. PMLR (2017)
Meng, L., Feng, F., He, X., Gao, X., Chua, T.S.: Heterogeneous fusion of semantic and collaborative information for visually-aware food recommendation. In: Proceedings of MM (2020)
Mercat, A., Viitanen, M., Vanne, J.: UVG dataset: 50/120fps 4K sequences for video codec analysis and development. In: Proceedings of the 11th ACM Multimedia Systems Conference, pp. 297–302 (2020)
Minnen, D., Ballé, J., Toderici, G.D.: Joint autoregressive and hierarchical priors for learned image compression. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Qi, Z., Wang, Y., Chen, Z., Wang, R., Meng, X., Meng, L.: Clustering-based curriculum construction for sample-balanced federated learning. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds.) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science, vol. 13606, pp. 155–166. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20503-3_13
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)
Sun, W., Li, X., Li, M., Wang, Y., Zheng, Y., Meng, X., Meng, L.: Sequential fusion of multi-view video frames for 3D scene generation. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds.) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science, vol. 13604, pp. 597–608. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20497-5_49
Taubman, D., Marcellin, M.: JPEG 2000: standard for interactive imaging. Proc. IEEE 90(8), 1336–1357 (2002). https://doi.org/10.1109/JPROC.2002.800725
Toderici, G., et al.: Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015)
Wang, Y., Li, X., Ma, H., Qi, Z., Meng, X., Meng, L.: Causal inference with sample balancing for out-of-distribution detection in visual classification. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds.) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science, vol. 13604, pp. 572–583. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20497-5_47
Wang, Y., Li, X., Qi, Z., Li, J., Li, X., Meng, X., Meng, L.: Meta-causal feature learning for out-of-distribution generalization. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol. 13806, pp. 530–545. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25075-0_36
Wang, Y., Qi, Z., Li, X., Liu, J., Meng, X., Meng, L.: Multi-channel attentive weighting of visual frames for multimodal video classification. In: Proceedings of IJCNN (2023)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1398–1402. IEEE (2003)
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the h. 264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13(7), 560–576 (2003)
Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W.T.: Video enhancement with task-oriented flow. Int. J. Comput. Vision 127, 1106–1125 (2019)
Yang, R., Mentzer, F., Gool, L.V., Timofte, R.: Learning for video compression with hierarchical quality and recurrent enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6628–6637 (2020)
Yao, L., Chu, Z., Li, S., Li, Y., Gao, J., Zhang, A.: A survey on causal inference. ACM Trans. Knowl. Disc. Data (TKDD) 15(5), 1–46 (2021)
Yao, R., Lin, G., Xia, S., Zhao, J., Zhou, Y.: Video object segmentation and tracking: a survey. ACM Trans. Intell. Syst. Technol. (TIST) 11(4), 1–47 (2020)
Acknowledgment
This work is supported by the Oversea Innovation Team Project of the “20 Regulations for New Universities” funding program of Jinan (Grant no. 2021GXRC073), the Excellent Youth Scholars Program of Shandong Province (Grant no. 2022HWYQ-048), the TaiShan Scholars Program (Grant no. tsqn202211289).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, R., Qi, Z., Meng, X., Meng, L. (2023). Learning to Fuse Residual and Conditional Information for Video Compression and Reconstruction. In: Lu, H., et al. Image and Graphics . ICIG 2023. Lecture Notes in Computer Science, vol 14358. Springer, Cham. https://doi.org/10.1007/978-3-031-46314-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-46314-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46313-6
Online ISBN: 978-3-031-46314-3
eBook Packages: Computer ScienceComputer Science (R0)