research-article

3D Tensor Auto-encoder with Application to Video Compression

Authors:

Shengyong ChenAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 2

Article No.: 48, Pages 1 - 18

https://doi.org/10.1145/3431768

Published: 11 May 2021 Publication History

Abstract

Auto-encoder has been widely used to compress high-dimensional data such as the images and videos. However, the traditional auto-encoder network needs to store a large number of parameters. Namely, when the input data is of dimension n, the number of parameters in an auto-encoder is in general O(n). In this article, we introduce a network structure called 3D Tensor Auto-Encoder (3DTAE). Unlike the traditional auto-encoder, in which a video is represented as a vector, our 3DTAE considers videos as 3D tensors to directly pass tensor objects through the network. The weights of each layer are represented by three small matrices, and thus the number of parameters in 3DTAE is just O(n1/3). The compact nature of 3DTAE fits well the needs of video compression. Given an ensemble of high-dimensional videos, we represent them as 3DTAE networks plus some small core tensors, and we further quantize the network parameters and the core tensors to get the final compressed data. Experimental results verify the efficiency of 3DTAE.

References

[1]

ISO/IEC CD 23090-3 Versatile Video Coding, document N10692, Joint Video Experts Team (JVET) of ITU-T SG 16 WP3 and ISO/IEC JTC 1/SC 29/WG 11. Retrieved from https://www.hhi.fraunhofer.de/.

[2]

Victor Agababov, Michael Buettner, Victor Chudnovsky, Mark Cogan, Ben Greenstein, Shane McDaniel, Michael Piatek, Colin Scott, Matt Welsh, and Bolian Yin. 2015. Flywheel: Google’s data compression proxy for the mobile web. In USENIX Symposium on Networked Systems Design and Implementation. 367--380.

Digital Library

[3]

Sekine Asadi Amiri and Hamid Hassanpour. 2018. Image compression using JPEG with reduced blocking effects via adaptive down-sampling and self-learning image sparse representation. Multimedia Tools Applic. 77, 7 (2018), 8677--8693.

Digital Library

[4]

Brett W. Bader, Tamara G. Kolda et al. 2015. MATLAB Tensor Toolbox Version 2.6. Retrieved from http://www.sandia.gov/~tgkolda/TensorToolbox/index-2.6.html.

[5]

Mohammad Haris Baig, Vladlen Koltun, and Lorenzo Torresani. 2017. Learning to inpaint for image compression. In Advances in Neural Information Processing Systems (NIPS’17). 1246--1255.

Digital Library

[6]

Fabrice Bellard. 2015. The BPG image format. Retrieved from http://bellard.org/bpg/.

[7]

Johann A. Bengua, Phien Ho, Hoang Duong Tuan, and Minh N. Do. 2016. Matrix product state for higher-order tensor compression and classification. IEEE Trans. Sig. Proc. PP, 99 (2016), 1--1.

[8]

G. Bjontegaard. 2001. Calculation of average PSNR differences between RD-curves. BJONTEGAARD G. Doc. VCEG-M33 ITU-T Q6/16, Austin, TX, USA, 2-4 April 2001.

[9]

Tong Chen, Haojie Liu, Qiu Shen, Tao Yue, Xun Cao, and Zhan Ma. 2017. DeepCoder: A deep neural network based video compression. In Visual Communications and Image Processing (VCIP’17). IEEE, 1--4.

[10]

Zhibo Chen, Tianyu He, Xin Jin, and Feng Wu. 2020. Learning for video compression. IEEE Trans. Circ. Syst. Vid. Technol. 30, 2 (2020), 566--576.

Digital Library

[11]

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2018. Deep convolutional autoencoder-based lossy image compression. arXiv preprint arXiv:1804.09535 (2018).

[12]

Charilaos Christopoulos, Athanassios Skodras et al. 2000. The JPEG2000 still image coding system: An overview. IEEE Trans. Consum. Electron. 46, 4 (2000), 1103--1127.

Digital Library

[13]

VN Index. 2013. Cisco Visual Networking Index: Forecast and Methodology, 2015--2020. Retrieved from http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360_ns827_Networking_Solutions_White_Paper.html.

[14]

Wenrui Dai, Yangmei Shen, Xin Tang, Junni Zou, Hongkai Xiong, and Chang Wen Chen. 2016. Sparse representation with spatio-temporal online dictionary learning for promising video coding. IEEE Trans. Image Proc. 25, 10 (2016), 4580--4595.

Digital Library

[15]

Chris Ding, Heng Huang, and Dijun Luo. 2008. Tensor reduction error analysis--Applications to video compression and classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08). IEEE, 1--8.

[16]

Bo Du, Mengfei Zhang, Lefei Zhang, and Xuelong Li. 2014. Hyperspectral biological images compression based on multiway tensor projection. In IEEE International Conference on Multimedia and Expo (ICME’14). IEEE, 1--6.

[17]

Frédéric Dufaux, Gary J. Sullivan, and Touradj Ebrahimi. 2009. The JPEG XR image coding standard [Standards in a Nutshell]. IEEE Sig. Proc. Mag. 26, 6 (2009).

[18]

Thierry Dumas, Aline Roumy, and Christine Guillemot. 2017. Image compression with stochastic winner-take-all auto-encoder. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 1512--1516.

[19]

Leyuan Fang, Nanjun He, and Hui Lin. 2017. CP tensor-based compression of hyperspectral images. J. Optic. Soc. Amer. A A 34, 2 (2017), 252--258.

[20]

Xiph.org Foundation. 2010. Xiph.org Video Test Media. Retrieved from https://media.xiph.org/video/derf/.

[21]

Shmuel Friedland, Qun Li, and Dan Schonfeld. 2014. Compressive sensing of sparse tensors.IEEE Trans. Image Proc. 23, 10 (2014), 4438--4447.

[22]

Jun Han, Salvator Lombardo, Christopher Schroers, and Stephan Mandt. 2018. Deep probabilistic video compression. arXiv preprint arXiv:1810.02845 (2018).

[23]

Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.

[24]

Maziar Irannejad and Homayoun Mahdavi-Nasab. 2018. Block matching video compression based on sparse representation and dictionary learning. Circ. Syst. Sig. Proc. 37, 8 (2018), 3537--3557.

Digital Library

[25]

J. Jiang. 1999. Image compression with neural networks ÃÂ¨C A survey. Sig. Proc. Image Commun. 14, 9 (1999), 737--760.

[26]

Fatih Kamisli. 2013. Intra prediction based on Markov process modeling of images. IEEE Trans. Image Proc. 22, 10 (2013), 3916--3925.

Digital Library

[27]

Sungsoo Kim, Jin Soo Park, Christos G. Bampis, Jaeseong Lee, Mia K. Markey, Alexandros G. Dimakis, and Alan C. Bovik. 2018. Adversarial video compression guided by soft edge detection. arXiv preprint arXiv:1811.10673 (2018).

[28]

Diederik P. Kingma and Jimmy Ba. 2014. ADAM: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[29]

Alex Krizhevsky and Geoffrey E. Hinton. 2012. Using very deep autoencoders for content-based image retrieval. In European Symposium on Artificial Neural Networks (ESANN’11).

[30]

Yue Li, Dong Liu, Houqiang Li, Li Li, Feng Wu, Hong Zhang, and Haitao Yang. 2018. Convolutional neural network-based block up-sampling for intra frame coding. IEEE Trans. Circ. Syst. Vid. Technol. 28, 9 (2018), 2316--2330.

[31]

Jiaying Liu, Sifeng Xia, Wenhan Yang, Mading Li, and Dong Liu. 2019. One-for-all: Grouped variation network-based fractional interpolation in video coding. IEEE Trans. Image Proc. 28, 5 (2019), 2140--2151.

Digital Library

[32]

Zhenyu Liu, Xianyu Yu, Yuan Gao, Shaolin Chen, Xiangyang Ji, and Dongsheng Wang. 2016. CU partition mode decision for HEVC hardwired intra encoder using convolution neural network. IEEE Trans. Image Proc. 25, 11 (2016), 5088--5103.

Digital Library

[33]

Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. 2019. DVC: An end-to-end deep video compression framework. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 11006--11015.

[34]

Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2018. Conditional probability models for deep image compression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 3--11.

[35]

Debargha Mukherjee, Jingning Han, Jim Bankoski, Ronald Bultje, Adrian Grange, John Koleszar, Paul Wilkins, and Yaowu Xu. 2015. A technical overview of VP9—The latest open-source video codec. SMPTE Motion Imag. J. 124, 1 (2015), 44--54.

[36]

Jörn Ostermann, Jan Bormans, Peter List, Detlev Marpe, Matthias Narroschke, Fernando Pereira, Thomas Stockhammer, and Thomas Wedi. 2004. Video coding with H. 264/AVC: Tools, performance, and complexity. IEEE Circ. Syst. Mag. 4, 1 (2004), 7--28.

[37]

Oren Rippel, Sanjay Nair, Carissa Lew, Steve Branson, Alexander G. Anderson, and Lubomir Bourdev. 2018. Learned video compression. arXiv preprint arXiv:1811.06981 (2018).

[38]

David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533.

[39]

Yun Song, Gaobo Yang, Hongtao Xie, Dengyong Zhang, and Sun Xingming. 2017. Residual domain dictionary learning for compressed sensing video recovery. Multimedia Tools Applic. 76, 7 (2017), 10083--10096.

Digital Library

[40]

Gary J. Sullivan, Jens-Rainer Ohm, Woo-Jin Han, Thomas Wiegand et al. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Vid. Technol. 22, 12 (2012), 1649--1668.

Digital Library

[41]

Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. 2017. Lossy image compression with compressive autoencoders. In International Conference on Learning Representations (ICLR’17). 1--19.

[42]

George Toderici, Sean M. O’Malley, Sung Jin Hwang, Damien Vincent, David Minnen, Shumeet Baluja, Michele Covell, and Rahul Sukthankar. 2015. Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015).

[43]

George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, David Minnen, Joel Shor, and Michele Covell. 2017. Full resolution image compression with recurrent neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5435--5443.

[44]

Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves et al. 2016. Conditional image generation with PixelCNN decoders. In Advances in Neural Information Processing Systems. 4790--4798.

Digital Library

[45]

Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba. 2016. Generating videos with scene dynamics. In Advances in Neural Information Processing Systems. 613--621.

Digital Library

[46]

Gregory K. Wallace. 1992. The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38, 1 (1992), xviii--xxxiv.

Digital Library

[47]

Qingzhu Wang, Mengying Wei, Xiaoming Chen, and Zhuang Miao. 2018. Joint encryption and compression of 3D images based on tensor compressive sensing with non-autonomous 3D chaotic system. Multimedia Tools Applic. 77, 2 (2018), 1715--1734.

Digital Library

[48]

Tingting Wang, Mingjin Chen, and Hongyang Chao. 2017. A novel deep learning-based method of improving coding efficiency from the decoder-end for HEVC. In Data Compression Conference (DCC). IEEE, 410--419.

[49]

Chao Yuan Wu, Nayan Singhal, and Philipp KrÃÂ¤henbÃÂ¼hl. 2018. Video compression through image interpolation. In European Conference on Computer Vision (ECCV’18). Springer, 425--440.

[50]

Yimin Yang, Q. M. Jonathan Wu, and Yaonan Wang. 2016. Autoencoder with invertible functions for dimension reduction and image reconstruction. IEEE Trans. Syst. Man Cybern. Syst. PP, 99 (2016), 1--15.

[51]

Li Yingzhen and Stephan Mandt. 2018. Disentangled sequential autoencoder. In International Conference on Machine Learning (ICML’18). 5656--5665.

[52]

Jia Zhang, Sam Tak Wu Kwong, Tiesong Zhao, and Horace Ho Shing Ip. 2018. Complexity control in HEVC intra coding for industrial video applications. IEEE Trans. Industr. Inform. PP, 99 (2018), 1--1.

Cited By

Li CQi LGeng X(2025)A SAM-guided Two-stream Lightweight Model for Anomaly DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370657421:2(1-23)Online publication date: 9-Jan-2025
https://doi.org/10.1145/3706574
Liang ZWang YLu WCao X(2024)Boosting Semi-Supervised Learning with Dual-Threshold Screening and Similarity LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672563Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3672563
Tang LZhang X(2024)High Efficiency Deep-learning Based Video CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366131120:8(1-23)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3661311
Show More Cited By

Index Terms

3D Tensor Auto-encoder with Application to Video Compression

Recommendations

Tensor compressed video sensing reconstruction by combination of fractional-order total variation and sparsifying transform

High reconstructed performance compressed video sensing (CVS) with low computational complexity and memory requirement is very challenging. In order to reconstruct the high quality video frames with low computational complexity, this paper proposes a ...
Sparse Auto-encoder with Smoothed $$l_1$$l1 Regularization

Improving the performance on data representation of an auto-encoder could help to obtain a satisfying deep network. One of the strategies to enhance the performance is to incorporate sparsity into an auto-encoder. Fortunately, sparsity for the auto-...
Tensor Train Construction From Tensor Actions, With Application to Compression of Large High Order Derivative Tensors

We present a method for converting tensors into the tensor train format based on actions of the tensor as a vector-valued multilinear function. Existing methods for constructing tensor trains require access to “array entries” of the tensor and are ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 2

May 2021

410 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3461621

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2021

Accepted: 01 October 2020

Revised: 01 September 2020

Received: 01 November 2019

Published in TOMM Volume 17, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Natural Science Foundation of China (NSFC)
Higher Vocational Education Teaching Fusion Production Integration Platform Construction Projects of Jiangsu Province
Research Project of Jiangsu Vocational College of Information Technology
New Generation AI Major Project of Ministry of Science and Technology of China
High Level of Jiangsu Province Key Construction Project Fund
“Qing Lan Project” Teaching Team in Colleges and Universities of Jiangsu Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
302
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li CQi LGeng X(2025)A SAM-guided Two-stream Lightweight Model for Anomaly DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370657421:2(1-23)Online publication date: 9-Jan-2025
https://doi.org/10.1145/3706574
Liang ZWang YLu WCao X(2024)Boosting Semi-Supervised Learning with Dual-Threshold Screening and Similarity LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3672563Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3672563
Tang LZhang X(2024)High Efficiency Deep-learning Based Video CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366131120:8(1-23)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3661311
Antil ADhiman C(2024)MF2ShrT: Multimodal Feature Fusion Using Shared Layered Transformer for Face Anti-spoofingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/364081720:6(1-21)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3640817
Qiu HLi HWu QShi HWang LMeng FXu L(2024)Learning Offset Probability Distribution for Accurate Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363721420:5(1-24)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3637214
Cheng YYan YZhu WPan YPan BYang X(2024)Head3D: Complete 3D Head Generation via Tri-plane Feature DistillationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363571720:6(1-20)Online publication date: 8-Mar-2024
https://dl.acm.org/doi/10.1145/3635717
Sheng BLi JGui LGuo ZXiao F(2024)LiteWiSys: A Lightweight System for WiFi-based Dual-task Action PerceptionACM Transactions on Sensor Networks10.1145/363217720:4(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3632177
Song XHou SHuang YCao CLiu XHuang YShan C(2024)Gait Attribute Recognition: A New Benchmark for Learning Richer Attributes From Human Gait PatternsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.331893419(1-14)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2023.3318934
Liang FZhang J(2024)Neural network-based cross-channel chroma prediction for versatile video codingThe Journal of Supercomputing10.1007/s11227-023-05868-y80:9(12166-12185)Online publication date: 8-Feb-2024
https://dl.acm.org/doi/10.1007/s11227-023-05868-y
Gashnikov M(2023)Video Codec Using Machine Learning Based on Parametric Orthogonal FiltersOptical Memory and Neural Networks10.3103/S1060992X2304002132:4(226-232)Online publication date: 22-Dec-2023
https://dl.acm.org/doi/10.3103/S1060992X23040021
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents