ABSTRACT
Deep neural networks (DNNs) have gained considerable attention in various real-world applications due to the strong performance on representation learning. However, a DNN needs to be trained many epochs for pursuing a higher inference accuracy, which requires storing sequential versions of DNNs and releasing the updated versions to users. As a result, large amounts of storage and network resources are required, significantly hampering DNN utilization on resource-constrained platforms (e.g., IoT, mobile phone).
In this paper, we present a novel delta compression framework called Delta-DNN, which can efficiently compress the float-point numbers in DNNs by exploiting the floats similarity existing in DNNs during training. Specifically, (1) we observe the high similarity of float-point numbers between the neighboring versions of a neural network in training; (2) inspired by delta compression technique, we only record the delta (i.e., the differences) between two neighboring versions, instead of storing the full new version for DNNs; (3) we use the error-bounded lossy compression to compress the delta data for a high compression ratio, where the error bound is strictly assessed by an acceptable loss of DNNs’ inference accuracy; (4) we evaluate Delta-DNN’s performance on two scenarios, including reducing the transmission of releasing DNNs over the network and saving the storage space occupied by multiple versions of DNNs.
According to experimental results on six popular DNNs, Delta-DNN achieves the compression ratio 2 × -10 × higher than state-of-the-art methods, while without sacrificing inference accuracy and changing the neural network structure.
- Rich Caruana, Steve Lawrence, and C Lee Giles. 2001. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in neural information processing systems. 402–408.Google Scholar
- Francisco M Castro, Manuel J Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. 2018. End-to-end incremental learning. In Proceedings of the European Conference on Computer Vision (ECCV). 233–248.Google ScholarDigital Library
- Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. 2018. Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Gobinda G Chowdhury. 2003. Natural language processing. Annual review of information science and technology 37, 1(2003), 51–89.Google Scholar
- Peter Deutsch 1996. GZIP file format specification version 4.3.Google Scholar
- Sheng Di and Franck Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In 2016 ieee international parallel and distributed processing symposium (ipdps). IEEE, 730–739.Google Scholar
- Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, Yanzhi Wang, and Dingwen Tao. 2020. RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition. arXiv preprint arXiv:2002.11474(2020).Google Scholar
- Mike Dutch. 2008. Understanding data deduplication ratios. In SNIA Data Management Forum. 7.Google Scholar
- Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
- Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531(2015).Google Scholar
- Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in neural information processing systems. 4107–4115.Google Scholar
- David A Huffman. 1952. A method for the construction of minimum-redundancy codes. Proceedings of the IRE 40, 9 (1952), 1098–1101.Google ScholarCross Ref
- Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866.Google Scholar
- Sian Jin, Sheng Di, Xin Liang, Jiannan Tian, Dingwen Tao, and Franck Cappello. 2019. DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. 159–170.Google ScholarDigital Library
- Dominik Kempa and Simon J Puglisi. 2013. Lempel-Ziv factorization: Simple, fast, practical. In 2013 Proceedings of the Fifteenth Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM, 103–112.Google ScholarCross Ref
- Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2014. The cifar-10 dataset. online: http://www. cs. toronto. edu/kriz/cifar. html 55 (2014).Google Scholar
- Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). IEEE, 1–12.Google ScholarCross Ref
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.Google Scholar
- Dawei Li, Xiaolong Wang, and Deguang Kong. 2018. Deeprebirth: Accelerating deep neural network execution on mobile devices. In Thirty-second AAAI conference on artificial intelligence.Google ScholarCross Ref
- Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710.Google Scholar
- He Li, Kaoru Ota, and Mianxiong Dong. 2018. Learning IoT in edge: Deep learning for the Internet of Things with edge computing. IEEE network 32, 1 (2018), 96–101.Google Scholar
- Sihuan Li, Sheng Di, Xin Liang, Zizhong Chen, and Franck Cappello. 2018. Optimizing lossy compression with adjacent snapshots for N-body simulation data. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 428–437.Google ScholarCross Ref
- Zhaoqi Li, Yu Ma, Catalina Vajiac, and Yunkai Zhang. 2018. Exploration of Numerical Precision in Deep Neural Networks. arXiv preprint arXiv:1805.01078(2018).Google Scholar
- Xin Liang, Sheng Di, Sihuan Li, Dingwen Tao, Zizhong Chen, and Franck Cappello. [n.d.]. Exploring Best Lossy Compression Strategy By Combining SZ with Spatiotemporal Decimation. In 2018 IEEE/ACM The 4th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-4).Google Scholar
- Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Shaomeng Li, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2018. Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 438–447.Google ScholarCross Ref
- Peter Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE Transactions on Visualization and Computer Graphics 20, 12(2014), 2674–2683.Google ScholarCross Ref
- Peter Lindstrom. 2017. Error distributions of lossy floating-point compressors. Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).Google Scholar
- Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision. 2736–2744.Google ScholarCross Ref
- Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270.Google Scholar
- Mengting Lu, Fang Wang, Dan Feng, and Yuchong Hu. 2019. A Read-leveling Data Distribution Scheme for Promoting Read Performance in SSDs with Deduplication. In Proceedings of the 48th International Conference on Parallel Processing. 1–10.Google ScholarDigital Library
- Yao Lu, Guangming Lu, Jinxing Li, Yuanrong Xu, Zheng Zhang, and David Zhang. 2020. Multiscale conditional regularization for convolutional neural networks. IEEE Transactions on Cybernetics(2020).Google ScholarCross Ref
- Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440.Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024–8035.Google Scholar
- Igor Pavlov. 1998. The Algorithm: Lempel-Ziv-Markov Chain.Google Scholar
- Majid Rabbani. 2002. JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging 11, 2 (2002), 286.Google ScholarCross Ref
- Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision. Springer, 525–542.Google ScholarCross Ref
- Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.Google ScholarCross Ref
- Jerome M Shapiro. 1993. Embedded image coding using zerotrees of wavelet coefficients. IEEE Transactions on signal processing 41, 12 (1993), 3445–3462.Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
- Seung Woo Son, Zhengzhang Chen, William Hendrix, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. 2014. Data compression for the exascale computing era-survey. Supercomputing Frontiers and Innovations 1, 2 (2014), 76–88.Google ScholarDigital Library
- Torsten Suel, Nasir Memon, and Khalid Sayood. 2002. Algorithms for delta compression and remote file synchronization. Lossless Compression Handbook.Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9.Google ScholarCross Ref
- Richard Szeliski. 2010. Computer vision: algorithms and applications. Springer Science & Business Media.Google ScholarDigital Library
- Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. 2014. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1701–1708.Google ScholarDigital Library
- Mingxing Tan and Quoc V Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946.Google Scholar
- Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Cappello. 2017. Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1129–1139.Google ScholarCross Ref
- Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. 2018. Dataset distillation. arXiv preprint arXiv:1811.10959(2018).Google Scholar
- Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.Google ScholarDigital Library
- Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Advances in neural information processing systems. 2074–2082.Google Scholar
- Wen Xia, Hong Jiang, Dan Feng, Fred Douglis, Philip Shilane, Yu Hua, Min Fu, Yucheng Zhang, and Yukun Zhou. 2016. A comprehensive study of the past, present, and future of data deduplication. Proc. IEEE 104, 9 (2016), 1681–1710.Google ScholarCross Ref
- Lianghong Xu, Andrew Pavlo, Sudipta Sengupta, and Gregory R Ganger. 2017. Online deduplication for databases. In Proceedings of the 2017 ACM International Conference on Management of Data. 1355–1368.Google ScholarDigital Library
- Lei Yang, Jiannong Cao, Zhenyu Wang, and Weigang Wu. 2017. Network aware multi-user computation partitioning in mobile edge clouds. In 2017 46th International Conference on Parallel Processing (ICPP). IEEE, 302–311.Google ScholarCross Ref
- Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Advances in neural information processing systems. 3320–3328.Google Scholar
- Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6848–6856.Google ScholarCross Ref
- Jacob Ziv and Abraham Lempel. 1977. A universal algorithm for sequential data compression. IEEE Transactions on information theory 23, 3 (1977), 337–343.Google ScholarDigital Library
- Zstandard. 2018. Fast real-time compression algorithm. http://facebook.github.io/zstd/Google Scholar
Recommendations
A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks
In this paper, through extension of the present methods and based on error minimization, two fast and efficient layer-by-layer pre-training methods are proposed for initializing deep neural network (DNN) weights. Due to confrontation with a large number ...
Lossy compression and curvelet thresholding for image denoising
A new system of multi-scale transform, namely, the curvelets, was developed recently, which possess directional features and provides optimally sparse representation of objects with edges. In this paper a novice algorithm for image denoising based on ...
Image denoising via lossy compression and wavelet thresholding
ICIP '97: Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1Some past work has proposed to use lossy compression to remove noise, based on the rationale that a reasonable compression method retains the dominant signal features more than the randomness of the noise. Building on this theme, we explain why ...
Comments