research-article

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity

Authors:
Zhenbo Hu

Harbin Institute of Technology Shenzhen, China

Harbin Institute of Technology Shenzhen, China
View Profile

,
Xiangyu Zou

Harbin Institute of Technology Shenzhen, China

Harbin Institute of Technology Shenzhen, China
View Profile

,
Wen Xia

Harbin Institute of Technology Shenzhen, China

Harbin Institute of Technology Shenzhen, China
View Profile

,
Sian Jin

, Washington State University, USA

, Washington State University, USA
View Profile

,
Dingwen Tao

, Washington State University, USA

, Washington State University, USA
View Profile

,
Yang Liu

Harbin Institute of Technology Shenzhen, China

Harbin Institute of Technology Shenzhen, China
View Profile

,
Weizhe Zhang

Harbin Institute of Technology Shenzhen, China

Harbin Institute of Technology Shenzhen, China
View Profile

,
Zheng Zhang

Harbin Institute of Technology Shenzhen, China

Harbin Institute of Technology Shenzhen, China
View Profile

ICPP '20: Proceedings of the 49th International Conference on Parallel ProcessingAugust 2020Article No.: 40Pages 1–12https://doi.org/10.1145/3404397.3404408

Published:17 August 2020Publication History

ICPP '20: Proceedings of the 49th International Conference on Parallel Processing

Pages 1–12

ABSTRACT

Deep neural networks (DNNs) have gained considerable attention in various real-world applications due to the strong performance on representation learning. However, a DNN needs to be trained many epochs for pursuing a higher inference accuracy, which requires storing sequential versions of DNNs and releasing the updated versions to users. As a result, large amounts of storage and network resources are required, significantly hampering DNN utilization on resource-constrained platforms (e.g., IoT, mobile phone).

In this paper, we present a novel delta compression framework called Delta-DNN, which can efficiently compress the float-point numbers in DNNs by exploiting the floats similarity existing in DNNs during training. Specifically, (1) we observe the high similarity of float-point numbers between the neighboring versions of a neural network in training; (2) inspired by delta compression technique, we only record the delta (i.e., the differences) between two neighboring versions, instead of storing the full new version for DNNs; (3) we use the error-bounded lossy compression to compress the delta data for a high compression ratio, where the error bound is strictly assessed by an acceptable loss of DNNs’ inference accuracy; (4) we evaluate Delta-DNN’s performance on two scenarios, including reducing the transmission of releasing DNNs over the network and saving the storage space occupied by multiple versions of DNNs.

According to experimental results on six popular DNNs, Delta-DNN achieves the compression ratio 2 × -10 × higher than state-of-the-art methods, while without sacrificing inference accuracy and changing the neural network structure.

References

Rich Caruana, Steve Lawrence, and C Lee Giles. 2001. Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping. In Advances in neural information processing systems. 402–408.Google Scholar
Francisco M Castro, Manuel J Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. 2018. End-to-end incremental learning. In Proceedings of the European Conference on Computer Vision (ECCV). 233–248.Google ScholarDigital Library
Yuntao Chen, Naiyan Wang, and Zhaoxiang Zhang. 2018. Darkrank: Accelerating deep metric learning via cross sample similarities transfer. In Thirty-Second AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Gobinda G Chowdhury. 2003. Natural language processing. Annual review of information science and technology 37, 1(2003), 51–89.Google Scholar
Peter Deutsch 1996. GZIP file format specification version 4.3.Google Scholar
Sheng Di and Franck Cappello. 2016. Fast error-bounded lossy HPC data compression with SZ. In 2016 ieee international parallel and distributed processing symposium (ipdps). IEEE, 730–739.Google Scholar
Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, Yanzhi Wang, and Dingwen Tao. 2020. RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition. arXiv preprint arXiv:2002.11474(2020).Google Scholar
Mike Dutch. 2008. Understanding data deduplication ratios. In SNIA Data Management Forum. 7.Google Scholar
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.Google ScholarCross Ref
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531(2015).Google Scholar
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks. In Advances in neural information processing systems. 4107–4115.Google Scholar
David A Huffman. 1952. A method for the construction of minimum-redundancy codes. Proceedings of the IRE 40, 9 (1952), 1098–1101.Google ScholarCross Ref
Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866.Google Scholar
Sian Jin, Sheng Di, Xin Liang, Jiannan Tian, Dingwen Tao, and Franck Cappello. 2019. DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. 159–170.Google ScholarDigital Library
Dominik Kempa and Simon J Puglisi. 2013. Lempel-Ziv factorization: Simple, fast, practical. In 2013 Proceedings of the Fifteenth Workshop on Algorithm Engineering and Experiments (ALENEX). SIAM, 103–112.Google ScholarCross Ref
Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2014. The cifar-10 dataset. online: http://www. cs. toronto. edu/kriz/cifar. html 55 (2014).Google Scholar
Nicholas D Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, Lei Jiao, Lorena Qendro, and Fahim Kawsar. 2016. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). IEEE, 1–12.Google ScholarCross Ref
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.Google Scholar
Dawei Li, Xiaolong Wang, and Deguang Kong. 2018. Deeprebirth: Accelerating deep neural network execution on mobile devices. In Thirty-second AAAI conference on artificial intelligence.Google ScholarCross Ref
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710.Google Scholar
He Li, Kaoru Ota, and Mianxiong Dong. 2018. Learning IoT in edge: Deep learning for the Internet of Things with edge computing. IEEE network 32, 1 (2018), 96–101.Google Scholar
Sihuan Li, Sheng Di, Xin Liang, Zizhong Chen, and Franck Cappello. 2018. Optimizing lossy compression with adjacent snapshots for N-body simulation data. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 428–437.Google ScholarCross Ref
Zhaoqi Li, Yu Ma, Catalina Vajiac, and Yunkai Zhang. 2018. Exploration of Numerical Precision in Deep Neural Networks. arXiv preprint arXiv:1805.01078(2018).Google Scholar
Xin Liang, Sheng Di, Sihuan Li, Dingwen Tao, Zizhong Chen, and Franck Cappello. [n.d.]. Exploring Best Lossy Compression Strategy By Combining SZ with Spatiotemporal Decimation. In 2018 IEEE/ACM The 4th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-4).Google Scholar
Xin Liang, Sheng Di, Dingwen Tao, Sihuan Li, Shaomeng Li, Hanqi Guo, Zizhong Chen, and Franck Cappello. 2018. Error-controlled lossy compression optimized for high compression ratios of scientific datasets. In 2018 IEEE International Conference on Big Data (Big Data). IEEE, 438–447.Google ScholarCross Ref
Peter Lindstrom. 2014. Fixed-rate compressed floating-point arrays. IEEE Transactions on Visualization and Computer Graphics 20, 12(2014), 2674–2683.Google ScholarCross Ref
Peter Lindstrom. 2017. Error distributions of lossy floating-point compressors. Technical Report. Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States).Google Scholar
Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision. 2736–2744.Google ScholarCross Ref
Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270.Google Scholar
Mengting Lu, Fang Wang, Dan Feng, and Yuchong Hu. 2019. A Read-leveling Data Distribution Scheme for Promoting Read Performance in SSDs with Deduplication. In Proceedings of the 48th International Conference on Parallel Processing. 1–10.Google ScholarDigital Library
Yao Lu, Guangming Lu, Jinxing Li, Yuanrong Xu, Zheng Zhang, and David Zhang. 2020. Multiscale conditional regularization for convolutional neural networks. IEEE Transactions on Cybernetics(2020).Google ScholarCross Ref
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2016. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440.Google Scholar
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024–8035.Google Scholar
Igor Pavlov. 1998. The Algorithm: Lempel-Ziv-Markov Chain.Google Scholar
Majid Rabbani. 2002. JPEG2000: Image compression fundamentals, standards and practice. Journal of Electronic Imaging 11, 2 (2002), 286.Google ScholarCross Ref
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European conference on computer vision. Springer, 525–542.Google ScholarCross Ref
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.Google ScholarCross Ref
Jerome M Shapiro. 1993. Embedded image coding using zerotrees of wavelet coefficients. IEEE Transactions on signal processing 41, 12 (1993), 3445–3462.Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
Seung Woo Son, Zhengzhang Chen, William Hendrix, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. 2014. Data compression for the exascale computing era-survey. Supercomputing Frontiers and Innovations 1, 2 (2014), 76–88.Google ScholarDigital Library
Torsten Suel, Nasir Memon, and Khalid Sayood. 2002. Algorithms for delta compression and remote file synchronization. Lossless Compression Handbook.Google Scholar
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1–9.Google ScholarCross Ref
Richard Szeliski. 2010. Computer vision: algorithms and applications. Springer Science & Business Media.Google ScholarDigital Library
Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, and Lior Wolf. 2014. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1701–1708.Google ScholarDigital Library
Mingxing Tan and Quoc V Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946.Google Scholar
Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Cappello. 2017. Significantly improving lossy compression for scientific data sets based on multidimensional prediction and error-controlled quantization. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1129–1139.Google ScholarCross Ref
Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. 2018. Dataset distillation. arXiv preprint arXiv:1811.10959(2018).Google Scholar
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13, 4 (2004), 600–612.Google ScholarDigital Library
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Advances in neural information processing systems. 2074–2082.Google Scholar
Wen Xia, Hong Jiang, Dan Feng, Fred Douglis, Philip Shilane, Yu Hua, Min Fu, Yucheng Zhang, and Yukun Zhou. 2016. A comprehensive study of the past, present, and future of data deduplication. Proc. IEEE 104, 9 (2016), 1681–1710.Google ScholarCross Ref
Lianghong Xu, Andrew Pavlo, Sudipta Sengupta, and Gregory R Ganger. 2017. Online deduplication for databases. In Proceedings of the 2017 ACM International Conference on Management of Data. 1355–1368.Google ScholarDigital Library
Lei Yang, Jiannong Cao, Zhenyu Wang, and Weigang Wu. 2017. Network aware multi-user computation partitioning in mobile edge clouds. In 2017 46th International Conference on Parallel Processing (ICPP). IEEE, 302–311.Google ScholarCross Ref
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In Advances in neural information processing systems. 3320–3328.Google Scholar
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6848–6856.Google ScholarCross Ref
Jacob Ziv and Abraham Lempel. 1977. A universal algorithm for sequential data compression. IEEE Transactions on information theory 23, 3 (1977), 337–343.Google ScholarDigital Library
Zstandard. 2018. Fast real-time compression algorithm. http://facebook.github.io/zstd/Google Scholar

Recommendations

A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks

In this paper, through extension of the present methods and based on error minimization, two fast and efficient layer-by-layer pre-training methods are proposed for initializing deep neural network (DNN) weights. Due to confrontation with a large number ...
Read More
Lossy compression and curvelet thresholding for image denoising

A new system of multi-scale transform, namely, the curvelets, was developed recently, which possess directional features and provides optimally sparse representation of objects with edges. In this paper a novice algorithm for image denoising based on ...
Read More
Image denoising via lossy compression and wavelet thresholding
ICIP '97: Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1

Some past work has proposed to use lossy compression to remove noise, based on the rationale that a reasonable compression method retains the dominant signal features more than the randomness of the noise. Building on this theme, we explain why ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP '20: Proceedings of the 49th International Conference on Parallel Processing
August 2020
844 pages
ISBN:9781450388160
DOI:10.1145/3404397

Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 August 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Lossy compression
delta compression
neural network
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate91of313submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 284
  Total Downloads
- Downloads (Last 12 months)73
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity

ICPP '20: Proceedings of the 49th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Recommendations

A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks

Lossy compression and curvelet thresholding for image denoising

Image denoising via lossy compression and wavelet thresholding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity

ICPP '20: Proceedings of the 49th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Recommendations

A fast and efficient pre-training method based on layer-by-layer maximum discrimination for deep neural networks

Lossy compression and curvelet thresholding for image denoising

Image denoising via lossy compression and wavelet thresholding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media