research-article

Free Access

NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning

Authors:
Dhananjay Saikumar

University of St Andrews, UK

University of St Andrews, UK

0009-0006-4937-9308
View Profile

,
Blesson Varghese

University of St Andrews, UK

University of St Andrews, UK

0000-0001-8392-832X
View Profile

EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsApril 2024Pages 999–1015https://doi.org/10.1145/3627703.3650067

Published:22 April 2024Publication History

EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems

Pages 999–1015

ABSTRACT

Efficient on-device Convolutional Neural Network (CNN) training in resource-constrained mobile and edge environments is an open challenge. Backpropagation is the standard approach adopted, but it is GPU memory intensive due to its strong inter-layer dependencies that demand intermediate activations across the entire CNN model to be retained in GPU memory. This necessitates smaller batch sizes to make training possible within the available GPU memory budget, but in turn, results in substantially high and impractical training time. We introduce NeuroFlux, a novel CNN training system tailored for memory-constrained scenarios. We develop two novel opportunities: firstly, adaptive auxiliary networks that employ a variable number of filters to reduce GPU memory usage, and secondly, block-specific adaptive batch sizes, which not only cater to the GPU memory constraints but also accelerate the training process. NeuroFlux segments a CNN into blocks based on GPU memory usage and further attaches an auxiliary network to each layer in these blocks. This disrupts the typical layer dependencies under a new training paradigm - 'adaptive local learning'. Moreover, NeuroFlux adeptly caches intermediate activations, eliminating redundant forward passes over previously trained blocks, further accelerating the training process. The results are twofold when compared to Backpropagation: on various hardware platforms, NeuroFlux demonstrates training speed-ups of 2.3× to 6.1× under stringent GPU memory budgets, and NeuroFlux generates streamlined models that have 10.9× to 29.4× fewer parameters.

References

Babak Joze Abbaschian, Daniel Sierra-Sosa, and Adel Said Elmaghraby. 2021. Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models. Sensors.Google Scholar
Ahmed M. Abdelmoniem, Atal Narayan Sahu, Marco Canini, and Suhaib A. Fahmy. 2023. REFL: Resource-Efficient Federated Learning. In European Conference on Computer Systems.Google Scholar
Samson Akinpelu, Serestina Viriri, and Adekanmi Adegun. 2023. Lightweight Deep Learning Framework for Speech Emotion Recognition. IEEE Access.Google Scholar
Milad Alizadeh, Shyam A. Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, and Yarin Gal. 2022. Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients. In International Conference on Machine Learning.Google Scholar
Eugene Belilovsky, Michael Eickenberg, and Edouard Oyallon. 2019. Greedy Layerwise Learning Can Scale To ImageNet. In International Conference on Machine Learning.Google Scholar
Eugene Belilovsky, Michael Eickenberg, and Edouard Oyallon. 2020. Decoupled Greedy Learning of CNNs. In International Conference on Machine Learning.Google Scholar
Léon Bottou, Frank E. Curtis, and Jorge Nocedal. 2018. Optimization Methods for Large-Scale Machine Learning. SIAM Rev.Google Scholar
Andrew Brock, Theodore Lim, J. M. Ritchie, and Nick Weston. 2017. FreezeOut: Accelerate Training by Progressively Freezing Layers. arXiv:abs/1706.04983.Google Scholar
Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In International Conference on Learning Representations.Google Scholar
Miguel A. Carreira-Perpinan and Yerlan Idelbayev. 2018. "Learning-Compression" Algorithms for Neural Net Pruning. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training Deep Nets with Sublinear Memory Cost. arXiv:abs/1604.06174.Google Scholar
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Google Scholar
Bailey J. Eccles, Philip Rodgers, Peter Kilpatrick, Ivor Spence, and Blesson Varghese. 2024. DNNShifter: An Efficient DNN Pruning System for Edge Computing. Future Generation Computer Systems.Google Scholar
Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In International Conference on Learning Representations.Google Scholar
In Gim and JeongGil Ko. 2022. Memory-efficient DNN training on mobile devices. In International Conference on Mobile Systems, Applications and Services (MobiSys '22).Google ScholarDigital Library
Junyao Guo, Unmesh Kurup, and Mohak Shah. 2021. Efficacy of Model Fine-Tuning for Personalized Dynamic Gesture Recognition. In Deep Learning for Human Activity Recognition.Google Scholar
Yunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, and Rogerio Feris. 2019. SpotTune: Transfer Learning Through Adaptive Fine-Tuning. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic Network Surgery for Efficient DNNs. In International Conference on Neural Information Processing Systems.Google Scholar
Otkrist Gupta and Ramesh Raskar. 2018. Distributed Learning of Deep Neural Network over Multiple Agents. Journal of Network and Computer Applications.Google ScholarCross Ref
Amirhossein Habibian, Davide Abati, Taco Cohen, and Babak Ehteshami Bejnordi. 2021. Skip-Convolutions for Efficient Video Processing. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Dong-Jun Han, Do-Yeon Kim, Minseok Choi, Christopher G. Brinton, and Jaekyun Moon. 2022. SplitGP: Achieving Both Generalization and Personalization in Federated Learning. IEEE Conference on Computer Communications.Google Scholar
Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In International Conference on Learning Representations.Google Scholar
Chaoyang He, Shen Li, Mahdi Soltanolkotabi, and Salman Avestimehr. 2021. PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. In International Conference on Machine Learning.Google Scholar
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2020. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:abs/1704.04861.Google Scholar
Baojin Huang, Zhongyuan Wang, Guangcheng Wang, Kui Jiang, Zheng He, Hua Zou, and Qin Zou. 2021. Masked Face Recognition Datasets and Validation. In 2021 IEEE/CVF International Conference on Computer Vision Workshops.Google Scholar
Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Weinberger. 2018. Multi-Scale Dense Networks for Resource Efficient Image Classification. In International Conference on Learning Representations.Google Scholar
Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism. International Conference on Neural Information Processing Systems.Google Scholar
Sinh Huynh, Rajesh Balan, and Jeonggil Ko. 2021. iMon: Appearance-based Gaze Tracking System on Mobile Devices. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies.Google ScholarDigital Library
Sergey Ioffe. 2017. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models. In Advances in Neural Information Processing Systems.Google Scholar
Joseph Bailey Luttrell Iv, Zhaoxian Zhou, Chaoyang Zhang, Ping Gong, and Yuanyuan Zhang. 2017. Facial Recognition via Transfer Learning: Fine-Tuning Keras_vggface. International Conference on Computational Science and Computational Intelligence.Google ScholarCross Ref
Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. 2018. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking. In International Conference on Machine Learning.Google Scholar
Ira Kemelmacher-Shlizerman, Steven M Seitz, Daniel Miller, and Evan Brossard. 2016. The MegaFace Benchmark: 1 Million Faces for Recognition at Scale. In IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2017. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In International Conference on Learning Representations.Google Scholar
Adam Kohan, Edward A. Rietman, and Hava T. Siegelmann. 2023. Signal Propagation: The Framework for Learning and Inference in a Forward Pass. IEEE Transactions on Neural Networks and Learning Systems.Google Scholar
Alexandros Kouris and Christos-Savvas Bouganis. 2018. Learning to Fly by MySelf: A Self-Supervised CNN-Based Approach for Autonomous Navigation. In IEEE/RSJ International Conference on Intelligent Robots and Systems.Google Scholar
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. https://www.cs.toronto.edu/ kriz/cifar.html.Google Scholar
Stefanos Laskaridis, Stylianos I. Venieris, Hyeji Kim, and Nicholas D. Lane. 2020. HAPI: Hardware-Aware Progressive Inference. In International Conference on Computer-Aided Design.Google Scholar
Ya Le and Xuan S. Yang. 2015. Tiny ImageNet Visual Recognition Challenge. http://vision.stanford.edu/teaching/cs231n/reports/2015/pdfs/yle_project.pdfGoogle Scholar
Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, and Matthias Grundmann. 2019. On-Device Neural Net Inference with Mobile GPUs. arXiv:abs/1907.01989.Google Scholar
H. Li, H. Zhang, X. Qi, Y. Ruigang, and G. Huang. 2019. Improved Techniques for Training Adaptive Deep Networks. In IEEE/CVF International Conference on Computer Vision.Google Scholar
Qianli Liao, Joel Z. Leibo, and Tomaso Poggio. 2016. How Important is Weight Symmetry in Backpropagation?. In AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Timothy P. Lillicrap, Daniel Cownden, Douglas Blair Tweed, and Colin J. Akerman. 2016. Random Synaptic Feedback Weights Support Error Backpropagation for Deep Learning. Nature Communications.Google Scholar
Tao Lin, Sebastian U. Stich, Luis Barba, Daniil Dmitriev, and Martin Jaggi. 2020. Dynamic Model Pruning with Feedback. In International Conference on Learning Representations.Google Scholar
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. In International Conference on Learning Representations.Google Scholar
Joseph Luttrell, Zhaoxian Zhou, Yuanyuan Zhang, Chaoyang Zhang, Ping Gong, Bei Yang, and Runzhi Li. 2018. A Deep Transfer Learning Approach to Fine-Tuning Facial Recognition Models. In IEEE Conference on Industrial Electronics and Applications.Google ScholarCross Ref
Bishwas Mandal, Adaeze Okeukwu, and Yihong Theis. 2021. Masked Face Recognition using ResNet-50. arXiv:abs/2104.08997.Google Scholar
Dominic Masters and Carlo Luschi. 2018. Revisiting Small Batch Training for Deep Neural Networks. arXiv:abs/1804.07612.Google Scholar
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In International Conference on Artificial Intelligence and Statistics.Google Scholar
Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning Convolutional Neural Networks for Resource Efficient Inference. In International Conference on Learning Representations.Google Scholar
Hesham Mostafa and Xin Wang. 2019. Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization. In International Conference on Machine Learning.Google Scholar
Sarala Padi, Seyed Omid Sadjadi, Dinesh Manocha, and Ram D. Sriram. 2021. Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation. Proceedings of the International Conference on Multimodal Interaction.Google Scholar
German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. 2019. Continual Lifelong Learning with Neural Networks: A Review. Neural Networks.Google Scholar
HyeonJung Park, Youngki Lee, and JeongGil Ko. 2021. Enabling Realtime Sign Language Translation on Mobile Platforms with On-board Depth Cameras. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies.Google Scholar
David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David R. So, Maud Texier, and Jeff Dean. 2022. The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. Computer (2022).Google Scholar
Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient Neural Architecture Search via Parameters Sharing. In International Conference on Machine Learning.Google Scholar
Jaya Prakash Sahoo, Allam Jaya Prakash, Paweł Pławiak, and Saunak Samantray. 2022. Real-Time Hand Gesture Recognition Using Fine-Tuned Convolutional Neural Network. Sensors.Google Scholar
F. Sarfraz, E. Arani, and B. Zonooz. 2021. Knowledge Distillation Beyond Model Compression. In International Conference on Pattern Recognition.Google Scholar
Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2020. Green AI. Commun. ACM (2020).Google Scholar
Shaohuai Shi, Qiang Wang, and Xiaowen Chu. 2020. Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format. In IEEE International Conference on Parallel and Distributed Systems.Google Scholar
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.Google Scholar
Hidenori Tanaka, Daniel Kunin, Daniel L Yamins, and Surya Ganguli. 2020. Pruning neural networks without any data by iteratively conserving synaptic flow. In Advances in Neural Information Processing Systems.Google Scholar
Surat Teerapittayanon, Bradley McDanel, and H.T. Kung. 2016. BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks. In International Conference on Pattern Recognition.Google Scholar
Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, and Ramesh Raskar. 2018. Split Learning for Health: Distributed Deep Learning Without Sharing raw patient data. arXiv:abs/1812.00564.Google Scholar
Haibin Wang, Ce Ge, Hesen Chen, and Xiuyu Sun. 2023. PreNAS: Preferred One-Shot Learning Towards Efficient Neural Architecture Search. In International Conference on Machine Learning.Google Scholar
Siqi Wang, Anuj Pathania, and Tulika Mitra. 2020. Neural Network Inference on Mobile SoCs. IEEE Design & Test.Google Scholar
Yiding Wang, Decang Sun, Kai Chen, Fan Lai, and Mosharaf Chowdhury. 2022. Egeria: Efficient DNN Training with Knowledge-Guided Layer Freezing. European Conference on Computer Systems.Google Scholar
Zhiyuan Wang, Hongli Xu, Yang Xu, Zhida Jiang, and Jianchun Liu. 2023. CoopFL: Accelerating Federated Learning with DNN Partitioning and Offloading in Heterogeneous Edge Computing. Comput. Netw.Google Scholar
Bichen Wu, Forrest Iandola, Peter H. Jin, and Kurt Keutzer. 2017. SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Google ScholarCross Ref
Di Wu, Rehmat Ullah, Paul Harvey, Peter Kilpatrick, Ivor Spence, and Blesson Varghese. 2022. FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning. IEEE Internet of Things Journal.Google Scholar
Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. 2016. Quantized Convolutional Neural Networks for Mobile Devices. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
Fang Yu, Li Cui, Pengcheng Wang, Chuanqi Han, Ruoran Huang, and Xi Huang. 2021. EasiEdge: A Novel Global Deep Neural Networks Pruning Method for Efficient Edge Computing. IEEE Internet of Things Journal (2021).Google Scholar
Ruizhe Zhao and Wayne W. C. Luk. 2018. Efficient Structured Pruning and Architecture Searching for Group Convolution. 2019 IEEE/CVF International Conference on Computer Vision Workshop.Google Scholar
Barret Zoph and Quoc Le. 2017. Neural Architecture Search with Reinforcement Learning. In International Conference on Learning Representations.Google Scholar

Recommendations

TNPU: an efficient accelerator architecture for training convolutional neural networks
ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference

Training large scale convolutional neural networks (CNNs) is an extremely computation and memory intensive task that requires massive computational resources and training time. Recently, many accelerator solutions have been proposed to improve the ...
Read More
Leveraging MLC STT-RAM for energy-efficient CNN training
MEMSYS '18: Proceedings of the International Symposium on Memory Systems

Graphics Processing Units (GPUs) are extensively used in training of convolutional neural networks (CNNs) due to their promising compute capability. However, GPU memory capacity, bandwidth, and energy are becoming critical system bottlenecks with ...
Read More
A durable and energy efficient main memory using phase change memory technology
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems
April 2024
1245 pages
ISBN:9798400704376
DOI:10.1145/3627703

Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 April 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CNN training
Edge computing
Local learning
Memory efficient training
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate241of1,308submissions,18%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 72
  Total Downloads
- Downloads (Last 12 months)72
- Downloads (Last 6 weeks)72
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning

EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems

ABSTRACT

References

Cited By

Recommendations

TNPU: an efficient accelerator architecture for training convolutional neural networks

Leveraging MLC STT-RAM for energy-efficient CNN training

A durable and energy efficient main memory using phase change memory technology

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

NeuroFlux: Memory-Efficient CNN Training Using Adaptive Local Learning

EuroSys '24: Proceedings of the Nineteenth European Conference on Computer Systems

ABSTRACT

References

Cited By

Recommendations

TNPU: an efficient accelerator architecture for training convolutional neural networks

Leveraging MLC STT-RAM for energy-efficient CNN training

A durable and energy efficient main memory using phase change memory technology

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media