ABSTRACT
Efficient on-device Convolutional Neural Network (CNN) training in resource-constrained mobile and edge environments is an open challenge. Backpropagation is the standard approach adopted, but it is GPU memory intensive due to its strong inter-layer dependencies that demand intermediate activations across the entire CNN model to be retained in GPU memory. This necessitates smaller batch sizes to make training possible within the available GPU memory budget, but in turn, results in substantially high and impractical training time. We introduce NeuroFlux, a novel CNN training system tailored for memory-constrained scenarios. We develop two novel opportunities: firstly, adaptive auxiliary networks that employ a variable number of filters to reduce GPU memory usage, and secondly, block-specific adaptive batch sizes, which not only cater to the GPU memory constraints but also accelerate the training process. NeuroFlux segments a CNN into blocks based on GPU memory usage and further attaches an auxiliary network to each layer in these blocks. This disrupts the typical layer dependencies under a new training paradigm - 'adaptive local learning'. Moreover, NeuroFlux adeptly caches intermediate activations, eliminating redundant forward passes over previously trained blocks, further accelerating the training process. The results are twofold when compared to Backpropagation: on various hardware platforms, NeuroFlux demonstrates training speed-ups of 2.3× to 6.1× under stringent GPU memory budgets, and NeuroFlux generates streamlined models that have 10.9× to 29.4× fewer parameters.
- Babak Joze Abbaschian, Daniel Sierra-Sosa, and Adel Said Elmaghraby. 2021. Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models. Sensors.Google Scholar
- Ahmed M. Abdelmoniem, Atal Narayan Sahu, Marco Canini, and Suhaib A. Fahmy. 2023. REFL: Resource-Efficient Federated Learning. In European Conference on Computer Systems.Google Scholar
- Samson Akinpelu, Serestina Viriri, and Adekanmi Adegun. 2023. Lightweight Deep Learning Framework for Speech Emotion Recognition. IEEE Access.Google Scholar
- Milad Alizadeh, Shyam A. Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, and Yarin Gal. 2022. Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients. In International Conference on Machine Learning.Google Scholar
- Eugene Belilovsky, Michael Eickenberg, and Edouard Oyallon. 2019. Greedy Layerwise Learning Can Scale To ImageNet. In International Conference on Machine Learning.Google Scholar
- Eugene Belilovsky, Michael Eickenberg, and Edouard Oyallon. 2020. Decoupled Greedy Learning of CNNs. In International Conference on Machine Learning.Google Scholar
- Léon Bottou, Frank E. Curtis, and Jorge Nocedal. 2018. Optimization Methods for Large-Scale Machine Learning. SIAM Rev.Google Scholar
- Andrew Brock, Theodore Lim, J. M. Ritchie, and Nick Weston. 2017. FreezeOut: Accelerate Training by Progressively Freezing Layers. arXiv:abs/1706.04983.Google Scholar
- Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In International Conference on Learning Representations.Google Scholar
- Miguel A. Carreira-Perpinan and Yerlan Idelbayev. 2018. "Learning-Compression" Algorithms for Neural Net Pruning. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training Deep Nets with Sublinear Memory Cost. arXiv:abs/1604.06174.Google Scholar
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Google Scholar
- Bailey J. Eccles, Philip Rodgers, Peter Kilpatrick, Ivor Spence, and Blesson Varghese. 2024. DNNShifter: An Efficient DNN Pruning System for Edge Computing. Future Generation Computer Systems.Google Scholar
- Jonathan Frankle and Michael Carbin. 2019. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In International Conference on Learning Representations.Google Scholar
- In Gim and JeongGil Ko. 2022. Memory-efficient DNN training on mobile devices. In International Conference on Mobile Systems, Applications and Services (MobiSys '22).Google ScholarDigital Library
- Junyao Guo, Unmesh Kurup, and Mohak Shah. 2021. Efficacy of Model Fine-Tuning for Personalized Dynamic Gesture Recognition. In Deep Learning for Human Activity Recognition.Google Scholar
- Yunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, and Rogerio Feris. 2019. SpotTune: Transfer Learning Through Adaptive Fine-Tuning. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic Network Surgery for Efficient DNNs. In International Conference on Neural Information Processing Systems.Google Scholar
- Otkrist Gupta and Ramesh Raskar. 2018. Distributed Learning of Deep Neural Network over Multiple Agents. Journal of Network and Computer Applications.Google ScholarCross Ref
- Amirhossein Habibian, Davide Abati, Taco Cohen, and Babak Ehteshami Bejnordi. 2021. Skip-Convolutions for Efficient Video Processing. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Dong-Jun Han, Do-Yeon Kim, Minseok Choi, Christopher G. Brinton, and Jaekyun Moon. 2022. SplitGP: Achieving Both Generalization and Personalization in Federated Learning. IEEE Conference on Computer Communications.Google Scholar
- Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. In International Conference on Learning Representations.Google Scholar
- Chaoyang He, Shen Li, Mahdi Soltanolkotabi, and Salman Avestimehr. 2021. PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. In International Conference on Machine Learning.Google Scholar
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2020. Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence.Google ScholarCross Ref
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:abs/1704.04861.Google Scholar
- Baojin Huang, Zhongyuan Wang, Guangcheng Wang, Kui Jiang, Zheng He, Hua Zou, and Qin Zou. 2021. Masked Face Recognition Datasets and Validation. In 2021 IEEE/CVF International Conference on Computer Vision Workshops.Google Scholar
- Gao Huang, Danlu Chen, Tianhong Li, Felix Wu, Laurens van der Maaten, and Kilian Weinberger. 2018. Multi-Scale Dense Networks for Resource Efficient Image Classification. In International Conference on Learning Representations.Google Scholar
- Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks Using Pipeline Parallelism. International Conference on Neural Information Processing Systems.Google Scholar
- Sinh Huynh, Rajesh Balan, and Jeonggil Ko. 2021. iMon: Appearance-based Gaze Tracking System on Mobile Devices. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies.Google ScholarDigital Library
- Sergey Ioffe. 2017. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models. In Advances in Neural Information Processing Systems.Google Scholar
- Joseph Bailey Luttrell Iv, Zhaoxian Zhou, Chaoyang Zhang, Ping Gong, and Yuanyuan Zhang. 2017. Facial Recognition via Transfer Learning: Fine-Tuning Keras_vggface. International Conference on Computational Science and Computational Intelligence.Google ScholarCross Ref
- Yigitcan Kaya, Sanghyun Hong, and Tudor Dumitras. 2018. Shallow-Deep Networks: Understanding and Mitigating Network Overthinking. In International Conference on Machine Learning.Google Scholar
- Ira Kemelmacher-Shlizerman, Steven M Seitz, Daniel Miller, and Evan Brossard. 2016. The MegaFace Benchmark: 1 Million Faces for Recognition at Scale. In IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. 2017. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. In International Conference on Learning Representations.Google Scholar
- Adam Kohan, Edward A. Rietman, and Hava T. Siegelmann. 2023. Signal Propagation: The Framework for Learning and Inference in a Forward Pass. IEEE Transactions on Neural Networks and Learning Systems.Google Scholar
- Alexandros Kouris and Christos-Savvas Bouganis. 2018. Learning to Fly by MySelf: A Self-Supervised CNN-Based Approach for Autonomous Navigation. In IEEE/RSJ International Conference on Intelligent Robots and Systems.Google Scholar
- Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. https://www.cs.toronto.edu/ kriz/cifar.html.Google Scholar
- Stefanos Laskaridis, Stylianos I. Venieris, Hyeji Kim, and Nicholas D. Lane. 2020. HAPI: Hardware-Aware Progressive Inference. In International Conference on Computer-Aided Design.Google Scholar
- Ya Le and Xuan S. Yang. 2015. Tiny ImageNet Visual Recognition Challenge. http://vision.stanford.edu/teaching/cs231n/reports/2015/pdfs/yle_project.pdfGoogle Scholar
- Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, and Matthias Grundmann. 2019. On-Device Neural Net Inference with Mobile GPUs. arXiv:abs/1907.01989.Google Scholar
- H. Li, H. Zhang, X. Qi, Y. Ruigang, and G. Huang. 2019. Improved Techniques for Training Adaptive Deep Networks. In IEEE/CVF International Conference on Computer Vision.Google Scholar
- Qianli Liao, Joel Z. Leibo, and Tomaso Poggio. 2016. How Important is Weight Symmetry in Backpropagation?. In AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
- Timothy P. Lillicrap, Daniel Cownden, Douglas Blair Tweed, and Colin J. Akerman. 2016. Random Synaptic Feedback Weights Support Error Backpropagation for Deep Learning. Nature Communications.Google Scholar
- Tao Lin, Sebastian U. Stich, Luis Barba, Daniil Dmitriev, and Martin Jaggi. 2020. Dynamic Model Pruning with Feedback. In International Conference on Learning Representations.Google Scholar
- Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2019. DARTS: Differentiable Architecture Search. In International Conference on Learning Representations.Google Scholar
- Joseph Luttrell, Zhaoxian Zhou, Yuanyuan Zhang, Chaoyang Zhang, Ping Gong, Bei Yang, and Runzhi Li. 2018. A Deep Transfer Learning Approach to Fine-Tuning Facial Recognition Models. In IEEE Conference on Industrial Electronics and Applications.Google ScholarCross Ref
- Bishwas Mandal, Adaeze Okeukwu, and Yihong Theis. 2021. Masked Face Recognition using ResNet-50. arXiv:abs/2104.08997.Google Scholar
- Dominic Masters and Carlo Luschi. 2018. Revisiting Small Batch Training for Deep Neural Networks. arXiv:abs/1804.07612.Google Scholar
- Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In International Conference on Artificial Intelligence and Statistics.Google Scholar
- Pavlo Molchanov, Stephen Tyree, Tero Karras, Timo Aila, and Jan Kautz. 2017. Pruning Convolutional Neural Networks for Resource Efficient Inference. In International Conference on Learning Representations.Google Scholar
- Hesham Mostafa and Xin Wang. 2019. Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization. In International Conference on Machine Learning.Google Scholar
- Sarala Padi, Seyed Omid Sadjadi, Dinesh Manocha, and Ram D. Sriram. 2021. Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation. Proceedings of the International Conference on Multimodal Interaction.Google Scholar
- German I. Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. 2019. Continual Lifelong Learning with Neural Networks: A Review. Neural Networks.Google Scholar
- HyeonJung Park, Youngki Lee, and JeongGil Ko. 2021. Enabling Realtime Sign Language Translation on Mobile Platforms with On-board Depth Cameras. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies.Google Scholar
- David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, Lluis-Miquel Munguia, Daniel Rothchild, David R. So, Maud Texier, and Jeff Dean. 2022. The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. Computer (2022).Google Scholar
- Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient Neural Architecture Search via Parameters Sharing. In International Conference on Machine Learning.Google Scholar
- Jaya Prakash Sahoo, Allam Jaya Prakash, Paweł Pławiak, and Saunak Samantray. 2022. Real-Time Hand Gesture Recognition Using Fine-Tuned Convolutional Neural Network. Sensors.Google Scholar
- F. Sarfraz, E. Arani, and B. Zonooz. 2021. Knowledge Distillation Beyond Model Compression. In International Conference on Pattern Recognition.Google Scholar
- Roy Schwartz, Jesse Dodge, Noah A. Smith, and Oren Etzioni. 2020. Green AI. Commun. ACM (2020).Google Scholar
- Shaohuai Shi, Qiang Wang, and Xiaowen Chu. 2020. Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format. In IEEE International Conference on Parallel and Distributed Systems.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.Google Scholar
- Hidenori Tanaka, Daniel Kunin, Daniel L Yamins, and Surya Ganguli. 2020. Pruning neural networks without any data by iteratively conserving synaptic flow. In Advances in Neural Information Processing Systems.Google Scholar
- Surat Teerapittayanon, Bradley McDanel, and H.T. Kung. 2016. BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks. In International Conference on Pattern Recognition.Google Scholar
- Praneeth Vepakomma, Otkrist Gupta, Tristan Swedish, and Ramesh Raskar. 2018. Split Learning for Health: Distributed Deep Learning Without Sharing raw patient data. arXiv:abs/1812.00564.Google Scholar
- Haibin Wang, Ce Ge, Hesen Chen, and Xiuyu Sun. 2023. PreNAS: Preferred One-Shot Learning Towards Efficient Neural Architecture Search. In International Conference on Machine Learning.Google Scholar
- Siqi Wang, Anuj Pathania, and Tulika Mitra. 2020. Neural Network Inference on Mobile SoCs. IEEE Design & Test.Google Scholar
- Yiding Wang, Decang Sun, Kai Chen, Fan Lai, and Mosharaf Chowdhury. 2022. Egeria: Efficient DNN Training with Knowledge-Guided Layer Freezing. European Conference on Computer Systems.Google Scholar
- Zhiyuan Wang, Hongli Xu, Yang Xu, Zhida Jiang, and Jianchun Liu. 2023. CoopFL: Accelerating Federated Learning with DNN Partitioning and Offloading in Heterogeneous Edge Computing. Comput. Netw.Google Scholar
- Bichen Wu, Forrest Iandola, Peter H. Jin, and Kurt Keutzer. 2017. SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops.Google ScholarCross Ref
- Di Wu, Rehmat Ullah, Paul Harvey, Peter Kilpatrick, Ivor Spence, and Blesson Varghese. 2022. FedAdapt: Adaptive Offloading for IoT Devices in Federated Learning. IEEE Internet of Things Journal.Google Scholar
- Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, and Jian Cheng. 2016. Quantized Convolutional Neural Networks for Mobile Devices. In IEEE Conference on Computer Vision and Pattern Recognition.Google Scholar
- Fang Yu, Li Cui, Pengcheng Wang, Chuanqi Han, Ruoran Huang, and Xi Huang. 2021. EasiEdge: A Novel Global Deep Neural Networks Pruning Method for Efficient Edge Computing. IEEE Internet of Things Journal (2021).Google Scholar
- Ruizhe Zhao and Wayne W. C. Luk. 2018. Efficient Structured Pruning and Architecture Searching for Group Convolution. 2019 IEEE/CVF International Conference on Computer Vision Workshop.Google Scholar
- Barret Zoph and Quoc Le. 2017. Neural Architecture Search with Reinforcement Learning. In International Conference on Learning Representations.Google Scholar
Recommendations
TNPU: an efficient accelerator architecture for training convolutional neural networks
ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation ConferenceTraining large scale convolutional neural networks (CNNs) is an extremely computation and memory intensive task that requires massive computational resources and training time. Recently, many accelerator solutions have been proposed to improve the ...
Leveraging MLC STT-RAM for energy-efficient CNN training
MEMSYS '18: Proceedings of the International Symposium on Memory SystemsGraphics Processing Units (GPUs) are extensively used in training of convolutional neural networks (CNNs) due to their promising compute capability. However, GPU memory capacity, bandwidth, and energy are becoming critical system bottlenecks with ...
A durable and energy efficient main memory using phase change memory technology
ISCA '09: Proceedings of the 36th annual international symposium on Computer architectureUsing nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
Comments