skip to main content
10.1145/3673038.3673107acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article
Open access

Dissecting Convolutional Neural Networks for Runtime and Scalability Prediction

Published: 12 August 2024 Publication History

Abstract

Given the computational complexity of deep neural networks (DNN), accurate prediction of their training and inference time using performance modeling is crucial for efficient infrastructure planning and DNN development. However, existing methods often predict only the inference time and rely on exhaustive benchmarking and fine tuning, making them time consuming and restricted in scope. As a remedy, we propose ConvMeter, a novel yet simple performance model that considers the inherent characteristics of DNNs, such as architecture, dataset, and target hardware, which strongly affect their runtime and scalability. Our performance model, which has been thoroughly tested on convolutional neural networks (ConvNets), a class of DNNs widely used for image analysis, offers the prediction of inference and training time, the latter on one or more compute nodes. Experiments with various ConvNets demonstrate that our runtime predictions of inference and training phases achieved an average error rate of less than 20% and 18%, respectively, making the assessment of ConvNets regarding efficiency and scalability straightforward.

Supplemental Material

PDF File - Appendix: Artifact Description/Artifact Evaluation
Given the computational complexity of deep neural networks (DNN), accurate prediction of their training and inference time using performance modeling is crucial for efficient infrastructure planning and DNN development. However, existing methods often rely on exhaustive inference-time benchmarking, making them time consuming and restricted in scope. As a remedy, we propose a novel yet simple performance model that considers the inherent characteristics of DNNs, such as architecture, dataset, and target hardware, which strongly affect their runtime and scalability.

References

[1]
Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, and Diana Marculescu. 2017. NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks. CoRR abs/1710.05420 (2017). arXiv:1710.05420
[2]
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once for All: Train One Network and Specialize it for Efficient Deployment. In Proc. of International Conference on Learning Representations (ICLR).
[3]
Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In Proc. of International Conference on Learning Representations (ICLR).
[4]
Piotr Dollár, Mannat Singh, and Ross B. Girshick. 2021. Fast and Accurate Model Scaling. CoRR abs/2103.06877 (2021). arXiv:2103.06877
[5]
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural Architecture Search: A Survey. Journal of Machine Learning Research 20, 1 (jan 2019), 1997–2017.
[6]
Yanjie Gao, Xianyu Gu, Hongyu Zhang, Haoxiang Lin, and Mao Yang. 2023. Runtime Performance Prediction for Deep Learning Models with Graph Neural Network. In Proc. of 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 368–380.
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs/1512.03385 (2015). arXiv:1512.03385
[8]
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for MobileNetV3. CoRR abs/1905.02244 (2019). arXiv:1905.02244
[9]
Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016). arXiv:1602.07360
[10]
Daniel Justus, John Brennan, Stephen Bonner, and Andrew Stephen McGough. 2018. Predicting the Computational Cost of Deep Learning Models. CoRR abs/1811.11880 (2018). arXiv:1811.11880
[11]
Albert Njoroge Kahira, Truong Thao Nguyen, Leonardo Bautista Gomez, Ryousei Takano, Rosa M. Badia, and Mohamed Wahib. 2021. An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks. In Proc. of the International Symposium on High-Performance Parallel and Distributed Computing. 161–173.
[12]
Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. CoRR abs/1404.5997 (2014). arXiv:1404.5997
[13]
J Gregory Pauloski, Lei Huang, Weijia Xu, Kyle Chard, Ian T Foster, and Zhao Zhang. 2022. Deep Neural Network Training With Distributed K-FAC. IEEE Transactions on Parallel and Distributed Systems 33, 12 (2022), 3616–3627.
[14]
Ziqian Pei, Chensheng Li, Xiaowei Qin, Xiaohui Chen, and Guo Wei. 2019. Iteration Time Prediction for CNN in Multi-GPU Platform: Modeling and Analysis. IEEE Access 7 (2019), 64788–64797.
[15]
Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, Chen Meng, and Wei Lin. 2021. DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters. IEEE Transactions on Parallel and Distributed Systems 32, 8 (2021), 1947–1960. https://doi.org/10.1109/TPDS.2021.3052895
[16]
Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient neural architecture search via parameters sharing. In Proc. of International Conference on Machine Learning (ICML). PMLR, 4095–4104.
[17]
Hang Qi, Evan R. Sparks, and Ameet Talwalkar. 2017. Paleo: A Performance Model for Deep Neural Networks. In Proc. of International Conference on Learning Representations (ICLR).
[18]
Aurick Qiao, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing. 2020. Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning. CoRR abs/2008.12260 (2020). arXiv:2008.12260
[19]
Ilija Radosavovic, Justin Johnson, Saining Xie Wan-Yen Lo, and Piotr Dollár. 2019. On Network Design Spaces for Visual Recognition. In Proc. of International Conference on Computer Vision (ICCV).
[20]
Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Designing Network Design Spaces. In Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[21]
Ilija Radosavovic, Raj Prateek Kosaraju, Ross B. Girshick, Kaiming He, and Piotr Dollár. 2020. Designing Network Design Spaces. CoRR abs/2003.13678 (2020). arXiv:2003.13678
[22]
Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. CoRR abs/1801.04381 (2018). arXiv:1801.04381
[23]
Panner Selvam, Karthick, and Mats Brorsson. 2023. DIPPM: A Deep Learning Inference Performance Predictive Model Using Graph Neural Networks. In Proc. of Euro-Par 2023: Parallel Processing, José Cano, Marios D. Dikaiakos, George A. Papadopoulos, Miquel Pericàs, and Rizos Sakellariou (Eds.). 3–16.
[24]
Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018).
[25]
Christopher J. Shallue, Jaehoon Lee, Joseph M. Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E. Dahl. 2018. Measuring the Effects of Data Parallelism on Neural Network Training. CoRR abs/1811.03600 (2018).
[26]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proc. of 3rd International Conference on Learning Representations (ICLR), Yoshua Bengio and Yann LeCun (Eds.).
[27]
Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. CoRR abs/1905.11946 (2019). arXiv:1905.11946
[28]
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2018. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. CoRR abs/1812.03443 (2018). arXiv:1812.03443
[29]
Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2016. Aggregated Residual Transformations for Deep Neural Networks. CoRR abs/1611.05431 (2016). arXiv:1611.05431
[30]
Geoffrey X. Yu, Yubo Gao, Pavel Golikov, and Gennady Pekhimenko. 2021. Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach. CoRR abs/2102.00527 (2021). arXiv:2102.00527
[31]
Geoffrey X. Yu, Yubo Gao, Pavel Golikov, and Gennady Pekhimenko. 2021. Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training. In Proc. of 2021 USENIX Annual Technical Conference (USENIX ATC 21). 503–521.
[32]
Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, and Yunxin Liu. 2021. nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices. In Proc. of the 19th Annual International Conference on Mobile Systems, Applications, and Services. ACM, New York, NY, USA, 81–93.

Index Terms

  1. Dissecting Convolutional Neural Networks for Runtime and Scalability Prediction

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing
      August 2024
      1279 pages
      ISBN:9798400717932
      DOI:10.1145/3673038
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2024

      Check for updates

      Badges

      Author Tags

      1. Artificial intelligence
      2. convolution
      3. deep neural networks
      4. distributed training
      5. performance modeling
      6. scalability

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • German Federal Ministry of Education and Research (BMBF)
      • Hessian Ministry of Science and Research, Art and Culture (HMWK)
      • German Research Foundation (DFG)
      • Gauss Centre for Supercomputing e.V.

      Conference

      ICPP '24

      Acceptance Rates

      Overall Acceptance Rate 91 of 313 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 187
        Total Downloads
      • Downloads (Last 12 months)187
      • Downloads (Last 6 weeks)48
      Reflects downloads up to 30 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media