skip to main content
10.1145/3673038.3673107acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
Open access

Dissecting Convolutional Neural Networks for Runtime and Scalability Prediction

Published: 12 August 2024 Publication History


Given the computational complexity of deep neural networks (DNN), accurate prediction of their training and inference time using performance modeling is crucial for efficient infrastructure planning and DNN development. However, existing methods often predict only the inference time and rely on exhaustive benchmarking and fine tuning, making them time consuming and restricted in scope. As a remedy, we propose ConvMeter, a novel yet simple performance model that considers the inherent characteristics of DNNs, such as architecture, dataset, and target hardware, which strongly affect their runtime and scalability. Our performance model, which has been thoroughly tested on convolutional neural networks (ConvNets), a class of DNNs widely used for image analysis, offers the prediction of inference and training time, the latter on one or more compute nodes. Experiments with various ConvNets demonstrate that our runtime predictions of inference and training phases achieved an average error rate of less than 20% and 18%, respectively, making the assessment of ConvNets regarding efficiency and scalability straightforward.

Supplemental Material

PDF File - Appendix: Artifact Description/Artifact Evaluation
Given the computational complexity of deep neural networks (DNN), accurate prediction of their training and inference time using performance modeling is crucial for efficient infrastructure planning and DNN development. However, existing methods often rely on exhaustive inference-time benchmarking, making them time consuming and restricted in scope. As a remedy, we propose a novel yet simple performance model that considers the inherent characteristics of DNNs, such as architecture, dataset, and target hardware, which strongly affect their runtime and scalability.


Ermao Cai, Da-Cheng Juan, Dimitrios Stamoulis, and Diana Marculescu. 2017. NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks. CoRR abs/1710.05420 (2017). arXiv:1710.05420
Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once for All: Train One Network and Specialize it for Efficient Deployment. In Proc. of International Conference on Learning Representations (ICLR).
Han Cai, Ligeng Zhu, and Song Han. 2019. ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware. In Proc. of International Conference on Learning Representations (ICLR).
Piotr Dollár, Mannat Singh, and Ross B. Girshick. 2021. Fast and Accurate Model Scaling. CoRR abs/2103.06877 (2021). arXiv:2103.06877
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural Architecture Search: A Survey. Journal of Machine Learning Research 20, 1 (jan 2019), 1997–2017.
Yanjie Gao, Xianyu Gu, Hongyu Zhang, Haoxiang Lin, and Mao Yang. 2023. Runtime Performance Prediction for Deep Learning Models with Graph Neural Network. In Proc. of 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 368–380.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. CoRR abs/1512.03385 (2015). arXiv:1512.03385
Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, and Hartwig Adam. 2019. Searching for MobileNetV3. CoRR abs/1905.02244 (2019). arXiv:1905.02244
Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016). arXiv:1602.07360
Daniel Justus, John Brennan, Stephen Bonner, and Andrew Stephen McGough. 2018. Predicting the Computational Cost of Deep Learning Models. CoRR abs/1811.11880 (2018). arXiv:1811.11880
Albert Njoroge Kahira, Truong Thao Nguyen, Leonardo Bautista Gomez, Ryousei Takano, Rosa M. Badia, and Mohamed Wahib. 2021. An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks. In Proc. of the International Symposium on High-Performance Parallel and Distributed Computing. 161–173.
Alex Krizhevsky. 2014. One weird trick for parallelizing convolutional neural networks. CoRR abs/1404.5997 (2014). arXiv:1404.5997
J Gregory Pauloski, Lei Huang, Weijia Xu, Kyle Chard, Ian T Foster, and Zhao Zhang. 2022. Deep Neural Network Training With Distributed K-FAC. IEEE Transactions on Parallel and Distributed Systems 33, 12 (2022), 3616–3627.
Ziqian Pei, Chensheng Li, Xiaowei Qin, Xiaohui Chen, and Guo Wei. 2019. Iteration Time Prediction for CNN in Multi-GPU Platform: Modeling and Analysis. IEEE Access 7 (2019), 64788–64797.
Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, Chen Meng, and Wei Lin. 2021. DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters. IEEE Transactions on Parallel and Distributed Systems 32, 8 (2021), 1947–1960.
Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. 2018. Efficient neural architecture search via parameters sharing. In Proc. of International Conference on Machine Learning (ICML). PMLR, 4095–4104.
Hang Qi, Evan R. Sparks, and Ameet Talwalkar. 2017. Paleo: A Performance Model for Deep Neural Networks. In Proc. of International Conference on Learning Representations (ICLR).
Aurick Qiao, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, and Eric P. Xing. 2020. Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning. CoRR abs/2008.12260 (2020). arXiv:2008.12260
Ilija Radosavovic, Justin Johnson, Saining Xie Wan-Yen Lo, and Piotr Dollár. 2019. On Network Design Spaces for Visual Recognition. In Proc. of International Conference on Computer Vision (ICCV).
Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, and Piotr Dollár. 2020. Designing Network Design Spaces. In Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Ilija Radosavovic, Raj Prateek Kosaraju, Ross B. Girshick, Kaiming He, and Piotr Dollár. 2020. Designing Network Design Spaces. CoRR abs/2003.13678 (2020). arXiv:2003.13678
Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. CoRR abs/1801.04381 (2018). arXiv:1801.04381
Panner Selvam, Karthick, and Mats Brorsson. 2023. DIPPM: A Deep Learning Inference Performance Predictive Model Using Graph Neural Networks. In Proc. of Euro-Par 2023: Parallel Processing, José Cano, Marios D. Dikaiakos, George A. Papadopoulos, Miquel Pericàs, and Rizos Sakellariou (Eds.). 3–16.
Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018).
Christopher J. Shallue, Jaehoon Lee, Joseph M. Antognini, Jascha Sohl-Dickstein, Roy Frostig, and George E. Dahl. 2018. Measuring the Effects of Data Parallelism on Neural Network Training. CoRR abs/1811.03600 (2018).
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proc. of 3rd International Conference on Learning Representations (ICLR), Yoshua Bengio and Yann LeCun (Eds.).
Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. CoRR abs/1905.11946 (2019). arXiv:1905.11946
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2018. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. CoRR abs/1812.03443 (2018). arXiv:1812.03443
Saining Xie, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2016. Aggregated Residual Transformations for Deep Neural Networks. CoRR abs/1611.05431 (2016). arXiv:1611.05431
Geoffrey X. Yu, Yubo Gao, Pavel Golikov, and Gennady Pekhimenko. 2021. Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach. CoRR abs/2102.00527 (2021). arXiv:2102.00527
Geoffrey X. Yu, Yubo Gao, Pavel Golikov, and Gennady Pekhimenko. 2021. Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training. In Proc. of 2021 USENIX Annual Technical Conference (USENIX ATC 21). 503–521.
Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, and Yunxin Liu. 2021. nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices. In Proc. of the 19th Annual International Conference on Mobile Systems, Applications, and Services. ACM, New York, NY, USA, 81–93.

Index Terms

  1. Dissecting Convolutional Neural Networks for Runtime and Scalability Prediction



      Information & Contributors


      Published In

      cover image ACM Other conferences
      ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing
      August 2024
      1279 pages
      This work is licensed under a Creative Commons Attribution International 4.0 License.


      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2024

      Check for updates


      Author Tags

      1. Artificial intelligence
      2. convolution
      3. deep neural networks
      4. distributed training
      5. performance modeling
      6. scalability


      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • German Federal Ministry of Education and Research (BMBF)
      • Hessian Ministry of Science and Research, Art and Culture (HMWK)
      • German Research Foundation (DFG)
      • Gauss Centre for Supercomputing e.V.


      ICPP '24

      Acceptance Rates

      Overall Acceptance Rate 91 of 313 submissions, 29%


      Other Metrics

      Bibliometrics & Citations


      Article Metrics

      • 0
        Total Citations
      • 187
        Total Downloads
      • Downloads (Last 12 months)187
      • Downloads (Last 6 weeks)48
      Reflects downloads up to 30 Jan 2025

      Other Metrics


      View Options

      View options


      View or Download as a PDF file.



      View online with eReader.


      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options






      Share this Publication link

      Share on social media