Skip to main content

Augmenting Legacy Networks for Flexible Inference

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 Workshops (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13807))

Included in the following conference series:

  • 1499 Accesses

Abstract

Once deployed in the field, Deep Neural Networks (DNNs) run on devices with widely different compute capabilities and whose computational load varies over time. Dynamic network architectures are one of the existing techniques developed to handle the varying computational load in real-time deployments. Here we introduce LeAF (Legacy Augmentation for Flexible inference), a novel paradigm to augment the key-phases of a pre-trained DNN with alternative, trainable, shallow phases that can be executed in place of the original ones. At run time, LeAF allows changing the network architecture without any computational overhead, to effectively handle different loads. LeAF-ResNet50 has a storage overhead of less than 14% with respect to the legacy DNN; its accuracy varies from the original accuracy of 76.1% to 64.8% while requiring 4 to 0.68 GFLOPs, in line with state-of-the-art results obtained with non-legacy and less flexible methods. We examine how LeAF’s dynamic routing strategy impacts the accuracy and the use of the available computational resources as a function of the compute capability and load of the device, with particular attention to the case of an unpredictable batch size. We show that the optimal configurations for a given network can indeed vary based on the system metrics (such as latency or FLOPs), batch size and compute capability of the machine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We note that if the number of configurations becomes large, it may be more efficient to randomly sample the configuration for each mini-batch.

  2. 2.

    The compute cost can be derived analytically in case of FLOPs, or experimentally in case of latency, energy or power consumption.

  3. 3.

    When the compute cost is measured in FLOPS, the batch size will be normalized away. Other systems cost metrics (e.g., latency) may be a function of the batch size, as detailed in the Results Section.

References

  1. Alvarez, J.M., Salzmann, M.: Learning the number of neurons in deep networks. In: NeurIPS (2016)

    Google Scholar 

  2. Cai, H., Gan, C., Wang, T., Zhang, Z., Han, S.: Once for all: train one network and specialize it for efficient deployment. In: ICLR (2020)

    Google Scholar 

  3. Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: ZeroQ: a novel zero shot quantization framework. In: CVPR (2020)

    Google Scholar 

  4. Dai, X., et al.: ChamNet: Towards efficient network design through platform-aware model adaptation. In: CVPR (2019)

    Google Scholar 

  5. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  6. Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: RepVGG: Making VGG-style convnets great again. In: CVPR (2021)

    Google Scholar 

  7. Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: a survey. IEEE Trans, PAMI (2021)

    Google Scholar 

  8. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38

    Chapter  Google Scholar 

  10. He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: ICCV (2017)

    Google Scholar 

  11. Hu, H., Dey, D., Hebert, M., Bagnell, J.: Learning anytime predictions in neural networks via adaptive loss balancing. In: AAAI (2019)

    Google Scholar 

  12. Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.: Multi-scale dense networks for resource efficient image classification. In: ICLR (2018)

    Google Scholar 

  13. Huang, G., Sun, Yu., Liu, Z., Sedra, D., Weinberger, K.Q.: Deep networks with stochastic depth. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 646–661. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_39

    Chapter  Google Scholar 

  14. Jastrzebski, S., Arpit, D., Ballas, N., Verma, V., Che, T., Bengio, Y.: Residual connections encourage iterative inference. In: ICLR (2018)

    Google Scholar 

  15. Lee, N., Ajanthan, T., Torr, P.H.: Snip: Single-shot network pruning based on connection sensitivity. CoRR abs/1810.02340 (2018)

    Google Scholar 

  16. Li, B., Wu, B., Su, J., Wang, G.: EagleEye: fast sub-net evaluation for efficient neural network pruning. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 639–654. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_38

    Chapter  Google Scholar 

  17. Molchanov, P., Mallya, A., Tyree, S., Frosio, I., Kautz, J.: Importance estimation for neural network pruning. In: CVPR (2019)

    Google Scholar 

  18. NVIDIA: CUDA Toolkit Documentation. http://www.docs.nvidia.com/cuda/profiler-users-guide/index.html. Accessed 30 Oct 2021

  19. Paszke, A., et al.: PyTorch: an imperative style, high-Performance deep learning library. In: NEURIPS (2019)

    Google Scholar 

  20. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: CVPR (2018)

    Google Scholar 

  21. Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q.V., Hinton, G.E., Dean, J.: Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. In: ICLR (2017)

    Google Scholar 

  22. Shen, M., Yin, H., Molchanov, P., Mao, L., Liu, J., Alvarez, J.M.: HALP: hardware-aware latency pruning. CoRR abs/2110.10811 (2021)

    Google Scholar 

  23. Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: CVPR (2016)

    Google Scholar 

  24. Teerapittayanon, S., McDanel, B., Kung, H.: BranchyNet: fast inference via early exiting from deep neural networks. In: ICPR (2016)

    Google Scholar 

  25. Veit, A., Belongie, S.: Convolutional networks with adaptive inference graphs. Int. J. Comput. Vis. 128(3), 730–741 (2019). https://doi.org/10.1007/s11263-019-01190-4

    Article  Google Scholar 

  26. Wang, W., et al.: Accelerate CNNs from three dimensions: a comprehensive pruning framework. In: ICML (2021)

    Google Scholar 

  27. Wang, X., Yu, F., Dou, Z.-Y., Darrell, T., Gonzalez, J.E.: SkipNet: learning dynamic routing in convolutional networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 420–436. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_25

    Chapter  Google Scholar 

  28. Wang, Y., et al.: Dual dynamic inference: enabling more efficient, adaptive, and controllable deep inference. IEEE J. Selected Top. Sig. Process. 14(4), 623–633 (2020)

    Google Scholar 

  29. Yang, T.-J., et al.: NetAdapt: platform-aware neural network adaptation for mobile applications. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 289–304. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_18

    Chapter  Google Scholar 

  30. Yu, J., Huang, T.S.: Network slimming by slimmable networks: towards one-shot architecture search for channel numbers. CoRR abs/1903.11728 (2019). http://arxiv.org/1903.11728

  31. Yu, J., Yang, L., Xu, N., Yang, J., Huang, T.: Slimmable neural networks. In: ICLR (2019)

    Google Scholar 

  32. Zhou, W., Xu, C., Ge, T., McAuley, J.J., Xu, K., Wei, F.: Bert loses patience: fast and robust inference with early exit. In: NeurIPS (2020)

    Google Scholar 

  33. Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. In: ICLR (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason Clemons .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Clemons, J., Frosio, I., Shen, M., Alvarez, J.M., Keckler, S. (2023). Augmenting Legacy Networks for Flexible Inference. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13807. Springer, Cham. https://doi.org/10.1007/978-3-031-25082-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25082-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25081-1

  • Online ISBN: 978-3-031-25082-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics