Skip to main content

MMExit: Enabling Fast and Efficient Multi-modal DNN Inference with Adaptive Network Exits

  • Conference paper
  • First Online:
Euro-Par 2023: Parallel Processing (Euro-Par 2023)

Abstract

Multi-modal DNNs have been demonstrated to outperform the best uni-modal DNNs by fusing information from different modalities. However, the performance improvement of multi-modal DNNs is always associated with an incredible increase in computational cost (e.g., network parameters, MAC operations, etc.) to handle more modalities, which ultimately makes them impractical for many real-world applications where computing capability is limited.

In this paper, we propose MMExit, a multi-modal exit architecture that allows for computing appropriate modalities and layers to predict results for different data samples. To this end, we define a novel metric called utility of exit (UoE) to measure the correlations of performance and computational cost for different exits. We then use an equivalent modality serialization method to map the two-dimensional exit space into an equivalent linear space and rank the exits according to their UoE to achieve fast and adaptive inference. To train the MMExit network, we devise a joint loss function which synthesizes the features of different modalities and layers. Our results show that MMExit can slash up to 48.72% of MAC operations with the best performance compared to SOTA multi-modal architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Akhtar, M.S., Chauhan, D.S., Ghosal, D., Poria, S., Ekbal, A., Bhattacharyya, P.: Multi-task learning for multi-modal emotion recognition and sentiment analysis. In: NAACL-HLT (2019)

    Google Scholar 

  2. Arevalo, J., Solorio, T., Montes-y Gómez, M., González, F.A.: Gated multimodal units for information fusion. In: ICLR (2017)

    Google Scholar 

  3. Bach, F.R., Lanckriet, G.R., Jordan, M.I.: Multiple kernel learning, conic duality, and the SMO algorithm. In: ICML (2004)

    Google Scholar 

  4. Bhattacharjee, A., et al.: MIME: adapting a single neural network for multi-task inference with memory-efficient dynamic pruning. In: DAC (2022)

    Google Scholar 

  5. Castro, S., Hazarika, D., Pérez-Rosas, V., Zimmermann, R., Mihalcea, R., Poria, S.: Towards multimodal sarcasm detection (an _obviously_ perfect paper). In: ACL (2019)

    Google Scholar 

  6. Choi, K., Yang, H.: A GPU architecture aware fine-grain pruning technique for deep neural networks. In: Sousa, L., Roma, N., Tomás, P. (eds.) Euro-Par 2021. LNCS, vol. 12820, pp. 217–231. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85665-6_14

    Chapter  Google Scholar 

  7. Cui, W., et al.: DVABatch: diversity-aware multi-entry multi-exit batching for efficient processing of DNN services on GPUs. In: USENIX ATC (2022)

    Google Scholar 

  8. Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: a survey. TPAMI 44, 7436–7456 (2021)

    Article  Google Scholar 

  9. Hasan, M.K., et al.: Humor knowledge enriched transformer for understanding multimodal humor. In: AAAI (2021)

    Google Scholar 

  10. Hou, X., et al.: Architecting efficient multi-modal AIoT systems. In: ISCA (2023)

    Google Scholar 

  11. Hou, X., et al.: Characterizing and understanding end-to-end multi-modal neural networks on GPUs. In: IEEE CAL (2022)

    Google Scholar 

  12. Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.Q.: Multi-scale dense networks for resource efficient image classification. In: ICLR (2018)

    Google Scholar 

  13. Jayakumar, S.M., et al.: Multiplicative interactions and where to find them. In: ICLR (2020)

    Google Scholar 

  14. Kim, W., Son, B., Kim, I.: ViLT: vision-and-language transformer without convolution or region supervision. In: ICML (2021)

    Google Scholar 

  15. Laskaridis, S., Kouris, A., Lane, N.D.: Adaptive inference through early-exit networks: design, challenges and directions. In: MobiSys (2021)

    Google Scholar 

  16. Liang, P.P., et al.: MultiBench: multiscale benchmarks for multimodal representation learning. In: NeurIPS (2021)

    Google Scholar 

  17. Liu, J., Hou, X., Tang, F.: Fine-grained machine teaching with attention modeling. In: AAAI (2020)

    Google Scholar 

  18. Liu, Z., Shen, Y., Lakshminarasimhan, V.B., Liang, P.P., Zadeh, A., Morency, L.P.: Efficient low-rank multimodal fusion with modality-specific factors. In: ACL (2018)

    Google Scholar 

  19. Neverova, N., Wolf, C., Taylor, G.W., Nebout, F.: Multi-scale deep learning for gesture detection and localization. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014. LNCS, vol. 8925, pp. 474–490. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16178-5_33

    Chapter  Google Scholar 

  20. Peng, X., Wei, Y., Deng, A., Wang, D., Hu, D.: Balanced multimodal learning via on-the-fly gradient modulation. In: CVPR (2022)

    Google Scholar 

  21. Pham, H., Liang, P.P., Manzini, T., Morency, L.P., Póczos, B.: Found in translation: learning robust joint representations by cyclic translations between modalities. In: AAAI (2019)

    Google Scholar 

  22. Scardapane, S., Scarpiniti, M., Baccarelli, E., Uncini, A.: Why should we add early exits to neural networks? Cogn. Comput. 12, 954–966 (2020)

    Article  Google Scholar 

  23. Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks. In: Synthesis Lectures on Computer Architecture (2020)

    Google Scholar 

  24. Teerapittayanon, S., McDanel, B., Kung, H.T.: BranchyNet: fast inference via early exiting from deep neural networks. In: ICPR (2016)

    Google Scholar 

  25. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. TPAMI 39, 652–663 (2016)

    Article  Google Scholar 

  26. Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. In: EMNLP (2017)

    Google Scholar 

  27. Zhang, C., Yang, Z., He, X., Deng, L.: Multimodal intelligence: representation learning, information fusion, and applications. JSTSP 14, 478–493 (2020)

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Key R &D Program of China under grant No.2021ZD0110104, and the National Natural Science Foundation of China under grant No.62122053. It was also partially supported by ACCESS - AI Chip Center for Emerging Smart Systems, InnoHK funding, Hong Kong SAR. We thank all the anonymous reviewers for their valuable feedback.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chao Li or Kwang-Ting Cheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hou, X. et al. (2023). MMExit: Enabling Fast and Efficient Multi-modal DNN Inference with Adaptive Network Exits. In: Cano, J., Dikaiakos, M.D., Papadopoulos, G.A., Pericàs, M., Sakellariou, R. (eds) Euro-Par 2023: Parallel Processing. Euro-Par 2023. Lecture Notes in Computer Science, vol 14100. Springer, Cham. https://doi.org/10.1007/978-3-031-39698-4_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39698-4_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39697-7

  • Online ISBN: 978-3-031-39698-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics