research-article

TinyM²Net-V2: A Compact Low-power Software Hardware Architecture for Multimodal Deep Neural Networks

Authors:

Hasib-Al Rashid,

Utteja Kallakuri,

Tinoosh MohseninAuthors Info & Claims

ACM Transactions on Embedded Computing Systems, Volume 23, Issue 3

Article No.: 47, Pages 1 - 23

https://doi.org/10.1145/3595633

Published: 11 May 2024 Publication History

Get Access

Abstract

With the evaluation of Artificial Intelligence (AI), there has been a resurgence of interest in how to use AI algorithms on low-power embedded systems to broaden potential use cases of the Internet of Things (IoT). To mimic multimodal human perception, multimodal deep neural networks (M-DNN) have recently become very popular with the classification task due to their impressive performance for computer vision and audio processing tasks. This article presents TinyM²Net-V2—a compact low-power software hardware architecture for multimodal deep neural networks for resource-constrained tiny devices. To compress the models to implement on tiny devices, cyclicly sparsification and hybrid quantization (4-bits weights and 8-bits activations) methods are used. Although model compression techniques are an active research area, we are the first to demonstrate their efficacy for multimodal deep neural networks, using cyclicly sparsification and hybrid quantization of weights/activations. TinyM²Net-V2 shows that even a tiny multimodal deep neural network model can improve the classification accuracy more than that of any unimodal counterparts. Parameterized M-DNN model architecture was designed to be evaluated in two different case-studies: vehicle detection from multimodal images and audios and COVID-19 detection from multimodal audio recordings. The most compressed TinyM²Net-V2 achieves 92.5% COVID-19 detection accuracy (6.8% improvement from the unimodal full precision model) and 90.6% vehicle classification accuracy (7.7% improvement from the unimodal full precision model). A parameterized and flexible FPGA hardware accelerator was designed as well for TinyM²Net-V2 models. To the best of our knowledge, this is the first work accelerating multimodal deep neural network models on low-power Artix-7 FPGA hardware. We achieved energy efficiency of 9.04 GOP/s/W and 15.38 GOP/s/W for case-study 1 and case-study 2, respectively, which is comparable to the state-of-the-art results. Finally, we compared our tiny FPGA hardware implementation results with off-the-shelf resource-constrained devices and showed our implementation is faster and consumed less power compared to the off-the-shelf resource-constrained devices.

References

[1]

Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, and Frédéric Pétrot. 2017. Ternary neural networks for resource-efficient AI applications. In Proceedings of the International Joint Conference on Neural Networks (IJCNN). IEEE, 2547–2554.

Abstract

References

Cited By

Index Terms

Recommendations

Reconfigurable Hardware Accelerator for Convolution Operations in Convolutional Neural Networks

FILM-QNN: Efficient FPGA Acceleration of Deep Neural Networks with Intra-Layer, Mixed-Precision Quantization

Hardware architecture for a general regression neural network coprocessor

Comments

Information

Published In

Publisher

Journal Family

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

Share

Share this Publication link

Share on social media

Affiliations