skip to main content
research-article

TinyM2Net-V2: A Compact Low-power Software Hardware Architecture for Multimodal Deep Neural Networks

Published: 11 May 2024 Publication History

Abstract

With the evaluation of Artificial Intelligence (AI), there has been a resurgence of interest in how to use AI algorithms on low-power embedded systems to broaden potential use cases of the Internet of Things (IoT). To mimic multimodal human perception, multimodal deep neural networks (M-DNN) have recently become very popular with the classification task due to their impressive performance for computer vision and audio processing tasks. This article presents TinyM2Net-V2—a compact low-power software hardware architecture for multimodal deep neural networks for resource-constrained tiny devices. To compress the models to implement on tiny devices, cyclicly sparsification and hybrid quantization (4-bits weights and 8-bits activations) methods are used. Although model compression techniques are an active research area, we are the first to demonstrate their efficacy for multimodal deep neural networks, using cyclicly sparsification and hybrid quantization of weights/activations. TinyM2Net-V2 shows that even a tiny multimodal deep neural network model can improve the classification accuracy more than that of any unimodal counterparts. Parameterized M-DNN model architecture was designed to be evaluated in two different case-studies: vehicle detection from multimodal images and audios and COVID-19 detection from multimodal audio recordings. The most compressed TinyM2Net-V2 achieves 92.5% COVID-19 detection accuracy (6.8% improvement from the unimodal full precision model) and 90.6% vehicle classification accuracy (7.7% improvement from the unimodal full precision model). A parameterized and flexible FPGA hardware accelerator was designed as well for TinyM2Net-V2 models. To the best of our knowledge, this is the first work accelerating multimodal deep neural network models on low-power Artix-7 FPGA hardware. We achieved energy efficiency of 9.04 GOP/s/W and 15.38 GOP/s/W for case-study 1 and case-study 2, respectively, which is comparable to the state-of-the-art results. Finally, we compared our tiny FPGA hardware implementation results with off-the-shelf resource-constrained devices and showed our implementation is faster and consumed less power compared to the off-the-shelf resource-constrained devices.

References

[1]
Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, and Frédéric Pétrot. 2017. Ternary neural networks for resource-efficient AI applications. In Proceedings of the International Joint Conference on Neural Networks (IJCNN). IEEE, 2547–2554.
[2]
Sajid Anwar, Kyuyeon Hwang, and Wonyong Sung. 2017. Structured pruning of deep convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. 13, 3 (2017), 1–18.
[3]
Colby Banbury, Chuteng Zhou, Igor Fedorov, Ramon Matas, Urmish Thakker, Dibakar Gope, Vijay Janapa Reddi, Matthew Mattina, and Paul Whatmough. 2021. MicroNets: Neural network architectures for deploying TinyML applications on commodity microcontrollers. Proc. Mach. Learn. Syst. 3 (2021).
[4]
George Barnum, Sabera Talukder, and Yisong Yue. 2020. On the benefits of early fusion in multimodal representation learning. arXiv preprint arXiv:2011.07191 (2020).
[5]
Claudionor N. Coelho, Aki Kuusela, Shan Li, Hao Zhuang, Jennifer Ngadiuba, Thea Klaeboe Aarrestad, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, and Sioni Summers. 2021. Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors. Nat. Mach. Intell. 3, 8 (2021), 675–686.
[6]
Emon Dey and Nirmalya Roy. 2020. OMAD: On-device mental anomaly detection for substance and non-substance users. In Proceedings of the IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE). 466–471. DOI:
[7]
Changxing Ding and Dacheng Tao. 2015. Robust face recognition via multimodal deep face representation. IEEE Trans. Multim. 17, 11 (2015), 2049–2058.
[8]
Igor Fedorov, Marko Stamenovic, Carl Jensen, Li-Chia Yang, Ari Mandell, Yiming Gan, Matthew Mattina, and Paul N. Whatmough. 2020. TinyLSTMs: Efficient neural speech enhancement for hearing aids. arXiv preprint arXiv:2005.11138 (2020).
[9]
Konrad Gadzicki, Razieh Khamsehashari, and Christoph Zetzsche. 2020. Early vs. late fusion in multimodal convolutional neural networks. In Proceedings of the IEEE 23rd International Conference on Information Fusion (FUSION). IEEE, 1–6.
[10]
Zahid Hasan, Emon Dey, Sreenivasan Ramasamy Ramamurthy, Nirmalya Roy, and Archan Misra. 2022. RhythmEdge: Enabling contactless heart rate estimation on the edge. In Proceedings of the IEEE International Conference on Smart Computing (SMARTCOMP). 92–99. DOI:
[11]
Morteza Hosseini, Mark Horton, et al. 2019. On the complexity reduction of dense layers from \(O(N^2)\) to \(O(N logN)\) with cyclic sparsely connected layers. In Proceedings of the 56th Annual Design Automation Conference. ACM.
[12]
Morteza Hosseini, Nitheesh Kumar Manjunath, Bharat Prakash, Arnab Mazumder, Vandana Chandrareddy, Houman Homayoun, and Tinoosh Mohsenin. 2021. Cyclic sparsely connected architectures for compact deep convolutional neural networks. IEEE Trans. Very Large Scale Integ. Syst. 29, 10 (2021), 1757–1770.
[13]
Morteza Hosseini and Tinoosh Mohsenin. 2021. QS-NAS: Optimally quantized scaled architecture search to enable efficient on-device micro-AI. IEEE J. Emerg. Select. Topics Circ. Syst. (2021).
[14]
Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, and Daniel Soudry. 2020. Improving post training neural quantization: Layer-wise calibration and integer programming. arXiv preprint arXiv:2006.10518 (2020).
[15]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.
[16]
Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014).
[17]
Gargi Joshi, Rahee Walambe, and Ketan Kotecha. 2021. A review on explainability in multimodal deep neural nets. IEEE Access 9 (2021), 59800–59821.
[18]
Ryan Kiros, Karteek Popuri, Dana Cobzas, and Martin Jagersand. 2014. Stacked multiscale feature learning for domain independent medical image segmentation. In Proceedings of the International Workshop on Machine Learning in Medical Imaging. Springer, 25–32.
[19]
Carl Lemaire, Andrew Achkar, and Pierre-Marc Jodoin. 2019. Structured pruning of neural networks with budget-aware regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9108–9116.
[20]
Ian Lenz, Honglak Lee, and Ashutosh Saxena. 2015. Deep learning for detecting robotic grasps. Int. J. Robot. Res. 34, 4-5 (2015), 705–724.
[21]
Shuang Liang, Shouyi Yin, Leibo Liu, Wayne Luk, and Shaojun Wei. 2018. FP-BNN: Binarized neural network on FPGA. Neurocomputing 275 (2018), 1072–1086.
[22]
Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, and Song Han. 2020. MCUNet: Tiny deep learning on IoT devices. arXiv preprint arXiv:2007.10319 (2020).
[23]
Siqi Liu, Sidong Liu, Weidong Cai, Hangyu Che, Sonia Pujol, Ron Kikinis, Dagan Feng, Michael J. Fulham, et al. 2014. Multimodal neuroimaging feature learning for multiclass diagnosis of Alzheimer’s disease. IEEE Trans. Biomed. Eng. 62, 4 (2014), 1132–1140.
[24]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431–3440.
[25]
Arnab Neelim Mazumder, Jian Meng, Hasib-Al Rashid, Utteja Kallakuri, Xin Zhang, Jae-sun Seo, and Tinoosh Mohsenin. 2021. A survey on the optimization of neural network accelerators for micro-AI on-device inference. IEEE J. Emerg. Select. Topics Circ. Syst. (2021).
[26]
Arnab Neelim Mazumder, Haoran Ren, Hasib-Al Rashid, Morteza Hosseini, Vandana Chandrareddy, Houman Homayoun, and Tinoosh Mohsenin. 2021. Automatic detection of respiratory symptoms using a low power multi-input CNN processor. IEEE Des. Test (2021). DOI:
[27]
Arnab Neelim Mazumder, Haoran Ren, Hasib-Al Rashid, Morteza Hosseini, Vandana Chandrareddy, Houman Homayoun, and Tinoosh Mohsenin. 2021. Automatic detection of respiratory symptoms using a low power multi-input CNN processor. IEEE Des. Test (2021).
[28]
Jian Meng, Shreyas Kolala Venkataramanaiah, Chuteng Zhou, Patrick Hansen, Paul Whatmough, and Jae-sun Seo. 2021. FixyFPGA: Efficient FPGA accelerator for deep neural networks with high element-wise sparsity and without external memory access. In Proceedings of the 31st International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 9–16.
[29]
Sankha S. Mukherjee and Neil Martin Robertson. 2015. Deep head pose: Gaze-direction estimation in multimodal video. IEEE Trans. Multim. 17, 11 (2015), 2094–2107.
[30]
Mozhgan Navardi, Prakhar Dixit, Tejaswini Manjunath, Nicholas R. Waytowich, Tinoosh Mohsenin, and Tim Oates. 2022. Toward real-world implementation of deep reinforcement learning for vision-based autonomous drone navigation with mission. UMBC Stud. Collect. (2022).
[31]
Mozhgan Navardi, Edward Humes, and Tinoosh Mohsenin. 2022. E2EdgeAI: Energy-efficient edge computing for deployment of vision-based DNNs on autonomous tiny drones. In Proceedings of the IEEE/ACM 7th Symposium on Edge Computing (SEC). 504–509. DOI:
[32]
Mozhgan Navardi, Aidin Shiri, Edward Humes, Nicholas R. Waytowich, and Tinoosh Mohsenin. 2022. An optimization framework for efficient vision-based autonomous drone navigation. In Proceedings of the IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 304–307.
[33]
Pretom Roy Ovi et al. 2021. ARIS: A real time edge computed accident risk inference system. In Proceedings of the IEEE International Conference on Smart Computing (SMARTCOMP). 47–54. DOI:
[34]
Pretom Roy Ovi, Emon Dey, Nirmalya Roy, Aryya Gangopadhyay, and Robert F. Erbacher. 2022. Towards developing a data security aware federated training framework in multi-modal contested environments. In Artificial Intelligence and Machine Learning for Multi-domain Operations Applications IV, Vol. 12113. SPIE, 189–198.
[35]
Hasib-Al Rashid, Pretom Roy Ovi, Aryya Busart, Carl Gangopadhyay, and Tinoosh Mohsenin. 2022. TinyM2Net: A flexible system algorithm co-designed multimodal learning framework for tiny devices. ArXiv (2022).
[36]
Hasib-Al Rashid, Mohammad M. Sajadi, and Tinoosh Mohsenin. 2022. CoughNet-V2: A scalable multimodal DNN framework for point-of-care edge devices to detect symptomatic COVID-19 cough. In Proceedings of the IEEE Healthcare Innovations and Point of Care Technologies (HI-POCT). IEEE, 37–40.
[37]
Neeraj Sharma et al. 2020. Coswara–A database of breathing, cough, and voice sounds for COVID-19 diagnosis. (2020).
[38]
Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Debarpan Bhattacharya, Debottam Dutta, Pravin Mote, and Sriram Ganapathy. 2021. The second DiCOVA challenge: Dataset and performance analysis for COVID-19 diagnosis using acoustics. arXiv preprint arXiv:2110.01177 (2021).
[39]
Martin Simonovsky, Benjamín Gutiérrez-Becker, Diana Mateus, Nassir Navab, and Nikos Komodakis. 2016. A deep metric for multimodal registration. In Proceedings of the International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 10–18.
[40]
Sören Richard Stahlschmidt, Benjamin Ulfenborg, and Jane Synnergren. 2022. Multimodal deep learning for biomedical data fusion: A review. Brief. Bioinform. 23, 2 (2022), bbab569.
[41]
Ahmet Ali Süzen, Burhan Duman, and Betül Şen. 2020. Benchmark analysis of Jetson TX2, Jetson Nano and Raspberry Pi using Deep-CNN. In Proceedings of the International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA). IEEE, 1–5.
[42]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.
[43]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818–2826.
[44]
Yapeng Tian et al. 2018. Audio-visual event localization in unconstrained videos. In Proceedings of the European Conference on Computer Vision (ECCV). 247–263.
[45]
Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. HAQ: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8612–8620.
[46]
Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda, and Kurt Keutzer. 2018. Mixed precision quantization of ConvNets via differentiable neural architecture search. arXiv preprint arXiv:1812.00090 (2018).
[47]
Pengcheng Wu, Steven C. H. Hoi, Hao Xia, Peilin Zhao, Dayong Wang, and Chunyan Miao. 2013. Online multimodal deep similarity learning with application to image retrieval. In Proceedings of the 21st ACM International Conference on Multimedia. 153–162.
[48]
Kunran Xu, Yishi Li, Huawei Zhang, Rui Lai, and Lin Gu. 2022. EtinyNet: Extremely tiny network for TinyML. (2022).
[49]
Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael Mahoney, et al. 2021. HAWQ-V3: Dyadic neural network quantization. In Proceedings of the International Conference on Machine Learning. PMLR, 11875–11886.
[50]
Yundong Zhang, Naveen Suda, Liangzhen Lai, and Vikas Chandra. 2017. Hello edge: Keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 (2017).
[51]
Guanwen Zhong, Akshat Dubey, Cheng Tan, and Tulika Mitra. 2019. Synergy: An HW/SW framework for high throughput CNNs on embedded heterogeneous SoC. ACM Trans. Embed. Comput. Syst. 18, 2 (2019), 1–23.

Cited By

View all
  • (2024)Reg-Tune: A Regression-Focused Fine-Tuning Approach for Profiling Low Energy Consumption and LatencyACM Transactions on Embedded Computing Systems10.1145/362338023:3(1-28)Online publication date: 11-May-2024
  • (2024)RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on EdgeIEEE Internet of Things Journal10.1109/JIOT.2024.338683211:14(24831-24845)Online publication date: 15-Jul-2024
  • (2024)Expanding Applications of TinyML in Versatile Assistive Devices: From Navigation Assistance to Health Monitoring System Using Optimized NASNet-XGBoost Transfer LearningIEEE Access10.1109/ACCESS.2024.349679112(168328-168338)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 23, Issue 3
May 2024
452 pages
EISSN:1558-3465
DOI:10.1145/3613579
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 11 May 2024
Online AM: 03 May 2023
Accepted: 14 March 2023
Revised: 08 February 2023
Received: 31 October 2022
Published in TECS Volume 23, Issue 3

Check for updates

Author Tags

  1. tinyML
  2. multimodal deep neural networks
  3. FPGA
  4. model compression

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation CAREER Award
  • University of Maryland, Baltimore, Institute for Clinical & Translational Research (ICTR) and the National Center for Advancing Translational Sciences (NCATS) Clinical Translational Science Award (CTSA)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)479
  • Downloads (Last 6 weeks)48
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Reg-Tune: A Regression-Focused Fine-Tuning Approach for Profiling Low Energy Consumption and LatencyACM Transactions on Embedded Computing Systems10.1145/362338023:3(1-28)Online publication date: 11-May-2024
  • (2024)RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on EdgeIEEE Internet of Things Journal10.1109/JIOT.2024.338683211:14(24831-24845)Online publication date: 15-Jul-2024
  • (2024)Expanding Applications of TinyML in Versatile Assistive Devices: From Navigation Assistance to Health Monitoring System Using Optimized NASNet-XGBoost Transfer LearningIEEE Access10.1109/ACCESS.2024.349679112(168328-168338)Online publication date: 2024
  • (2023)HAC-POCD: Hardware-Aware Compressed Activity Monitoring and Fall Detector Edge POC Devices2023 IEEE Biomedical Circuits and Systems Conference (BioCAS)10.1109/BioCAS58349.2023.10389023(1-5)Online publication date: 19-Oct-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media