Skip to main content
Log in

Adaptive design and implementation of automatic modulation recognition accelerator

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Automatic modulation recognition-oriented Deep Neural Networks (ADNNs) have achieved higher recognition accuracy than traditional methods with less labor overhead. However, their high computation complexity usually far exceeds the computation capacity of communication devices built on Field Programmable Gate Array (FPGA) platform. When solving the problem of insufficient resources, the complete operation of FPGA-based accelerator can be achieved by dividing the calculation into several parts and calculating them separately, but this will cause unaccepTable latency. In this backdrop, we develop a new ADNN model, named VT-CNN2+, to promote the recognition accuracy. Then, after stating the resources and latency problems for implementing VT-CNN2+ on the FPGA platform, an adaptive hardware accelerator is put forward. To implement the accelerator, Area folding is introduced to optimize resources consumption. Moreover, Literacy Optimization, Parallelism Optimization, Inter-layer Cascading, Temporary Cache and Data Loading Optimization are adopted to reduce latency. Afterwords, the two components in our accelerator are detailed, i.e., Once-designed module and Re-designed module. Finally, to evaluate the performance and adaptivity of our accelerator, a series of experiments are conducted on two different FPGA platforms, i.e., AX7350 and ZedBoard. Results show that our accelerator can successfully adapt to different FPGA platforms and it can remarkably reduce the processing latency. Moreover, our accelerator’s processing speed of 0.066249s per single data sample with much lower energy consumption is one order of magnitude faster than desktop-level Central Processing Units (CPUs), two orders of magnitude faster than embedded CPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Algorithm 1
Fig. 9
Fig. 10
Algorithm 2
Algorithm 3
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availability

The authors confirm that the data supporting the findings of this study are available within the article.

References

  • Bablani D, Mckinstry JL, Esser SK, Appuswamy R, Modha DS (2023) Efficient and effective methods for mixed precision neural network quantization for faster, energy-efficient inference. arXiv preprint arXiv:2301.13330arXiv:2301.13330

  • Chen J, Liu L, Liu Y, Zeng X (2020) A learning framework for n-bit quantized neural networks toward fpgas. IEEE Trans Neural Netw Learn Syst 32(3):1067–1081

    Article  MathSciNet  Google Scholar 

  • Choi J, Kong BY, Park I-C (2020) Retrain-less weight quantization for multiplier-less convolutional neural networks. IEEE Trans Circuits Syst I Regul Pap 67(3):972–982

    Article  Google Scholar 

  • Deng C, Sui Y, Liao S, Qian X, Yuan B (2021) Gospa: an energy-efficient high-performance globally optimized sparse convolutional neural network accelerator. In: 2021 ACM/IEEE 48th Annual International Symposium on computer architecture (ISCA), pp 1110–1123. IEEE

  • Dong P, Wang S, Niu W, Zhang C, Lin S, Li Z, Gong Y, Ren B, Lin X, Tao D (2020) Rtmobile: beyond real-time mobile acceleration of rnns for speech recognition. In: 2020 57th ACM/IEEE Design Automation Conference (DAC), pp 1–6. IEEE

  • Gong C, Ye L, Xie K, Jin Z, Li T, Wang Y (2022) Elastic significant bit quantization and acceleration for deep neural networks. IEEE Trans Parallel Distrib Syst 33(11):3178–3193

    Google Scholar 

  • Gudaparthi S, Singh S, Narayanan S, Balasubramonian R, Sathe V (2022) Candles: channel-aware novel dataflow-microarchitecture co-design for low energy sparse neural network acceleration. In: 2022 IEEE International Symposium on high-performance computer architecture (HPCA), pp 876–891. IEEE

  • Xu K, Zhang D, An J, Liu L, Liu L, Wang D (2021) Genexp: multi-objective pruning for deep neural network based on genetic algorithm. Neurocomputing 451:81–94

    Article  Google Scholar 

  • Kosuge A, Hamada M, Kuroda T (2021) A 16 nj/classification fpga-based wired-logic dnn accelerator using fixed-weight non-linear neural net. IEEE J Emerg Sel Top Circ Syst 11(4):751–761

    Article  Google Scholar 

  • Kumar S, Mahapatra R, Singh A (2022) Automatic modulation recognition: an FPGA implementation. IEEE Commun Lett 26(9):2062–2066

    Article  Google Scholar 

  • Liang Y, Liqiang L, Xiao Q, Yan S (2020) Evaluating fast algorithms for convolutional neural networks on fpgas. IEEE Trans Comput Aided Des Integr Circuits Syst 39(4):857–870

    Article  Google Scholar 

  • Li H, Fan X, Jiao L, Cao W, Zhou X, Wang L (2016) A high performance FPGA-based accelerator for large-scale convolutional neural networks. In: 2016 26th International Conference on field programmable logic and applications (FPL), pp 1–9. IEEE

  • Li Y, Zhao P, Yuan G, Lin X, Wang Y, Chen X (2022) Pruning-as-search: efficient neural architecture search via channel pruning and structural reparameterization. arXiv preprint arXiv:2206.01198

  • Lin Y, Ya T, Dou Z (2020) An improved neural network pruning technology for automatic modulation classification in edge devices. IEEE Trans Veh Technol 69(5):5703–5706

    Article  Google Scholar 

  • Ling Y, He T, Yu Z, Meng H, Huang K, Chen G (2022) Lite-stereo: a resource-efficient hardware accelerator for real-time high-quality stereo estimation using binary neural network. IEEE Trans Comput Aided Des Integr Circuits Syst 41(12):5357–5366

    Article  Google Scholar 

  • Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp 21–37. Springer

  • Ma Y, Suda N, Cao Y, Seo J-s, Vrudhula S (2016) Scalable and modularized rtl compilation of convolutional neural networks onto FPGA. In: 2016 26th International Conference on field programmable logic and applications (FPL), pp 1–8. IEEE

  • Ma Y, Cao Y, Vrudhula S, Seo J-s (2017) Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In: Proceedings of the 2017 ACM/SIGDA International Symposium on field-programmable gate arrays, pp 45–54

  • Matsubara Y, Yang R, Levorato M, Mandt S (2022) Supervised compression for resource-constrained edge computing systems. In: Proceedings of the IEEE/CVF Winter Conference on applications of computer vision, pp 2685–2695

  • Motamedi M, Gysel P, Akella V, Ghiasi S (2016) Design space exploration of FPGA-based deep convolutional neural networks. In: 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pp 575–580. IEEE

  • Nguyen DT, Je H, Nguyen TN, Ryu S, Lee K, Lee H-J (2022) Shortcutfusion: from tensorflow to fpga-based accelerator with a reuse-aware memory allocation for shortcut data. IEEE Trans Circuits Syst I Regul Pap 69(6):2477–2489

    Article  Google Scholar 

  • O’shea TJ, West N (2016) Radio machine learning dataset generation with GNU radio. In: Proceedings of the GNU Radio Conference, volume 1

  • O’Shea TJ, Roy T, Clancy TC (2018) Over-the-air deep learning based radio signal classification. IEEE J Sel Top Signal Process 12(1):168–179

    Article  ADS  Google Scholar 

  • Qin E, Samajdar A, Kwon H, Nadella V, Srinivasan S, Das D, Kaul B, Krishna T (2020) Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In: 2020 IEEE International Symposium on high performance computer architecture (HPCA), pp 58–70. IEEE

  • Rahman A, Lee J, Choi K (2016) Efficient FPGA acceleration of convolutional neural networks using logical-3d compute array. In: 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp 1393–1398. IEEE

  • Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 779–788

  • Rybalkin V, Ney J, Tekleyohannes MK, Wehn N (2021) When massive gpu parallelism ain’t enough: A novel hardware architecture of 2d-lstm neural network. ACM Trans Reconfig Technol Syst (TRETS) 15(1):1–35

    Google Scholar 

  • Shanyong X, Zhou Y, Huang Y, Han T (1983) YOLOv4-tiny-based coal gangue image recognition and FPGA implementation. Micromachines 13(11):2022

    Google Scholar 

  • Shen Y, Ferdman Ml, Milder P (2016) Overcoming resource underutilization in spatial cnn accelerators. In: 2016 26th International Conference on field programmable logic and applications (FPL), pp 1–4. IEEE

  • Swaminathan S, Garg D, Kannan R, Andres F (2020) Sparse low rank factorization for deep neural network compression. Neurocomputing 398:185–196

    Article  Google Scholar 

  • Tan S, Fang Z, Liu Y, Zhe W, Hang D, Renjie X, Liu Y (2023) An SSD-MobileNet acceleration strategy for FPGAs based on network compression and subgraph fusion. Forests 14(1):53

    Article  Google Scholar 

  • Tridgell S, Boland D, Leong PHW, Kastner R, Khodamoradi A et al (2020) Real-time automatic modulation classification using RFSoC. In: 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 82–89. IEEE

  • Wang Y, Qin Y, Deng D, Wei J, Chen T, Lin X, Liu L, Wei S, Yin S (2021) A 28nm 276.55 tflops/w sparse deep-neural-network training processor with implicit redundancy speculation and batch normalization reformulation. In: 2021 Symposium on VLSI Circuits, pp 1–2. IEEE

  • Xiao H, Zhao K, Liu G (2021) Efficient hardware accelerator for compressed sparse deep neural network. IEICE Trans Inf Syst 104(5):772–775

    Article  Google Scholar 

  • Xuan L, Un K-F, Lam C-S, Martins RP (2022) An fpga-based energy-efficient reconfigurable depthwise separable convolution accelerator for image recognition. IEEE Trans Circuits Syst II Express Briefs 69(10):4003–4007

    Google Scholar 

  • Yan Y, Ling Y, Huang K, Chen G (2023) An efficient real-time accelerator for high-accuracy dnn-based optical flow estimation in fpga. J Syst Arch 136:102818

    Article  Google Scholar 

  • Yang D, Ghasemazar A, Ren X, Golub M, Lemieux G, Lis M (2020a) Procrustes: a dataflow and accelerator for sparse deep neural network training. In: 2020 53rd Annual IEEE/ACM International Symposium on microarchitecture (MICRO), pp 711–724. IEEE

  • Yang Y, Deng L, Shuang W, Yan T, Xie Y, Li G (2020b) Training high-performance and large-scale deep neural networks with full 8-bit integers. Neural Netw 125:70–82

    Article  PubMed  Google Scholar 

  • Yuan T, Liu W, Han J, Lombardi F (2021) High performance cnn accelerators based on hardware and algorithm co-optimization. IEEE Trans Circuits Syst I Regul Pap 68(1):250–263

    Article  MathSciNet  Google Scholar 

  • Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on field-programmable gate arrays, pp 161–170

  • Zhang Z, Mahmud MAP, Kouzani AZ (2022) Fitnn: a low-resource fpga-based cnn accelerator for drones. IEEE Internet Things J 9(21):21357–21369

    Article  Google Scholar 

  • Zhang Y, Dong Z, Yang H, Lu M, Tseng C-C, Guo Y, Keutzer K, Du L, Zhang S (2023) QD-BEV: quantization-aware view-guided distillation for multi-view 3D object detection

  • Zhou C, Liu M, Qiu S, He Y, Jiao H (2021) An energy-efficient low-latency 3d-cnn accelerator leveraging temporal locality, full zero-skipping, and hierarchical load balance. In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp 241–246. IEEE

  • Zhu F, Gong R, Yu F, Liu X, Wang Y, Li Z, Yang X, Yan J (2020) Towards unified int8 training for convolutional neural network. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 1969–1979

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianglin Wei.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, B., Wei, X., Wang, C. et al. Adaptive design and implementation of automatic modulation recognition accelerator. J Ambient Intell Human Comput 15, 779–795 (2024). https://doi.org/10.1007/s12652-023-04736-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-023-04736-0

Keywords

Navigation