Abstract
In the era of intelligence, the processing of a large amount of information and various intelligent applications need to rely on embedded devices. This trend has made machine learning algorithms play an increasingly important role. High-performance embedded computing is an effective means to solve the lack of computing power of embedded devices. Aiming at the problem that the calculation amount of new intelligent embedded applications based on machine learning technology is higher, the computing power of traditional embedded systems is difficult to meet their needs, this paper studies the parallel optimization and implementation techniques of convolutional neural networks in Parallella platform. The parallel optimization strategy of convolutional neural network on the clustering architecture processor of heterogeneous multi-core system is given. Then the high-performance implementation of convolutional neural network on Parallella platform is studied, and the function of convolutional neural network system is implemented. A set of performance evaluation methods for embedded parallel processors is proposed. From the application point of S698P, the eCos operating system is selected as the platform. The single-core mode and multi-core mode are compared on the simulator GRSIM, and the parallel performance evaluation is given. Experiments have shown that the efficiency of deep learning tasks is significantly improved compared to traditional parallel methods.
Similar content being viewed by others
References
Mai TNT, Kim S (2017) Parallel implementation of color-based particle filter for object tracking in embedded systems. Hum Cent Comput Inf Sci 7(1):2
Gao F, Huang Z, Wang S et al (2017) Optimized parallel implementation of face detection based on embedded heterogeneous many-core architecture. Int J Pattern Recognit Artif Intell 31(7):1756011
Chen WH, Ji-Yao AN, Ren-Fa LI et al (2017) Review on deep-learning-based cognitive computing. Acta Autom Sin 43(11):1886–1897
Niu J, Huang C, Li J et al (2018) Parallel computing techniques for concept-cognitive learning based on granular computing. Int J Mach Learn Cybernet 9(3):1–21
Zeng G, Liu W (2017) An iso-time scaling method for big data tasks executing on parallel computing systems. J Supercomput 73(10):4493–4516
Yin S, Peng O, Tang S et al (2018) A high energy efficient reconfigurable hybrid neural network processor for deep learning applications. IEEE J Solid State Circuits 53(4):968–982
Wen S, Wei H, Zeng Z et al (2018) Memristive fully convolutional network: an accurate hardware image-segmentor in deep learning. IEEE Trans Emerg Top Comput Intell 2(5):324–334
Gu X, Angelov PP, Zhang C et al (2018) A massively parallel deep rule-based ensemble classifier for remote sensing scenes. IEEE Geosci Remote Sens Lett 15(3):345–349
Wang C, Shen Y, Jia J et al (2018) SingleCaffe: an efficient framework for deep learning on a single node. IEEE Access 6(99):69660–69671
Chung I, Sainath TN, Ramabhadran B et al (2017) Parallel deep neural network training for Big Data on Blue Gene/Q. IEEE Trans Parallel Distrib Syst 28(6):1703–1714
Sugie T, Akamatsu T, Nishitsuji T et al (2018) High-performance parallel computing for next-generation holographic imaging. Nat Electron 1(4):254–259
Xia C, Yan L, Xin Z et al (2018) A novel DVR-ESS-embedded wind-energy conversion system. IEEE Trans Sustain Energy 9(3):1
Cai B, Ye W, Zhao J (2018) A dynamic texture based segmentation method for ultrasound images with Surfacelet, HMT and parallel computing. Multimed Tools Appl 78(1):5381–5401
Cunha MAP, Matoussi O, Pétrot F (2018) Detecting software cache coherence violations in MPSoC using traces captured on virtual platforms. ACM Trans Embed Comput Syst 16(2):1–21
Dou W, Li Y (2018) A fault-tolerant computing method for Xdraw parallel algorithm. J Supercomput 74(3):1–25
Thoman P, Dichev K, Heller T et al (2018) A taxonomy of task-based parallel programming technologies for high-performance computing. J Supercomput 74(4):1422–1434
Yu L, Nina-Paravecino F, Kaeli D et al (2018) Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms. J Biomed Opt 23(1):1–4
Zhu G, Chen W, Wang D et al (2019) Study on high-density integration resistive random access memory array from multiphysics perspective by parallel computing. IEEE Trans Electron Devices 66(4):1747–1753
Mo ZY (2018) Extreme-scale parallel computing: bottlenecks and strategies. Front Inf Technol Electron Eng 19(10):1251–1260
Grubov VV, Nedaivozov VO (2018) Stream processing of multichannel EEG data using parallel computing technology with NVIDIA CUDA graphics processors. Tech Phys Lett 44(5):453–455
Chen Y, Zhao Q, Hu X et al (2019) Multi-resolution parallel magnetic resonance image reconstruction in mobile computing-based IoT. IEEE Access 7(99):15623–15633
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zu, Y. Deep learning parallel computing and evaluation for embedded system clustering architecture processor. Des Autom Embed Syst 24, 145–159 (2020). https://doi.org/10.1007/s10617-020-09235-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-020-09235-5