Abstract
This paper presents real-time image processing applications using multicore and multiprocessing technologies. To this end, parallel image segmentation was performed on many images covering the entire surface of the same metallic and cylindrical moving objects. Experimental results on multicore CPU with OpenMP platform showed that by increasing the chunk size, the execution time decreases approximately four times in comparison with serial computing. The same experiments were implemented on GPGPU using four techniques: (1) Single image transmission with single pixel processing; (2) Single image transmission with multiple pixel processing; (3) Multiple image transmission with single pixel processing; and (4) Multiple image transmission with multiple pixel processing. All techniques were implemented on GeForce, Tesla K20 and Tesla K40. Experimental results of GPU with CUDA platform showed that by increasing the core number speedup is increased. Tesla K40 gave the best results of 35 and 12 (for the first technique), 36 and 13 (for the second technique), 54 and 16 (for the third technique), 71 and 17 (for the fourth technique) times improvement without and with data transmission time in comparison with serial computing. As a result, users are suggested to use Tesla K40 GPU and Multiple image transmission with multiple pixel processing to get the maximum performance.
Similar content being viewed by others
References
Hu J, Zhang T, Jiang H (2006) New multi-DSP parallel computing architecture for real-time image processing. J Syst Eng Electron 17(4):883
Mondal P, Biswal PK, Banerjee S (2016) FPGA based accelerated 3D affine transform for real-time image processing applications. Comput Electr Eng 49(1):69
Mertes JG, Marranghello N, Pereira AS (2013) Real-time module for digital image processing developed on a FPGA. In: 12th IFAC Conference on Programmable Devices and Embedded Systems. IFAC Proceedings Volumes 46(28), p 405
Daz-Pernil D, Berciano A, Pea-Cantillana F, Gutirrez-Naranjo MA (2013) Segmenting images with gradient-based edge detection using membrane computing. Pattern Recognit Lett 34(8):846
Huqqani AA, Schikuta E, Ye S, Chen P (2013) Multicore and GPU parallelization of neural networks for face recognition. Procedia Comput Sci 18:349
Mahafzah BA (2011) Parallel multithreaded IDA heuristic search: algorithm design and performance evaluation. Int J Parallel Emerg Distrib Syst 26(1):61
Mahafzah BA (2013) Performance assessment of multithreaded quicksort algorithm on simultaneous multithreaded architecture. J Supercomput 66(1):339
Szgyi Z, Trk M, Pataki N (2011) Multicore C++ standard template library in a generative way. In: Proceedings of the Third Workshop on Generative Technologies (WGT) 2011. Electronic Notes in Theoretical Computer Science, vol 279(3), p 63
Smistad E, Elster AC, Lindseth F (2014) GPU accelerated segmentation and centerline extraction of tubular structures from medical images. Int J Comput Assist Radiol Surg 9(4):561. https://doi.org/10.1007/s11548-013-0956-x
Brodtkorb AR, Hagen TR, SeTra ML (2013) Graphics processing unit GPU programming strategies and trends in GPU computing. J Parallel Distrib Comput 73(1):4
Patil S, Junnarka A (2015) Color image segmentation using median cut and contourlet transform: a parallel segmentation approach. Int J Comput Sci Inf Technol (IJCSIT) 5(6):7353
Thapliyal H, Arabnia H (2006) Reversible programmable logic array (RPLA) using Fredkin and Feynman gates for industrial electronics and applications. In: Proceedings of 2006 International Conference on Computer Design and Conference on Computing in Nanotechnology, Las Vegas, pp 70–74
Thapliyal H, Arabnia H, Bajpai R, Sharma K (2007) Combined integer and variable precision (CIVP) floating point multiplication architecture for FPGAs. In: Proceedings of 2007 International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, pp 449–450
Arabnia HR, Oliver MA (1986) Fast operations on raster images with SIMD machine architectures. Comput Graph Forum 5(3):179–188. https://doi.org/10.1111/j.1467-8659.1986.tb00296.x
Gopineedi PD, Thapliyal H, Srinivas MB, Arabnia HR (2006) Novel and efficient 4:2 and 5:2 compressors with minimum number of transistors designed for low-power operations, pp 160–168
Balasubramanian P, Arisaka R, Arabnia H (2012) RB DSOP a rule based disjoint sum of products synthesis method. In: Proceedings of 2012 International Conference on Computer Design, Las Vegas, pp 39–43
Thapliyal H, Srinivas M, Arabnia H (2005) Reversible logic synthesis of half, full and parallel subtractors. In: Proceedings of 2005 International Conference on Embedded Systems and Applications, Las Vegas, pp 165–172
Al-amri SS, Kalyankar NV, D KS (2010) Image segmentation by using threshold techniques. CoRR abs/1005.4020
Osuna-Enciso V, Cuevas E, Sossa H (2013) A comparison of nature inspired algorithms for multi-threshold image segmentation. Expert Syst Appl 40(4):1213
Wei S, Hong Q, Hou M (2011) Automatic image segmentation based on PCNN with adaptive threshold time constant. Neurocomputing 74(9):1485
Han S, Tao W, Wu X, cheng Tai X, Wang T (2010) Fast image segmentation based on multilevel banded closed-form method. Pattern Recognit Lett 31(3):216
Ayala HVH, dos Santos FM, Mariani VC, dos Santos Coelho L (2015) Image thresholding segmentation based on a novel beta differential evolution approach. Expert Syst Appl 42(4):2136
Wang R, Li C, Wang J, Wei X, Li Y, Zhu Y, Zhang S (2015) Threshold segmentation algorithm for automatic extraction of cerebral vessels from brain magnetic resonance angiography images. J Neurosci Methods 241:30
Happ P, Feitosa R, Bentes C, Farias R (2012) A parallel image segmentation algorithm on GPUs. In: Proceedings of the 4th GEOBIA, Rio de Janeiro, 2012, pp 580–586
Smistad E, Elster AC, Lindseth F (2014) GPU accelerated segmentation and centerline extraction of tubular structures from medical images. Int J Comput Assist Radiol Surg 9(4):561
Korbes A, Vitor GB, de Alencar Loyufoi R, Ferreira JV (2010) Analysis of a step-based watershed algorithm using CUDA. Int J Curr Res Rev 1(1):6
Singh BM, Sharma R, Mittal A, Ghosh D (2011) Parallel implementation of Otsus binarization approach on GPU. Int J Comput Appl 32(2):16
Farias R, Farias R, Marroquim R, Clua E (2013) Parallel image segmentation using reduction-sweeps on multicore processors and GPUs. In: Proceedings of the 2013 XXVI Conference on Graphics, Patterns and Images, SIBGRAPI ’13. IEEE Computer Society, Washington, DC, pp 139–146
Prosser N (2010) Medical image segmentation using gpu accelerated variational level set methods. Master’s thesis, Rochester Institute of Technology
Abramov A, Kulvicius T, Wörgötter F, Dellen B (2010) Real-time image segmentation on a GPU. In: Keller R, Kramer D, Weiss JP (eds) Facing the multicore-challenge. Lecture notes in computer science, vol 6310. Springer, Berlin, Heidelberg
Smistad E, Falch TL, Bozorgi M, Elster AC, Lindseth F (2015) Medical image segmentation on GPUs a comprehensive review. Med Image Anal 20(1):1
Li Y, Jiao L, Shang R, Stolkin R (2015) Dynamic-context cooperative quantum-behaved particle swarm optimization based on multilevel thresholding applied to medical image segmentation. Inf Sci 294:408
Chen Z, Meng X, Guo L, Liu G (2012) GICUDA: a parallel program for 3D correlation imaging of large scale gravity and gravity gradiometry data on graphics processing units with CUDA. Comput Geosci 46:119
Bay OF, Samet R, Aydn S, Tural S, Bayram A (2015) Performance analysis of GPU-based parallel image segmentation using CUDA. In: Proceedings of the 2th International Conference on Advanced Technology and Sciences (Antalya-Turkey, 2015), ICAT’15, pp 426–429
Hovland RJ Latency and bandwidth impact on gpu-systems. Tech. rep., Norwegian University of Science and Technology
Samet R, Aydin S, Bay OF, Tural S, Bayram A (2015) Real time image processing applications on multicore CPU and GPGPU. In: The 21st International Conference on Parallel and Distributed Processing, WORLDCOMP’15, Las Vegas-Nevada, 27–30 July 2015
Samet R, Aydin S, Tural S, Bayram A (2016) Primer defects detection on military cartridge cases. In: The 15th annual International Conference, NICOGRAPH’15, Hangzhou, 6–8 July 2016
Abdullah M, Abuelrub E, Mahafzah B (2011) The chained-cubic tree interconnection network. Int Arab J Inf Technol 8(3):334
Mahafzah BA, Alshraideh M, Abu-Kabeer TM, Ahmad EF, Hamad NA (2012) The optical chained-cubic tree interconnection network: topological structure and properties. Comput Electr Eng 38(2):330. https://doi.org/10.1016/j.compeleceng.2011.11.023
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Aydin, S., Samet, R. & Bay, O.F. Real-time parallel image processing applications on multicore CPUs with OpenMP and GPGPU with CUDA. J Supercomput 74, 2255–2275 (2018). https://doi.org/10.1007/s11227-017-2168-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-017-2168-6