Abstract
Current low-cost general-purpose single-board computing (SDC) devices are gaining increasing interests in research computing due to their very low cost/performance ratio and energy consumption. Among all the SDCs available nowadays, Raspberry Pi devices constitute maybe the most renowned representatives. On the other hand, the wavelet transform plays an important role in contemporary standards for image compression (such as JPEG-2000) and video compression (MPEG-4). In this work, we present and evaluate three parallelization strategies of the 3D fast wavelet transform (3D-FWT) on a cluster of Raspberry Pi 2 SDCs. Each parallelization strategy has been implemented using both POSIX Threads (shared memory) and MPI (message passing). The set of implementations using POSIX Threads is restricted to runs on a single board, whereas multiple boards can be used for the MPI versions. We find out that noticeable speed-ups can be obtained when all MPI processes or POSIX Threads are run using the cores of a single Raspberry Pi 2 SDC. However, in the case of the MPI versions, we observe that performance drops drastically when all MPI processes spread to several boards. The reason for this is the limited bandwidth that the onboard LAN port can deliver, and that proves insufficient for the fine-grained, high-volume communication requirements of the studied parallelization strategies. Finally, we have also considered the execution of the POSIX Threads and MPI versions on a very high-performance but power-hungry 4-core Intel Xeon CPU E5606, obtaining that the Raspberry Pi 2 SDC can do the task with much lower total energy consumption (up to 4 times).
Similar content being viewed by others
Notes
A Pi Zero with smaller footprint and limited IO (GPIO) capabilities was released in November 2015 for US$5.
References
Membrey P, Hows D (2015) Learn Raspberry Pi 2 with Linux and Windows 10, 2nd edn. Apress, New York
Hague A, Hastings G, Klling M, Croston B, Oldknow A, Lockwood B, Beale C (2012) The Raspberry pi education manual version 1.0. Computing at School, Creative Commons License
Heeks R, Robinson A (2013) Ultra-low-cost computing and developing countries. Commun ACM 56(8):22–24
Schot N (2015) Feasibility of raspberry pi 2 based micro data centers in big data applications. In: Proceedings of the 23rd Twenty Student Conference on IT
Antonini M, Barlaud M (1992) Image coding using wavelet transform. IEEE Trans Image Process 1(2):205–220
Lewis AS, Knowles G (1992) Image compression using the 2D wavelet transform. IEEE Trans Image Process 1(2):244–256
Shapiro JM (1993) Embedded image coding using zerotrees of wavelets coefficients. IEEE Trans Signal Process 41(12):3445–3462
Marcellin MW, Gormish MJ, Bilgin A, Boliek MP (2000) An overview of JPEG-2000. In: Proceedings of Data Compression Conference
Santa-Cruz D, Ebrahimi T (2000) A study of JPEG 2000 still image coding versus others standards. In: Proceedings of X European Signal Processing Conference
Chen Y, Pearlman WA (1996) Three-dimensional subband coding of video using the zero-tree method. In: Proceedings of SPIE-Visual Communications and Image Processing, pp 1302–1310
Kim Y, Pearlman WA (2000) Stripe-based SPIHT Lossy compression of volumetric medical images for low memory usage and uniform reconstruction quality. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp 2031–2034
Battista S, Casalino F, Lande C (1999) MPEG-4: a multimedia standard for the third millenium, Part 1. IEEE Multimed 6(4):74–83
Battista S, Casalino F, Lande C (2000) MPEG-4: a multimedia standard for the third millenium, Part 2. IEEE Multimed 7(1):76–84
Bernabé G, García JM, González J (2005) Reducing 3D wavelet transform execution time using blocking and the streaming SIMD extensions. J VLSI Signal Process 41(2):209–223
Bernabé G, Guerrero G, Fernández J (2012) CUDA and OpenCL implementations of 3D fast wavelet transform. In: 3rd IEEE Latin American symposium on circuits and systems, Playa del Carmen, Mexico
Franco J, Bernabé G, Fernández J, Acacio ME (2009) A parallel implementation of the 2D wavelet transform using CUDA. In: 17th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, Weimar
Franco J, Bernabé G, Fernández J, Ujaldón M (2012) The 2D wavelet transform on emerging architectures: GPUs and Multicores. J Real-Time Image Process 3:145–152. doi:10.1007/s11554-011-0224-7
Franco J, Bernabé G, Fernández J, Ujaldn M (2010) Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs. In: 10th International Conference on Computational Science, Amsterdam
Bernabé G, Fernández R, García JM, Acacio ME, González J (2007) An efficient implementation of a 3D wavelet transform based encoder on hyper-threading technology. J Parallel Comput 33(1):54–72
Butenhof D (1997) Programming with POSIX threads. Addison-Wesley Professional, Melbourne
Gropp W, Lusk E, Skjellum A (1999) Using MPI, Second Edition edn. MIT Press, Massachusetts
Ramesh M, Ragi G, Abishek T (2012) Low-power intelligent wearable cardiac sensor using discrete wavelet compression. In: Proceedings of 2012 International Conference on Advances in Mobile Networks, Communication and its Applications, pp 107–110
Navarro AA, Vélez JA, Satizabal JE, Múnera LE, Bernabé G (2003) virtual surgical telesimulations in ophtalmology. In: 17th International Congress on Computer Assisted Radiology and Surgery (CARS 2003), London
Vélez JA, Navarro AA, Roche CADL, Múnera LE, Bernabé G, Bermudez C, Jiménez JF, Kopec A (2003) A virtual surgical telesimulations in micrographic dermatologic surgery (MOHS). In: 17th International Congress on Computer Assisted Radiology and Surgery (CARS 2003), London
Navarro AA, Hernández CJ, Vélez JA, Múnera LE, Bernabé G, Gamboa CA, Reyes AJ (2005) Virtual surgical telesimulations in otolaryngology. In: 13th annual medicine meets virtual reality, Long Beach
Mallat S (1989) A theory for multiresolution signal descomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693
Bernabé G, González J, García JM, Duato J (2000) A New Lossy 3-D wavelet transform for high-quality compression of medical video. In: Proceedings of IEEE EMBS International Conference on Information Technology Applications in Biomedicine, pp 226–231
Daubechies I (1992) Ten lectures on wavelets. Society for Industrial and Applied Mathematics, Philadelphia
Meerwald P, Norcen R, Uhl A (2002) Cache issues with JPEG2000 wavelet lifting. In: Proceedings of Visual Communications and Image Processing Conference, pp 626–634
Shahbahrami A, Juurlink B, Vassiliadis S (2006) Improving the memory behavior of vertical filtering in the discrete wavelet transform. In: Proceedings of ACM Conference in Computing Frontiers, pp 253–260
Tao J, Shahbahrami A, Juurlink B, Buchty R, Karl W, Vassiliadis S (2007) Optimizing cache performance of the discrete wavelet transform using a visualization tool. In: Proceedings of IEEE International Symposium on Multimedia, pp 153–160
Acknowledgements
This work was supported by the Spanish MINECO, as well as by European Commission FEDER funds, under Grant TIN2015-66972-C5-3-R.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bernabé, G., Hernández, R. & Acacio, M.E. Parallel implementations of the 3D fast wavelet transform on a Raspberry Pi 2 cluster. J Supercomput 74, 1765–1778 (2018). https://doi.org/10.1007/s11227-016-1933-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1933-2