Skip to main content
Log in

Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions

  • Published:
Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Abstract

The video compression algorithms based on the 3D wavelet transform obtain excellent compression rates at the expense of huge memory requirements, that drastically affects the execution time of such applications. Its objective is to allow the real-time video compression based on the 3D fast wavelet transform. We show the hardware and software interaction for this multimedia application on a general-purpose processor. First, we mitigate the memory problem by exploiting the memory hierarchy of the processor using several techniques. As for instance, we implement and evaluate the blocking technique. We present two blocking approaches in particular: cube and rectangular, both of which differ in the way the original working set is divided. We also put forward the reuse of previous computations in order to decrease the number of memory accesses and floating point operations. Afterwards, we present several optimizations that cannot be applied by the compiler due to the characteristics of the algorithm. On the one hand, the Streaming SIMD Extensions (SSE) are used for some of the dimensions of the sequence (y and time), to reduce the number of floating point instructions, exploiting Data Level Parallelism. Then, we apply loop unrolling and data prefetching to specific parts of the code. On the other hand, the algorithm is vectorized by columns, allowing the use of SIMD instructions for the y dimension. Results show speedups of 5x in the execution time over a version compiled with the maximum optimizations of the Intel C/C++ compiler, maintaining the compression ratio and the video quality (PSNR) of the original encoder based on the 3D wavelet transform. Our experiments also show that, allowing the compiler to perform some of these optimizations (i.e. automatic code vectorization), causes performance slowdown, demonstrating the effectiveness of our optimizations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. I. Daubechies, “Ten Lectures on Wavelets,” Society for Industrial and Applied Mathematics, 1992.

  2. M. Antonini and M. Barlaud, “Image Coding Using Wavelet Transform,” IEEE Transactions on Image Processing, vol. 1, no. 2, 1992, pp. 205–220.

    Article  Google Scholar 

  3. A.S. Lewis and G. Knowles, “Image Compression Using the 2-d Wavelet Transform,” IEEE Transactions on Image Processing, vol. 1, no. 2, 1992, pp. 244–256.

    Article  Google Scholar 

  4. J.M. Shapiro, “Embedded Image Coding Using Zerotrees of Wavelets Coefficients,” IEEE Transcations on Signal Processing, vol. 41, no. 12, 1993, pp. 3445–3462.

    Article  MATH  Google Scholar 

  5. M.W. Marcellin, M.J. Gormish, A. Bilgin, and M.P. Boliek, “An Overview of jpeg-2000,” in Proceedings of Data Compression Conference, March 2000.

  6. D. Santa-Cruz and T. Ebrahimi, “A Study of jpeg 2000 Still Image Coding Versus Others Standards” in Proc. of the X European Signal Processing Conference, September 2000.

  7. M.L. Hilton, B.D. Jawerth, and A. Sengupta, “Compressing Still and Moving Images with Wavelets,” Multimedia Systems, vol. 2, no. 3, 1994.

  8. S. Muraki, “Approximation and Rendering of Volume Data Using Wavelet Transforms,” in Proceedings of Visualization, October 1992, pp. 21–28.

  9. S. Muraki, “Multiscale Volume Representation by a Aog Wavelet,” IEEE Transactions on Visualization and Computer Graphics, vol. 1, no. 2, 1995, pp. 109–116.

    Article  MathSciNet  Google Scholar 

  10. Y. Chen and W.A. Pearlman, “Three-Dimensional Subband Coding of Video Using the Zero-Tree Method,” in Proc. of SPIE-Visual Communications and Image Processing, March 1996, pp. 1302–1310.

  11. B.-J. Kim and W.A. Pearlman, “An Embedded Wavelet Video Coder Using Three-Dimensional Set Partitioning in Hierarchical Trees (spiht),” in Proceedings of Data Compression Conference, 1997.

  12. S. Battista, F. Casalino, and C. Lande, “Mpeg-4: A Multimedia Standard for the Third Millenium, part 1,” IEEE Multimedia, vol. 6, no. 4, 1999, pp. 74–83.

    Article  Google Scholar 

  13. S. Battista, F. Casalino, and C. Lande, “Mpeg-4: A Multimedia Standard for the Third Millenium, part 2,” IEEE Multimedia, vol. 7, no. 1, 2000, pp. 76–84.

    Article  Google Scholar 

  14. G. Bernabé, J. González, J.M. García, and J. Duato, “A New Lossy 3-d Wavelet Transform for High-Quality Compression of Medical Video,” in Proc. of IEEE EMBS International Conference on Information Technology Applications in Biomedicine, November 2000, pp. 226–231.

  15. G. Bernabé, J. González, J.M. García, and J. Duato, “Enhancing the Entropy Encoder of a 3d-fwt for High-Quality Compression of Medical Video,” in Proc. of IEEE International Symposium for Intelligent Signal Processing and Communication Systems, November 2001.

  16. A. Bik, M. Girkar, P. Grey, and X. Tian, “Efficient Exploitation of Parallelism on Pentium iii and Pentium iv Processor-Based systems,” Available at http://developer.intel.com/.

  17. I. Corporation, “Intel C/C++ Compiler for Linux,” Available at http://www.intel.com/software/products/compiler/c50/linux.

  18. G. Bernabé, J. González, J.M. García, and J. Duato, “Memory Conscious 3d Wavelet Transform,” in Proceedings of the 28th Euromicro Conference. Multimedia and Telecommunications, September 2002.

  19. G. Bernabé, J.M. García, and J. González, “Reducing 3d Wavelet Transform Execution Time Through the Streaming Simd Extensions,” in Proceedings of the 11th Euromicro Conference on Parallel Distributed and Network based Processing, February 2003.

  20. I.J.W. (JPEG/JBIG). Fcd 14495, lossless and near-lossless coding of continuous tone still images (jpeg-ls).

  21. S. Mallat, “A Theory for Multiresolution Signal Descomposition: The Wavelet Representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, 1989, pp. 674–693.

    Article  MATH  Google Scholar 

  22. N. Ahmed, N. Mateev, and K. Pingali, “Tiling Imperfectly-Nested Loop Nests,” in Proceedings of Supercomputing, November 2000.

  23. M.S. Lam, E.E. Rothberg, and M.E. Wolf, “The Cache Perfomance and Optimizations of Blocked Algorithms,” Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), April 1991.

  24. A.W. Lim, S.-W. Liao, and M.S. Lam, “Blocking and Array Contraction Across Arbitrarily Nested Loops Using Affine Patitioning,” in Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 2001.

  25. J. Dongarra, J.D. Croz, I.S. Duff, and S. Hammarling, “A Set of Level 3 Basic Linear Algebra Subprogram,” ACM Trans. Math. Soft, vol. 14, 1988, pp. 1–17.

    Article  MATH  Google Scholar 

  26. E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J.D. Croz, A. Greenbaum, S. Hammarling, A.M. Kenney, and D. Sorensen, “Lapack: A Portable Linear Algebra Library for High-Performance Computers,” Tech. Report CS-90-105, (LAPACK Working Note #20), Univ. Of Tennessee, Knoxville, 1990.

  27. R.C. Whaley, A. Petitet, and J.J. Dongarra, “Automated Empirical Optimizations of Software and the Atlas Project,” Parallel Computing, vol. 27, nos. 1–2, 2001, pp. 3–35.

    Article  MATH  Google Scholar 

  28. C. Chrysafis and A. Ortega, “Line Based Reduced Memory Wavelet Image Compression,” IEEE Transactions on Image Processing, vol. 9, March 2000, pp. 378–389.

    Article  MathSciNet  MATH  Google Scholar 

  29. Y. Kim and W.A. Pearlman, “Stripe-Based Spiht Lossy Compression of Volumetric Medical Images for Low Memory Usage and Uniform Reconstruction Quality,” in Proceedings of International Conference on Acoustics, Speech and Signal Processing, 2000.

  30. C. Parisot, M. Antonini, and M. Barlaud, “3d Scan-Based wavelet Transform and Quality Control for Video Coding,” EURASIP Journal on Applied Signal Processing, 1, January 2003.

  31. O. Lempel, A. Peleg, and U. Weiser, “Intel’s mmx Technology—A New Instruction Set,” in Proceedings of 42nd IEEE Computer Society International Conference, 1997.

  32. I. Corporation, “Ia-32 Intel Architecture Software Developer’s Manual,” Available at http://developer.intel.com/.

  33. P. Ranganathan, S. Adve, and N.P. Jouppi, “Performance of Image and Video Processing with General-Purpose Processors and Media isa Extensions,” in ternational Symposium on Computer Architecture, May 1999.

  34. L. Nachtergaele, G. Lafruit, J. Bormans, and I. Bolsens, “Fast Software Implementation of the mpeg-4 Reversible Integer Wavelet Transform on Pentium mmx, Sharc adsp and Trimedia tm1000,” in Proceedings of Packet Video, 2000.

  35. G. Conte, S. Tommesani, and F. Zanichelli, “The Long and Winding Road to High-Perfomance Image Processing with mmx/sse,” in Proceedings of the Fifth IEEE International Workshop on Computer Architectures for Machine Perception, 2000.

  36. S. Thakkar and T. Huff, “Internet Streaming simd Extensions,” IEEE Computer, vol. 32, 1999, pp. 26–34.

    Article  Google Scholar 

  37. M.J. Wolfe, High Perfomance Compilers for Parallel Computer. Addison-Wesley Publishing Company, 1996.

  38. D. Heller, Rabbit: A perfomance counters library for intel/amd processors and linux. Available at http://www.scl.ameslab.gov/Projects/Rabbit/.

  39. T. Sikora, MPEG Digital Video Coding Standars, McGraw Hill Company, 1997.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gregorio Bernabé.

Additional information

Special Issue on Media and Communication Applications on General Purpose Processors: Hardware and Software Issues/Journal of VLSI Signal Processing Systems/Dr. Eric Debes, (Lead) Guest Editor. Contact Author: Gregorio Bernabé.

Gregorio Bernabé was born in Antibes (Alpes Maritimos, France) on 21 November 1974. He received the M.S. in Computer Science from the University of Murcia (Spain) in 1997. In 1998, he joined the Computer Engineering Department of the University of Murcia, where he is an Assistant Professor as well as a Ph. D. candidate. His current research interests include video compression using the Wavelet Transform, and the development of optimizations to improve the performance of the video compression algorithms based on the 3D wavelet transform.

Jose M. Garcia was born in Valencia, Spain on 9 January, 1962. He received the MS and the PhD degrees in electrical engineering from the Technical University of Valencia (Valencia, Spain), in 1987 and 1991, respectively. In 1987 he joined the Computer Science Department at the University of Castilla-La Mancha at the Campus of Albacete (Spain). From 1987 to 1993, he was an Assistant Professor of Computer Architecture. In 1994 he became an Associate Professor at the University of Murcia (Spain). From 1995 to 1997 he served as Vice-Dean of the School of Computer Science. At present, he is the Director of the Computer Engineering Department, and also the Head of the Research Group on Parallel Computing and Architecture. He has developed several courses on Computer Structure, Peripheral Devices, Computer Architecture and Multicomputer Design. His current research interests include Multiprocessors Systems, Interconnection Networks, File Systems, Grid Computing and its Application in Multimedia Systems. He has published over 45 refereed papers in different Journals and Conferences in these fields. Dr. Garcia is a member of several international associations as IEEE Computer Society, ACM, USENIX, and also a member of some European associations (Euromicro and ATI).

Pepe Gonzalez received the M.S. and Ph.D. degrees from the Universitat Politecnica de Catalunya (UPC). In January 2000, he joined the Computer Engineering Department of the University of Murcia, Spain, and became an Associate Professor in June 2001. In March 2002, he joined the Intel Barcelona Research Center, where he is a Senior Researcher. Currently, Pepe is working in new paradigms for the IA-32 family, in particular, Thermal-and Power-Aware clustered microarchitectures. pepe.gonzalez@intel.com

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bernabé, G., García, J.M. & González, J. Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions. J VLSI Sign Process Syst Sign Image Video Technol 41, 209–223 (2005). https://doi.org/10.1007/s11265-005-6651-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-005-6651-6

Keywords

Navigation