Skip to main content
Log in

Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 × 4 and 2 × 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120× concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 × 7,680 @ 30 fps).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agostini, L., Porto, M., Guntzel, J., Porto, R., Bampi, S.: High throughput FPGA based architecture for H.264/AVC inverse transforms and quantization. In: 49th IEEE International Midwest Symposium on Circuits Systems, vol. 1, pp. 281–285 (2006)

  2. Azevedo, A., Meenderinck, C., Juurlink, B., Terechko, A., Hoogerbrugge, J., Alvarez, M., Ramirez, A.: Parallel H.264 decoding on an embedded multicore processor. In: 4th International Conference on High Performance Embedded Architectures and Compilers, pp. 404–418. Springer, Berlin, Heidelberg (2009). doi:10.1007/978-3-540-92990-1_29

  3. Bertels K., Sima V., Yankova Y., Kuzmanov G., Luk W., Coutinho G., Ferrandi F., Pilato C., Lattuada M., Sciuto D., Michelotti A.: Hartes: hardware-software codesign for heterogeneous multicore platforms. IEEE Micro. 30(5), 88–97 (2010). doi:10.1109/MM.2010.91

    Article  Google Scholar 

  4. Chaoui, J., Cyr, K., Giacalone, J.P., de Gregorio, S., Masse, Y., Muthusamy, Y., Spits, T., Budagavi, M., Webb, J.: OMAP: enabling multimedia applications in third generation (3G) wireless terminals. In: White Paper: Extensible Processing Platform. Texas Instruments (2000)

  5. Cheng C., Parhi K.: A novel systolic array structure for DCT. IEEE Trans. Circuits Syst. II 52(7), 366–369 (2005)

    Article  Google Scholar 

  6. Dias, T., Roma, N., Sousa, L., Ribeiro, M.: Adaptive motion estimation processor for autonomous video devices. EURASIP J. Embed. Syst. - Special Issue on Embedded System for Portable and Mobile Video Platforms 57234, 1–10 (2007)

  7. Do, T., Le, T.: High throughput area-efficient SoC-based forward/inverse integer transforms for H.264/AVC. In: Proceedings of 2010 IEEE International Symposium Circuits Systems, pp. 4113–4116 (2010)

  8. Fan C.P.: Fast 2-dimensional 4x4 forward integer transform implementation for H.264/AVC. IEEE Trans. Circuits Syst. II 53(3), 174–177 (2006). doi:10.1109/TCSII.2005.858748

    Article  Google Scholar 

  9. Ho, T., Le, T., Vu, K., Mochizuki, S., Iwata, K., Matsumoto, K., Ueda, H.: A 768 Megapixels/sec inverse transform with hybrid architecture for multi-standard decoder. In: IEEE 9th International Conference on ASIC, pp. 71–74 (2011). doi:10.1109/ASICON.2011.6157125

  10. Husemann, R., Majolo, M., Susin, A., Roesler, V., Lima, J.: Highly efficient transforms module solution for a H.264/SVC encoder. In: 2010 IEEE Computer Society Annual Symposium on VLSI, pp. 86–91 (2010)

  11. Hwangbo W., Kyung C.M.: A multitransform architecture for H.264/AVC high-profile coders. IEEE Trans. Multimed. 12(3), 157–167 (2010). doi:10.1109/TMM.2010.2041099

    Article  Google Scholar 

  12. Jiang, C., Yu, N., Gu, M.: A novel VLSI architecture of 8x8 integer DCT based on H.264/AVC FRext. In: 3rd International Symposium on Knowledge Acquisition and Modeling, pp. 59–62 (2010). doi:10.1109/KAM.2010.5646328

  13. JM H.264/AVC Reference Software-version 13.0. http://iphome.hhi.de/suehring/tml/ (2007)

  14. Kordasiewicz, R., Shirani, S.: Hardware implementation of the optimized transform and quantization blocks of H.264. In: 2004 Canadian Conference Electrical and Computer Engineering, vol. 2, pp. 943–946 (2004)

  15. Kung S.Y.: VLSI Array Processors. Prentice Hall, Englewood Cliffs (1988)

    Google Scholar 

  16. Lee, S., Cho, K.: Design of high-performance transform and quantization circuit for unified video CODEC. In: 2008 IEEE Asia Pacific Conference Circuits and Systems, pp. 1450–1453 (2008)

  17. Li, J., Ahamdi, M.: Realizing high throughput transforms of H.264/AVC. In: 2008 IEEE International Symposium on Circuits Systems, pp. 840–843 (2008)

  18. Ling-Zhi, L., Lin, Q., Meng-Tian, R., Li, J.: A 2-D forward/inverse integer transform processor of H.264 based on highly-parallel architecture. In: 4th IEEE International Workshop on System-on-Chip for Real-Time Applications, pp. 158–161 (2004). doi:10.1109/IWSOC.2004.1319870

  19. Liu, Z., Wang, D., Ikenaga, T.: Hardware optimizations of variable block size Hadamard transform for H.264/AVC FRExt. In: 16th IEEE International Conference Image Processing, pp. 2701–2704 (2009)

  20. Lo C.C., Tsai S.T., Shieh M.D.: Reconfigurable architecture for entropy decoding and inverse transform in H.264. IEEE Trans. Consum. Electron. 56(3), 1670–1676 (2010). doi:10.1109/TCE.2010.5606311

    Article  Google Scholar 

  21. Minasyan, S., Astola, J., Guevorkian, D.: On unified architectures for synthesizing and implementation of fast parametric transforms. In: 5th International Conference on Information, Communications and Signal Processing, pp. 710–714 (2005). doi:10.1109/ICICS.2005.1689140

  22. Momcilovic, S., Roma, N., Sousa, L.: Multi-level parallelization of advanced video coding on hybrid CPU+GPU platforms. In: International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (2012)

  23. Nadeem, M., Wong, S., Kuzmanov, G.: An efficient realization of forward integer transform in H.264/AVC intra-frame encoder. In: International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (2010)

  24. Ostermann J., Bormans J., List P., Marpe D., Narroschke M., Pereira F., Stockhammer T., Wedi T.: Video coding with H.264/AVC: tools, performance and complexity. IEEE Circuits Syst. Mag. 4(1), 7–28 (2004)

    Article  Google Scholar 

  25. Richardson I.E.: The H.264 Advanced Video Compression Standard. Wiley, New York (2010)

    Book  Google Scholar 

  26. Rodrigues, A., Roma, N., Sousa, L.: p264: open platform for designing parallel H.264/AVC video encoders on multi-core systems. In: 20th International Workshop on Network and Operating Systems Support for Digital Audio and Video, pp. 81–86. ACM, New York, NY, USA (2010). doi:10.1145/1806565.1806586.

  27. Sihvo, T., Niittylahti, J.: Row-column decomposition based 2D transform optimization on subword parallel processors. In: 2005 International Symposium on Signals, Circuits and Systems, vol. 1, pp. 99–102 (2005). doi:10.1109/ISSCS.2005.1509860

  28. Tasdizen, O., Hamzaoglu, I.: A high performance and low cost hardware architecture for H.264 transform and quantization algorithms. In: 13th European Signal Processing Conference, pp. 4–8 (2005)

  29. Wahid, K., Martuza, M., Das, M., McCrosky, C.: Resource shared architecture of multiple transforms for multiple video codecs. In: 24th Canadian Conference on Electrical and Computer Engineering, pp. 947–950 (2011). doi:10.1109/CCECE.2011.6030599

  30. Wang K., Chen J., Cao W., Wang Y., Wang L., Tong J.: A reconfigurable multi-transform VLSI architecture supporting video codec design. IEEE Trans. Circuits Syst. II 58(7), 432–436 (2011). doi:10.1109/TCSII.2011.2158265

    Article  Google Scholar 

  31. Wei, C., Hui, H., Jinmei, L., Jiarong, T., Hao, M.: A high-performance reconfigurable 2-D transform architecture for H.264. In: 15th IEEE International Conference on Electronics, Circuits and Systems, pp. 606–609 (2008)

  32. Wiegand T., Sullivan G., Bjntegaard G., Luthra A.: Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)

    Article  Google Scholar 

  33. Wolf, W.H.: Hardware-software co-design of embedded systems. In: IEEE, pp. 967–989 (1994)

  34. Xilinx Inc.: ML505/ML506/ML507 Evaluation Platform User Guide v3.1.2 (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tiago Dias.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dias, T., López, S., Roma, N. et al. Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems. Int J Parallel Prog 41, 236–260 (2013). https://doi.org/10.1007/s10766-012-0221-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-012-0221-x

Keywords

Navigation