Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems

Dias, Tiago; López, Sebastián; Roma, Nuno; Sousa, Leonel

doi:10.1007/s10766-012-0221-x

Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems

Published: 02 October 2012

Volume 41, pages 236–260, (2013)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Tiago Dias¹,
Sebastián López²,
Nuno Roma³ &
…
Leonel Sousa³

262 Accesses
2 Citations
Explore all metrics

Abstract

A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 × 4 and 2 × 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120× concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 × 7,680 @ 30 fps).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs

Article Open access 11 July 2014

Tiago Dias, Nuno Roma & Leonel Sousa

Performance analysis of optimized versatile video coding software decoders on embedded platforms

Article Open access 31 October 2023

Anup Saha, Wassim Hamidouche, … Ibrahim Farhat

An Area Efficient and Reusable HEVC 1D-DCT Hardware Accelerator

References

Agostini, L., Porto, M., Guntzel, J., Porto, R., Bampi, S.: High throughput FPGA based architecture for H.264/AVC inverse transforms and quantization. In: 49th IEEE International Midwest Symposium on Circuits Systems, vol. 1, pp. 281–285 (2006)
Azevedo, A., Meenderinck, C., Juurlink, B., Terechko, A., Hoogerbrugge, J., Alvarez, M., Ramirez, A.: Parallel H.264 decoding on an embedded multicore processor. In: 4th International Conference on High Performance Embedded Architectures and Compilers, pp. 404–418. Springer, Berlin, Heidelberg (2009). doi:10.1007/978-3-540-92990-1_29
Bertels K., Sima V., Yankova Y., Kuzmanov G., Luk W., Coutinho G., Ferrandi F., Pilato C., Lattuada M., Sciuto D., Michelotti A.: Hartes: hardware-software codesign for heterogeneous multicore platforms. IEEE Micro. 30(5), 88–97 (2010). doi:10.1109/MM.2010.91
Article Google Scholar
Chaoui, J., Cyr, K., Giacalone, J.P., de Gregorio, S., Masse, Y., Muthusamy, Y., Spits, T., Budagavi, M., Webb, J.: OMAP: enabling multimedia applications in third generation (3G) wireless terminals. In: White Paper: Extensible Processing Platform. Texas Instruments (2000)
Cheng C., Parhi K.: A novel systolic array structure for DCT. IEEE Trans. Circuits Syst. II 52(7), 366–369 (2005)
Article Google Scholar
Dias, T., Roma, N., Sousa, L., Ribeiro, M.: Adaptive motion estimation processor for autonomous video devices. EURASIP J. Embed. Syst. - Special Issue on Embedded System for Portable and Mobile Video Platforms 57234, 1–10 (2007)
Do, T., Le, T.: High throughput area-efficient SoC-based forward/inverse integer transforms for H.264/AVC. In: Proceedings of 2010 IEEE International Symposium Circuits Systems, pp. 4113–4116 (2010)
Fan C.P.: Fast 2-dimensional 4x4 forward integer transform implementation for H.264/AVC. IEEE Trans. Circuits Syst. II 53(3), 174–177 (2006). doi:10.1109/TCSII.2005.858748
Article Google Scholar
Ho, T., Le, T., Vu, K., Mochizuki, S., Iwata, K., Matsumoto, K., Ueda, H.: A 768 Megapixels/sec inverse transform with hybrid architecture for multi-standard decoder. In: IEEE 9th International Conference on ASIC, pp. 71–74 (2011). doi:10.1109/ASICON.2011.6157125
Husemann, R., Majolo, M., Susin, A., Roesler, V., Lima, J.: Highly efficient transforms module solution for a H.264/SVC encoder. In: 2010 IEEE Computer Society Annual Symposium on VLSI, pp. 86–91 (2010)
Hwangbo W., Kyung C.M.: A multitransform architecture for H.264/AVC high-profile coders. IEEE Trans. Multimed. 12(3), 157–167 (2010). doi:10.1109/TMM.2010.2041099
Article Google Scholar
Jiang, C., Yu, N., Gu, M.: A novel VLSI architecture of 8x8 integer DCT based on H.264/AVC FRext. In: 3rd International Symposium on Knowledge Acquisition and Modeling, pp. 59–62 (2010). doi:10.1109/KAM.2010.5646328
JM H.264/AVC Reference Software-version 13.0. http://iphome.hhi.de/suehring/tml/ (2007)
Kordasiewicz, R., Shirani, S.: Hardware implementation of the optimized transform and quantization blocks of H.264. In: 2004 Canadian Conference Electrical and Computer Engineering, vol. 2, pp. 943–946 (2004)
Kung S.Y.: VLSI Array Processors. Prentice Hall, Englewood Cliffs (1988)
Google Scholar
Lee, S., Cho, K.: Design of high-performance transform and quantization circuit for unified video CODEC. In: 2008 IEEE Asia Pacific Conference Circuits and Systems, pp. 1450–1453 (2008)
Li, J., Ahamdi, M.: Realizing high throughput transforms of H.264/AVC. In: 2008 IEEE International Symposium on Circuits Systems, pp. 840–843 (2008)
Ling-Zhi, L., Lin, Q., Meng-Tian, R., Li, J.: A 2-D forward/inverse integer transform processor of H.264 based on highly-parallel architecture. In: 4th IEEE International Workshop on System-on-Chip for Real-Time Applications, pp. 158–161 (2004). doi:10.1109/IWSOC.2004.1319870
Liu, Z., Wang, D., Ikenaga, T.: Hardware optimizations of variable block size Hadamard transform for H.264/AVC FRExt. In: 16th IEEE International Conference Image Processing, pp. 2701–2704 (2009)
Lo C.C., Tsai S.T., Shieh M.D.: Reconfigurable architecture for entropy decoding and inverse transform in H.264. IEEE Trans. Consum. Electron. 56(3), 1670–1676 (2010). doi:10.1109/TCE.2010.5606311
Article Google Scholar
Minasyan, S., Astola, J., Guevorkian, D.: On unified architectures for synthesizing and implementation of fast parametric transforms. In: 5th International Conference on Information, Communications and Signal Processing, pp. 710–714 (2005). doi:10.1109/ICICS.2005.1689140
Momcilovic, S., Roma, N., Sousa, L.: Multi-level parallelization of advanced video coding on hybrid CPU+GPU platforms. In: International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (2012)
Nadeem, M., Wong, S., Kuzmanov, G.: An efficient realization of forward integer transform in H.264/AVC intra-frame encoder. In: International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (2010)
Ostermann J., Bormans J., List P., Marpe D., Narroschke M., Pereira F., Stockhammer T., Wedi T.: Video coding with H.264/AVC: tools, performance and complexity. IEEE Circuits Syst. Mag. 4(1), 7–28 (2004)
Article Google Scholar
Richardson I.E.: The H.264 Advanced Video Compression Standard. Wiley, New York (2010)
Book Google Scholar
Rodrigues, A., Roma, N., Sousa, L.: p264: open platform for designing parallel H.264/AVC video encoders on multi-core systems. In: 20th International Workshop on Network and Operating Systems Support for Digital Audio and Video, pp. 81–86. ACM, New York, NY, USA (2010). doi:10.1145/1806565.1806586.
Sihvo, T., Niittylahti, J.: Row-column decomposition based 2D transform optimization on subword parallel processors. In: 2005 International Symposium on Signals, Circuits and Systems, vol. 1, pp. 99–102 (2005). doi:10.1109/ISSCS.2005.1509860
Tasdizen, O., Hamzaoglu, I.: A high performance and low cost hardware architecture for H.264 transform and quantization algorithms. In: 13th European Signal Processing Conference, pp. 4–8 (2005)
Wahid, K., Martuza, M., Das, M., McCrosky, C.: Resource shared architecture of multiple transforms for multiple video codecs. In: 24th Canadian Conference on Electrical and Computer Engineering, pp. 947–950 (2011). doi:10.1109/CCECE.2011.6030599
Wang K., Chen J., Cao W., Wang Y., Wang L., Tong J.: A reconfigurable multi-transform VLSI architecture supporting video codec design. IEEE Trans. Circuits Syst. II 58(7), 432–436 (2011). doi:10.1109/TCSII.2011.2158265
Article Google Scholar
Wei, C., Hui, H., Jinmei, L., Jiarong, T., Hao, M.: A high-performance reconfigurable 2-D transform architecture for H.264. In: 15th IEEE International Conference on Electronics, Circuits and Systems, pp. 606–609 (2008)
Wiegand T., Sullivan G., Bjntegaard G., Luthra A.: Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)
Article Google Scholar
Wolf, W.H.: Hardware-software co-design of embedded systems. In: IEEE, pp. 967–989 (1994)
Xilinx Inc.: ML505/ML506/ML507 Evaluation Platform User Guide v3.1.2 (2011)

Download references

Author information

Authors and Affiliations

ISEL-PI Lisbon/INESC-ID Lisbon/IST-TU Lisbon, Rua Conselheiro Emídio Navarro 1, 1959-007, Lisbon, Portugal
Tiago Dias
IUMA/University of Las Palmas GC, Campus Universitario de Tafira, 35017, Las Palmas de Gran Canaria, Spain
Sebastián López
INESC-ID Lisbon/IST-TU Lisbon, Rua Alves Redol 9, 1000-029, Lisbon, Portugal
Nuno Roma & Leonel Sousa

Authors

Tiago Dias
View author publications
You can also search for this author in PubMed Google Scholar
Sebastián López
View author publications
You can also search for this author in PubMed Google Scholar
Nuno Roma
View author publications
You can also search for this author in PubMed Google Scholar
Leonel Sousa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tiago Dias.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dias, T., López, S., Roma, N. et al. Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems. Int J Parallel Prog 41, 236–260 (2013). https://doi.org/10.1007/s10766-012-0221-x

Download citation

Received: 22 March 2012
Accepted: 06 September 2012
Published: 02 October 2012
Issue Date: April 2013
DOI: https://doi.org/10.1007/s10766-012-0221-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems

Abstract

Access this article

Similar content being viewed by others

Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs

Performance analysis of optimized versatile video coding software decoders on embedded platforms

An Area Efficient and Reusable HEVC 1D-DCT Hardware Accelerator

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems

Abstract

Access this article

Similar content being viewed by others

Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs

Performance analysis of optimized versatile video coding software decoders on embedded platforms

An Area Efficient and Reusable HEVC 1D-DCT Hardware Accelerator

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation