Abstract
A novel high throughput and scalable unified architecture for the computation of the transform operations in video codecs for advanced standards is presented in this paper. This structure can be used as a hardware accelerator in modern embedded systems to efficiently compute all the two-dimensional 4 × 4 and 2 × 2 transforms of the H.264/AVC standard. Moreover, its highly flexible design and hardware efficiency allows it to be easily scaled in terms of performance and hardware cost to meet the specific requirements of any given video coding application. Experimental results obtained using a Xilinx Virtex-5 FPGA demonstrated the superior performance and hardware efficiency levels provided by the proposed structure, which presents a throughput per unit of area relatively higher than other similar recently published designs targeting the H.264/AVC standard. Such results also showed that, when integrated in a multi-core embedded system, this architecture provides speedup factors of about 120× concerning pure software implementations of the transform algorithms, therefore allowing the computation, in real-time, of all the above mentioned transforms for Ultra High Definition Video (UHDV) sequences (4,320 × 7,680 @ 30 fps).
Similar content being viewed by others
References
Agostini, L., Porto, M., Guntzel, J., Porto, R., Bampi, S.: High throughput FPGA based architecture for H.264/AVC inverse transforms and quantization. In: 49th IEEE International Midwest Symposium on Circuits Systems, vol. 1, pp. 281–285 (2006)
Azevedo, A., Meenderinck, C., Juurlink, B., Terechko, A., Hoogerbrugge, J., Alvarez, M., Ramirez, A.: Parallel H.264 decoding on an embedded multicore processor. In: 4th International Conference on High Performance Embedded Architectures and Compilers, pp. 404–418. Springer, Berlin, Heidelberg (2009). doi:10.1007/978-3-540-92990-1_29
Bertels K., Sima V., Yankova Y., Kuzmanov G., Luk W., Coutinho G., Ferrandi F., Pilato C., Lattuada M., Sciuto D., Michelotti A.: Hartes: hardware-software codesign for heterogeneous multicore platforms. IEEE Micro. 30(5), 88–97 (2010). doi:10.1109/MM.2010.91
Chaoui, J., Cyr, K., Giacalone, J.P., de Gregorio, S., Masse, Y., Muthusamy, Y., Spits, T., Budagavi, M., Webb, J.: OMAP: enabling multimedia applications in third generation (3G) wireless terminals. In: White Paper: Extensible Processing Platform. Texas Instruments (2000)
Cheng C., Parhi K.: A novel systolic array structure for DCT. IEEE Trans. Circuits Syst. II 52(7), 366–369 (2005)
Dias, T., Roma, N., Sousa, L., Ribeiro, M.: Adaptive motion estimation processor for autonomous video devices. EURASIP J. Embed. Syst. - Special Issue on Embedded System for Portable and Mobile Video Platforms 57234, 1–10 (2007)
Do, T., Le, T.: High throughput area-efficient SoC-based forward/inverse integer transforms for H.264/AVC. In: Proceedings of 2010 IEEE International Symposium Circuits Systems, pp. 4113–4116 (2010)
Fan C.P.: Fast 2-dimensional 4x4 forward integer transform implementation for H.264/AVC. IEEE Trans. Circuits Syst. II 53(3), 174–177 (2006). doi:10.1109/TCSII.2005.858748
Ho, T., Le, T., Vu, K., Mochizuki, S., Iwata, K., Matsumoto, K., Ueda, H.: A 768 Megapixels/sec inverse transform with hybrid architecture for multi-standard decoder. In: IEEE 9th International Conference on ASIC, pp. 71–74 (2011). doi:10.1109/ASICON.2011.6157125
Husemann, R., Majolo, M., Susin, A., Roesler, V., Lima, J.: Highly efficient transforms module solution for a H.264/SVC encoder. In: 2010 IEEE Computer Society Annual Symposium on VLSI, pp. 86–91 (2010)
Hwangbo W., Kyung C.M.: A multitransform architecture for H.264/AVC high-profile coders. IEEE Trans. Multimed. 12(3), 157–167 (2010). doi:10.1109/TMM.2010.2041099
Jiang, C., Yu, N., Gu, M.: A novel VLSI architecture of 8x8 integer DCT based on H.264/AVC FRext. In: 3rd International Symposium on Knowledge Acquisition and Modeling, pp. 59–62 (2010). doi:10.1109/KAM.2010.5646328
JM H.264/AVC Reference Software-version 13.0. http://iphome.hhi.de/suehring/tml/ (2007)
Kordasiewicz, R., Shirani, S.: Hardware implementation of the optimized transform and quantization blocks of H.264. In: 2004 Canadian Conference Electrical and Computer Engineering, vol. 2, pp. 943–946 (2004)
Kung S.Y.: VLSI Array Processors. Prentice Hall, Englewood Cliffs (1988)
Lee, S., Cho, K.: Design of high-performance transform and quantization circuit for unified video CODEC. In: 2008 IEEE Asia Pacific Conference Circuits and Systems, pp. 1450–1453 (2008)
Li, J., Ahamdi, M.: Realizing high throughput transforms of H.264/AVC. In: 2008 IEEE International Symposium on Circuits Systems, pp. 840–843 (2008)
Ling-Zhi, L., Lin, Q., Meng-Tian, R., Li, J.: A 2-D forward/inverse integer transform processor of H.264 based on highly-parallel architecture. In: 4th IEEE International Workshop on System-on-Chip for Real-Time Applications, pp. 158–161 (2004). doi:10.1109/IWSOC.2004.1319870
Liu, Z., Wang, D., Ikenaga, T.: Hardware optimizations of variable block size Hadamard transform for H.264/AVC FRExt. In: 16th IEEE International Conference Image Processing, pp. 2701–2704 (2009)
Lo C.C., Tsai S.T., Shieh M.D.: Reconfigurable architecture for entropy decoding and inverse transform in H.264. IEEE Trans. Consum. Electron. 56(3), 1670–1676 (2010). doi:10.1109/TCE.2010.5606311
Minasyan, S., Astola, J., Guevorkian, D.: On unified architectures for synthesizing and implementation of fast parametric transforms. In: 5th International Conference on Information, Communications and Signal Processing, pp. 710–714 (2005). doi:10.1109/ICICS.2005.1689140
Momcilovic, S., Roma, N., Sousa, L.: Multi-level parallelization of advanced video coding on hybrid CPU+GPU platforms. In: International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (2012)
Nadeem, M., Wong, S., Kuzmanov, G.: An efficient realization of forward integer transform in H.264/AVC intra-frame encoder. In: International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (2010)
Ostermann J., Bormans J., List P., Marpe D., Narroschke M., Pereira F., Stockhammer T., Wedi T.: Video coding with H.264/AVC: tools, performance and complexity. IEEE Circuits Syst. Mag. 4(1), 7–28 (2004)
Richardson I.E.: The H.264 Advanced Video Compression Standard. Wiley, New York (2010)
Rodrigues, A., Roma, N., Sousa, L.: p264: open platform for designing parallel H.264/AVC video encoders on multi-core systems. In: 20th International Workshop on Network and Operating Systems Support for Digital Audio and Video, pp. 81–86. ACM, New York, NY, USA (2010). doi:10.1145/1806565.1806586.
Sihvo, T., Niittylahti, J.: Row-column decomposition based 2D transform optimization on subword parallel processors. In: 2005 International Symposium on Signals, Circuits and Systems, vol. 1, pp. 99–102 (2005). doi:10.1109/ISSCS.2005.1509860
Tasdizen, O., Hamzaoglu, I.: A high performance and low cost hardware architecture for H.264 transform and quantization algorithms. In: 13th European Signal Processing Conference, pp. 4–8 (2005)
Wahid, K., Martuza, M., Das, M., McCrosky, C.: Resource shared architecture of multiple transforms for multiple video codecs. In: 24th Canadian Conference on Electrical and Computer Engineering, pp. 947–950 (2011). doi:10.1109/CCECE.2011.6030599
Wang K., Chen J., Cao W., Wang Y., Wang L., Tong J.: A reconfigurable multi-transform VLSI architecture supporting video codec design. IEEE Trans. Circuits Syst. II 58(7), 432–436 (2011). doi:10.1109/TCSII.2011.2158265
Wei, C., Hui, H., Jinmei, L., Jiarong, T., Hao, M.: A high-performance reconfigurable 2-D transform architecture for H.264. In: 15th IEEE International Conference on Electronics, Circuits and Systems, pp. 606–609 (2008)
Wiegand T., Sullivan G., Bjntegaard G., Luthra A.: Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003)
Wolf, W.H.: Hardware-software co-design of embedded systems. In: IEEE, pp. 967–989 (1994)
Xilinx Inc.: ML505/ML506/ML507 Evaluation Platform User Guide v3.1.2 (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dias, T., López, S., Roma, N. et al. Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems. Int J Parallel Prog 41, 236–260 (2013). https://doi.org/10.1007/s10766-012-0221-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-012-0221-x