Abstract
This paper uses joint algorithm and architecture design to enable high coding efficiency in conjunction with high processing speed and low area cost. Specifically, it presents several optimizations that can be performed on Context Adaptive Binary Arithmetic Coding (CABAC), a form of entropy coding used in H.264/AVC, to achieve the throughput necessary for real-time low power high definition video coding. The combination of syntax element partitions and interleaved entropy slices, referred to as Massively Parallel CABAC, increases the number of binary symbols that can be processed in a cycle. Subinterval reordering is used to reduce the cycle time required to process each binary symbol. Under common conditions using the JM12.0 software, the Massively Parallel CABAC, increases the bins per cycle by 2.7 to 32.8× at a cost of 0.25 to 6.84% coding loss compared with sequential single slice H.264/AVC CABAC. It also provides a 2× reduction in area cost, and reduces memory bandwidth. Subinterval reordering reduces the critical path delay by 14 to 22%, while modifications to context selection reduces the memory requirement by 67%. This work demonstrates that accounting for implementation cost during video coding algorithms design can enable higher processing speed and reduce hardware cost, while still delivering high coding efficiency in the next generation video coding standard.
Similar content being viewed by others
References
Bjøntegaard, G. (2001). VCEG-M33: Calculation of average PSNR differences between RD curves. ITU-T SG. 16 Q. 6, Video Coding Experts Group (VCEG).
Chen, J. W., & Lin, Y. L. (2009). A high-performance hardwired CABAC decoder for ultra-high resolution video. IEEE Trans. on Consumer Electronics, 55(3), 1614–1622.
Chuang, T. D., Tsung, P. K., Pin-Chih Lin, L. M. C., Ma, T. C., Chen, Y. H., et al. (2010). A 59.5 scalable/multi-view video decoder chip for Quad/3D full HDTV and video streaming applications. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (pp. 330–331).
Finchelstein, D., Sze, V., & Chandrakasan, A. (2009). Multicore processing and efficient on-chip caching for H.264 and future video decoders. IEEE Trans. on Circuits and Systems for Video Technology, 19(11), 1704–1713.
Guo, X., Huang, Y. W., & Lei, S. (2009). VCEG-AK25: Ordered entropy slices for parallel CABAC. ITU-T SG. 16 Q. 6, Video Coding Experts Group (VCEG).
Henry, F., & Pateux, S. (2011). JCTVC-E196: Wavefront parallel processing. Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11.
Marpe, D., Schwarz, H., & Wiegand, T. (2003). Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard. IEEE Trans. on Circuits and Systems for Video Technology, 13(7), 620–636.
Recommendation ITU-T H.264 (2003). Advanced video coding for generic audiovisual services. Tech. rep., ITU-T.
Sze, V., Budagavi, M., & Chandrakasan, A. (2009). VCEG-AL21: Massively parallel CABAC. ITU-T SG. 16 Q. 6, Video Coding Experts Group (VCEG).
Sze, V., & Chandrakasan, A. (2011). A highly parallel and scalable cabac decoder for next generation video coding. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (pp. 126–128).
Sze, V., & Chandrakasan, A. (2011). Joint algorithm-architecture optimization of CABAC to increase speed and reduce area cost. IEEE Inter. Conf. on Acoustics, Speech and Signal Processing (pp. 1577–1580).
Sze, V., & Chandrakasan, A. (2012). A highly parallel and scalable cabac decoder for next generation video coding. IEEE Journal of Solid-State Circuits, 47(1), 8–22.
Sze, V., & Chandrakasan, A. P. (2009). A high throughput CABAC algorithm using syntax element partitioning. IEEE Inter. Conf. on Image Processing (pp. 773–776).
Tan, T., Sullivan, G., & Wedi, T. (2007). VCEG-AE010: Recommended simulation common conditions for coding efficiency experiments Rev. 1. ITU-T SG. 16 Q. 6, Video Coding Experts Group (VCEG).
Tan, T. K., Sullivan, G., & Ohm, J. R. (2010). JCTVC-C405: Summary of HEVC working draft 1 and HEVC test model (HM). Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11.
Yang, Y. C., & Guo, J. I. (2009). High-throughput H.264/AVC high-profile CABAC decoder for HDTV applications. IEEE Trans. on Circuits and Systems for Video Technology, 19(9), 1395–1399.
Zhang, P., Xie, D., & Gao, W. (2009). Variable-bin-rate CABAC engine for H.264/AVC high definition real-time decoding. IEEE Trans. on Very Large Scale Integration (VLSI) Systems, 17(3), 417–426.
Zhao, J., & Segall, A. (2008). COM16-C405: Entropy slices for parallel entropy decoding. ITU-T SG. 16 Q. 6, Video Coding Experts Group (VCEG).
Zhao, J., & Segall, A. (2008). VCEG-AI32: New results using entropy slices for parallel decoding. ITU-T SG. 16 Q. 6, Video Coding Experts Group (VCEG).
Acknowledgements
The authors would like to thank Madhukar Budagavi and Daniel Finchelstein for valuable feedback and discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was funded by Texas Instruments. The work of V. Sze was supported by the Texas Instruments Graduate Women’s Fellowship for Leadership in Microelectronics and NSERC.
Rights and permissions
About this article
Cite this article
Sze, V., Chandrakasan, A.P. Joint Algorithm-Architecture Optimization of CABAC. J Sign Process Syst 69, 239–252 (2012). https://doi.org/10.1007/s11265-012-0678-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-012-0678-2