Abstract
A hardware video encoder based on recent video coding standards such as HEVC and VVC needs to efficiently handle a massive number of memory accesses to search motion vectors. To this end, first, this paper preliminarily evaluates the memory access behavior of a hardware video encoding pipeline. The preliminary evaluation suggests that the behavior of the early stages of the pipeline, accessing the wide areas of reference frames for the rough search, is quite different from those of the subsequent ones, accessing the small areas of them for the precise search. Therefore, this paper proposes a partitioned memory architecture for the hardware video encoding pipeline. This architecture adopts a split cache structure that consists of a front-end cache and a back-end cache. The front-end cache stores shrunk reference frames and provides them for the rough search in the early stages. Normal reference frames for the precise search are provided only to the subsequent stages through the back-end cache. As a result, this structure can reduce the memory bandwidth requirement. On the other hand, the split cache structure cannot reuse the data loaded by the early stages. It increases cache misses in the subsequent stages and may violate the deadline of memory accesses for real-time encoding. To solve this problem, this paper also designs and implements a coding tree unit (CTU) prefetcher to the back-end cache. The CTU prefetcher loads the data used by the subsequent stages without waiting for the results of the early stages. The evaluation results show that the proposed memory system can successfully reduce the cache miss rate and the deadline miss rate in the subsequent stages. As a result, the proposed memory architecture can contribute to satisfying the demands for real-time encoding while reducing energy consumption.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Binkert, N., et al.: The gem5 simulator. ACM SIGARCH Comput. Architect. News 39(2), 1–7 (2011). https://doi.org/10.1145/2024716.2024718
Bross, B., et al.: Overview of the versatile video coding (VVC) standard and its applications. IEEE Trans. Circuits Syst. Video Technol. 31(10), 3736–3764 (2021). https://doi.org/10.1109/TCSVT.2021.3101953
Cerveira, A., Agostini, L., Zatt, B., Sampaio, F.: Memory assessment of versatile video coding. In: International Conference on Image Processing, vol. 2020, pp. 1186–1190. IEEE Computer Society (2020). https://doi.org/10.1109/ICIP40778.2020.9191358
JCT-VC: HEVC test model (2022). https://hevc.hhi.fraunhofer.de/
Kondo, Y., et al.: A shared cache architecture for VVC coding. In: COOL Chips 25 Poster (2022)
Mativi, A., Monteiro, E., Bampi, S.: Memory access profiling for HEVC encoders. In: IEEE 7th Latin American Symposium on Circuits and Systems (LASCAS), pp. 243–246 (2016). https://doi.org/10.1109/LASCAS.2016.7451055
Muralimanohar, N., Balasubramonian, R., Jouppi, N.P.: CACTI 6.0: a tool to model large caches. Technical report. HPL-2009-85, HP Labs (2009)
Omori, Y., Onishi, T., Iwasaki, H., Shimizu, A.: A 120 fps high frame rate real-time HEVC video encoder with parallel configuration scalable to 4K. IEEE Trans. Multi-Scale Comput. Syst. 4(4), 491–499 (2018). https://doi.org/10.1109/TMSCS.2018.2825320
Onishi, T., et al.: A single-chip 4K 60-fps 4:2:2 HEVC video encoder LSI employing efficient motion estimation and mode decision framework with scalability to 8K. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(10), 1930–1938 (2018). https://doi.org/10.1109/TVLSI.2018.2842179
Sinangil, M.E., Chandrakasan, A.P., Sze, V., Zhou, M.: Memory cost vs. coding efficiency trade-offs for HEVC motion estimation engine. In: International Conference on Image Processing, pp. 1533–1536 (2012). https://doi.org/10.1109/ICIP.2012.6467164
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012). https://doi.org/10.1109/TCSVT.2012.2221191
The Advanced Television Systems Committee Inc: ATSC3.0 standards (2022). https://www.atsc.org/atsc-documents/type/3-0-standards/
Tsai, S.F., Li, C.T., Chen, H.H., Tsung, P.K., Chen, K.Y., Chen, L.G.: A 1062Mpixels/s 8192\(\times \)4320p high efficiency video coding (H.265) encoder chip. In: IEEE Symposium on VLSI Circuits, Digest of Technical Papers, pp. C146–C147. IEEE (2013). https://ieeexplore.ieee.org/abstract/document/6578657
Wiegand, T., Sullivan, G.J., Bjøntegaard, G., Luthra, A.: overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003). https://doi.org/10.1109/TCSVT.2003.815165
Acknowledgements
This work was partially supported by Grant-in-Aid for Scientific Research (B) No. 22H03571 and the joint research between Tohoku University and NTT Device Innovation Center, NTT Corporation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sato, M. et al. (2023). A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders. In: Takizawa, H., Shen, H., Hanawa, T., Hyuk Park, J., Tian, H., Egawa, R. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2022. Lecture Notes in Computer Science, vol 13798. Springer, Cham. https://doi.org/10.1007/978-3-031-29927-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-29927-8_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29926-1
Online ISBN: 978-3-031-29927-8
eBook Packages: Computer ScienceComputer Science (R0)