A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders

Sato, Masayuki; Omori, Yuya; Egawa, Ryusuke; Nakamura, Ken; Kobayashi, Daisuke; Iwasaki, Hiroe; Komatsu, Kazuhiko; Kobayashi, Hiroaki

doi:10.1007/978-3-031-29927-8_23

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13798))

Included in the following conference series:

International Conference on Parallel and Distributed Computing: Applications and Technologies

665 Accesses

Abstract

A hardware video encoder based on recent video coding standards such as HEVC and VVC needs to efficiently handle a massive number of memory accesses to search motion vectors. To this end, first, this paper preliminarily evaluates the memory access behavior of a hardware video encoding pipeline. The preliminary evaluation suggests that the behavior of the early stages of the pipeline, accessing the wide areas of reference frames for the rough search, is quite different from those of the subsequent ones, accessing the small areas of them for the precise search. Therefore, this paper proposes a partitioned memory architecture for the hardware video encoding pipeline. This architecture adopts a split cache structure that consists of a front-end cache and a back-end cache. The front-end cache stores shrunk reference frames and provides them for the rough search in the early stages. Normal reference frames for the precise search are provided only to the subsequent stages through the back-end cache. As a result, this structure can reduce the memory bandwidth requirement. On the other hand, the split cache structure cannot reuse the data loaded by the early stages. It increases cache misses in the subsequent stages and may violate the deadline of memory accesses for real-time encoding. To solve this problem, this paper also designs and implements a coding tree unit (CTU) prefetcher to the back-end cache. The CTU prefetcher loads the data used by the subsequent stages without waiting for the results of the early stages. The evaluation results show that the proposed memory system can successfully reduce the cache miss rate and the deadline miss rate in the subsequent stages. As a result, the proposed memory architecture can contribute to satisfying the demands for real-time encoding while reducing energy consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Binkert, N., et al.: The gem5 simulator. ACM SIGARCH Comput. Architect. News 39(2), 1–7 (2011). https://doi.org/10.1145/2024716.2024718
Article Google Scholar
Bross, B., et al.: Overview of the versatile video coding (VVC) standard and its applications. IEEE Trans. Circuits Syst. Video Technol. 31(10), 3736–3764 (2021). https://doi.org/10.1109/TCSVT.2021.3101953
Article Google Scholar
Cerveira, A., Agostini, L., Zatt, B., Sampaio, F.: Memory assessment of versatile video coding. In: International Conference on Image Processing, vol. 2020, pp. 1186–1190. IEEE Computer Society (2020). https://doi.org/10.1109/ICIP40778.2020.9191358
JCT-VC: HEVC test model (2022). https://hevc.hhi.fraunhofer.de/
Kondo, Y., et al.: A shared cache architecture for VVC coding. In: COOL Chips 25 Poster (2022)
Google Scholar
Mativi, A., Monteiro, E., Bampi, S.: Memory access profiling for HEVC encoders. In: IEEE 7th Latin American Symposium on Circuits and Systems (LASCAS), pp. 243–246 (2016). https://doi.org/10.1109/LASCAS.2016.7451055
Muralimanohar, N., Balasubramonian, R., Jouppi, N.P.: CACTI 6.0: a tool to model large caches. Technical report. HPL-2009-85, HP Labs (2009)
Google Scholar
Omori, Y., Onishi, T., Iwasaki, H., Shimizu, A.: A 120 fps high frame rate real-time HEVC video encoder with parallel configuration scalable to 4K. IEEE Trans. Multi-Scale Comput. Syst. 4(4), 491–499 (2018). https://doi.org/10.1109/TMSCS.2018.2825320
Article Google Scholar
Onishi, T., et al.: A single-chip 4K 60-fps 4:2:2 HEVC video encoder LSI employing efficient motion estimation and mode decision framework with scalability to 8K. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 26(10), 1930–1938 (2018). https://doi.org/10.1109/TVLSI.2018.2842179
Sinangil, M.E., Chandrakasan, A.P., Sze, V., Zhou, M.: Memory cost vs. coding efficiency trade-offs for HEVC motion estimation engine. In: International Conference on Image Processing, pp. 1533–1536 (2012). https://doi.org/10.1109/ICIP.2012.6467164
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012). https://doi.org/10.1109/TCSVT.2012.2221191
Article Google Scholar
The Advanced Television Systems Committee Inc: ATSC3.0 standards (2022). https://www.atsc.org/atsc-documents/type/3-0-standards/
Tsai, S.F., Li, C.T., Chen, H.H., Tsung, P.K., Chen, K.Y., Chen, L.G.: A 1062Mpixels/s 8192$\times $4320p high efficiency video coding (H.265) encoder chip. In: IEEE Symposium on VLSI Circuits, Digest of Technical Papers, pp. C146–C147. IEEE (2013). https://ieeexplore.ieee.org/abstract/document/6578657
Wiegand, T., Sullivan, G.J., Bjøntegaard, G., Luthra, A.: overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 13(7), 560–576 (2003). https://doi.org/10.1109/TCSVT.2003.815165

Download references

Acknowledgements

This work was partially supported by Grant-in-Aid for Scientific Research (B) No. 22H03571 and the joint research between Tohoku University and NTT Device Innovation Center, NTT Corporation.

Author information

Authors and Affiliations

Tohoku University, Sendai, Miyagi, 980-8579, Japan
Masayuki Sato, Ryusuke Egawa, Ken Nakamura, Hiroe Iwasaki, Kazuhiko Komatsu & Hiroaki Kobayashi
NTT Device Innovation Center, Nippon Telegraph and Telephone Corporation, Atsugi, Kanagawa, 243-0124, Japan
Yuya Omori, Ken Nakamura & Daisuke Kobayashi
Tokyo Denki University, Adachi, Tokyo, 120-8551, Japan
Ryusuke Egawa
Tokyo University of Agriculture and Technology, Fuchu, Tokyo, 183-8538, Japan
Hiroe Iwasaki

Authors

Masayuki Sato
View author publications
You can also search for this author in PubMed Google Scholar
Yuya Omori
View author publications
You can also search for this author in PubMed Google Scholar
Ryusuke Egawa
View author publications
You can also search for this author in PubMed Google Scholar
Ken Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Hiroe Iwasaki
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhiko Komatsu
View author publications
You can also search for this author in PubMed Google Scholar
Hiroaki Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Masayuki Sato .

Editor information

Editors and Affiliations

Tohoku University, Aoba-ku, Japan
Hiroyuki Takizawa
Sun Yat-sen University, Guangzhou, China
Hong Shen
The University of Tokyo, Tokyo, Japan
Toshihiro Hanawa
Seoul National University of Science and Technology, Seoul, Korea (Republic of)
Jong Hyuk Park
Griffith University, Queensland, QLD, Australia
Hui Tian
Tokyo Denki University, Tokyo, Japan
Ryusuke Egawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sato, M. et al. (2023). A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders. In: Takizawa, H., Shen, H., Hanawa, T., Hyuk Park, J., Tian, H., Egawa, R. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2022. Lecture Notes in Computer Science, vol 13798. Springer, Cham. https://doi.org/10.1007/978-3-031-29927-8_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-29927-8_23
Published: 08 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29926-1
Online ISBN: 978-3-031-29927-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders