Parallel spiral search algorithm applied to integer motion estimation

https://doi.org/10.1016/j.image.2021.116279Get rights and content

Abstract

Thanks to its flexible coding structure, high-efficiency video coding (HEVC) can save more coding bit rates than the previous standard, H.264. However, it also increases the complexity of integer-pixel motion estimation (IME). To speed up the encoding process, we propose a parallel spiral search (PSS) algorithm, which features the following characteristics and advantages. First, the proposed algorithm is hardware-friendly. PSS has both a fix search order that cuts the correlation between search points and a high data reuse level that facilitates the pipeline application in hardware implementation. Second, the PSS algorithm processes all prediction units (PU) blocks in parallel, which speeds up the RD calculation. Finally, the early termination strategy is proposed to end the search for unnecessary search points and further reduce search time. Experimental results show that the proposed algorithm outperforms other popular hardware-oriented IME algorithms in terms of coding speed, with the same loss of RD performance. Compared with the default full search algorithm (FSA) in the HEVC test model HM-16.7, the proposed algorithm achieves average time saving ratio of up to 92.55%, with BD-PSNR loss of 0.056 dB and an increase by 1.38% in terms of BD-BR.

Introduction

Based on the H.264/AVC, high-efficiency video coding (HEVC) [1] adds a special set of image segmentation modes, including pattern division of coding units (CU), PU, and transform units (TU). HEVC can save up to 50% [2], [3] reduction in the bit rate compared with H.264 under the same condition of peak signal-to-noise ratio (PSNR). The outstanding performance of HEVC in terms of coding efficiency is due to its advanced coding structure and various advanced technologies; however, these also make HEVC more complex than H.264 [4]. Inter-prediction comprises 80% [5], [6] of the complexity occurring throughout the encoding process, whereas motion estimation (ME) comprises approximately 70% [6] of the inter-prediction time. Therefore, reducing the ME search time can effectively reduce the complexity of the entire encoding process.

ME is often used to find the best matched blocks in the reference frame against the current one. Most of the ME algorithms are based on block-matching algorithm (BMA) [7]. Classical ME search algorithms can be approximately divided into FSA and fast search algorithms. The FSA traverses all the search points in the search window to find the best matching block, thus ensuring the accuracy of the best matching block. FSA is hardware-friendly as it has a regular search order. However, the complexity of the FSA is too high to use in real-time video coding. In comparison, fast search algorithms search only a few points using some special methods. By contrast, fast search algorithms can effectively reduce the complexity of ME. Many productive methods have been proposed related to fast search algorithm. Before 2000, some classical search algorithms with a fixed search step were proposed, including three-step search (TSS) [8], four-step search (FSS) [9], and the new three-step search (NTSS) [10]. Those algorithms can save a great deal of time by skipping many invalid points in the search window. Meanwhile, they can easily fall into local optimum because the match criteria often have several minima in which those algorithms can be trapped. In response to the above problem, Jo et al. [11] proposed the unrestricted center-biased diamond search (UCBDS) algorithm, which is more effective and robust than the previous techniques. Based on UCBDS, several more efficient algorithms were proposed, such as new diamond [12], adaptive rood pattern search (ARPS) [13], and hexagon-based search (HS) [14]. Although UCBDS has a fast search speed, this type of algorithm is bad for hardware implementation because of the variable and irregular search points. When those algorithms are applied to hardware, they tend to have low data reuse levels and need a long time to read pixel data.

Classical fast search algorithms are also bad for hardware implementation. The FSA is more hardware-friendly, but it requires a great deal of search time. Aiming to provide an efficient solution for both hardware implementation and time costs, we proposed a novel IME algorithm called parallel spiral search (PSS) algorithm. The proposed IME algorithm is hardware-friendly, has a search speed that is as fast as most fast search algorithms, and shows insignificant loss in terms of PSNR performance and bit rate. In addition to the above contributions, we also developed an early termination strategy to further decrease the complexity of IME.

The remainder of this paper is presented as follows. Section 2 presents the background and studies related to this paper. Section 3 introduces the specific flow of the PSS algorithm, the early termination strategy, and features of the PSS algorithm. Section 4 compares the results of this research with those of previous studies. Section 5 provides a summary of this paper.

Section snippets

Background

The quad-tree structure was used in the HEVC standard. The basic units in the quad-tree structure include the CU, PU, and TU, wherein CU is the root node of both PU and TU. Each root-CU is split into one or 4 leaf-CUs. The coding tree unit (CTU) can be set to 64 × 64, and the smallest leaf-CU is 8 × 8. Then, number of all CU partition modes in a CTU can be calculated as 1+(1+(1+14)4)4 = 83 522.

When CU size is 8 × 8, it has only one CU partition mode because the smallest leaf-CU is 8 × 8 and it

Limiting factors

In the HEVC test model (HM), TZS algorithm uses a diamond pattern (or square pattern) to search the best matched block and the stride length is incremented as an integer power of 2. The diamond pattern of the TZS algorithm is shown in Fig. 2. After traversing all points of the diamond pattern (the round dots), it would set the best matched point as starting point of the next diamond pattern (the pentagram dots). The process ends when twice the number of best matched points of the diamond search

Test results of PSS

The preceding scheme is applied to the HEVC encoder and modified on the basis of the reference software HM-16.7 version. We used 14 sets of test video sequences (i.e., from Classes A to E) from a variety of video sequences with different intensity levels. Compared PSS with the original FSA under the simulation environment of Table 6, comparison results of the performance and complexity are shown in Fig. 11, Fig. 12, respectively (i.e., a-e are from Classes A to E, respectively, and f belongs to

Conclusion

The PSS algorithm has been presented in this paper. The proposed PSS starts searching from the end point of the UMVP and the search window is traversed by the spiral search order. During the traversal process, all the symmetrical PU blocks in the same CTU are processed in parallel. The best MV and the corresponding costs of all PUs are updated simultaneously. Finally, the search process is terminated by the early termination strategy. The proposed algorithm can significantly reduce search time

CRediT authorship contribution statement

Long-Zhao Shi: Conception or design of the work, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported by Natural Science Foundation of Fujian Province, China (grant nos. 2018J01801).

Long-Zhao Shi approved the version of the manuscript to be published.

References (32)

  • SzeV. et al.

    High efficiency video coding (hevc)

  • SullivanG.J. et al.

    Overview of the high efficiency video coding (hevc) standard

    IEEE Trans. Circuits Syst. Video Technol.

    (2012)
  • OhmJ.-R. et al.

    Comparison of the coding efficiency of video coding standards–including high efficiency video coding (hevc)

    IEEE Trans. Circuits Syst. Video Technol.

    (2012)
  • BossenF. et al.

    Hevc complexity and implementation analysis

    IEEE Trans. Circuits Syst. Video Technol.

    (2012)
  • KimS. et al.

    A novel fast and low-complexity motion estimation for uhd hevc

    Proc. Natl. Acad. Sci. USA

    (2013)
  • KimJ. et al.

    An sad-based selective bi-prediction method for fast motion estimation in high efficiency video coding

    ETRI J.

    (2012)
  • CafforioC. et al.

    Methods for measuring small displacements of television images

    IEEE Trans. Inform. Theory

    (1976)
  • KogaT.

    Motion compensated interframe coding for video conferencing

    IEEE Proc. Natl. Telecommun. Conf.

    (1981)
  • PoL.M. et al.

    A novel four-step search algorithm for fast block motion estimation

    IEEE Trans. Circuits Syst. Video Technol.

    (1996)
  • LiR. et al.

    New three-step search algorithm for block motion estimation

    IEEE Trans. Circuits Syst. Video Technol.

    (1994)
  • ThamJ.Y. et al.

    A novel unrestricted center-biased diamond search algorithm for block motion estimation

    IEEE Trans. Circuits Syst. Video Technol.

    (1998)
  • ZhuS. et al.

    A new diamond search algorithm for fast block-matching motion estimation

    IEEE Trans. Image Process.

    (2000)
  • NieY. et al.

    Adaptive rood pattern search for fast block-matching motion estimation

    (2002)
  • ZhuC. et al.

    Hexagon-based search pattern for fast block motion estimation

    IEEE Trans. Circuits Syst. Video Technol.

    (2002)
  • KimI.-K. et al.

    Block partitioning structure in the hevc standard

    IEEE Trans. Circuits Syst. Video Technol.

    (2012)
  • SullivanG.J. et al.

    Rate–distortion optimization for video compression

    IEEE Signal Process. Mag.

    (1998)
  • Cited by (0)

    View full text