Skip to main content
Log in

Efficient finite impulse response filters in massively-parallel recursive systems

Fast recursive systems computation with applications to low-level vision using GPUs

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

This paper presents strategies to massively parallelize complete recursive systems. Each algorithm handles systems with feedforward and feedback coefficients allowing to compute high-complexity filtering operators. The final algorithm is linear in time and memory, exposes a high number of parallel tasks, and it is implemented on graphics processing units, i.e. GPUs. The key to the final algorithm is the derivation of closed-form formulas to combine both non-recursive and recursive linear filters, based on an efficient state-of-the-art block-based strategy. Applications to early vision are considered in this work, hence the GPU implementation runs on images computing an approximation of the Gaussian filter and its first and second derivatives. Finally, comparison results are given showing that this work outperforms prior state-of-the-art algorithms, enabling it to achieve real-time image filtering on ultra-high-definition videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. CUDPP library: CUDA Data Parallel Primitives. http://cudpp.github.io (2013)

  2. cuFFT library: CUDA Fast Fourier Transform. http://developer.nvidia.com/cuFFT (2007)

  3. Deriche, R.: Fast algorithms for low-level vision. IEEE Trans. Pattern Anal. Machine Intell. 12(1), 78–87 (1990)

    Article  Google Scholar 

  4. Deriche, R.: Recursively implementing the Gaussian and its derivatives. In: Proceedings of the 2nd Conference on Image Processing, pp. 263–267 (1992)

  5. Göddeke, D., Strzodka, R.: Cyclic reduction tridiagonal solvers on GPUs applied to mixed-precision multigrid. IEEE Trans. Parallel Distrib. Syst. 22(1), 22–32 (2011)

    Article  Google Scholar 

  6. Hale, D.: Recursive Gaussian filters. Tech. Rep. CWP-546, Center for Wave Phenomena, Colorado School of Mines, Golden. Colorado (2006)

  7. Kirk, D.B., Hwu, W.W.: Programming massively parallel processors. Morgan Kaufmann (2010)

  8. Merrill, D., Grimshaw, A.: Parallel scan for stream architectures. Tech. Rep. CS2009-14. University of Virginia (2009)

  9. Nehab, D., Maximo, A., Lima, R.S., Hoppe, H.: GPU-efficient recursive filtering and summed-area tables. ACM Trans. Graphics 30(6), 176 (2011)

    Article  Google Scholar 

  10. NVIDIA: CUDA SDK Examples. http://developer.nvidia.com/cuda-toolkit (2014)

  11. Oppenheim, A.V., Schafer, R.W.: Discrete-time signal processing, 3rd edn. Prentice Hall (2010)

  12. Podlozhnyuk, V.: Image convolution with CUDA. NVIDIA whitepaper (2007)

  13. Ruijters, D., Thévenaz, P.: GPU prefilter for accurate cubic B-spline interpolation. Comput. J. 55(1), 15–20 (2012)

  14. Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Proceedings of Graphics Hardware, pp. 97–106 (2007)

  15. Stone, H.S.: An efficient parallel algorithm for the solution of a tridiagonal linear system of equations. J. ACM 20(1), 27–38 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  16. Sung, W., Mitra, S.: Efficient multi-processor implementation of recursive digital filters. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 257–260 (1986)

  17. van Vliet, L.J., Young, I.T., Verbeek, P.W.: Recursive Gaussian derivative filters. In: Proceedings of the 14th International Conference on Pattern Recognition, vol. 1, pp. 509–514 (1998)

  18. Zhang, Y., Cohen, J., Owens, J.D.: Fast tridiagonal solvers on the GPU. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to André Maximo.

Appendix: Linear filter matrices

Appendix: Linear filter matrices

The \(r\times r\) matrices \(A_{ C }\) and \(A_{ F }\) correspond to all multiply-add operations on (1) and (2), grouping the \(b[k]\) and \(a[k]\) feedforward and feedback coefficients when applying the \(C_1\) and \(F\) filter on an \(r\) vector. The \(b\times r\) matrices \(A_{ CP }\) and \(A_{ FP }\) are the result of \(C_1(I,{\mathbf{0}})\) and \(F(I,{\mathbf{0}})\), where \(I\) is an \(r\times r\) identity matrix and \(\mathbf {0}\) is an \(b\times r\) matrix of zeros. To illustrate, for \(r=2\) we have

$$\begin{aligned} A_{ CP }&= \left[ \begin{array}{ll} b[2]&b[1] \\ 0&b[2] \\ 0&0 \\ \vdots&\vdots \end{array}\right]\end{aligned}$$
(13)
$$\begin{aligned} A_{ FP }&= \left[\begin{array}{ll} -a[2]&-a[1] \\ a[1]\,a[2]&a[1]^2 - a[2] \\ -a[1]^2\,a[2] + a[2]^2&-a[1]^3 + 2\,a[1]\,a[2] \\ \vdots&\vdots \end{array}\right] \end{aligned}$$
(14)

and the \(r\times r\) matrix \(T(A_{ FP })\) contains the last \(r\) rows of \(A_{ FP }\). Following, the \(b\times b\) matrices \(A_{ C_1\!B }\) and \(A_{ FB }\) is the result of \(C_1(\mathbf {0},I)\) and \(F(\mathbf {0},I)\), where \(I\) is an \(b\times b\) identity matrix and \(\mathbf {0}\) is an \(r\times b\) matrix of zeros. To illustrate,

$$\begin{aligned} A_{ C_1\!B }&= \left[\begin{array}{llll} b[0]&0&0&\cdots \\ b[1]&b[0]&0&\cdots \\ b[2]&b[1]&b[0]&\cdots \\ \vdots&\vdots&\vdots&\ddots \end{array}\right]\end{aligned}$$
(15)
$$\begin{aligned} A_{ FB }&= \left[\begin{array}{llll} 1&0&0&\cdots \\ -a[1]&1&0&\cdots \\ a[1]^2 - a[2]&-a[1]&1&\cdots \\ \vdots&\vdots&\vdots&\ddots \end{array}\right] \end{aligned},$$
(16)

and the \(r\times b\) matrix \(T(A_{ FB })\) contains the last \(r\) rows of \(A_{ FB }\). Finally, the reverse filter matrices \(A_{ CE }\) from \(C_2({\mathbf {0}},I)\), \(A_{ RE }\) from \(R({\mathbf {0}},I)\), \(A_{ C_2\!B }\) from \(C_2(I,{\mathbf{0}})\), \(A_{ RB }\) from \(R(I,{\mathbf{0}})\) and all their row-processing counterparts (\(\,\,\cdot ^{{\scriptscriptstyle \mathcal {T}}}\)) are analogous.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maximo, A. Efficient finite impulse response filters in massively-parallel recursive systems. J Real-Time Image Proc 12, 603–611 (2016). https://doi.org/10.1007/s11554-015-0510-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-015-0510-x

Keywords

Navigation