Efficient finite impulse response filters in massively-parallel recursive systems

Maximo, André

doi:10.1007/s11554-015-0510-x

Efficient finite impulse response filters in massively-parallel recursive systems

Fast recursive systems computation with applications to low-level vision using GPUs

Original Research Paper
Published: 28 May 2015

Volume 12, pages 603–611, (2016)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

André Maximo¹

219 Accesses
2 Citations
Explore all metrics

Abstract

This paper presents strategies to massively parallelize complete recursive systems. Each algorithm handles systems with feedforward and feedback coefficients allowing to compute high-complexity filtering operators. The final algorithm is linear in time and memory, exposes a high number of parallel tasks, and it is implemented on graphics processing units, i.e. GPUs. The key to the final algorithm is the derivation of closed-form formulas to combine both non-recursive and recursive linear filters, based on an efficient state-of-the-art block-based strategy. Applications to early vision are considered in this work, hence the GPU implementation runs on images computing an approximation of the Gaussian filter and its first and second derivatives. Finally, comparison results are given showing that this work outperforms prior state-of-the-art algorithms, enabling it to achieve real-time image filtering on ultra-high-definition videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-time video denoising on multicores and GPUs with Kalman-based and Bilateral filters fusion

Article 08 February 2017

A fast deconvolution-based approach for single-image super-resolution with GPU acceleration

Article 16 June 2015

Image and Video Processing on GPU: Implementation Scheme, Applications and Future Directions

References

CUDPP library: CUDA Data Parallel Primitives. http://cudpp.github.io (2013)
cuFFT library: CUDA Fast Fourier Transform. http://developer.nvidia.com/cuFFT (2007)
Deriche, R.: Fast algorithms for low-level vision. IEEE Trans. Pattern Anal. Machine Intell. 12(1), 78–87 (1990)
Article Google Scholar
Deriche, R.: Recursively implementing the Gaussian and its derivatives. In: Proceedings of the 2nd Conference on Image Processing, pp. 263–267 (1992)
Göddeke, D., Strzodka, R.: Cyclic reduction tridiagonal solvers on GPUs applied to mixed-precision multigrid. IEEE Trans. Parallel Distrib. Syst. 22(1), 22–32 (2011)
Article Google Scholar
Hale, D.: Recursive Gaussian filters. Tech. Rep. CWP-546, Center for Wave Phenomena, Colorado School of Mines, Golden. Colorado (2006)
Kirk, D.B., Hwu, W.W.: Programming massively parallel processors. Morgan Kaufmann (2010)
Merrill, D., Grimshaw, A.: Parallel scan for stream architectures. Tech. Rep. CS2009-14. University of Virginia (2009)
Nehab, D., Maximo, A., Lima, R.S., Hoppe, H.: GPU-efficient recursive filtering and summed-area tables. ACM Trans. Graphics 30(6), 176 (2011)
Article Google Scholar
NVIDIA: CUDA SDK Examples. http://developer.nvidia.com/cuda-toolkit (2014)
Oppenheim, A.V., Schafer, R.W.: Discrete-time signal processing, 3rd edn. Prentice Hall (2010)
Podlozhnyuk, V.: Image convolution with CUDA. NVIDIA whitepaper (2007)
Ruijters, D., Thévenaz, P.: GPU prefilter for accurate cubic B-spline interpolation. Comput. J. 55(1), 15–20 (2012)
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Proceedings of Graphics Hardware, pp. 97–106 (2007)
Stone, H.S.: An efficient parallel algorithm for the solution of a tridiagonal linear system of equations. J. ACM 20(1), 27–38 (1973)
Article MathSciNet MATH Google Scholar
Sung, W., Mitra, S.: Efficient multi-processor implementation of recursive digital filters. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 257–260 (1986)
van Vliet, L.J., Young, I.T., Verbeek, P.W.: Recursive Gaussian derivative filters. In: Proceedings of the 14th International Conference on Pattern Recognition, vol. 1, pp. 509–514 (1998)
Zhang, Y., Cohen, J., Owens, J.D.: Fast tridiagonal solvers on the GPU. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2010)

Download references

Author information

Authors and Affiliations

ACM Member, Rio de Janeiro, Brazil
André Maximo

Authors

André Maximo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to André Maximo.

Appendix: Linear filter matrices

The $r\times r$ matrices $A_{ C }$ and $A_{ F }$ correspond to all multiply-add operations on (1) and (2), grouping the $b[k]$ and $a[k]$ feedforward and feedback coefficients when applying the $C_1$ and $F$ filter on an $r$ vector. The $b\times r$ matrices $A_{ CP }$ and $A_{ FP }$ are the result of $C_1(I,{\mathbf{0}})$ and $F(I,{\mathbf{0}})$, where $I$ is an $r\times r$ identity matrix and $\mathbf {0}$ is an $b\times r$ matrix of zeros. To illustrate, for $r=2$ we have

$$\begin{aligned} A_{ CP }&= \left[ \begin{array}{ll} b[2]&b[1] \\ 0&b[2] \\ 0&0 \\ \vdots&\vdots \end{array}\right]\end{aligned}$$

(13)

$$\begin{aligned} A_{ FP }&= \left[\begin{array}{ll} -a[2]&-a[1] \\ a[1]\,a[2]&a[1]^2 - a[2] \\ -a[1]^2\,a[2] + a[2]^2&-a[1]^3 + 2\,a[1]\,a[2] \\ \vdots&\vdots \end{array}\right] \end{aligned}$$

(14)

and the $r\times r$ matrix $T(A_{ FP })$ contains the last $r$ rows of $A_{ FP }$. Following, the $b\times b$ matrices $A_{ C_1\!B }$ and $A_{ FB }$ is the result of $C_1(\mathbf {0},I)$ and $F(\mathbf {0},I)$, where $I$ is an $b\times b$ identity matrix and $\mathbf {0}$ is an $r\times b$ matrix of zeros. To illustrate,

$$\begin{aligned} A_{ C_1\!B }&= \left[\begin{array}{llll} b[0]&0&0&\cdots \\ b[1]&b[0]&0&\cdots \\ b[2]&b[1]&b[0]&\cdots \\ \vdots&\vdots&\vdots&\ddots \end{array}\right]\end{aligned}$$

(15)

$$\begin{aligned} A_{ FB }&= \left[\begin{array}{llll} 1&0&0&\cdots \\ -a[1]&1&0&\cdots \\ a[1]^2 - a[2]&-a[1]&1&\cdots \\ \vdots&\vdots&\vdots&\ddots \end{array}\right] \end{aligned},$$

(16)

and the $r\times b$ matrix $T(A_{ FB })$ contains the last $r$ rows of $A_{ FB }$. Finally, the reverse filter matrices $A_{ CE }$ from $C_2({\mathbf {0}},I)$, $A_{ RE }$ from $R({\mathbf {0}},I)$, $A_{ C_2\!B }$ from $C_2(I,{\mathbf{0}})$, $A_{ RB }$ from $R(I,{\mathbf{0}})$ and all their row-processing counterparts ($\,\,\cdot ^{{\scriptscriptstyle \mathcal {T}}}$) are analogous.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maximo, A. Efficient finite impulse response filters in massively-parallel recursive systems. J Real-Time Image Proc 12, 603–611 (2016). https://doi.org/10.1007/s11554-015-0510-x

Download citation

Received: 15 January 2015
Accepted: 11 May 2015
Published: 28 May 2015
Issue Date: October 2016
DOI: https://doi.org/10.1007/s11554-015-0510-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient finite impulse response filters in massively-parallel recursive systems

Abstract

Access this article

Similar content being viewed by others

Real-time video denoising on multicores and GPUs with Kalman-based and Bilateral filters fusion

A fast deconvolution-based approach for single-image super-resolution with GPU acceleration

Image and Video Processing on GPU: Implementation Scheme, Applications and Future Directions

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Linear filter matrices

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient finite impulse response filters in massively-parallel recursive systems

Abstract

Access this article

Similar content being viewed by others

Real-time video denoising on multicores and GPUs with Kalman-based and Bilateral filters fusion

A fast deconvolution-based approach for single-image super-resolution with GPU acceleration

Image and Video Processing on GPU: Implementation Scheme, Applications and Future Directions

References

Author information

Authors and Affiliations

Corresponding author

Appendix: Linear filter matrices

Appendix: Linear filter matrices

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation