Abstract
This paper presents strategies to massively parallelize complete recursive systems. Each algorithm handles systems with feedforward and feedback coefficients allowing to compute high-complexity filtering operators. The final algorithm is linear in time and memory, exposes a high number of parallel tasks, and it is implemented on graphics processing units, i.e. GPUs. The key to the final algorithm is the derivation of closed-form formulas to combine both non-recursive and recursive linear filters, based on an efficient state-of-the-art block-based strategy. Applications to early vision are considered in this work, hence the GPU implementation runs on images computing an approximation of the Gaussian filter and its first and second derivatives. Finally, comparison results are given showing that this work outperforms prior state-of-the-art algorithms, enabling it to achieve real-time image filtering on ultra-high-definition videos.
Similar content being viewed by others
References
CUDPP library: CUDA Data Parallel Primitives. http://cudpp.github.io (2013)
cuFFT library: CUDA Fast Fourier Transform. http://developer.nvidia.com/cuFFT (2007)
Deriche, R.: Fast algorithms for low-level vision. IEEE Trans. Pattern Anal. Machine Intell. 12(1), 78–87 (1990)
Deriche, R.: Recursively implementing the Gaussian and its derivatives. In: Proceedings of the 2nd Conference on Image Processing, pp. 263–267 (1992)
Göddeke, D., Strzodka, R.: Cyclic reduction tridiagonal solvers on GPUs applied to mixed-precision multigrid. IEEE Trans. Parallel Distrib. Syst. 22(1), 22–32 (2011)
Hale, D.: Recursive Gaussian filters. Tech. Rep. CWP-546, Center for Wave Phenomena, Colorado School of Mines, Golden. Colorado (2006)
Kirk, D.B., Hwu, W.W.: Programming massively parallel processors. Morgan Kaufmann (2010)
Merrill, D., Grimshaw, A.: Parallel scan for stream architectures. Tech. Rep. CS2009-14. University of Virginia (2009)
Nehab, D., Maximo, A., Lima, R.S., Hoppe, H.: GPU-efficient recursive filtering and summed-area tables. ACM Trans. Graphics 30(6), 176 (2011)
NVIDIA: CUDA SDK Examples. http://developer.nvidia.com/cuda-toolkit (2014)
Oppenheim, A.V., Schafer, R.W.: Discrete-time signal processing, 3rd edn. Prentice Hall (2010)
Podlozhnyuk, V.: Image convolution with CUDA. NVIDIA whitepaper (2007)
Ruijters, D., Thévenaz, P.: GPU prefilter for accurate cubic B-spline interpolation. Comput. J. 55(1), 15–20 (2012)
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Proceedings of Graphics Hardware, pp. 97–106 (2007)
Stone, H.S.: An efficient parallel algorithm for the solution of a tridiagonal linear system of equations. J. ACM 20(1), 27–38 (1973)
Sung, W., Mitra, S.: Efficient multi-processor implementation of recursive digital filters. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 257–260 (1986)
van Vliet, L.J., Young, I.T., Verbeek, P.W.: Recursive Gaussian derivative filters. In: Proceedings of the 14th International Conference on Pattern Recognition, vol. 1, pp. 509–514 (1998)
Zhang, Y., Cohen, J., Owens, J.D.: Fast tridiagonal solvers on the GPU. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2010)
Author information
Authors and Affiliations
Corresponding author
Appendix: Linear filter matrices
Appendix: Linear filter matrices
The \(r\times r\) matrices \(A_{ C }\) and \(A_{ F }\) correspond to all multiply-add operations on (1) and (2), grouping the \(b[k]\) and \(a[k]\) feedforward and feedback coefficients when applying the \(C_1\) and \(F\) filter on an \(r\) vector. The \(b\times r\) matrices \(A_{ CP }\) and \(A_{ FP }\) are the result of \(C_1(I,{\mathbf{0}})\) and \(F(I,{\mathbf{0}})\), where \(I\) is an \(r\times r\) identity matrix and \(\mathbf {0}\) is an \(b\times r\) matrix of zeros. To illustrate, for \(r=2\) we have
and the \(r\times r\) matrix \(T(A_{ FP })\) contains the last \(r\) rows of \(A_{ FP }\). Following, the \(b\times b\) matrices \(A_{ C_1\!B }\) and \(A_{ FB }\) is the result of \(C_1(\mathbf {0},I)\) and \(F(\mathbf {0},I)\), where \(I\) is an \(b\times b\) identity matrix and \(\mathbf {0}\) is an \(r\times b\) matrix of zeros. To illustrate,
and the \(r\times b\) matrix \(T(A_{ FB })\) contains the last \(r\) rows of \(A_{ FB }\). Finally, the reverse filter matrices \(A_{ CE }\) from \(C_2({\mathbf {0}},I)\), \(A_{ RE }\) from \(R({\mathbf {0}},I)\), \(A_{ C_2\!B }\) from \(C_2(I,{\mathbf{0}})\), \(A_{ RB }\) from \(R(I,{\mathbf{0}})\) and all their row-processing counterparts (\(\,\,\cdot ^{{\scriptscriptstyle \mathcal {T}}}\)) are analogous.
Rights and permissions
About this article
Cite this article
Maximo, A. Efficient finite impulse response filters in massively-parallel recursive systems. J Real-Time Image Proc 12, 603–611 (2016). https://doi.org/10.1007/s11554-015-0510-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-015-0510-x