A New Overlap Save Algorithm for Fast Block Convolution and Its Implementation Using FFT

Kuk, Jung Gap; Kim, Seyun; Cho, Nam Ik

doi:10.1007/s11265-010-0466-9

A New Overlap Save Algorithm for Fast Block Convolution and Its Implementation Using FFT

Published: 27 March 2010

Volume 63, pages 143–152, (2011)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Jung Gap Kuk¹,
Seyun Kim¹ &
Nam Ik Cho¹

516 Accesses
3 Citations
Explore all metrics

Abstract

Convolution of data with a long-tap filter is often implemented by overlap save algorithm (OSA) using fast Fourier transform (FFT). But there are some redundant computations in the traditional OSA because the FFT is applied to the overlapped data (concatenation of previous block and the current block) while the DFT computations are recursive. In this paper, we first analyze the redundancy by decomposing the OSA into two processes related to the previous and current block. Then we eliminate the redundant computations by introducing a new transform which is applied only to the current data, not to the overall overlapped data. Hence the size of transform is reduced by half compared to the traditional OSA. The new transform is in the form of DFT and it can be implemented by defining a new butterfly structure. However we implement it by a cascade of twiddle factor and conventional FFT in this paper, in order to use the FFT libraries in PC and DSP. The computational complexity in this case is analyzed and compared with the existing methods. In the experiment, the proposed method is applied to several block convolutions and partitioned-block convolutions. The CPU time is reduced more than expected from the arithmetic analysis, which implies that the reduced transform size gives additional advantage in data manipulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Architecture for Block Parallel Convolution using Two-Dimensional Polyphase Decomposition

Article 16 August 2021

Convolution Based Multilevel DWT Architecture Using Distributed Arithmetic and FIR Bi-orthogonal Filter for Two-Dimensional Data Analysis

High performance and resource efficient FFT processor based on CORDIC algorithm

Article Open access 21 March 2022

References

Oppenhiem, A. V., & Schafer, R. W. (1989). Discrete-time signal processing. Englewood Cliffs: Prentice-Hall.
Google Scholar
Agarwal, R. C., & Burrus, C. S. (1978). Number theoretic transforms to implement fast digital convolution. Proceedings of IEEE, 63(4), 550–560.
Article MathSciNet Google Scholar
Mou, Z.-J., & Duhamel, P. (1991). Short-length FIR filters and their use in fast nonrecursive filtering. IEEE Transactions on Signal Processing, 39(6), 1322–1332.
Article Google Scholar
Duhamel, P. (1986). Implementation of “split-radix” FFT algorithms for complex, real, and real-symmetric data. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-34(2), 285–295.
Article MathSciNet Google Scholar
Johnson, S. G., & Frigo, M. (2007). A modified split-radix fft with fewer arithmetic operations. IEEE Transactions on Signal Processing, 55(1), 111–119.
Article MathSciNet Google Scholar
Vetterli, M. (1988). Running FIR and IIR filtering using multirate filter banks. Transactions on Acoustics, Speech, and Signal Processing, 36(5), 730–738.
Article MATH Google Scholar
Gardner, W. G. (1995). Efficient convolution without input–output delay. Journal of Audio Engineering Society, 43(3), 127–136.
Google Scholar
Torger, A., & Farina, A. (2001). Real-time partitioned convolution for ambiophonics surround sound. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 21–24).
Shynk, J. J. (1992). Frequency-domain and multirate adaptive filtering. IEEE Signal Processing Magazine, 9, 14–37.
Article Google Scholar
Farina, A., Glasgal, R., Armelloni, E., & Torger, A. (2001). Ambiophonic principles for the recording and reproduction of surround sound for music. In 19th AES conference (pp. 21–24).
Matusiak, R. (1997). Implementing fast Fourier transform algorithms of real-valued sequences with the TMS320 DSP family. Application Report of Texas Instruments.
Prati, G. (1978). A discrete adaptive equalizer based on the overlap save filtering technique. In Canadian communications and power conference (pp. 141–144).
Narasimha, M. J. (2006). Modified overlap add and overlap save convolution algorithms for real signals. IEEE Signal Processing Letters, 13(11), 669–671.
Article Google Scholar
Kuk, J. G., Kim, S. Y., & Cho, N. I. (2009). An overlap save algorithm for block convolution with reduced complexity. In IEEE international conference on acoustics, speech and signal processing (pp. 605–608).
Intel Performance Libraries. Intel integrated performance primitives website. http://software.intel.com/en-us/intel-ipp/.

Download references

Acknowledgements

This research was performed for the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Knowledge Economy (MKE).

Author information

Authors and Affiliations

Department of Electrical Engineering, Institute of New Media & Communications (INMC), Seoul National University, Seoul, 151-744, South Korea
Jung Gap Kuk, Seyun Kim & Nam Ik Cho

Authors

Jung Gap Kuk
View author publications
You can also search for this author in PubMed Google Scholar
Seyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Nam Ik Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jung Gap Kuk.

Appendices

Appendix 1: Properties of QDFT

1.1 Reversibility of QDFT

Proof

Substituting $X_k^q$ in inverse QDFT in Eq. 17 with forward QDFT, we have

$$ \frac{1}{N} \sum\limits_{k=0}^{N-1}\left(\sum\limits_{m=0}^{N-1} x_m W_N^{m(k+3/4)}\right)W_N^{-n(k+3/4)} $$

(22)

and several algebraic steps give

$$ \begin{array}{lll} &&\frac{1}{N} \sum\limits_{k=0}^{N-1}\left(\sum\limits_{m=0}^{N-1} x_m W_N^{m(k+3/4)}\right)W_N^{-n(k+3/4)} \\ &&\frac{1}{N} \sum\limits_{k=0}^{N-1}\sum\limits_{m=0}^{N-1} x_m W_N^{(m-n)(k+3/4)} \\ &&\sum\limits_{m=0}^{N-1} x_m \frac{1}{N} \sum\limits_{k=0}^{N-1} W_N^{(m-n)(k+3/4)}\\ &&\sum\limits_{m=0}^{N-1} x_m \frac{1}{N} W_N^{(m-n)(3/4)} \sum\limits_{k=0}^{N-1} W_N^{(m-n)k}. \label{eq:de} \end{array} $$

(23)

The summation $\sum_{k=0}^{N-1} W_N^{(m-n)k}$ in Eq. 23 is zero for all values of m except for the case when m − n = pN, which results in N. It can therefore be replaced by an infinite sum of Kronecker delta functions with respect to p and Eq. 23 is reduced to

$$ \begin{array}{lll} &&\sum\limits_{m=0}^{N-1} x_m \frac{1}{N} W_N^{(m-n)(3/4)} \sum\limits_{p=-\infty}^{\infty} \delta_{m,n-l+pN} \\ &&\sum\limits_{p=-\infty}^{\infty} x_{n+pN} W_N^{pN(3/4)}. \label{eq:de1} \end{array} $$

(24)

In Eq. 24, the summation has a non-zero value when p = 0 because the x _n is defined as 0 outside [0, N − 1] and the non-zero value is x _n. Hence the reversibility of QDFT is proved. □

1.2 Convolution Property of QDFT

Property Multiplication of two sequences in QDFT domain $\mathbf{X}_k^q \mathbf{G}_k^q$ corresponds to $\sum_{k=0}^n x_k g_{n-k} +j\sum_{k=n+1}^{N-1} x_k g_{n-k+N}$ in time domain.

Proof

The N-point QDFT based block convolution can be written as $\frac{1}{N} \sum_{k=0}^{N-1}X_k^q G_k^q W_N^{-n(k+\frac{3}{4})}$ and several algebraic steps give

$$ \begin{array}{lll} & &\frac{1}{N} \sum\limits_{k=0}^{N-1}X_k^q G_k^q W_N^{-n(k+\frac{3}{4})}\\ &&\frac{1}{N}\sum\limits_{k=0}^{N-1}\sum\limits_{l=0}^{N-1}x_l W_N^{n(l+\frac{3}{4})} \sum\limits_{m=0}^{N-1}g_m W_N^{m(k+\frac{3}{4})} W_N^{-n(k+\frac{3}{4})}\\ &&\frac{1}{N}\sum\limits_{l=0}^{N-1}x_l \sum\limits_{m=0}^{N-1} g_m \sum\limits_{k=0}^{N-1} W_N^{(l+m-n)(k+\frac{3}{4})}\\ &&\sum\limits_{l=0}^{N-1}x_l \sum\limits_{m=0}^{N-1} g_m W_N^{\frac{3}{4}(l+m-n)} \frac{1}{N}\sum\limits_{k=0}^{N-1} e^{-\frac{2\pi i}{N} (l+m-n)k}. \label{eq:k} \end{array} $$

(25)

As in Eq. 25, the summation with the index k is zero for all values of m except for the case when l + m − n = pN(p ∈ ℤ), which results in N. It can therefore be replaced by an infinite sum of Kronecker delta functions with respect to p. We may also extend the limits of m to infinity, with the understanding that the x and g sequences are defined as 0 outside [0, N − 1]. Continuing with the derivation, we have

$$ \begin{array}{lll} &&\sum\limits_{l=0}^{N-1}x_l \sum\limits_{m=-\infty}^{\infty} g_m W_N^{\frac{3}{4}(l+m-n)} \sum\limits_{p=-\infty}^{\infty} \delta_{m,n-l+pN} \\ &&\sum\limits_{l=0}^{N-1} x_l \sum\limits_{p=-\infty}^{\infty} j^p g_{n-l+pN} \label{eq:pr} \end{array} $$

(26)

where g _{n − l + pN} has non-zero values only if p = 0 or p = 1. Hence this can be rewritten as

$$ \sum\limits_{l=0}^n x_l g_{n-l} +j\sum\limits_{l=n+1}^{N-1} x_l g_{n-l+N} $$

and the convolution property of QDFT is proved. □

Appendix 2: Direct radix-2 and radix-4 Implementation of QDFT

QDFT can be implemented in a similar way to the conventional FFT. To generalize the discussion, we consider the N-point transform $T_s^N$ as

$$ \label{eq:t} T_s^N : X^s=\sum\limits_{n=0}^{N-1} x_n W_N^{n(k+s)} $$

(27)

where s means the amount of shift of frequency index. We can have DFT by s = 0, ODFT by s = 1/2 and QDFT by s = 3/4 from Eq. 27. Based on the definition of $T_s^N$, applying Cooley–Tukey decomposition to $T_s^N$ yields two shorter transforms of size N/2 and thus radix-2 structure as

$$ \begin{array}{rll} \label{eq:radix2} T_{s/2}^{N/2} &: X_{2k}^s=\sum\limits_{n=0}^{N/2-1} (x_n+W_2^s x_{n+N/2} )W_{N/2}^{n(k+s/2)}\\ T_{(s+1)/2}^{N/2} &: X_{2k+1}^s=\sum\limits_{n=0}^{N/2-1} (x_n-W_2^s x_{n+N/2} )W_{N/2}^{n(k+(s+1)/2)} \end{array} $$

(28)

where 0 ≤ k < N/2. That is, the N-point transform $T_s^N$ is decomposed into two N/2-point transforms $T_{s/2}^{N/2}$ and $T_{(s+1)/2}^{N/2}$. The butterfly structure corresponding to Eq. 28 is shown in Fig. 2 where directed line means that the data is multiplied by − 1.

$T_s^N$ can also be implemented by radix-4 structure where $T_s^N$ is decomposed into four shorter transforms of size N/4 : $T_{s/4}^{N/4}$,$T_{s/4+1}^{N/4}$,$T_{s/4+2}^{N/4}$ and $T_{s/4+3}^{N/4}$ as in Eq. 29.

$$ \begin{array}{rll} T_{s/4}^{N/4} &: X_{4k}^s = \sum\limits_{n=0}^{N/4-1} (x_n + W_4^s x_{n+N/4} + W_4^{2s} x_{n+2N/4} \\ & \qquad \qquad + W_4^{3s} x_{n+2N/4}) W_{N/4}^{n(k+s/4)} \\ T_{s/4+1}^{N/4} &: X_{4k+1}^s = \sum\limits_{n=0}^{N/4-1} (x_n - jW_4^s x_{n+N/4} - W_4^{2s} x_{n+2N/4} \\ & \qquad \qquad + jW_4^{3s} x_{n+2N/4}) W_{N/4}^{n(k+(s+1)/4)}\\ T_{s/4+2}^{N/4} &: X_{4k+2}^s = \sum\limits_{n=0}^{N/4-1} (x_n - W_4^s x_{n+N/4} + W_4^{2s} x_{n+2N/4} \\ & \qquad \qquad - W_4^{3s} x_{n+2N/4}) W_{N/4}^{n(k+(s+2)/4)}\\ T_{s/4+3}^{N/4} &: X_{4k+3}^s = \sum\limits_{n=0}^{N/4-1} (x_n - jW_4^s x_{n+N/4} - W_4^{2s} x_{n+2N/4} \\ & \qquad \qquad - jW_4^{3s} x_{n+2N/4}) W_{N/4}^{n(k+(s+3)/4)} \label{eq:radix4} \end{array} $$

(29)

The butterfly structure of Eq. 29 is shown in Fig. 3 where directed line and dotted line mean that the data is multiplied by − 1 and j, respectively. The directed and dotted line, of course, means − j.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuk, J.G., Kim, S. & Cho, N.I. A New Overlap Save Algorithm for Fast Block Convolution and Its Implementation Using FFT. J Sign Process Syst 63, 143–152 (2011). https://doi.org/10.1007/s11265-010-0466-9

Download citation

Received: 11 July 2009
Revised: 23 February 2010
Accepted: 25 February 2010
Published: 27 March 2010
Issue Date: April 2011
DOI: https://doi.org/10.1007/s11265-010-0466-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A New Overlap Save Algorithm for Fast Block Convolution and Its Implementation Using FFT

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Architecture for Block Parallel Convolution using Two-Dimensional Polyphase Decomposition

Convolution Based Multilevel DWT Architecture Using Distributed Arithmetic and FIR Bi-orthogonal Filter for Two-Dimensional Data Analysis

High performance and resource efficient FFT processor based on CORDIC algorithm

References

Acknowledgements