Abstract
Phase correlation is one of the classic methods for sparse motion or displacement estimation. It is renowned in the literature for high precision and insensitivity against illumination variations. We propose several important enhancements to the phase correlation (PhC) method which render it more robust against those situations where a motion measurement is not possible (low structure, too much noise, too different image content in the corresponding measurement windows). This allows the method to perform self-diagnosis in adverse situations.
Furthermore, we extend the PhC method by a robust scheme for detecting and classifying the presence of multiple motions and estimating their uncertainties. Experimental results on the Middlebury Stereo Dataset and on the KITTI Optical Flow Dataset show the potential offered by the enhanced method in contrast to the PhC implementation of OpenCV.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Phase Correlation (PhC) is one of the four classical methods for local motion estimation, together with discrete matching (a.k.a. block matching), differential matching and spatio-temporal optical flow measurement (structure tensor). Although the fundamentals of PhC date back to the 1970s [2, 4, 5], the precise relations between the listed families of approaches have not been analyzed thoroughly in the literature so far. Depending on the characteristics of the data to be processed and to some degree also depending on the scientific community which is regarded (computer vision, geophysical data analysis, time delay estimation, ...), different families of methods are preferred for the task of estimating displacements or 2D motion. For instance, [6, 7] are early papers proposing the normalized cross correlation metric. How to optimize the metric is another issue. Discrete matching is opposed to differential approaches lead by the classic Lucas&Kanade approach [1].
Since Fourier transform is the main element of the PhC method, it shows strong robustness against geometrical and photometric distortions [8–10]. Like the classical differential matching schemes, the PhC method, with certain extensions, can achieve subpixel matching accuracy [11]; accuracy of better than 1/100 pixel was claimed by [12, 13]. Additional details regarding different variants of PhC algorithms can be found in [10, 14–17, 21, 23]. Some of them describe different ways to achieve subpixel matching accuracy, some others emphasize the advantages of PhC for estimating homogenous displacements for larger images (image registration). Some recent papers considered the use of PhC-based stereo algorithms for remote sensing tasks applied to aerial imagery [18] and for interferometric SAR image co-registration [19]. We refer also to [20] where the PhC stabilizes video sequences against illumination changes and camera shaking. Besides some completely novel approaches, we extend in the present paper several ideas that appear already in [20] and put them on a more systematic basis.
We emphasize that the method presented here does not aim at the computation of dense motion fields, but a) makes the classical PhC robust, and b) extends the PhC method towards being able to obtain distributions of motion vectors that appear in a given patch. In applications where the patch is assumed to be subjected to a homogeneous translation motion (image registration), this is already the desired result, whereas for complex motion fields these distributions give valuable prior information that allows to systematically initialize and guide a subsequent sparse or (semi-)dense motion estimation procedure.
2 Approach
This section embeds the plain PhC method as it is described in the literature into a framework that checks for potential problematic situations (due to invalid or ambiguous input data) and performs a series of self-checks and filtering steps that are necessary to employ the method in an autonomous mode without user intervention. We provide solid and proven procedures for tuning the different parameters that appear in the enhanced PhC method. The presentation of the PhC method and the proposed extensions are described here for one-dimensional signals; the generalization to more dimensions is straightforward.
Let \(y[x_{n}]\) and \(z[x_{n}]\) be two observations of the same discrete signal \(s[x_{n}]\), where \(z[x_{n}]\) contains a shift by a displacement d:
The orthonormal Fourier transform over a discrete area of size N yields:
For further examination, we isolate the displacement and frequency dependent phase shift between the two signals and introduce the cross-power spectrum \(P[f_{k}]\) and its inverse Fourier transform, the delta array \(p[x_{n}]\):
The delta array \(p[x_{n}]\) consists of an ideal \(\delta \)-impulse which indicates the relative shift between the two signals \(y[x_{n}]\) and \(z[x_{n}]\). In a realistic setting, with noiseFootnote 1, multiple motionsFootnote 2 and without periodicity of the imagesFootnote 3, the delta array is more complex and needs to be analyzed in detail to obtain reliable results.
In the following Sects. 2.1 — 2.4 we introduce several checks and filtering steps which must be performed to let the PhC actually yield reliable and precise results. Steps which need to be applied separately for both patches (\(y[x_{n}]\), \(Y[f_{k}]\) or \(z[x_{n}]\), \(Z[f_{k}]\)) are only denoted for the first patch (second patch accordingly).
2.1 Structure Check
First we check if both image patches show sufficient structure to allow the displacement estimation. We compute the gray scale variance of the patches in a weighted manner using the weights \(w[x_{n}]\) of the anti-leakage window:
Then we compare it against a threshold \(\tau _{1}\) which was experimentally determined:
In our experiments with different datasetsFootnote 4 we found \(\tau _{1} \approx 90\) to be a good threshold to distinguish between structured and unstructured patches. Of course this value varies with the noise level of the input images. Due to the normalization of the weights, \(w[x_{n}]\), it is independent of the chosen patch size.
2.2 Spectral Significance Filtering
After the transition to the frequency domain, we need to identify those significant spectral coefficients \(Y[f_{k}]\) and \(Z[f_{k}]\) which represent the main structure of the image patches and thus allow us to determine the displacement d. Therefore we need to suppress the influence of the DC (\(f_{k}=0\)) spectral component of the signal (mean value compensation) as well as the components whose spectral magnitudes are dominated by noise (noise suppression).
Mean Compensation. Since most of the structural information of the image is encoded in the low frequency AC (\(f_{k} \ne 0\)) spectral components, it is important to compensate for the gray scale mean before the anti-leakage window \(w[x_{n}]\) is applied. Otherwise, these low frequency components would be superimposed by the gray scale mean of the original image patch when the convolution with the Fourier transform of the anti-leakage window \(w[x_{n}]\) is performedFootnote 5.
Noise Suppression. We also need to suppress those spectral components of \(Y[f_{k}]\) and \(Z[f_{k}]\) whose magnitudes are in the order of magnitude of the noise floor because their phases are only dominated by noise and do not contain any information. To do so, we compute the frequency distributions of \({|Y[f_{k}]|}\) and \({|Z[f_{k}]|}\) and look for the first interval which is mainly dominated by noise. For a fast approximation, we compute the mean \(\tau _2\) of those magnitudes which lie in the smaller half of the frequency distribution. Generally \(\tau _2\) might be too large, but this is negligible.
2.3 Delta Array Check
After significance filtering has been applied, the cross-power spectrum \(P[f_{k}]\) and the delta array \(p[x_{n}]\) are computed for significant components (see Eqs. 3 and 4). The inverse Fourier transform is an orthonormal transformation, thus:
In an ideal case all the energy should concentrate on one \(\delta \)-impulse which represents the displacement d. Hence we are only interested in those values of \(p[x_{n}]\) which hold a significant amount of the energy known in beforehand (see Eq. 10) and thus represent a dominant motion. The other values which possess a much lower energy are suppressed by computing a threshold \(\tau _{3}\) based on the histogram of the distribution of \({|p[x_{n}]|}^{2}\). We set the histogram range to \([0, N_{\text {sig}}]\) (see Eq. 10), the number of bins to the geometric mean \( m_{win} \) of the lengths of the window and the right border of the first bin to be \(\tau _3\). In our experiments we verified that energies which represent a relevant motion always lie above this threshold. This check fails if the energies of all spectral components are below \(\tau _3\).
2.4 Delta Array Clustering
So far, the tests were described for the one-dimensional case, but for the next check we need the actual two-dimensional representation of the signal. Therefore the delta array is written as \(p[\mathbf {x}_{n}]\). In the absence of noise, the inverse Fourier transform of \(P[\mathbf {f}_{k}]\) contains a single \(\delta \) peak, or multiple \(\delta \) peaks in case of multiple motion. For real data, this / these peak(s) get smeared out and there will be some background noise in the delta array. Hence we only examine the significant (Eq. 11) values of the delta array \(p[\mathbf {x}_{n}]\). We define the sets:
These two sets will serve as input for a weighted K-means clustering algorithm.
Initial Phase. The first mean chosen is the point with the largest weight. We iteratively determine \(K-1\) more candidates as the ones with the largest cumulative euclidean distance to the already chosen ones. A set of K covariance matrices \(\mathbf {\Sigma }_{k}\) is initialized as two dimensional identity matrices.
Labeling Phase. For each point we calculate the Mahalanobis distance to the current K means \(\mathbf {m}_{k}\) and assign the point to the cluster of the mean with minimum distance. Subsequently, we compute means and covariance matrices of the updated clusters using the values of the delta array as weights. This is repeated until either the clusters converge or a predefined maximum number of iterations is reached.
This algorithm returns K means \(\mathbf {m}_{k}\) and covariance matrices \(\mathbf {\Sigma }_{k}\) which describe the distribution within each cluster. To find the optimal K, the algorithm is run for different values of K and a cost function which sums up the areas of the covariance ellipses and penalizes large values of K (Occam’s razor) is minimized:
We determined the values of the parameters in our experiments to be
where \(\det (\mathbf {\Sigma }_{0})\) is the area of the covariance ellipse in the case of only one cluster (\(K = 1\)).
2.5 Multiresolution
Since the estimation of a relative displacement of the signal in two regarded patches is limited by the patch sizeFootnote 6 and works best when most of the image content is present in both patches, the previously presented steps are performed iteratively on different resolution scales of the image. We employed a Gaussian pyramid with two levels and a scaling of 2 for each image dimension. We used the same patch size on both pyramid levels, performed a first motion estimation on the upper (=lower resolution) pyramid level and transferred the result to the original scale by shifting the patch windows relative to each other according to the (correctly scaled) motion vector determined in the upper pyramid level. This way we ensure that we can deal also with large displacements.
3 Experiments
Our enhanced PhC approach allows us to estimate multiple motion distributions. The proposed method is evaluated at the optical flow dataset from the KITTI Vision Benchmark Suite [22] and the Middlebury Stereo Dataset [3]. Due to the fact that the PhC, by construction, aims at determining the distribution of motion vectors but not a dense motion field, we could not apply the metrics of these benchmarks which expect a dense motion field. Therefore we can only compare our method against the PhC implementation of OpenCV, which is based on the work of Stone et al. [16], and the ground truth data of the training datasets of the two mentioned benchmarks.
3.1 Middlebury Stereo Dataset
In this experiment, we intend to show that our proposed approach is able to estimate multiple motions within a defined patch. We also want to demonstrate that these estimates are correct and precise. However, PhC is of course only able to detect motions if the moving objects show enough structure. Therefore, we chose the dataset from 2001 as its images exhibit well structured elements.
The 6 image pairs of this stereo dataset are divided into 6 centered patches, each of size \(128 \times 128\) pixels. These patches are shown as black rectangles in Fig. 2. Since this dataset was originally created for a stereo benchmark, the images are recorded by a left and right camera. Thus we can assume that the captured scene is only translated horizontally, although of course the PhC is not aware of this. The provided disparity maps express exactly this described behavior. Objects which are more far away from the camera exhibit a lower displacement than objects in the near field. The disparity values (represented by the gray level values) describe the ‘motion’ of an object between two images. For example, two different motions are present in the first patch of the disparity map 2a. The aim of this experiment is to detect exactly these multiple motions within a patch. The total quantity of available displacements of all patches of a specific image pair is listed in the second column of Table 1. Another aspect which has to be considered is that the objects do not necessarily lie in a frontal plane w.r.t. the camera and hence the translation of the object cannot be described with one single disparity value. This means that we observe disparity value ranges, not singular values. In our experiment, we computed all such ranges in each patch, which serve as ground truth ranges.
For each patch we executed our enhanced PhC algorithm. The results are shown in Table 1, where they are compared against the OpenCV version of the PhC. Obviously our enhanced PhC is able to estimate a significantly larger amount of motions than the OpenCV PhC. Moreover, the calculated displacements are more precise and more reliable than the OpenCV ones. We estimated every single detected motion correctly, which means that our computed displacements fall within the above stated range of ground truth data. In contrast to that, the OpenCV version does not determine all its detected motions correctly, as it can be seen in the last column of Table 1.
Using the Middlebury Stereo Dataset, we showed that our enhanced PhC can detect and correctly estimate multiple motions within a patch if the individual objects possess enough structure and cover a reasonable percentage of the patch.
3.2 KITTI Optical Flow Dataset
In the second part of our experiments, we use real world driving scenes and show that our algorithm outperforms the OpenCV PhC in terms of precision and reliability and also simultaneously measures the uncertainties of the estimated motions. Furthermore we show that our four self-diagnostic checks are both useful and work correctly. For the evaluation of our new approach, we took all 194 image pairs from the KITTI Optical Flow Training Dataset [22]. We did not evaluate our PhC method on the Test Dataset, because ground truth data is not available and the benchmark only accepts sparse or dense motion fields. We cannot provide this because we can only estimate motion distribution within a defined patch. For that reason we analyze the performance of our PhC by dividing each image into 45 non-overlapping patches of \(128\times 64\) pixels (cf. Fig. 3) and compare the results of our PhC against the ones from OpenCV and the ground truth from the training dataset. Unfortunately, KITTI does not provide ground truth for the entire image because their LIDAR scanner has only a limited field of view. This is why we could only do the evaluation on 6942 of the 8730 possible patches. The KITTI data provides an almost dense motion field within a patch, but we can only compare motion distributions characterized by a mean \(\mathbf {m}_{n,gt}\) and a covariance matrix \(\mathbf {\Sigma }_{n,gt}\). Therefore, we determine the parameters of the motion distribution from the ground truth data for the patches \(\{c_n\}_{n=1,\ldots ,3752}\) where K motions \(\mathbf {x}_i\) occur in a patch \(c_n\), in the following way:
With the described self-diagnosis checks, we determined that on 3752 of the 6942 patches, the PhC can provide a reliable motion estimate. On the other patches, one of our proposed checks failed due to low structure, too much noise or too different image patches caused by large displacements. In the Fig. 4a and b, the ordered displacements in horizontal respectively vertical direction are shown for all possible patches. These plots show that our PhC is located much closer to the trend of the ground truth than the OpenCV implementation. Many of the motions computed by OpenCV are either outliers or lie close to the zero line. As opposed to this our PhC produces only a few outliers. Consider to the given integer precision, our results comply well to the ground truth.
In the last part of this experiment, we want to show that our motion distribution parametersFootnote 7 are well estimated. We chose a relatively pessimistic approach by evaluating \(\mathbf {m}_{n, eval}\) and \(\mathbf {\Sigma }_{n, eval}\) for each patch \( c_n \) in the following way:
Figure 5a and b show the histogram of the euclidean lengths of the deviation \(\{\mathbf {m}\}_{n=1,\ldots ,3752, eval}\) between the ground truth and our PhC and the OpenCV PhC respectively. The results show that our PhC provides fewer estimates than the OpenCV PhC, simply because we recognize patches where no reliable motion estimate is possible. Secondly, slightly more displacements are estimated correctly, as it can be seen in histogram bins [0, 5]. However, the main advantage of our PhC is that only very few estimates deviate more than 30 pixels to the ground truth. The OpenCV PhC, on the other hand, yields more than 2000 motion estimates which exhibit a deviation of more than 30 pixels w.r.t the ground truth. To evaluate the uncertainty of the estimate, expressed by the ‘size’ of the covariance matrix, we compute the area \(A_n\) of \(\mathbf {\Sigma }_{n, eval}\) as \(A_n = \det (\mathbf {\Sigma }_{n, eval})\), which corresponds to the \(1\sigma \)-area covered by the covariance ellipse. Figure 5c shows the histogram of \( \{A\}_{n=1,\ldots ,3752}\). Some motions have a relatively high uncertainty (large \(A_{n}\)), but most of the estimated motion distributions are compact (small \(A_{n}\)), which means that their displacements are reliable and precise.
We have evaluated our work on two different datasets. The performance of our enhanced PhC clearly outperforms the OpenCV one. We achieve good estimates of motion distributions if the moving objects possess enough structure and cover a significant part of the patch. As already stated, the purpose of our approach is not to compute a dense optical flow field, but to estimate the dominant motions and their uncertainties. The particular advantage of our approach is that we achieve with very moderate computational effort reliable information about the distribution of the optical flow vectors within a patch - including the case of multiple motions. The runtime of our PhC is roughly 1 ms for a \( 256 \times 256\) pixel patch without any use of multithreading and GPU support on a common PC.
4 Summary and Conclusion
We have shown that the classical PhC method can be made significantly more robust against different sources of malfunction. This has been achieved by a systematic analysis of the effects of noise and the conditioning of the input data (texture, similarity). Obviously, the spatial precision of the method can be extended into the subpixel range by using existing schemes for providing subpixel resolution to phase correlation [11–13]. This, however, is independent from the method improvements presented here. We refrained from using any of these schemes in order to present the effects of our modifications in ‘clean room conditions’, unaffected by other modifications. We emphasize that the standard PhC is a good motion estimator for patches with a homogeneous translational motion field, whereas our extended PhC provides distributions for multiple motions in a patch which can be used for local methods that need a good initialization.
Notes
- 1.
\(y[x_{n}] \rightarrow y[x_{n}] + u[x_{n}]\), where \(u[x_{n}]\) is assumed to be \(\mathcal {N}(0,\sigma _{s}^{2})\) i.i.d.
- 2.
Due to independent motions within the image or geometric effects (e.g. zoom).
- 3.
\(y[x_{n}] \rightarrow w[x_{n}] \cdot y[x_{n}]\), where \(w[x_{n}]\) is an anti-leakage window (e.g. Tukey).
- 4.
We used a 0.5 Tukey window on Middlebury Stereo and KITTI Optical Flow datasets.
- 5.
Convolution theorem: \(\mathcal {F}(w[x_{n}] \cdot y[x_{n}]) = \mathcal {F}(w[x_{n}]) * \mathcal {F}(y[x_{n}]) = W[f_{k}] * Y[f_{k}]\).
- 6.
Displacements of at most half the patch size are detectable.
- 7.
We do not assume any specific probability distribution.
References
Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: proceedings of the International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)
Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Intern. J. C.V. 47(1–3), 7–42 (2002)
Pratt, W.K.: Correlation techniques of image registration. IEEE Trans. Aerosp. Electron. Syst. AES–10(3), 353–358 (1974)
Kuglin, C.D., Hines, D.C.: The Phase Correlation Image Alignment Method. In: Proceedings of the International Conference on Cybernetics and Society, pp. 163–165 (1975)
Kumar, B.V.K., Hassebrook, L.: Performance measures for correlation filters. Appl. Opt. 29(20), 2997–3006 (1990)
Greenfeld, J.S.: An operator-based matching system. Photogram. Eng. Remote Sens. 8(57), 1049–1055 (1991)
Tian, Q., Huhns, M.N.: Algorithms for subpixel registration. Comput. Vis. Graph. Image Process. 35(2), 220–233 (1986)
Brown, L.G.: A survey of image registration techniques. ACM Comput. Surv. 24(4), 325–376 (1992)
Zitová, B., Flusser, J.: Image registration methods: a survey. Image Vis. Comput. 21(11), 977–1000 (2003)
Pearson, J.J., Hines, D.C., Golosman, S., Kuglin, C.D.: Video-rate image correlation processor. In: Proceedings of Application of Digital Image Processing, pp. 191–205 (1977)
Takita, K., Sasaki, Y., Higuchi, T., Kobayashi, K.: High-accuracy subpixel image registration based on phase-only correlation. IEICE Trans. Fundam. Electron. Commun. 86(8), 1925–1934 (2003)
Takita, K., Muquit, M.A., Aoki, T., Higuchi, T.: A subpixel correspondence search technique for computer vision applications. IEICE Trans. Fundam. Electron. Commun. E87–A(8), 1913–1923 (2004)
Kirichuk, V.S., Peretjagin, G.I.: Establishing similarity between fragments and a standard. Optoelectron. Instrum. Data Process. 22(4), 83–87 (1986)
Fleet, D.J.: Disparity from local weighted phase-correlation. In: proceedings of the International Conference on Systems, Man and Cybernectis, pp. 48–56 (1994)
Stone, H.S., Orchard, M., Chang, E.C., Martucci, S.: A fast direct Fourier-based algorithm for subpixel registration of images. IEEE Trans. Geosci. Remote Sens. 39(10), 2235–2243 (2001)
Guizar-Sicairos, M., Thurman, S.T., Fienup, J.R.: Efficient subpixel image registration algorithms. Opt. Lett. 33(2), 156–158 (2008)
Jung, I.K., Lacroix, S.: High resolution terrain mapping using low altitude aerial stereo imagery. In: proceedings of the International Conference on Computer Vision, pp. 946–951 (2003)
Abdelfattah, R., Nicolas, J.M., Tupin, F.: Interferometric SAR image coregistration based on the Fourier Mellin invariant descriptor. IEEE Int. Geosci. Remote Sens. Symp. 3, 1334–1336 (2002)
Eisenbach, J., Mertz, M., Conrad, C., Mester, R.: Reducing camera vibrations and photometric changes in surveillance video. In: proceedings of the International Conference on Advanced Video and Signal Based Surveillance, pp. 69–74 (2013)
Foroosh, H., Zerubia, J.B., Berthod, M.: Extension of phase correlation to subpixel registration. IEEE Trans. Image Process. 11(3), 188–200 (2002)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: proceedings of Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Morgan, G.L.K., Jian, G.L., Yan, H.: Precise subpixel disparity measurement from very narrow baseline stereo. IEEE Trans. Geosci. Remote Sens. 48(9), 3424–3433 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Ochs, M., Bradler, H., Mester, R. (2016). Enhanced Phase Correlation for Reliable and Robust Estimation of Multiple Motion Distributions. In: Bräunl, T., McCane, B., Rivera, M., Yu, X. (eds) Image and Video Technology. PSIVT 2015. Lecture Notes in Computer Science(), vol 9431. Springer, Cham. https://doi.org/10.1007/978-3-319-29451-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-29451-3_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29450-6
Online ISBN: 978-3-319-29451-3
eBook Packages: Computer ScienceComputer Science (R0)