Elsevier

Pattern Recognition

Volume 40, Issue 6, June 2007, Pages 1756-1762
Pattern Recognition

A note on the convergence of the mean shift

https://doi.org/10.1016/j.patcog.2006.10.016Get rights and content

Abstract

Mean shift is an effective iterative algorithm widely used in computer vision community. However, to our knowledge, its convergence, a key aspect of any iterative algorithm, has not been rigorously proved up to now. In this paper, by further imposing some commonly acceptable conditions, its convergence is proved.

Introduction

The mean shift algorithm is a simple iterative statistical method introduced by Fukunaga and Hostetler [1], which shifts each data point to the weighted average of a sample set. The theory is studied further in Refs. [2], [3], [4], [5]. In recent years, it has been widely applied in computer vision community [3], [6], such as tracking, image segmentation, discontinuity preserving, smoothing, filtering, edge detection, etc.

Let {xi,1in} be an i.i.d. (independently and identically distributed) sample data set from probability density function f(x), xRm. If f(x) is estimated by f^(x)=i=1nwik(βxi-x2), Cheng [2] gave the mean shift procedure {yj,j=1,2,} as the weighted averages of the samples {xi,1in} yj+1=i=1nwijxito seek the mode of f^(x), where wi,wij>0, are the weights of sample xi, wij=wik(βxi-yj2)xii=1nwik(βxi-yj2),i=1nwij=1.β>0, k(x) is the profile function defined in Ref. [2] (sometimes called window or kernel), and k(x) the differential of k(x). Cheng [2] proved the convergence of mean shift sequence {yj,j=1,2,} under the following two assumptions:

  • (1)

    k(x)=e-x.

  • (2)

    The idealized mode in the density surface of random variable x is q(x)=e-γx2,γ<β.

However, since the true value of γ is unknown, it is difficult to assure the above assumption (2) of being satisfied in real application. Hence, its applicability is limited to some extent. Refs. [3], [7], [8] attempted to prove the convergence of mean shift sequence {yj,j=1,2,} under the assumption that k(x) is simply a convex and monotonically decreasing profile, and wi=1/n. But, as shown in the following, the proofs are incorrect in Refs. [3], [7], [8].

In Refs. [7], [8], the proofs essentially depend on the wrong conclusion that “yj+1-yj converges to zero” means “{yj,j=1,2,}converges”. Here is a counter example:

Counter Example 1

Let yj=i=1j1/i, then yj+1-yj=1j+10(j).However, it is well known that {yj,j=1,2,} does not converge and is not a Cauchy sequence.

In the convergence proof of mean shift sequence {yj,j=1,2,} in Ref. [3], the key step isyj+m-yj+m-12++yj+1-yj2yj+m-yj2.However, inequality (1) does not hold. Here is a counter example:

Counter Example 2

Let m=2, thenyj+2-yj2=yj+2-yj+1+yj+1-yj2=yj+2-yj+12+yj+1-yj2+2(yj+2-yj+1)T(yj+1-yj).From Theorem 2 in Ref. [3], the following inequality holds: (yj+2-yj+1)T(yj+1-yj)0.Hence, yj+2-yj2yj+2-yj+12+yj+1-yj2.It is in conflict with inequality (1). Let yj=1, yj+1=2 and yj+2=3, then yj+2-yj2=3-12=4and yj+2-yj+12+yj+1-yj2=1+1=2.Therefore, yj+2-yj2yj+2-yj+12+yj+1-yj2.

In addition to the convergence problem, there are two other main limitations for the current mean shift algorithm:

  • (1)

    No sufficient attention has been paid to the difference and the anisotropy of the local structure around different samples. For example, as shown in Fig. 1, since the sample distribution in the neighborhood of x2 is denser than that of x1, the scale for x2 should ideally be smaller than that for x1. In addition, the sample distribution is highly anisotropic in the neighborhood of x2, and we should take it into account. In Refs. [2], [3], [7], however, the relative scale and local structure are treated identically for all samples and in every direction. In Ref. [8], only the difference of relative scale between samples is accounted for.

  • (2)

    No sufficient attention has been paid to the difference of sample contributions. As we know, the peripheral samples, often more corrupted by noise, are less reliable. Hence, different samples should be ideally treated differently. In Refs. [3], [7], [8], the contributions are assumed to be same for all samples. In Ref. [2], although the contribution differences are considered, the local structure is not taken into account.

In the next section, we outline some means to extend the current mean shift algorithm and alleviate these two limitations by accounting for the anisotropy of local structure around every sample, the difference of relative scale and the relative importance/reliability between samples. In addition, the convergence of the iterative points {yj,j=1,2,} and its function value {f^(yj),j=1,2,} of the extended algorithm are rigorously proved by adding a modest constraint in Section 3. The experiments results are given to evaluate the contribution of the proposed algorithm in Section 4. The conclusion and remarks are given in Section 5.

Section snippets

Preliminaries

Definition 1

Function k(x) is called a bounded kernel if it, on [0,+), satisfies:

  • (1)

    k(x)0.

  • (2)

    monotonically decreasing: k(x1)k(x2), 0x1x2<+.

  • (3)

    0k(x)dx<.

  • (4)

    0<k(0)<+.

In this work, given a bounded kernel k(x), the density estimation of random variable x is defined asf^H,k(x)=i=1nwiKi(x),where Ki(x)=ck,i,hk(x-xiHi2),x-xiHi2=(x-xi)THi(x-xi),Hi=Σi-1/h2,H={Hi,1in},i=1nwi=1,wi>0.h>0 is to adjust the window size on the whole, ck,i,h>0 is a constant to ensure that Ki(x) is a probability density function, wi is

Convergence

By convergence of mean shift, it is meant that both {f^H,k(yj),j=1,2,} and the iterative sequence {yj,j=1,2,} are all convergent.

Definition 2

A function k:[0,+]R is convex if there exists a bounded and continuous k(x) satisfyingk(x2)-k(x1)>k(x1)(x2-x1),x10,x20,x1x2.We have the following two theorems:

Theorem 1

If the kernel k(x) is convex, then the sequence {f^H,k(yj),j=1,2,} converges and monotonically increases.

Theorem 2

If the kernel k(x) is convex, and the number of critical points of f^H,k(x) is finite on S0={y|

Experiments

To evaluate the contribution of the work, we conducted clustering experiments by mean shift algorithm. In this section, we report and analyze the experimental results on simulated data (Fig. 2). For convenience of presentation, MS({i}, {wi}, h) represent the mean shift procedure with local structure {i}, weight {wi} and scale h.

In the experiments, the local structure i and weight wi are estimated by^i=1kj=1k(xij-xi)(xij-xi)T,w^i=1^i,where xi1,,xik are the k nearest neighbors of xi, and xi

Conclusion and discussions

In this paper, we at first extended the mean shift algorithm by introducing an arbitrary positive definite matrix to account for the difference and the anisotropy of the local structure around different samples and by assigning a weight for every sample to account for its relative importance and reliability. Most of all, we gave a rigorous convergence proof for the extended algorithm. The convergence conditions in this work can be easily satisfied and slightly different from those in Refs. [2],

Acknowledgments

The authors are appreciative of the inspiring comments from the anonymous reviewers. This work was supported by the National Natural Science Foundation of China under the Grant No. 60121302, the National Key Basic Research and Development Program (2004CB318107).

About the Author—XIANGRU LI is a teacher at Shandong University of Science and Technology. He received his Ph.D. degree in 2006 from Institute of Automation, Chinese Academy of Sciences. His research interests are now in statistical machine learning, data mining.

References (9)

  • K. Fukunaga et al.

    The estimation of the gradient of a density function, with applications in pattern recognition

    IEEE Trans. Inf. Theory

    (1975)
  • Y.Z. Cheng

    Mean shift, mode seeking, and clustering

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1995)
  • D. Comaniciu et al.

    Mean shift: a robust approach toward feature space analysis

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • M. Fashing et al.

    Mean shift is a bound optimization

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
There are more references available in the full text version of this article.

Cited by (60)

View all citing articles on Scopus

About the Author—XIANGRU LI is a teacher at Shandong University of Science and Technology. He received his Ph.D. degree in 2006 from Institute of Automation, Chinese Academy of Sciences. His research interests are now in statistical machine learning, data mining.

About the Author—ZHANYI HU is a professor at Institute of Automation, Chinese Academy of Sciences. He received his Ph.D. degree (Docteur d’Etat) in computer vision from University of Liege, Belgium, in 1993. His research interests include camera calibration and 3D reconstruction, vision guided robot navigation, and image based modeling and rendering.

About the Author—FUCHAO WU is a professor at Institute of Automation, Chinese Academy of Sciences. His research interests are now in computer vision, which include 3D reconstruction, active vision, and image based modeling and rendering.

View full text