Locally Orderless Tracking

Oron, Shaul; Bar-Hillel, Aharon; Levi, Dan; Avidan, Shai

doi:10.1007/s11263-014-0740-6

Shaul Oron¹,
Aharon Bar-Hillel²,
Dan Levi³ &
…
Shai Avidan¹

1460 Accesses
139 Citations
Explore all metrics

Abstract

Locally Orderless Tracking (LOT) is a visual tracking algorithm that automatically estimates the amount of local (dis)order in the target. This lets the tracker specialize in both rigid and deformable objects on-line and with no prior assumptions. We provide a probabilistic model of the target variations over time. We then rigorously show that this model is a special case of the Earth Mover’s Distance optimization problem where the ground distance is governed by some underlying noise model. This noise model has several parameters that control the cost of moving pixels and changing their color. We develop two such noise models and demonstrate how their parameters can be estimated on-line during tracking to account for the amount of local (dis)order in the target. We also discuss the significance of this on-line parameter update and demonstrate its contribution to the performance. Finally we show LOT’s tracking capabilities on challenging video sequences, both commonly used and new, displaying performance comparable to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

Article Open access 08 October 2020

Jonathon Luiten, Aljos̆a Os̆ep, … Bastian Leibe

LSD-SLAM: Large-Scale Direct Monocular SLAM

Multi-modal visual tracking: Review and experimental comparison

Article Open access 03 January 2024

Pengyu Zhang, Dong Wang & Huchuan Lu

Notes

References

Alterman, M., Schechner, Y., Perona, P., & Shamir, J. (2012). Independent components in dynamic refraction. CCIT Report 805.
Avidan, S. (2005). Ensemble tracking. In CVPR, pp. 494–501.
Babenko, B., Yang, M. H., & Belongie, S. (2009). Visual tracking with online multiple instance learning. In CVPR.
Boltz, S., Nielsen, F., & Soatto, S. (2010). Earth mover distance on superpixels. In ICIP.
Comaniciu, D. (2002). Bayesian kernel tracking. In DAGM-Symposium, pp. 438–445.
Doucet, A., de Freitas, N., & Gordon, N. (2001). Sequential Monte Carlo methods in practice. Berlin: Springer.
Book MATH Google Scholar
Elgammal, A., Duraiswami, R., Davis, L. S. (2003). Probabilistic tracking in joint feature-spatial spaces. In CVPR.
Everingham, M., Van Gool, L. J., Williams, C. K. J., Winn, J. M., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303.
Ginneken, B. V., & Haar Romeny, B. M. T. (1999). Applications of locally orderless images. Scale-space theories in computer vision. Berlin: Springer.
Godec, M., Roth, P. M., & Bischof, H. (2011). Hough-based tracking of non-rigid objects. In ICCV.
Grabner, H., Grabner, M., & Bischof, H. (2006). Real-time tracking via online boosting. In BMVC.
Hager, G. D., & Belhumeur, P. N. (1998). Efficient region tracking with parametric models of geometry and illumination. Pattern Analysis and Machine Intelligence, 20, 1025.
Article Google Scholar
He, X., Zemel, R., & Ray, D. (2006). Learning and incorporating top-down cues in image segmentation. In ECCV.
Heller, & Tompkins, C. B. G. (1956). An extension of a theorem of Dantzig’s. Linear Inequalities and Related Systems, vol. 38, pp. 247–254. Princeton, NJ: Annals of Mathematics Studies.
Hoiem, D., Efros, A., & Hebert, M. (2005) Geometric context from a single image. In ICCV.
Isard, M., & Blake, A. (1998). Condensation—conditional density propagation for visual tracking. International Journal of Computer Vision, 1, 5.
Jia, X., Lu, H., & Yang, M. H. (2012). Visual tracking via adaptive structural local sparse appearance model. In CVPR.
Kalal, Z., Mikolajczyk, K., & Matas, J. (2010). Tracking–learning–detection. TPAMI, 34, 1409.
Koenderink, J. J., & Van Doorn, A. J. (1999). The structure of locally orderless images. International Journal of Computer Vision, 31, 159.
Kwon, J., & Lee, K.M. (2010). Visual tracking decomposition. In CVPR.
Levina, E., & Bickel, P. (2001). The earth mover’s distance Is the mallows distance: some insights from statistics. In ICCV.
Levinshtein, A., Stere, A., Kutulakos, K. N., Fleet, D. J., Dickinson, S. J., & Siddiqi, K. (2009). Turbopixels: Fast superpixels using geometric flows. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 2290.
Mei, X., Ling, H., Wu, Y., Blasch, E., & Bai, L. (2011) Minimum error bounded efficient L1 tracker with occlusion detection. In ICCV.
Oron, S., Bar-Hillel, A., Levi, D., & Avidan, S. (2012). Locally Orderless Tracking. In CVPR.
Peleg, S., Werman, M., & Rom, H. (1989). A unified approach to the change of resolution: Space and gray-level. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11, 739.
Article Google Scholar
Ren, X., & Malik, J. (2003). Learning a classification model for segmentation. In ICCV.
Ross, D., Lim, J., & Yang, M. H. (2004). Adaptive probabilistic visual tracking with incremental subspace update. In ECCV.
Ross, D., Lim, J., Lin, R. S., & Yang, M. H. (2007). Incremental learning for robust visual tracking. International Journal of Computer Vision, 77, 125.
Article Google Scholar
Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40, 99.
Article MATH Google Scholar
Santner, J., Leistner, C., Saffari, A., Pock, T., & Bischof, H. (2010). Prost: Parallel robust online simple tracking. In CVPR.
Wang, S., Lu, H., Yang, F., & Yang, M. H. (2011). Superpixel tracking. In ICCV.
Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys (CSUR), 38, 13.
Article Google Scholar
Zhao, Q., Yang, Z., & Tao, H. (2010). Differential earth mover’s distance with its applications to visual tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 274.
Article Google Scholar
Zhong, W., Lu, H., & Yang, M. H. (2012). Robust object tracking via sparsity-based collaborative model. In CVPR.

Download references

Author information

Authors and Affiliations

Tel Aviv University, 69978 , Tel Aviv, Israel
Shaul Oron & Shai Avidan
Microsoft Research, Advanced Technology Labs Israel Microsoft - Haifa R&D Center, Building No. 23, Matam, Haifa , 31905, Israel
Aharon Bar-Hillel
General Motors Advanced Technical Center, Hamada 7, Herzliya, Israel
Dan Levi

Authors

Shaul Oron
View author publications
You can also search for this author in PubMed Google Scholar
Aharon Bar-Hillel
View author publications
You can also search for this author in PubMed Google Scholar
Dan Levi
View author publications
You can also search for this author in PubMed Google Scholar
Shai Avidan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaul Oron.

Additional information

Communicated by I. S. Kweon.

Appendices

Appendix 1: Additional Noise Models

1.1 Uniform Noise

A Uniform distribution with parameter $r$ can be used as location and/or appearance noise model again. Due to the independence assumed between appearance and location parameters $p,q,r,D$ will be used without the superscripts $A,L$.

$$\begin{aligned} Pr(p|q,r)= \left\{ \begin{array}{l l} \frac{1}{(2r)^D} &{} \quad ||p-q||_\infty \le r \\ 0 &{} \quad \text {otherwise}\\ \end{array} \right. \end{aligned}$$

(20)

where $D$ is the dimension of $p$ and $q$. The ground distance in this case is:

$$\begin{aligned} d(p,q)= \left\{ \begin{array}{l l} D \cdot log(2r) &{} \quad ||p-q||_\infty \le r \\ \infty &{} \quad \text {otherwise}\\ \end{array} \right. \end{aligned}$$

(21)

This distance means the cost of changing the appearance and/or location of a pixel by less than a certain quant costs nothing (the same as not moving it at all), and changing it by more than that is not allowed.

This model may pose some problems as certain mismatches are not allowed at all and also since the signature EMD problem can become unfeasible in some cases i.e. giving $\infty $ distance. Therefore a mixture of two uniforms might be a better choice.

1.2 Uniform-Mixture Noise

Using a mixture of two uniforms provides us with one low cost for small perturbations and a second high cost (but not $\infty $) for large ones.This means we allow any match but with high cost. The parameter for the second uniform should include the entire space. We formulate this model using a mixture variable $h \!\sim \! Bernoulli(\alpha )$ and marginalizing over it:

$$\begin{aligned} Pr(p|q,r,\alpha )\!=\! \alpha Pr(p|q,h=0)\!+\!(1\!-\!\alpha )Pr(p|q,h\!=\!1)\nonumber \\ \end{aligned}$$

(22)

where $P(p|q,h=\{0,1\})$ are both uniform distributions. The ground distance is given by:

$$\begin{aligned} d(p,q)= \left\{ \begin{array}{l l} -log(\frac{\alpha }{(2r)^D}+\frac{1-\alpha }{S}) &{} \quad ||p-q||_\infty \le r \\ -log(\frac{1-\alpha }{S}) &{} \quad \text {otherwise}\\ \end{array} \right. \end{aligned}$$

(23)

where $S$ is the hyper-volume of the entire space (e.g. for un-normalized RGB space $S=(2^8)^3$ which is the RGB cube volume).

1.2.1 Uniform-Mixture Parameter Estimation

This model has two parameters $\varTheta =\{\alpha ,r\}$. We use the EMD correspondence mapping $f_{ij}$ and the ground distance matrix $d_{ij}=d(p_i,q_j)$ from which we build a CDF of the transported distance. We denoted this CDF by $c(r):[0,R] \rightarrow [0,1]$ where $R$ is the maximal distance a mass can move in our subspace i.e. $\forall r \quad c(r)=\frac{\sum _{i,j:d_{ij}\le r}f_{ij}\dot{d}_{ij}}{ \sum _{ij}f_{ij}\dot{d}_{ij}}$. We can now estimate $\alpha $ and $r$ using an ML consideration:

$$\begin{aligned}&logPr(P|Q,r,\alpha )=\nonumber \\&\quad {\sum _i}logPr(p_i|q_j) = {\sum _{i \in D_1}} log\left( \frac{\alpha }{(2r)^D}+\frac{1-\alpha }{S}\right) \nonumber \\&\quad +{\sum _{i \in D_2}} log\left( \frac{1-\alpha }{S}\right) =N\left[ c(r)\cdot log\left( \frac{\alpha }{(2r)^D}\right. \right. \nonumber \\&\qquad \left. \left. +\frac{1-\alpha }{S}\right) + (1-c(r))\cdot log\left( \frac{1-\alpha }{S}\right) \right] \end{aligned}$$

(24)

where $D_1=\{i:||p_i-q_j||_{\infty }\le r\},D_2=\{i:||p_i-q_j||_{\infty }> r\}$ and $N$ is the total mass. If we only want to estimate $r$ and leave $\alpha $ constant we can numerically find $r$ that maximizes (24). For estimating both $r$ and $\alpha $ we differentiate (24) with respect to $\alpha $ and compare to 0 which leads to:

$$\begin{aligned} \alpha = \frac{c(r)S-(2r)^D}{S-(2r)^D} \end{aligned}$$

(25)

Plugging this result back to equation (24) we see that we need to find:

$$\begin{aligned} \underset{r}{argmax} \left( c(r)\cdot log(\frac{c(r)}{(2r)^D}) + (1-c(r))\cdot log(\frac{1-c(r)}{S-(2r)^D}) \right) \end{aligned}$$

(26)

Equation (26) can be solved numerically given $c(r)$ built using the EMD result and then $\alpha $ is calculated based on equation (25).

Appendix 2: Proof of Proposition 2

Proof

For all $i,j$ in (7), we take all the variables $\{f_{k_1j},\ldots , f_{k_{w^p_i}j}\}$ that correspond to $w^p_i$ similar pixels (with singleton weights). We then collapse each set into a single variable representing their sum $g_{ij}=\sum _{l=1}^{w^p_i}f_{k_lj}$. This can be done as their coefficients ($d_{k_lj}$) in the optimization argument ${\sum _{ij}}f_{ij}d_{ij}$ are the same. Thus the $w^p_i$ constraints of the form $\sum _jf_{k_lj}=1$ can be replaced with a single constraint demanding $\sum _jg_{ij}=w^p_i$ and the $w^q_j$ constraints of the form $\sum _if_{ik_l}=1$ can be replaced with a single constraint demanding $\sum _ig_{ij}=w^q_j$. We then obtain the following integer linear program (ILP):

$$\begin{aligned}&\qquad \qquad \qquad \text {min}\overset{n_1}{\underset{i=1}{\sum }}\overset{n_2}{\underset{j=1}{\sum }}g_{ij}d_{ij}\nonumber \\&\text { such that }\nonumber \\&\qquad \qquad \qquad \overset{n_1}{\underset{i=1}{\sum }}g_{ij}=w^p_i, \overset{n_2}{\underset{j=1}{\sum }}g_{ij}\nonumber \\&\qquad \qquad \quad \qquad =w^q_j , g_{ij} \in \{0,1,\ldots ,\min (w^p_i,w^q_j)\} \end{aligned}$$

(27)

By construction we have that the space of feasible solutions w.r.t to optimization problem (7) did not change i.e. $\min {{\sum _{i=1}^m}}{{\sum _{j=1}^m}}f_{ij}d_{ij} = \min {\sum _{i=1}^{n_1}}{\sum _{j=1}^{n_2}}g_{ij}d_{ij}$ where the $d_{ij}$ on the left and right side of the equation are set according to the appropriate source and sink nodes. Again this is true since every $g_{ij}$ is simply a sum of $f_{ij}$ having the same ground distance $d_{ij}$. If we now write (27) in the canonical form (as we did in proposition 1) we see that the matrix $A$ is again totally unimodular which means that the relaxed linear programming (LP) problem has an integral solution. This relaxed LP is exactly optimization problem (8) and given a solution (i.e. the $g_{ij}$) to this problem we can always find an assignment to the $f_{ij}$ such that would satisfy (7). This is true since we can always break down the compact signatures back into the pixel-wise problem with singleton bins which as we have shown would have the same minima.$\square $

Appendix 3: Proof of Proposition 3

Proof

It is enough to look at a single step of uniting two clusters. Assume we unite $p_{n_1},p_{n_1-1}$ into a single cluster $\hat{p}_{n-1}$. For weight/flow assignment $f_{ij}$ we have:

$$\begin{aligned} \underset{i=1}{\overset{n_1}{\sum }}\underset{j=1}{\overset{n_2}{\sum }}f_{ij}d_{ij} \!&= \! \underset{i=1}{\overset{n_1-2}{\sum }}\underset{j=1}{\overset{n_2}{\sum }}f_{ij}d_{ij}\nonumber \\&\!+\!\underset{j=1}{\overset{n_2}{\sum }}f_{n-1,j}d(p_{n_1-1},q_j)\!+\!f_{n_1,j}d(p_{n_1},q_j)\nonumber \\ \end{aligned}$$

(28)

Denoting $ C = {\sum _{i=1}^{n_1-2}}{\sum _{j=1}^{n_2}}f_{ij}d_{ij}$ and using the triangle inequality we have:

$$\begin{aligned}&C + \underset{j=1}{\overset{n_2}{\sum }}f_{n-1,j}d(p_{n_1-1},q_j)+f_{n_1,j}d(p_{n_1},q_j)\nonumber \\&\quad \le C +\underset{j=1}{\overset{n_2}{\sum }}f_{n_1-1,j}[d(p_{n_1-1},\hat{p}_{n_1-1})+d(\hat{p}_{n_1-1},q_j)]\nonumber \\&\qquad +f_{n,j}[d(p_{n_1},\hat{p}_{n_1-1})+d(\hat{p}_{n_1-1},q_j)] \end{aligned}$$

(29)

Reorganizing the last expression by collecting elements related to the distance between the original clusters and their crude version and elements related to the distance between the crude cluster and its assignment leads to,

$$\begin{aligned}&C+\underset{j=1}{\overset{n_2}{\sum }}(f_{n_1-1,j}+f_{nj})d(\hat{p}_{n_1-1},q_j)\nonumber \\&\quad +w_{n_1-1}d(p_{n_1-1},\hat{p}_{n_1-1})+w_{n_1} d(p_{n_1},\hat{p}_{n_1-1})\nonumber \\&\quad = C+\underset{j=1}{\overset{n_2}{\sum }}\hat{f}_{n_1-1,j}d(\hat{p}_{n_1-1},q_j)\nonumber \\&\quad \quad +w_{n_1-1}d(p_{n_1-1},\hat{p}_{n_1-1})\!+\!w_{n_1} d(p_{n_1},\hat{p}_{n_1-1}) \end{aligned}$$

(30)

where $\hat{f}_{n_1-1} = f_{n_1-1} + f_{n_1}$. The expression ${\sum _{i=1}^{n_1-2}}{\sum _{j=1}^{n_2}} f_{ij}d_{ij} +{\sum _{j=1}^{n_2}}\hat{f}_{n_1-1,j}d(\hat{p}_{n_1-1},q_j)$ appearing in the last line is the optimization argument $EMD(\widehat{P},Q,d)$. Lets fix now the variables $\{f_{ij}\}_{i=1}^{n-2},\hat{f}_{n_1-1}$ to the argmin values of the problem (the values achieving the minimun for $EMD(\widehat{P},Q,d)$. Now using the inequality in (28) we have

$$\begin{aligned} \begin{array}{ll} &{}EMD(\widehat{P},Q,d) \\ &{}= \underset{i=1}{\overset{n_1-2}{\sum }}\underset{j=1}{\overset{n_2}{\sum }}f_{ij}d_{ij} + \underset{j=1}{\overset{n_2}{\sum }}\hat{f}_{n_1-1,j}d(\hat{p}_{n_1-1},q_j)\\ &{}\ge \underset{i=1}{\overset{n_1}{\sum }}\underset{j{=}1}{\overset{n_2}{\sum }}f_{ij}d_{ij}\!-\! w_{n_1-1}d(p_{n_1-1},\hat{p}_{n_1{-}1})-w_n d(p_{n_1},\hat{p}_{n_1-1}\!)\\ &{}\ge \underset{f_{ij}}{\text {argmin}}\underset{i=1}{\overset{n_1}{\sum }}\underset{j=1}{\overset{n_2}{\sum }}f_{ij}d_{ij} -w_{n_1-1}d(p_{n_1-1},\hat{p}_{n_1-1})\\ &{}\qquad -w_n d(p_{n_1},\hat{p}_{n_1-1})\\ &{}= EMD(P,Q,d)-w_{n_1-1}d(p_{n_1-1},\hat{p}_{n_1-1})\\ &{} \quad -w_n d(p_{n_1},\hat{p}_{n_1-1}) \end{array} \end{aligned}$$

(31)

Since $w_{n_1-1}d(p_{n_1-1},\hat{p}_{n_1-1})+w_n d(p_{n_1},\hat{p}_{n_1-1}>0$ it follows that,

$$\begin{aligned}&|EMD(P,Q,d) - EMD(\widehat{P},Q,d)\nonumber \\&\quad |\ge w_{n_1-1}d(p_{n_1-1},\hat{p}_{n_1-1})+w_{n_1}d(p_{n_1},\hat{p}_{n_1-1}) \end{aligned}$$

(32)

In an analogous way it can be shown that,

$$\begin{aligned}&|EMD(P,Q,d) - EMD(P,\widehat{Q},d)\nonumber \\&\quad |\ge w_{n_2-1}d(q_{n_2-1},\hat{q}_{n_2-1})+w_{n_2}d(q_{n_2},\hat{q}_{n_2-1}) \end{aligned}$$

(33)

The proposition follows by repeating this argument for all $\hat{p}_i,\hat{q}_j$ $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Oron, S., Bar-Hillel, A., Levi, D. et al. Locally Orderless Tracking. Int J Comput Vis 111, 213–228 (2015). https://doi.org/10.1007/s11263-014-0740-6

Download citation

Received: 30 April 2012
Accepted: 13 June 2014
Published: 08 July 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s11263-014-0740-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Locally Orderless Tracking

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

LSD-SLAM: Large-Scale Direct Monocular SLAM

Multi-modal visual tracking: Review and experimental comparison

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Additional Noise Models

1.1 Uniform Noise

1.2 Uniform-Mixture Noise

1.2.1 Uniform-Mixture Parameter Estimation

Appendix 2: Proof of Proposition 2

Proof

Appendix 3: Proof of Proposition 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

HOTA: A Higher Order Metric for Evaluating Multi-object Tracking

LSD-SLAM: Large-Scale Direct Monocular SLAM

Multi-modal visual tracking: Review and experimental comparison

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix 1: Additional Noise Models

1.1 Uniform Noise

1.2 Uniform-Mixture Noise

1.2.1 Uniform-Mixture Parameter Estimation

Appendix 2: Proof of Proposition 2

Proof

Appendix 3: Proof of Proposition 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation