Multiple Object Segmentation and Tracking by Bayes Risk Minimization

Sixta, Tomáš; Flach, Boris

doi:10.1007/978-3-319-46723-8_70

Multiple Object Segmentation and Tracking by Bayes Risk Minimization

Tomáš Sixta¹⁸ &
Boris Flach¹⁸

Conference paper
First Online: 02 October 2016

13k Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9901))

Abstract

Motion analysis of cells and subcellular particles like vesicles, microtubules or membrane receptors is essential for understanding various processes, which take place in living tissue. Manual detection and tracking is usually infeasible due to large number of particles. In addition the images are often distorted by noise caused by limited resolution of optical microscopes, which makes the analysis even more challenging. In this paper we formulate the task of detection and tracking of small objects as a Bayes risk minimization. We introduce a novel spatio-temporal probabilistic graphical model which models the dynamics of individual particles as well as their relations and propose a loss function suitable for this task. Performance of our method is evaluated on artificial but highly realistic data from the 2012 ISBI Particle Tracking Challenge [8]. We show that our approach is fully comparable or even outperforms state-of-the-art methods.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Multiple object tracking (MOT) is an important tool for understanding various processes in living tissues by analyzing the dynamics and interactions of cells and subcellular particles. Tracking biological cells is challenging, because they may appear and disappear at any moment, reproduce or merge (e.g. macrophage engulfing a pathogen) and often cannot be visually distinguished from each other. Existing techniques for MOT (see [5] for an excellent review) often consist of two separate modules: a detector and a tracker. The reliability of the detector is crucial for their success, because without feedback from the tracker it is impossible to recover from detection errors. Unfortunately, current visualization techniques (e.g. fluorescence microscopy) are prone to noise, which motivates the development of more elaborate tracking methods.

Network flow based methods formulate the MOT as a min-cost flow problem in a network, which is constructed using the detector output and local motion information. A typical example is [16]. A similar formulation is adopted in [2] and solved by the k-shortest paths algorithm. Integer linear programming (ILP) can be used to handle aspects of MOT, which are difficult to model by a flow-based formulation (object-object interactions). The ILP framework allows to express detection and tracking by a single objective [3, 14], which avoids the problem of propagating detection errors. This additional power however comes at a cost, because ILP is NP-hard in general and can be solved only approximately by an LP relaxation [12].

A natural way to deal with the uncertainty of the detector is to use a probabilistic model. The model in [4] employs interacting Kalman filters with several candidate detections. Many works start by assuming that a sparse tracking graph has been already created and can be used as an input. Nodes of the graph then correspond either to the detection candidates [7] or short tracklets [15] and potential functions are used to model the shape prior, particle dynamics, interactions among the objects or their membership in social groups [9]. Node labels may carry information about presence or absence of an object, its position and/or about its depth [13] to resolve occlusions. In [1] the detection and data association tasks are solved jointly in a single probabilistic framework. Their model however requires the number of objects to be known a priori, which severely restricts its applicability to tracking biological particles.

The optimal output of a probabilistic tracker is determined by the loss function. Despite its importance it receives surprisingly little attention in the literature. In fact the zero-one loss has become a de facto standard and although it often leads to acceptable results, we believe that better loss could lead to more robust and accurate trackers.

In this paper we propose a novel probabilistic model for MOT, which can deal with unknown and changing number of objects, model their interactions and recover from detection errors. The tracking task is formulated as minimization of a Bayes risk with a loss function specifically designed for this purpose. The optimal set of trajectories is found by a greedy optimization algorithm.

2 Multiple Object Tracking

2.1 The Model

We assume that we are given a sequence of n images on domain $D\subset \mathbb {Z}^2$ and corresponding measurements $U=\{U_1, U_2,...,U_{n}\}$. Each $U_t$ contains a detection score for each pixel – a number from interval [0, 1]. They are however not final decisions – $U_t(i)=1$ does not necessarily imply the presence of an object at pixel i and $U_t(i)=0$ does not exclude an object in that position. To account for the noise in the images and resulting uncertainty of detection scores we model a conditional distribution of a set of trajectories $p_\theta (\mathbf{S}|U)$.

The basic building block of the model is a trajectory represented as a sequence of n discrete positions of a single particle. A special label is used to indicate, that the particle is not in the scene. This allows to deal with variable and unknown number of particles, as the only requirement for the model is to have at least as many trajectories as particles in the scene. The probability of a set of trajectories $\mathbf S$ given measurements U is defined by a graphical model, which involves terms for individual trajectories, interaction potentials and hard constraints, which ensure, that the set of trajectories is physically possible:

$$\begin{aligned} p_\theta (\mathbf{S}|U)=\frac{\exp \left( \sum _{s \in \mathbf{S}}\phi _\theta (s,U)+\sum _{s,s' \in \mathbf{S}}\psi _\theta (s,s')+\sum _{s \in \mathbf{S}}\kappa _\theta (s)\right) }{Z(\theta )}, \end{aligned}$$

(1)

where $Z(\theta )$ is the normalization constant. The single trajectory potentials $\phi _\theta (s,U)$ are sums of contributions of the appearance model, the motion model (proportional to the distance between expected and real position of the particle) and penalties for trajectories, that enter the scene (this prevents the tracker to model a single particle by several short trajectories). The interaction potentials $\psi _\theta (s,s')$ prevent two trajectories from getting too close to each other, which eliminates the need for rather heuristic non-maximum suppression. The hard constraints $\kappa _\theta (s)$ are $0/-\infty $ valued functions used mainly to ensure, that no particle exceeds some predefined maximum velocity.

2.2 The Loss Function

In the context of the Bayes risk minimization a good loss function should reflect the properties of the physical system. For example the commonly used zero-one loss is not a proper choice, because it cannot distinguish between solutions, which are only slightly off and completely wrong. Another common choice, the additive L$_2$ loss, is also inappropriate, because it is unclear how to evaluate the loss for two sets of trajectories with different cardinality. The loss function we propose is based on the Baddeley’s Delta Loss, which was originally developed for binary images [11]. Let I be a binary image on discrete domain $D\subset \mathbb {Z}^2$ and $\delta _I$ its distance transform with respect to a truncated pixel distance $\min \{dist(i,j),\epsilon \}$ (Fig. 1a). The value of the Baddeley’s loss for two binary images I and $\hat{I}$ is the squared Euclidean distance of $\delta _I$ and $\delta _{\hat{I}}$:

$$\begin{aligned} B(I,\hat{I})=\sum _{i\in D} \left( \delta _I(i)-\delta _{\hat{I}}(i)\right) ^2. \end{aligned}$$

(2)

To make the loss applicable for the tracking task we have to define an appropriate set of sites. The simplest option is to take pixels of the input images and foreground sites would be positions where the trajectories intersect the frames. The resulting loss has however a serious flaw because it cannot distinguish between parallel and crossing trajectories (Fig. 1b). To solve this issue we also consider all conceivable tracklets of length two and the foreground sites are those which are part of some trajectory (Fig. 1c). The distance between two tracklets is sum of distances of their positions in individual frames. Because of the maximum velocity constraint we often have to consider only a subset of the tracklets.

Our loss function is then defined as Baddeley’s loss over all tracklets of length one (pixels) and two (we denote them as TR(D)):

$$\begin{aligned} L(\mathbf{S},\hat{\mathbf{S}})= \sum _{a\in TR(D)} \left( \delta _\mathbf{S}(a)-\delta _{\hat{\mathbf{S}}}(a)\right) ^2. \end{aligned}$$

(3)

This loss function has several appealing properties. It is permutation invariant, does not require to artificially match trajectories from $\mathbf{S}$ and $\hat{\mathbf{S}}$ and is well defined for sets of trajectories with different cardinality.

3 The Inference

In the Bayesian framework the optimal set of trajectories $\mathbf{S}^*$ minimizes the expected risk $\mathbb {E}_{p_\theta }\left[ L(\mathbf{S},\hat{\mathbf{S}})\right] $ with respect to $\hat{\mathbf{S}}$. If we substitute the proposed loss function (3) and discard terms, that cannot influence the optimal solution, we end up with the following optimization task:

$$\begin{aligned} \mathbf{S}^*=\mathop {\text {argmin}}\limits _{\hat{\mathbf{S}}}\sum _{a\in TR(D)}\left( \delta _{\hat{\mathbf{S}}}(a)^2-2\delta _{\hat{\mathbf{S}}}(a)\mathbb {E}_{p_\theta }\left[ \delta _\mathbf{S}(a)\right] \right) . \end{aligned}$$

(4)

This is a difficult energy minimization problem that in addition requires to calculate for each tracklet the expected distance to the nearest trajectory which is also non-trivial. We estimate $\mathbb {E}_{p_\theta }\left[ \delta _\mathbf{S}(a)\right] $ using an MCMC sampling method and minimize the risk by a greedy algorithm. Both approaches are described in the following subsections.

3.1 Model Sampling

Our probabilistic model (1) is in fact a fully connected Conditional Random Field, where each variable represents a single trajectory. Usual procedure for Gibbs sampling is to iteratively simulate from conditional distributions of single variables, i.e. hidden Markov chains in our case. Although this can be done by a dynamic programming algorithm [10], its time complexity $O(n|D|^2)$ makes it prohibitively slow for our purposes^{Footnote 1}. Instead, we use Gibbs sampling in a more restricted way by sampling single components of the trajectory sequence (i.e. single position or special label), which is tractable and enjoys the same asymptotic properties.

However, in our case Gibbs sampling (both full and restricted) suffers from poor mixing, so in addition we use the following Metropolis-Hastings scheme:

1.
Take two trajectories and some frame t
2.
Cut both trajectories in frame t and swap the sequences for subsequent frames
3.
Accept the new labelling with probability $\min \left\{ 1,\frac{p_\theta (\mathbf{S'}|U)}{p_\theta (\mathbf{S}|U)}\right\} $, where $\mathbf{S'}$ denotes the set of trajectories after the cut-swap procedure was performed.

The resulting MCMC algorithm consists of iteratively repeating the restricted Gibbs and cut-swap schemes. The expected values $\mathbb {E}_{p_\theta }\left[ \delta _\mathbf{S}(a)\right] $ are estimated by averaging over the generated samples. Calculation of $\delta _\mathbf{S}(a)$ for a given sample $\mathbf{S}$ is straightforward and for many commonly used distance functions it can be done in O(|TR(D)|) [6].

3.2 Risk Minimization

We minimize the risk (4) by a greedy algorithm (Algorithm 1), which iteratively adds new trajectories and extends existing ones. The procedure $Extend(\mathbf{S^*},\mathbb {E}_{p_\theta }\left[ \delta _\mathbf{S}(a)\right] ,t)$ iterates over all the trajectories in $\mathbf{S^*}$, that reached the $(t-1)$-th frame and for each of them calculates the change of the risk for all possible extensions to the t-th frame. The extension with the largest decrease of the risk is selected and $\mathbf{S^*}$ is changed accordingly. The procedure stops, when no trajectory extension can decrease the risk. The procedure $Start(\mathbf{S^*},\mathbb {E}_{p_\theta }\left[ \delta _\mathbf{S}(a)\right] ,t)$ iteratively adds new trajectories starting in the t-th frame, such that adding each trajectory leads to the largest possible decrease of the risk. If adding a new trajectory would increase the risk, the algorithm proceeds to the next frame. This algorithm is easy to implement but it remains open whether some optimality guarantees could be proved or whether there is a different algorithm with better properties. The main focus of this paper is however on the model and the loss function itself so we postpone this issue to the future work.

4 Experimental Results

4.1 The Data

We test our method on artificial, but highly realistic data from the 2012 ISBI Particle Tracking Challenge [8]. It contains simulated videos of fluorescence microscopy images of vesicles in the cytoplasm, microtubule transport, membrane receptors and infecting viruses (3D) with three different densities (on average 100, 500 and 1000 particles per frame) and four different signal-to-noise ratios (SNR): 1, 2, 4 and 7 (see Fig. 2). Motion types correspond to the dynamics of real biological particles: Brownian for vesicles, directed (near constant velocity) for microtubules and random switching between these two for receptors and viruses. The particles may appear and disappear randomly on any position and any frame. The data contain ambiguities similar to those in real data, including noise, clutter, parallel trajectories, intersecting and visual merging and splitting.

4.2 Evaluation Objectives

To make our method comparable to the other contributions to the 2012 ISBI Particle Tracking Challenge, we use the most important evaluation objectives from the challenge: the true positive rate $\alpha (S^*,S_{GT})\in \left[ 0,1\right] $, the false positive rate $\beta (S^*,S_{GT})\in \left[ 0,\alpha (S^*,S_{GT})\right] $ and root mean square error (RMSE) of true positives. True positive rate takes value 1 if for every trajectory in the ground truth $S_{GT}$ there is a corresponding trajectory in $S^*$ with exactly the same properties and it is smaller than 1 if some trajectories are misplaced or missing ($\alpha (S^*,S_{GT})=0$ means that $S^*=\emptyset $). Spurious trajectories, which cannot be paired with any trajectory from $S_{GT}$, are taken into account by the second objective $\beta (S^*,S_{GT})$. It is equal to $\alpha (S^*,S_{GT})$ if there are none, whereas small $\beta (S^*,S_{GT})$ indicates lots of them. Instead of creating a single measure of quality, these objectives are used separately in the challenge thus a direct comparison of two trackers is possible only if one outperforms the other in both of them. We refer the reader to [8] for their detailed definition and description of other auxiliary objectives used in the challenge.

4.3 Tracking Vesicles

We demonstrate the performance of our method on sequences of low density vesicles for all four levels of SNR. Each sequence consists of 100 images ($512\times 512$ px) and contains around 500 physical particles. Vesicles appear as brighter blobs in the images but due to the noise the pixels values are not reliable detection scores. Instead we enhanced the contrast and reduced the noise by the following filter:

1.
Convolve the input images with $3\times 3$ Gaussian kernel ($\sigma =1$) to filter out the noise
2.
Choose two thresholds $0<t_1<t_2<1$, replace all the pixel values lower than $t_1$ by $t_1$ and all the pixel values higher than $t_2$ by $t_2$
3.
Normalize the pixel values to interval [0, 1] and use them as detection scores

The appearance model was a simple linear function of the detection scores and we used Brownian motion assumption. In all four experiments we used the same model parameters and only the detector thresholds $t_1$ and $t_2$ were tuned using the first image of the sequence. Maximum velocity of the particles was bounded to 15 pixels per frame.

As shown in Fig. 4, for SNR 2, 4 and 7 our method outperforms all the methods from the challenge in objectives $\alpha $ and $\beta $ and it is highly competitive in RMSE. Similarly to the other methods, it performs poorly on the data with SNR 1. This is caused by our oversimplified detector, which fails in that case beyond the model’s ability to recover. We believe that better results could be achieved by incorporating the tracker and an improved detector into a joint model.

5 Conclusion and Future Work

In this paper we have proposed a novel probabilistic model and loss function for multiple object tracking. The model can recover from detection errors and allows to model interactions between the objects and therefore does not need heuristics like non-maximum suppression. We demonstrate the performance of the method on data from the 2012 ISBI particle tracking challenge and show, that it outperforms state-of-the-art methods in most cases. In the future work we will incorporate the detection and tracking into a joint model which will allow the tracker to recover from detection errors even on larger scale. We will also focus on automated parameter learning as well as on better detector and inference algorithm.

Notes

1.
For example for images with $512\times 512$ pixels $|D|^2=2^{36}$.

References

Aeschliman, C., Park, J., Kak, A.C.: A probabilistic framework for joint segmentation and tracking. In: CVPR, pp. 1371–1378. IEEE (2010)
Google Scholar
Berclaz, J., Fleuret, F., Turetken, E., Fua, P.: Multiple object tracking using k-shortest paths optimization. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1806–1819 (2011)
Article Google Scholar
Chenouard, N., Bloch, I., Olivo-Marin, J.: Multiple hypothesis tracking for cluttered biological image sequences. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2736–2750 (2013)
Article Google Scholar
Godinez, W.J., Rohr, K.: Tracking multiple particles in fluorescence time-lapse microscopy images via probabilistic data association. IEEE Trans. Med. Imaging 34(2), 415–432 (2015)
Article Google Scholar
Luo, W., Zhao, X., Kim, T.: Multiple object tracking: a review. CoRR abs/1409.7618 (2014)
Google Scholar
Meijster, A., Roerdink, J., Hesselink, W.H.: A general algorithm for computing distance transforms in linear time. In: Goutsias, J., Vincent, L., Bloomberg, D.S. (eds.) Mathematical Morphology and its Applications to Image and Signal Processing, pp. 331–340. Springer, Boston (2000)
Google Scholar
Nillius, P., Sullivan, J., Carlsson, S.: Multi-target tracking - linking identities using bayesian network inference. In: CVPR, vol. 2, pp. 2187–2194. IEEE Computer Society (2006)
Google Scholar
Olivo-Marin, J.C., Meijering, E.: Objective comparison of particle tracking methods. Nat. Methods 11(3), 281–289 (2014)
Article Google Scholar
Pellegrini, S., Ess, A., Gool, L.: Improving data association by joint modeling of pedestrian trajectories and groupings. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 452–465. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15549-9_33
Chapter Google Scholar
Rao, V., Teh, Y.W.: Fast MCMC sampling for Markov jump processes and extensions. J. Mach. Learn. Res. 14, 3207–3232 (2013). arXiv:1208.4818
MathSciNet MATH Google Scholar
Rue, H., Syversveen, A.R.: Bayesian object recognition with baddeley’s delta loss. Adv. Appl. Probab. 30(1), 64–84 (1998)
Article MathSciNet MATH Google Scholar
Türetken, E., Wang, X., Becker, C.J., Fua, P.: Detecting and tracking cells using network flow programming. CoRR abs/1501.05499 (2015)
Google Scholar
Wang, C., de La Gorce, M., Paragios, N.: Segmentation, ordering and multi-object tracking using graphical models. In: ICCV, pp. 747–754. IEEE (2009)
Google Scholar
Wu, Z., Thangali, A., Sclaroff, S., Betke, M.: Coupling detection and data association for multiple object tracking. In: CVPR, pp. 1948–1955. IEEE (2012)
Google Scholar
Yang, B., Huang, C., Nevatia, R.: Learning affinities and dependencies for multi-target tracking using a crf model. In: CVPR, pp. 1233–1240. IEEE (2011)
Google Scholar
Zhang, L., Li, Y., Nevatia, R.: Global data association for multi-object tracking using network flows. In: CVPR. IEEE Computer Society (2008)
Google Scholar

Download references

Acknowledgements

This work has been supported by the Grant Agency of the CTU Prague under Project SGS15/156/OHK3/2T/13 and by the Czech Science Foundation under Project 16-05872S.

Author information

Authors and Affiliations

Department of Cybernetics, Faculty of Electrical Engineering, Center for Machine Perception, Czech Technical University in Prague, Prague, Czech Republic
Tomáš Sixta & Boris Flach

Authors

Tomáš Sixta
View author publications
You can also search for this author in PubMed Google Scholar
Boris Flach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tomáš Sixta .

Editor information

Editors and Affiliations

University College London , London, United Kingdom
Sebastien Ourselin
The Hebrew University of Jerusalem , Jerusalem, Israel
Leo Joskowicz
Harvard Medical School , Boston, Massachusetts, USA
Mert R. Sabuncu
Istanbul Technical University , Istanbul, Turkey
Gozde Unal
Harvard Medical School and Brigham and Women's Hospital, Boston, Massachusetts, USA
William Wells

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sixta, T., Flach, B. (2016). Multiple Object Segmentation and Tracking by Bayes Risk Minimization. In: Ourselin, S., Joskowicz, L., Sabuncu, M., Unal, G., Wells, W. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science(), vol 9901. Springer, Cham. https://doi.org/10.1007/978-3-319-46723-8_70

Download citation

DOI: https://doi.org/10.1007/978-3-319-46723-8_70
Published: 02 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46722-1
Online ISBN: 978-3-319-46723-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)