1 Introduction

Multiple object tracking (MOT) is an important tool for understanding various processes in living tissues by analyzing the dynamics and interactions of cells and subcellular particles. Tracking biological cells is challenging, because they may appear and disappear at any moment, reproduce or merge (e.g. macrophage engulfing a pathogen) and often cannot be visually distinguished from each other. Existing techniques for MOT (see [5] for an excellent review) often consist of two separate modules: a detector and a tracker. The reliability of the detector is crucial for their success, because without feedback from the tracker it is impossible to recover from detection errors. Unfortunately, current visualization techniques (e.g. fluorescence microscopy) are prone to noise, which motivates the development of more elaborate tracking methods.

Network flow based methods formulate the MOT as a min-cost flow problem in a network, which is constructed using the detector output and local motion information. A typical example is [16]. A similar formulation is adopted in [2] and solved by the k-shortest paths algorithm. Integer linear programming (ILP) can be used to handle aspects of MOT, which are difficult to model by a flow-based formulation (object-object interactions). The ILP framework allows to express detection and tracking by a single objective [3, 14], which avoids the problem of propagating detection errors. This additional power however comes at a cost, because ILP is NP-hard in general and can be solved only approximately by an LP relaxation [12].

A natural way to deal with the uncertainty of the detector is to use a probabilistic model. The model in [4] employs interacting Kalman filters with several candidate detections. Many works start by assuming that a sparse tracking graph has been already created and can be used as an input. Nodes of the graph then correspond either to the detection candidates [7] or short tracklets [15] and potential functions are used to model the shape prior, particle dynamics, interactions among the objects or their membership in social groups [9]. Node labels may carry information about presence or absence of an object, its position and/or about its depth [13] to resolve occlusions. In [1] the detection and data association tasks are solved jointly in a single probabilistic framework. Their model however requires the number of objects to be known a priori, which severely restricts its applicability to tracking biological particles.

The optimal output of a probabilistic tracker is determined by the loss function. Despite its importance it receives surprisingly little attention in the literature. In fact the zero-one loss has become a de facto standard and although it often leads to acceptable results, we believe that better loss could lead to more robust and accurate trackers.

In this paper we propose a novel probabilistic model for MOT, which can deal with unknown and changing number of objects, model their interactions and recover from detection errors. The tracking task is formulated as minimization of a Bayes risk with a loss function specifically designed for this purpose. The optimal set of trajectories is found by a greedy optimization algorithm.

2 Multiple Object Tracking

2.1 The Model

We assume that we are given a sequence of n images on domain \(D\subset \mathbb {Z}^2\) and corresponding measurements \(U=\{U_1, U_2,...,U_{n}\}\). Each \(U_t\) contains a detection score for each pixel – a number from interval [0, 1]. They are however not final decisions – \(U_t(i)=1\) does not necessarily imply the presence of an object at pixel i and \(U_t(i)=0\) does not exclude an object in that position. To account for the noise in the images and resulting uncertainty of detection scores we model a conditional distribution of a set of trajectories \(p_\theta (\mathbf{S}|U)\).

The basic building block of the model is a trajectory represented as a sequence of n discrete positions of a single particle. A special label is used to indicate, that the particle is not in the scene. This allows to deal with variable and unknown number of particles, as the only requirement for the model is to have at least as many trajectories as particles in the scene. The probability of a set of trajectories \(\mathbf S\) given measurements U is defined by a graphical model, which involves terms for individual trajectories, interaction potentials and hard constraints, which ensure, that the set of trajectories is physically possible:

$$\begin{aligned} p_\theta (\mathbf{S}|U)=\frac{\exp \left( \sum _{s \in \mathbf{S}}\phi _\theta (s,U)+\sum _{s,s' \in \mathbf{S}}\psi _\theta (s,s')+\sum _{s \in \mathbf{S}}\kappa _\theta (s)\right) }{Z(\theta )}, \end{aligned}$$
(1)

where \(Z(\theta )\) is the normalization constant. The single trajectory potentials \(\phi _\theta (s,U)\) are sums of contributions of the appearance model, the motion model (proportional to the distance between expected and real position of the particle) and penalties for trajectories, that enter the scene (this prevents the tracker to model a single particle by several short trajectories). The interaction potentials \(\psi _\theta (s,s')\) prevent two trajectories from getting too close to each other, which eliminates the need for rather heuristic non-maximum suppression. The hard constraints \(\kappa _\theta (s)\) are \(0/-\infty \) valued functions used mainly to ensure, that no particle exceeds some predefined maximum velocity.

Fig. 1.
figure 1

(a) Distance transform of 2D binary images with truncated L\(_1\) norm (\(\epsilon =2\)). (b) Two sets of trajectories with identical single frame distance transforms. (c) Set of sites (tracklets) for two consecutive frames. Drawn with thicker line are trajectories in \(\mathbf{S}\).

2.2 The Loss Function

In the context of the Bayes risk minimization a good loss function should reflect the properties of the physical system. For example the commonly used zero-one loss is not a proper choice, because it cannot distinguish between solutions, which are only slightly off and completely wrong. Another common choice, the additive L\(_2\) loss, is also inappropriate, because it is unclear how to evaluate the loss for two sets of trajectories with different cardinality. The loss function we propose is based on the Baddeley’s Delta Loss, which was originally developed for binary images [11]. Let I be a binary image on discrete domain \(D\subset \mathbb {Z}^2\) and \(\delta _I\) its distance transform with respect to a truncated pixel distance \(\min \{dist(i,j),\epsilon \}\) (Fig. 1a). The value of the Baddeley’s loss for two binary images I and \(\hat{I}\) is the squared Euclidean distance of \(\delta _I\) and \(\delta _{\hat{I}}\):

$$\begin{aligned} B(I,\hat{I})=\sum _{i\in D} \left( \delta _I(i)-\delta _{\hat{I}}(i)\right) ^2. \end{aligned}$$
(2)

To make the loss applicable for the tracking task we have to define an appropriate set of sites. The simplest option is to take pixels of the input images and foreground sites would be positions where the trajectories intersect the frames. The resulting loss has however a serious flaw because it cannot distinguish between parallel and crossing trajectories (Fig. 1b). To solve this issue we also consider all conceivable tracklets of length two and the foreground sites are those which are part of some trajectory (Fig. 1c). The distance between two tracklets is sum of distances of their positions in individual frames. Because of the maximum velocity constraint we often have to consider only a subset of the tracklets.

Our loss function is then defined as Baddeley’s loss over all tracklets of length one (pixels) and two (we denote them as TR(D)):

$$\begin{aligned} L(\mathbf{S},\hat{\mathbf{S}})= \sum _{a\in TR(D)} \left( \delta _\mathbf{S}(a)-\delta _{\hat{\mathbf{S}}}(a)\right) ^2. \end{aligned}$$
(3)

This loss function has several appealing properties. It is permutation invariant, does not require to artificially match trajectories from \(\mathbf{S}\) and \(\hat{\mathbf{S}}\) and is well defined for sets of trajectories with different cardinality.

3 The Inference

In the Bayesian framework the optimal set of trajectories \(\mathbf{S}^*\) minimizes the expected risk \(\mathbb {E}_{p_\theta }\left[ L(\mathbf{S},\hat{\mathbf{S}})\right] \) with respect to \(\hat{\mathbf{S}}\). If we substitute the proposed loss function (3) and discard terms, that cannot influence the optimal solution, we end up with the following optimization task:

$$\begin{aligned} \mathbf{S}^*=\mathop {\text {argmin}}\limits _{\hat{\mathbf{S}}}\sum _{a\in TR(D)}\left( \delta _{\hat{\mathbf{S}}}(a)^2-2\delta _{\hat{\mathbf{S}}}(a)\mathbb {E}_{p_\theta }\left[ \delta _\mathbf{S}(a)\right] \right) . \end{aligned}$$
(4)

This is a difficult energy minimization problem that in addition requires to calculate for each tracklet the expected distance to the nearest trajectory which is also non-trivial. We estimate \(\mathbb {E}_{p_\theta }\left[ \delta _\mathbf{S}(a)\right] \) using an MCMC sampling method and minimize the risk by a greedy algorithm. Both approaches are described in the following subsections.

3.1 Model Sampling

Our probabilistic model (1) is in fact a fully connected Conditional Random Field, where each variable represents a single trajectory. Usual procedure for Gibbs sampling is to iteratively simulate from conditional distributions of single variables, i.e. hidden Markov chains in our case. Although this can be done by a dynamic programming algorithm [10], its time complexity \(O(n|D|^2)\) makes it prohibitively slow for our purposesFootnote 1. Instead, we use Gibbs sampling in a more restricted way by sampling single components of the trajectory sequence (i.e. single position or special label), which is tractable and enjoys the same asymptotic properties.

However, in our case Gibbs sampling (both full and restricted) suffers from poor mixing, so in addition we use the following Metropolis-Hastings scheme:

  1. 1.

    Take two trajectories and some frame t

  2. 2.

    Cut both trajectories in frame t and swap the sequences for subsequent frames

  3. 3.

    Accept the new labelling with probability \(\min \left\{ 1,\frac{p_\theta (\mathbf{S'}|U)}{p_\theta (\mathbf{S}|U)}\right\} \), where \(\mathbf{S'}\) denotes the set of trajectories after the cut-swap procedure was performed.

The resulting MCMC algorithm consists of iteratively repeating the restricted Gibbs and cut-swap schemes. The expected values \(\mathbb {E}_{p_\theta }\left[ \delta _\mathbf{S}(a)\right] \) are estimated by averaging over the generated samples. Calculation of \(\delta _\mathbf{S}(a)\) for a given sample \(\mathbf{S}\) is straightforward and for many commonly used distance functions it can be done in O(|TR(D)|) [6].

3.2 Risk Minimization

We minimize the risk (4) by a greedy algorithm (Algorithm 1), which iteratively adds new trajectories and extends existing ones. The procedure \(Extend(\mathbf{S^*},\mathbb {E}_{p_\theta }\left[ \delta _\mathbf{S}(a)\right] ,t)\) iterates over all the trajectories in \(\mathbf{S^*}\), that reached the \((t-1)\)-th frame and for each of them calculates the change of the risk for all possible extensions to the t-th frame. The extension with the largest decrease of the risk is selected and \(\mathbf{S^*}\) is changed accordingly. The procedure stops, when no trajectory extension can decrease the risk. The procedure \(Start(\mathbf{S^*},\mathbb {E}_{p_\theta }\left[ \delta _\mathbf{S}(a)\right] ,t)\) iteratively adds new trajectories starting in the t-th frame, such that adding each trajectory leads to the largest possible decrease of the risk. If adding a new trajectory would increase the risk, the algorithm proceeds to the next frame. This algorithm is easy to implement but it remains open whether some optimality guarantees could be proved or whether there is a different algorithm with better properties. The main focus of this paper is however on the model and the loss function itself so we postpone this issue to the future work.

figure a

4 Experimental Results

4.1 The Data

We test our method on artificial, but highly realistic data from the 2012 ISBI Particle Tracking Challenge [8]. It contains simulated videos of fluorescence microscopy images of vesicles in the cytoplasm, microtubule transport, membrane receptors and infecting viruses (3D) with three different densities (on average 100, 500 and 1000 particles per frame) and four different signal-to-noise ratios (SNR): 1, 2, 4 and 7 (see Fig. 2). Motion types correspond to the dynamics of real biological particles: Brownian for vesicles, directed (near constant velocity) for microtubules and random switching between these two for receptors and viruses. The particles may appear and disappear randomly on any position and any frame. The data contain ambiguities similar to those in real data, including noise, clutter, parallel trajectories, intersecting and visual merging and splitting.

Fig. 2.
figure 2

2012 ISBI Particle Tracking Challenge Data. (a) Four biological scenarios were simulated (from left to right): vesicles, microtubules, receptors and viruses. (b) Each scenario contains images with four different SNR (from left to right, illustrated on images of vesicles): 1, 2, 4 and 7. (c) Images with three different particle densities are available for each SNR level. For the sake of clarity only \(150\times 150\) px segments of the original \(512\times 512\) px images are shown.

4.2 Evaluation Objectives

To make our method comparable to the other contributions to the 2012 ISBI Particle Tracking Challenge, we use the most important evaluation objectives from the challenge: the true positive rate \(\alpha (S^*,S_{GT})\in \left[ 0,1\right] \), the false positive rate \(\beta (S^*,S_{GT})\in \left[ 0,\alpha (S^*,S_{GT})\right] \) and root mean square error (RMSE) of true positives. True positive rate takes value 1 if for every trajectory in the ground truth \(S_{GT}\) there is a corresponding trajectory in \(S^*\) with exactly the same properties and it is smaller than 1 if some trajectories are misplaced or missing (\(\alpha (S^*,S_{GT})=0\) means that \(S^*=\emptyset \)). Spurious trajectories, which cannot be paired with any trajectory from \(S_{GT}\), are taken into account by the second objective \(\beta (S^*,S_{GT})\). It is equal to \(\alpha (S^*,S_{GT})\) if there are none, whereas small \(\beta (S^*,S_{GT})\) indicates lots of them. Instead of creating a single measure of quality, these objectives are used separately in the challenge thus a direct comparison of two trackers is possible only if one outperforms the other in both of them. We refer the reader to [8] for their detailed definition and description of other auxiliary objectives used in the challenge.

4.3 Tracking Vesicles

We demonstrate the performance of our method on sequences of low density vesicles for all four levels of SNR. Each sequence consists of 100 images (\(512\times 512\) px) and contains around 500 physical particles. Vesicles appear as brighter blobs in the images but due to the noise the pixels values are not reliable detection scores. Instead we enhanced the contrast and reduced the noise by the following filter:

  1. 1.

    Convolve the input images with \(3\times 3\) Gaussian kernel (\(\sigma =1\)) to filter out the noise

  2. 2.

    Choose two thresholds \(0<t_1<t_2<1\), replace all the pixel values lower than \(t_1\) by \(t_1\) and all the pixel values higher than \(t_2\) by \(t_2\)

  3. 3.

    Normalize the pixel values to interval [0, 1] and use them as detection scores

The appearance model was a simple linear function of the detection scores and we used Brownian motion assumption. In all four experiments we used the same model parameters and only the detector thresholds \(t_1\) and \(t_2\) were tuned using the first image of the sequence. Maximum velocity of the particles was bounded to 15 pixels per frame.

Fig. 3.
figure 3

Tracking results for the sequence of vesicles with SNR=7 (\(150\times 150\) px segments). Objects found by the tracker are marked by crosses and ground truth by circles.

Fig. 4.
figure 4

Objectives \(\alpha \), \(\beta \) and RMSE from the 2012 ISBI particle tracking challenge for tracking low-density vesicles (SNR 1, 2, 4, and 7). Competing methods are numbered according to [8]. Team 4 did not submit results for low density vesicles.

As shown in Fig. 4, for SNR 2, 4 and 7 our method outperforms all the methods from the challenge in objectives \(\alpha \) and \(\beta \) and it is highly competitive in RMSE. Similarly to the other methods, it performs poorly on the data with SNR 1. This is caused by our oversimplified detector, which fails in that case beyond the model’s ability to recover. We believe that better results could be achieved by incorporating the tracker and an improved detector into a joint model.

5 Conclusion and Future Work

In this paper we have proposed a novel probabilistic model and loss function for multiple object tracking. The model can recover from detection errors and allows to model interactions between the objects and therefore does not need heuristics like non-maximum suppression. We demonstrate the performance of the method on data from the 2012 ISBI particle tracking challenge and show, that it outperforms state-of-the-art methods in most cases. In the future work we will incorporate the detection and tracking into a joint model which will allow the tracker to recover from detection errors even on larger scale. We will also focus on automated parameter learning as well as on better detector and inference algorithm.