Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Cell growth and migration play key roles in cancer progression: abnormal cell growth can lead to formation of tumors and cancer cells can spread to other parts of the body, a process known as metastasis. In vitro experiments are essential to understand these mechanisms and for developing anti-cancer drugs. In these experiments, the cells are typically observed with conventional light microscopes. Thanks to recent advances in CMOS sensor technology, lens-free microscopy (LFM) [4, 13] has become a promising alternative. In LFM a part of the incident wavefront originating from the light source is scattered by the sample, in this case the cell. The scattered light then interferes with the unscattered part of the wavefront and the resulting interference pattern is recorded with a CMOS sensor. The components required for LFM are extremely small and cheap. Thus, LFM provides the means for a wide range of applications where a conventional light microscope would be either too big or simply too expensive, such as the continuous monitoring of growing cell cultures inside standard incubators [9].

Fig. 1.
figure 1

The cell lineage tracing problem with LFM data. We aim to detect all cells and establish their relation over time, i.e. determine the lineage forest. While the LFM technology allows for frequent image acquisition (3 min/frame in this case), challenges arise due to overlapping interference patterns of close objects, fluctuating shape and size of the cells appearance, and particles that generate similar patterns as the cells. The detail views show cell locations as a circle and identify their lineage tree.

To quantify the clinically relevant information on cell growth and migration from the large amount of images that are acquired in such continuous monitoring, reliable automatic image analysis methods are crucial. Counting the number of cells in a time series of images gives access to the dynamics of cell growth. Locating and tracing individual cells provides information about cell motility, and over the course of a sequence, reconstructing the lineage trees gives insights into cell cycle timings and allows more selective analysis of cell sub-cultures.

There are several works on these tasks in traditional light microscopy, e.g. focussing on cell segmentation [15], detection and counting [8, 10, 16] or tracking [1, 6, 7, 12, 14], but very few deal with LFM data. One of the few exceptions is [3] which employs a regression framework for estimating the total cell count per image. We aim at the more complex goal of not only counting but also localizing cells and reconstructing their spatio-temporal lineage forest (c.f. Fig. 1). Methods for the latter task range from Kalman filtering [1] to keep track of moving cells, or iteratively composing tracklets by using the Viterbi algorithm [11], and have been compared in [12]. More recently, Jug et al. [7] have proposed a mathematically rigorous framework for lineage reconstruction, the so-called moral lineage tracing problem (MLTP). The MLTP differs fundamentally from all mathematical abstractions of cell tracking whose feasible solutions are either disjoint paths or disjoint trees of detections. Unlike these approaches that select only one detection for each cell in every image, feasible solutions of the MLTP select and cluster an arbitrary set of such detections for each cell. This renders the lineage trees defined by feasible solutions of the MLTP robust to the addition of redundant detections, a property we will exploit in this work.

In this paper, we contribute a framework for analysis of LFM sequences. First, we design and benchmark robust cell detectors for LFM time-lapse sequences derived from most recent work on convolutional neural networks and residual learning. Second, we discuss the MLTP in the context of LFM data. In particular, we define a probability measure for which the MLTP is a maximum a posteriori (MAP) estimator. This allows us to define the costs in the objective function of the MLTP w.r.t. probabilities that we estimate from image data. We validate it experimentally on two annotated sequences. Finally, we demonstrate the capability of our approach to quantify biologically relevant parameters from sequences of two in vitro experiments with skin cancer cells.

2 Methods

We consider the lineage tracing task as a MAP inference over a hypothesis graph containing a multitude of potential lineage forests. We discuss the probability measure and its MAP estimator, the MLTP in Sect. 2.1. In order to construct the hypothesis graph from a sequence of LFM images, we devise a cell detector in Sect. 2.2, which estimates a cell probability map for each given image. The workflow is illustrated in Fig. 2.

Fig. 2.
figure 2

Illustration of our workflow. From left to right: (1) Raw microscopy image, (2) image overlayed with cell probability map generated by the detector, (3) nodes of the hypothesis graph with spatial edges constructed from cell probabilities, (4) optimized lineage where spatial edges that were cut are removed, and (5) each cluster is represented as one cell with its lineage tree identifier. Temporal edges are not depicted for simplicity.

2.1 Lineage Tracing

Hypothesis Graph. We construct a spatio-temporal hypothesis graph \(G= (V, E)\) as follows: For every image \(I_t\) in the sequence, we apply a cell detector and define one node \(v \in V_t\) for every local maximum in \(P\left( c_s=1\vert I_t\right) \), the estimated probability map for finding a cell at a particular location s in image \(I_t\). Additionally, we define hypothesized successors to each node that has one or more favourable parents in the previous frame but no immediate successor. This helps avoiding gaps in the final tracklets. The nodes \(v \in V\) represent cells, yet do not need to be unique, i.e. one cell may give rise to several nodes. We then construct edges in space \(E_t^{\mathrm {sp}} = \{ uv \in V_t \times V_t : d(u,v) \le d_{\mathrm {max}}\}\), i.e. between any two nodes that lie within a distance of \(d_{\mathrm {max}}\), and in the same fashion, we construct temporal edges \(E_t^{\mathrm {tmp}} = \{ uv \in V_t \times V_{t+1} : d(u,v) \le d_{\mathrm {max}}\}\) between nodes in adjacent frames.

Probabilistic Model. We introduce a family of probability measures, each defining a conditional probability of any lineage forest, given an image sequence. We describe the learning of this probability from a training set of annotated image sequences as well as the inference of a maximally probable lineage forest, given a previously unseen image sequence. The resulting MAP estimation problem will assume the form of an MLTP with probabilistically justified costs.

First, we encode subgraphs in terms of cut edges with binary indicator variables \(\mathbf {x}\in \{ 0, 1\}^{ E }\). If edge uv is cut, i.e. \(x_{uv}=1\), it means that nodes u and v do not belong together. In order to ensure that the solution describes a lineage forest, we rely on the formulation of the MLTP [7], which describes the set of inequalities that are required to do so. In short, these constraints ensure: (1) spatial and temporal consistency, i.e. if nodes u and v as well as v and w belong together, then u and w must also belong together. (2) Distinct tracklets cannot merge at a later point in time. These are the so called morality constraints. (3) Bifurcation constraints allow cells to split in no more than two distinct successors. We will denote the set of \(\mathbf {x}\) that describe valid lineage forests with \(X_G\). For a more extensive discussion of these constraints, we refer to [7, 14]. We next model the measure of probability:

$$\begin{aligned} P\left( \mathbf {x}\vert X_G, \mathbf {\Theta }\right)&\propto P\left( X_G\vert \mathbf {x}\right) \prod _{uv \in E} P\left( x_{uv}\vert \mathbf {\Theta }\right) \prod _{v\in V} P\left( x_v^+\vert \mathbf {\Theta }\right) \prod _{v\in V} P\left( x_v^-\vert \mathbf {\Theta }\right) , \end{aligned}$$
(1)
$$\begin{aligned} \text {where } P\left( X_G\vert \mathbf {x}\right)&\propto {\left\{ \begin{array}{ll} 1 &{}\text {if } \mathbf {x}\in X_G, \\ 0 &{} \mathrm {otherwise} \end{array}\right. } . \end{aligned}$$
(2)

It is comprised of four parts. First, we have \(P\left( X_G\vert \mathbf {x}\right) \) representing the uniform prior over \(X_G\). Second, the cut probability \(P\left( x_{uv}\vert \mathbf {\Theta }\right) \) describing the probability of u and v being part of the same cell (either in space or along time), and third and fourth, the birth and termination probabilities \(P\left( x_v^+\vert \mathbf {\Theta }\right) \) and \(P\left( x_v^-\vert \mathbf {\Theta }\right) \) for each node \(v\in V\). The variables \(x_v^+, x_v^- \in \{0, 1\}\) are indicating whether the respective event, birth or termination, occurs at node v. \(\mathbf {\Theta }\) denotes the joint set of parameters. We use these parts to incorporate the following assumptions: Two detections u and v that are close are more likely to originate from the same cell, hence we choose \(P\left( x_{uv}=1\vert \mathbf {\Theta }\right) = \min (\frac{d(u,v)}{\theta ^{\mathrm {sp}}}, 1)\). Similarly, two successive detections u at t and v at \(t+1\) are more likely to be related the closer they are, is captured by \(P\left( x_{uv}=1\vert \mathbf {\Theta }\right) = \min (\frac{d(u,v)}{\theta ^{\mathrm {tmp}}}, 1)\). Finally, we assume that birth and termination events occur at a low rate, which is incorporated by \(P\left( x_v^+=1\vert \mathbf {\Theta }\right) = \theta ^+\) and \(P\left( x_v^-=1\vert \mathbf {\Theta }\right) = \theta ^-\). We fit these parameters \(\mathbf {\Theta }\) on training data in a maximum likelihood fashion: For \(\theta ^{-}\) and \(\theta ^{+}\) this boils down to calculating the relative frequency of the respective events on the annotated lineage. For the spatial and temporal parameters \(\theta ^{\mathrm {sp}}\) and \(\theta ^{\mathrm {tmp}}\), we first complement the lineage forest with edges within \(d_{\max }\) as \(\mathcal {E}\). We then maximize the log-likelihood \(\log \mathcal {L}(\theta ) = \sum _{uv \in \mathcal {E}} \log P\left( x_{uv}\vert \theta \right) \) by an extensive search over the interval \(\theta \in \left[ \theta _{\min }, \theta _{\max }\right] \), where we found \(\left[ 1, 80\right] \) to be appropriate.

The MAP estimate \(\mathbf {x}^*= \mathop {\arg \,\mathrm{max}}\nolimits _{\mathbf {x}\in \mathbf {X}} P\left( \mathbf {x}\vert \mathbf {\Theta }, X_G\right) \) can be written as solution to the MLTP:

$$\begin{aligned} \min \bigg \{ \sum _{uv \in E} c_{uv} x_{uv} + \sum _{v \in V} c_v^+ x_v^+ + \sum _{v \in V} c_v^- x_v^- \bigg \vert \mathbf {x}\in X_G\cap X_V \bigg \} , \end{aligned}$$
(3)

where the coefficients become \(c_{uv} = -\log \frac{P\left( x_{uv}=1\vert \mathbf {\Theta }\right) }{1-P\left( x_{uv}=1\vert \mathbf {\Theta }\right) }\) for edges, and vice versa for \(c_v\) of the node events. \(X_V\) is the set of \(\mathbf {x}\) that satisfy the auxiliary constraints which tie birth and termination indicator variables \(x_v^-\) and \(x_v^+\) to the respective edge variables. We optimize (3) with the KLB algorithm described in [14].

2.2 Cell Detection with Residual Networks

Cells in LFM images are usually only marked at their center of mass and not segmented since their interference pattern, i.e. their appearance in the image, does not accurately describe their true shape and would therefore be ambiguous in many cases. Thus, we are interested in a detector that outputs the set of cell centers in image \(I_t\). Strong performance of the detector is crucial for the lineage reconstruction as its errors can affect the final lineage trees over many frames. To achieve this, we build on the recent work on residual networks [5]. However, instead of directly regressing bounding boxes or center coordinates in a sliding window fashion, we train our network, denoted with \(f(I_t)\), on a surrogate task: We approximate \(f(I_t) \approx P\left( c_s = 1\vert I_t\right) \), the probablity map of finding a cell at a particular location s in \(I_t\). This detector is fully convolutional and its output \(f(I_t)\) has the same size as \(I_t\). We found this to facilitate the training as it enlargens the spatial support of the sparse cell center annotations and gracefully handles the strongly varying cell density. Similar findings were made with techniques that learn a distance transform to detect cells, e.g. in [8]. We describe next how we arrive at a suitable architecture for this task and how to construct \(P\left( c_s = 1\vert I_t\right) \) from point-wise cell annotations.

Network Architecture. We start from the architecture of ResNet-50 [5]. We first truncate the network at layer 24 to obtain a fully convolutional detector. We found that truncating in the middle of the original ResNet-50, i.e. at layer 24, results in best resolution of the output response maps and allows to distinguish close cells. We then add one convolutional layer of \(1 \times 1 \times 256\) and one up-convolutional layer (also known as deconvolutional layer) of \(8 \times 8 \times 1\) with a stride of 8. The former combines all feature channels, while the latter compensates for previous pooling operations and ensures that the predicted cell probability map has the same resolution as the input image \(I_t\). Finally, a sigmoid activation function is used in the last layer to ensure that \(f(I_t)\) is within the interval \(\left[ 0, 1 \right] \) at any point.

Loss Function and Training. We sample training images of size \(224 \times 224\) from all frames of the training corpus. For each training image \(I_k\), we construct a corresponding cell probability map \(P\left( c_s=1\vert I_k\right) \) by placing a Gaussian kernel \(G^\sigma \) with \(\sigma =8\) at each annotated center. This implicitly represents the assumption that all cells have about the same extent, which is reasonable for our microscopy data. Each image is normalized to zero mean and unit variance. During training, we minimize the cross entropy loss between the predicted map \(f(I_t)\) and \(P\left( c_s\vert I_t\right) \) in order to let our network approximate the constructed cell probability map. We fine tune the network (pre-trained weights from ResNet-50 [5]) with a learning rate of \(10^{-3}\) for 100 epochs with batch size of 8. In each epoch, we sample 4000 training images. Since the annotated dataset for training is typically small and shows strong correlation between cells in consecutive frames, we used dropout of 0.5 after the last convolutional layer to avoid overfitting.

3 Experiments and Results

Datasets. We use a sequence depicting A549 cells, annotated over 250 frames in a region of interest (ROI) of \(1295 \times 971\,\mathrm {px}\), for all training purposes. For testing, we annotated two distinct sequences monitoring 3T3 cells of 350 and 300 frames in a ROI of \(639 \times 511\,\mathrm {px}\) (3T3-I) and \(1051 \times 801\,\mathrm {px}\) (3T3-II), respectively. Images were acquired at an interval of \(3\,\mathrm {min}\) with \(1.4\,\mathrm {\upmu m} \times 1.4\,\mathrm {\upmu m}\) per pixel.

Benchmarking Detectors. We compare four different network configurations, including the described ResNet-23, ResNet-11, a variant of it which was truncated at layer 11, the UNet [15] and CNN-4. In UNet, we obtained better results when replacing the stacks in the expansive path with single up-convolution layers which are merged with the corresponding feature maps from contracting path. CNN-4 is a plain vanilla CNN with three \(5\times 5\) convolutional layers followed by max pooling and finally, one up-convolutional layer of \(8 \times 8 \times 1\) to compensate for the down-sampling operations. We use the same training procedure (Sect. 2.2) for all detectors, but adjust the learning rate for UNet and CNN-4 to \(10^{-2}\).

We match annotated cells to detections within each frame with the hungarian algorithm and consider only matches closer than \(10\,\mathrm {px}\) (\(\approx \) a cell center region) as a true positive (TP). Unmatched annotations are counted as false negative (FN), unmatched detections as false positive (FP). The results are presented in Fig. 3, where we find the ResNet-23 to be the most robust detector.

Fig. 3.
figure 3

Performance of different detectors over all test frames. Boxplots depict median as orange line, mean as black square and outliers as grey +. The F1 scores are shown in %. We find that ResNet-23 is the most robust detector in our experiment with an average F1 of 94.1%. It is followed by the UNet with 89.2%, ResNet-11 with 85.1% and finally, CNN-4 with only 72.2%.

Lineage Tracing. To compare the quality of different lineages, we match again annotations and detections within each frame to calculate the number of TP, FP and FN as described before. We then determine the number of false links, i.e. how often two matched nodes do not have the same parent. From these, we calculate multiple object detection accuracy (MODA) and multiple object tracking accuracy (MOTA) [2]. Moreover, we derive the number of edit operations needed to get from the predicted lineage to the ground truth lineage, and calculate the tracking accuracy (TRA) score proposed in [12]. We use unit weight for each type of edit (add or delete node or edge). This is justified by the fact that we have point annotations for cells instead of segmentations, making both addition and deletion equally expensive to correct.

For the MLTP, we compare the effect of varying \(\theta ^{\mathrm {tmp}}\), \(\theta ^{\mathrm {sp}}\) together with hypothesis graphs generated from the different detectors in Fig. 4. The optimal parameter choice for ResNet-23 is at 10, i.e. a relatively small merge radius of favourable merges, while the other detectors considerably benefit from wider ranges. In Table 1, we compare different lineage tracing approaches. Our baseline is linear assignment problem tracking (LAPT) [6]. The disjoint trees method (DTP), uses our ResNet-23 detections but solves the disjoint trees problem instead, i.e. it considers only one detection per cell. We find that MLTP outperforms both in terms of detection and tracking metrics.

Fig. 4.
figure 4

Sensitivity analysis of the lineage tracing model with different detectors. We increase both edge cut parameters \(\theta ^{\mathrm {tmp}}\) and \(\theta ^{\mathrm {sp}}\) together. While the optimal choice in combination with ResNet-23 is relatively small, i.e. at 10, the other detectors, which suffer from many spurious detections, benefit from a wider range. Most notably, the performance with CNN-4 improves up to a competitive TRA of \(84.8\%\).

Table 1. Quantitative evaluation of traced lineages. Precision, recall, F1 and MODA are averaged over all frames of a sequence, while MOTA and TRA are global scores for the entire lineage forest. All measures are in \(\%\). Disjoint trees (DTP) uses our ResNet-23 detections and is equivalent to MLTP with disabled spatial edges and no hypothesized successors. LAPT is linear assignment problem tracking [6] and our baseline.

Assessing Cell Population Dynamics. We apply our method on data from two experiments with skin cancer cells. In each, one population is exposed to an inhibitor substance while the control is not. Figure 5 depicts the resulting statistics. We observe the expected difference in growth rate, yet a more constrained motility of the control cells, which is caused by the limited space.

Fig. 5.
figure 5

Cell dynamics measured on two experiments with skin cancer cell lines. One population (blue) is exposed to an inhibitor substance, while the other (orange) is not. From left to right: Cell count over time, histograms on cell motility (\(\upmu \mathrm {m} / 3\,\mathrm {min}\)) and divisions \(r_{\mathrm {div}} / \mathrm {h}\). Cells that divide often are more abundant in the control group.

4 Conclusions

We have presented a framework for automatic analysis of LFM time-lapse sequences. It transfers two recently proposed methods, residual learning [5] and moral lineage tracing [7, 14], to the task at hand. We have shown experimentally that it is able to determine cell lineage forests of high quality and thereby quantify several measures of interest for the analysis of in vitro experiments, including cell population dynamics and motility.