1 Introduction

Most image partitioning algorithms are defined over a graph encoding purely attractive interactions. No matter whether a segmentation or clustering is then found agglomeratively (as in single linkage clustering/watershed) or divisively (as in spectral clustering or iterated normalized cuts), the user either needs to specify the desired number of segments or a termination criterion. An even stronger form of supervision is in terms of seeds, where one pixel of each segment needs to be designated as such either by a user or automatically. Unfortunately, clustering with automated seed selection remains a fragile and error-fraught process, because every missed or hallucinated seed causes an under- or oversegmentation error. Although the learning of good edge detectors boosts the quality of classical seed selection strategies (such as finding local minima of the boundary map, or thresholding boundary maps), non-local effects of seed placement along with strong variability in region sizes and shapes make it hard for any learned predictor to place exactly one seed in every true region.

Fig. 1.
figure 1

Left: Overlay of raw data from the ISBI 2012 EM segmentation challenge and the edges for which attractive (green) or repulsive (red) interactions are estimated for each pixel using a CNN. Middle: vertical/horizontal repulsive interactions at intermediate/long range are shown in the top/bottom half. Right: Active mutual exclusion (mutex) constraints that the proposed algorithm invokes during the segmentation process. (Color figure online)

In contrast to the above class of algorithms, multicut/correlation clustering partitions vertices with both attractive and repulsive interactions encoded into the edges of a graph. Multicut has the great advantage that a “natural” partitioning of a graph can be found, without needing to specify a desired number of clusters, or a termination criterion, or one seed per region. Its great drawback is that its optimization is NP-hard.

The main insight of this paper is that when both attractive and repulsive interactions between pixels are available, then a generalization of the watershed algorithm can be devised that segments an image without the need for seeds or stopping criteria or thresholds. It examines all graph edges, attractive and repulsive, sorted by their weight and adds these to an active set iff they are not in conflict with previous, higher-priority, decisions. The attractive subset of the resulting active set is a forest, with one tree representing each segment. However, the active set can have loops involving more than one repulsive edge. See Fig. 1 for a visual abstract.

In summary, our principal contribution, the Mutex Watershed, is a “best of both worlds” algorithm that combines the multicut’s desirable lack of hyperparameters with the small computational footprint of Kruskal-type watershed algorithm.

The algorithm is presented in Sect. 3. In Sect. 4 we evaluate the algorithm against very strong baselines. We choose a challenging dataset for neuron segmentation from electron microscopy (EM) image stacks as benchmark. For this task, watershed segmentation is a key component: EM staining only highlights membrane boundaries, discouraging the use of region cues for segmentation. By incorporating long-range repulsions into the watershed procedure, we can obtain an accurate segmentation from this step already, avoiding costly post-processing for agglomeration. In addition, we present preliminary results on the BSDS500, demonstrating the applicability of the proposed method to natural images. We describe our future plans, including extensions to semantic segmentation, in Sect. 5. Our implementation is available at https://github.com/hci-unihd/mutex-watershed.git.

2 Related Work

In the original watershed algorithm [1], seeds were automatically placed at all local minima of the boundary map. Unfortunately, this leads to severe over-segmentation. Defining better seeds has been a recurring theme of watershed research ever since. The simplest solution is offered by the seeded watershed algorithm [2]: It relies on an oracle (an external algorithm or a human) to provide seeds and assigns each pixel to its nearest seed in terms of minimax path distance. In the absence of an oracle, automatic seed selection is challenging because exactly one seed must be placed in every region. Simple methods, e.g. defining seeds by connected regions of low boundary probability, do not work: The segmentation quality is usually insufficient because multiple seeds are in the same region and/or seeds leak through the boundary.

This problem is typically addressed by biasing seed selection towards over-segmentation (with seeding at all minima being the extreme case). The watershed algorithm then produces superpixels that are merged into final regions by more or less elaborate postprocessing. This works better than using watersheds alone because it exploits the larger context afforded by superpixel adjacency graphs. Many criteria have been proposed to identify the regions to be preserved during merging, e.g. region dynamics [3], the waterfall transform [4], extinction values [5], region saliency [6], and \((\alpha ,\omega )\)-connected components [7]. A merging process controlled by criteria like these can be iterated to produce a hierarchy of segmentations where important regions survive to the next level. Variants of such hierarchical watersheds are reviewed and evaluated in [8].

These results highlight the close connection of watersheds to hierarchical clustering and minimum spanning trees/forests [9, 10], which inspired novel merging strategies and termination criteria. For example, [11] simply terminated hierarchical merging by fixing the number of surviving regions beforehand. [12] incorparate predefined sets of generalized merge constraints into the clustering algorithm. Graph-based segmentation according to [13] defines a measure of quality for the current regions and stops when the merge costs would exceed this measure. Ultrametric contour maps [14] combine the gPb (global probability of boundary) edge detector with an oriented watershed transform. Superpixels are agglomerated until the ultrametric distance between the resulting regions exceeds a learned threshold. An optimization perspective is taken in [15], which introduces h-increasing energy functions and builds the hierarchy incrementally such that merge decisions greedily minimize the energy. The authors prove that the optimal cut corresponds to a different unique segmentation for every value of a free regularization parameter.

An important line of research is based on the observation that superior partitionings are obtained when the graph has both attractive and repulsive edges. Solutions that optimally balance attraction and repulsion do not require external stopping criteria such as predefined number of regions or seeds. This generalization leads to the NP-hard problem of correlation clustering or (synonymous) multicut (MC) partitioning [16]. Fortunately, modern integer linear programming solvers in combination with incremental constraint generation can solve problem instances of considerable size [17], and good approximations exist for even larger problems [18, 19].

Another beneficial extension is the introduction of additional long-range edges. Thanks to their larger field of view, the strength of these edges can often be estimated with greater certainty than is achievable for the local edges used in standard watersheds. This has been used in [20] to represent object size constraints by repulsive long-range edges, which is still an MC-type problem. When long-range edges are also allowed to be attractive, the problem turns into the more complicated lifted multicut (LMC) [21]. Realistic problem sizes can only be solved approximately [22, 23], but watershed superpixels followed by LMC postprocessing achieve state-of-the-art results on important benchmarks [24]. Long-range edges are also used in [25], as side losses for the boundary detection CNN; but they are not used explicitly in any downstream inference.

In general, striking progress in watershed-based segmentation has been achieved by learning boundary maps with convolutional neural networks (CNNs). This is nicely illustrated by the evolution of neurosegmentation for connectomics, an important field we also address in the experimental section. CNNs were introduced to this application in [26] and became, in much refined form [27], the winning entry of the ISBI 2012 Neuro-Segmentaion Challenge [28]. Boundary maps and superpixels were further improved by progress in CNN architectures and data augmentation methods, using U-Nets [29], FusionNets [30] or inception modules [24]. Subsequent postprocessing with the GALA algorithm [31, 32], conditional random fields [33] or the lifted multicut [24] pushed the envelope of final segmentation quality. MaskExtend [34] applied CNNs to both boundary map prediction and superpixel merging, while flood-filling networks [35] eliminated superpixels all together by training a recurrent neural network to perform region growing one region at a time.

Most networks mentioned so far learn boundary maps on pixels, but learning works equally well for edge-based watersheds, as was demonstrated in [36, 37] using CNN-generated edge weights according to [38, 39]. Tailoring the learning objective to the needs of the watershed algorithm by penalizing critical edges along minimax paths [39] or end-to-end training of edge weights and region growing [40] improved results yet again.

Outside of connectomics, [41] obtained superior boundary maps from CNNs by learning not just boundary strength, but also its gradient direction. Holistically-nested edge detection [42, 43] couples the CNN loss at multiple resolutions using deep supervision and is successfully used as a basis for watershed segmentation of medical images in [44].

The present paper combines all these concepts (hierarchical clustering, attractive and repulsive interactions, long-range edges, and CNN-based learning) into a novel efficient segmentation framework. It can be interpreted as a generalization of [12], because we also allow for soft constraints (which can be overridden by strong attractive edges), and constraints are generated on the fly by a neural network rather than predefined. Our method is also related to greedy additive edge contraction (GAEC) according to [22], but we handle attractive and repulsive interactions separately and define edge strength between clusters by a maximum instead of an additive rule.

3 The Mutex Watershed Algorithm

3.1 Definitions and Notation

We consider the problem of clustering a graph \(G(V, E^+\cup E^-, W^+\cup W^-)\) with both attractive and repulsive edge attributes. The scalar attribute \(w^+_e\in \mathbb {R}^+_0\) associated with edge \(e\in E^+\) is a merge affinity: the higher this number, the higher the inclination of the two incident vertices to be assigned to the same cluster. Similarly, \(w^-_e\in \mathbb {R}^+_0\) for \(e\in E^-\) is a split tendency: the higher this number, the greater the tendency of the incident vertices to be in different clusters.

In our application, each vertex corresponds to one pixel in the image to be segmented. Two vertices may have no edge connecting them; or an attractive edge \(e\in E^+\); or a repulsive edge \(e\in E^-\); or two edges at the same time, one attractive and one repulsive. Edges can be either local/short-range (when connecting two pixels that are immediately adjacent in the image) or long-range.

The Mutex Watershed algorithm, defined in Subsect. 3.3, maintains disjunct active sets \(A^+ \subseteq E^+\), \(A^- \subseteq E^-\), \(A^+ \cap A^- = \emptyset \), that encode merges and mutual exclusion constraints, respectively. Clusters are defined via the “connected” predicate:

$$\begin{aligned} \forall i,j\in V: \quad \quad \quad \quad \quad \Pi _{i \rightarrow j}= & {} \{ \mathrm {path\,} \pi \,\mathrm {from}\,i\,\mathrm {to}\,j\,\,\mathrm {with}\,\pi \subseteq E^+\}\\ \mathrm {connected}(i, j)\Leftrightarrow & {} \exists \text { path } \pi \in \Pi _{i \rightarrow j}\,\, \text { with }\pi \subseteq A^+ \subseteq E^+\\ \mathrm {cluster}(i)= & {} \{i\} \cup \{j: \mathrm {connected}(i,j) \} \end{aligned}$$

Conversely, the active subset \(A^-\subseteq E^-\) of repulsive edges defines mutual exclusion relations by using the following predicate:

$$\begin{aligned} \mathrm {mutex}(i, j)\Leftrightarrow & {} \exists \, e=(k,l)\in A^- \text { with } \\&k \in \mathrm {cluster}(i) \text { and } l \in \mathrm {cluster}(j) \text { and } \\&\mathrm {cluster}(i) \ne \mathrm {cluster}(j) \end{aligned}$$

Admissible active edge sets \(A^+\) and \(A^-\) must be chosen such that the resulting clustering is consistent, i.e. nodes engaged in a mutual exclusion constraint cannot be in the same cluster: \(\mathrm {mutex}(i,j) \Rightarrow \mathrm {not} \,\, \mathrm {connected}(i, j)\). The “connected” and“mutex” predicates can be efficiently evaluated using a union find data structure.

Fig. 2.
figure 2

Two equivalent representations of the seeded watershed clustering obtained using (a) a maximum spanning tree computation or (b) Algorithm 1. Both graphs share the weighted attractive (green) edges and seeds (hatched nodes). The infinitely attractive connections to the auxiliary node (gray) in (a) are replaced by infinitely repulsive (red) edges between each pair of seeds in (b). The two final clusterings are defined by the active sets (bold edges) and are identical. Node colors indicate the clustering result, but are arbitrary. (Color figure online)

3.2 Seeded Watershed from a Mutex Perspective

One interpretation of the proposed method is in terms of a generalization of the edge-based watershed algorithm [9, 45, 46] or image foresting transform [47]. This algorithm can only ingest a graph with purely attractive interations, \(G(V, E^+, W^+)\). Without further constraints, the algorithm would yield only the trivial result of a single cluster comprising all vertices. To obtain more interesting output, an oracle needs to provide seeds, namely precisely one node per cluster. These seed vertices are all connected to an auxiliary node (see Fig. 2(a)) by auxiliary edges with infinite merge affinity. A maximum spanning tree (MST) on this augmented graph can be found in linearithmic time; and the maximum spanning tree (or in the case of degeneracy: at least one of the maximum spanning trees) will include the auxiliary edges. When the auxiliary edges are deleted from the MST, a forest results, with each tree representing one cluster [9, 45, 47].

We now reformulate this well-known algorithm in a way that will later emerge as a special case of the proposed Mutex Watershed: we eliminate the auxiliary node and edges, and replace them by a set of infinitely repulsive edges, one for each pair of seeds (Fig. 2(b)). Algorithm 1 is a variation of Kruskal’s MST algorithm operating on the seed mutex graph just defined, and gives results identical to seeded watershed on the original graph.

figure a

This algorithm differs from Kruskal’s only by the check for mutual exclusion in the if-statement. Obviously, the modified algorithm has the same effect as the original algorithm, because the final set \(A^+\) is exactly the maximum spanning forest obtained after removing the auxiliary edges from the original solution.

In the sequel, we generalize this construction by admitting less-than-infinitely repulsive edges. Importantly, these can be dense and are hence much easier to estimate automatically than seeds with their strict requirement of only-one-per-cluster.

3.3 Mutex Watersheds

We now introduce the core contribution: an algorithm that is empirically no more expensive than a MST computation; but that can ingest both attractive and repulsive cues and partition a graph into a number of clusters that does not need to be specified beforehand. There is no requirement of one seed per cluster, and not even of a hyperparameter that would implicitly determine the number of resulting clusters.

The Mutex Watershed, Algorithm 2, proceeds as follows: given a graph with sets of attractive and repulsive edges \(E^+\) and \(E^-\), with edge weights \(W^+\) and \(W^-\) respectively, do the following: sort all edges \(E^+\cup E^-\), attractive or repulsive, by their weight in descending order into a priority queue. Iteratively pop all edges from the queue and add them to the active set one by one, provided that a set of conditions are satisfied. More specifically, if the next edge popped from the priority queue is attractive and its incident vertices are not yet in the same tree, then connect the respective trees provided this is not ruled out by a mutual exclusion constraint. If on the other hand the edge popped is repulsive, and if its incident vertices are not yet in the same tree, then add a mutual exclusion constraint between the two trees.

figure b

The crucial difference to Algorithm 1 is that mutex constraints are no longer pre-defined, but created dynamically whenever a repulsive edge is found. However, new exclusion constraints can never override earlier, high-priority merge decisions. In this case, the repulsive edge in question is simply ignored. Similarly, an attractive edge must never override earlier and thus higher-priority must-not-link decisions.

3.4 Time Complexity Analysis

Before analyzing the time complexity of Algorithm 2 we first review the complexity of Kruskal’s algorithm. Using a union-find data structure the time complexity of \(\mathrm {merge(i,~j)}\) and \(\mathrm {connected(i,~j)}\) is \(\mathcal {O}(\alpha (V))\), where \(\alpha \) is the slowly growing inverse Ackerman function, and the total runtime complexity is dominated by the initial sorting of the edges \(\mathcal {O}(E \log E)\) [48].

To check for mutex constraints efficiently, we maintain a set of all active mutex edges

$$ M[C_i] = \{(u,~v) \in A^- | u \in C_i \vee v \in C_i\} $$

for every \(C_i = \mathrm {cluster}(i)\) using hash tables, where insertion of new mutex edges (i.e. \(\mathrm {addmutex}\)) and search have an average complexity of \(\mathcal {O}(1)\). Note that every cluster can be efficiently identified by its union-find root node. For \(\mathrm {mutex(i,~j)}\) we check if \(M[C_i] \cap M[C_j] = \emptyset \) by searching for all elements of the smaller hash table in the larger hash table. Therefore \(\mathrm {mutex(i,~j)}\) has an average complexity of \(\mathcal {O}(\min (|M[C_i]|, |M[C_j]|)\). Similarly, during \(\mathrm {merge(i,~j)}\), mutex constraints are inherited by merging two hash tables, which also has an average complexity \(\mathcal {O}(\min (|M[C_i]|, |M[C_j]|)\).

In conclusion, the average runtime contribution of attractive edges \(\mathcal {O}(|E^{+}| \cdot \alpha (V) + |E^{+}| \cdot M)\) (checking mutex constrains and possibly merging) and repulsive edges \(\mathcal {O}(|E^{-}| \cdot \alpha (V) + |E^{-}|) \) (insertion of one mutex edge) result in a total average runtime complexity of Algorithm 2:

$$\begin{aligned} \mathcal {O}(E \log E + E \cdot \alpha (V) + EM). \end{aligned}$$
(1)

where M is the expected value of \(\min (|M[C_i]|, |M[C_j]|)\). Using \(\alpha (V) \in \mathcal {O}(\log V) \in \mathcal {O}(\log E) \) this simplifies to

$$\begin{aligned} \mathcal {O}(E \log E \;+\; EM). \end{aligned}$$
(2)

In the worst case \(\mathcal {O}(M) = \mathcal {O}(E)\), the Mutex Watershed Algorithm has a runtime complexity of \(\mathcal {O}(E^2)\). Empirically, we find that \(\mathcal {O}(EM) \approx \mathcal {O}(E\log E)\) by measuring the runtime of Mutex Watershed for different sub-volumes of the ISBI challenge (see Fig. 3), leading to a

$$\begin{aligned} \text {Empirical Mutex Watershed Complexity: }\mathcal {O}(E \log E) \end{aligned}$$
(3)
Fig. 3.
figure 3

Runtime T of Mutex Watershed (without sorting of edges) measured on sub-volumes of the ISBI challenge of different sizes (thereby varying the total number of edges E). We plot \(\frac{T}{|E|}\) over |E| in a logarithmic plot, which makes \(T \sim |E| log(|E|)\) appear as straight line. A logarithmic function (green line) is fitted to the measured \(\frac{T}{|E|}\) (blue crosses) with (\(R^2 = 0.9896\)). The good fit suggests that empirically \(T \approx \mathcal {O}(E \log E)\). (Color figure online)

4 Experiments

We evaluate the Mutex Watershed on the challenging task of neuron segmentation in electron microscopy (EM) image volumes. This application is of key interest in connectomics, a field of neuro-science that strives to reconstruct neural wiring diagrams spanning complete central nervous systems. The task requires segmentation of neurons from electron microscopy images of neural tissue – a challenging endeavor, since segmentation has to be based only on boundary information (cell membranes) and some of the boundaries are not very pronounced. Besides, cells contain membrane-bound organelles, which have to be suppressed in the segmentation. Some of the neuron protrusions are very thin, but all of those have to be preserved in the segmentation to arrive at the correct connectivity graph. While a lot of progress has been made recently, only manual tracing yields sufficient accuracy for correct circuit reconstruction [49].

We validated the Mutex Watershed algorithm on the most popular neural segmentation challenge: ISBI2012 [28]. We estimate the edge weights using a CNN as described in Sect. 4.1 and compare with other entries in the leaderboard as well as with other common post-processing methods for the same network predictions Sect. 4.2.

4.1 Estimating Edge Weights with a CNN

The common approach to EM segmentation is to predict which pixels belong to a cell membrane using a CNN. Different post-processing methods are used on top to obtain a segmentation, see Sect. 2 for an overview of these methods. The CNN can be either trained to predict boundary pixels [24, 27] or undirected affinities [25, 50] which express how likely it is for a pixel to belong to a different cell than its neighbors in the 6-neighborhood. In this case, the output of the network contains three channels, corresponding to left, down and next imaging plane neighbors in 3d. The affinities do not have to be limited to immediate neighbors - in fact, [25] have shown that introduction of long-range affinities is beneficial for the final segmentation even if they are only used as auxiliary loss during training. Building on the work of [25], we train a CNN to predict short and long-range affinities and then use those directly as weights for the Mutex Watershed algorithm.

We estimate the affinities/edge weights for the neighborhood structure shown in Fig. 4. To that end, we define local attractive and long-range repulsive edges. The choice of this structure has to be motivated by the underlying data - we use a different pattern for in-plane and between-plane edges due to the anisotropy of the validation datasets. In more detail, we picked a sparse ring of in-plane repulsive edges and additional longer-range in-plane edges which were necessary to split regions reliably (see Fig. 4a). We also added connections to the indirect neighbors in the lower adjacent slice to ensure correct 3D connectivity (see Fig. 4b).

In total, \(C^+\) attractive and \(C^-\) repulsive edges are defined for each pixel, resulting in \(C^+ + C^-\) output channels in the network. We partition the set of attractive/repulsive edges into subsets \(H^+\) and \(H^-\) that contain all edges at a specific offset, attractive edges: \(E^+ = {\bigcup _{c}^{C^+}} H^+_{c}\) and repulsive edges analogously. Each element of the subsets \(H^+_{c}\) and \(H^-_{c}\) corresponds to a specific channel predicted by the network. We further assume that weights take values in [0, 1] and adopt the same conventions for attraciveness/repulsion as in Sect. 3. For more details on network architecture and training see Supplementary 1.

In our experiments, we pick a subset of repulsive edges, by using strides of 2 in the XY-plane in order to avoid artifacts caused by occasional very thick membranes. Note that the stride is not applied to local (attractive) edges, but only to long-range (repulsive) edges.

Fig. 4.
figure 4

Local neighborhood structure of attractive (green) and repulsive (red) edges in the Mutex Watershed graph. Due to point symmetry to the origin, we only predict half of the directions with the neural network. (Color figure online)

4.2 ISBI Challenge

The ISBI 2012 EM Segmentation Challenge [28] is the neuron segmentation challenge with the largest number of competing entries. The challenge data contains two volumes of dimensions \(1.5 \times 2 \times 2\) microns with a resolution of \(50 \times 4 \times 4\) nm per pixel. The groundtruth is provided as binary membrane labels, which can easily be converted to a 2D, but not 3D segmentation. To train a 3D model, we follow the procedure described in [24].

Fig. 5.
figure 5

Mutex Watershed applied on the ISBI Challenge test data. For further images and a detailed comparison to the baseline segmentation methods view Supplementary Sect. 2.

The test volume has private groundtruth; results can be submitted to the leaderboard. They are evaluated based on the Adapted Rand Score (Rand-Score) and the Variation of Information Score (VI-Score) [28], separately for each 2D slice.

Our method holds the top entry in the challenge’s leader boardFootnote 1 at the time of submission, see Table 1a. This is especially remarkable, because it is simpler than the methods holding the other top entries. Similar to us, they rely on a CNN to predict boundary locations, but postprocess its output with the complex pipeline described in [24], that involves a NP-hard partitioning step.

In addition, we compare to baseline post-processing methods starting from our network predictions: thresholding (THRESH), two watershed variants (WS, WSDT), and one multicut variant (MC-LOCAL) only take into account short-range predictions. Lifed multicut (LMC) and another multicut variant (MC-FULL) also use long-range predictions. For these baseline methods we have only produced 2D segmentations for the individual slices, either because the 3D results were inferior (THRESH, WS, WSDT) or infeasible to obtain (MC, LMC). In contrast, the Mutex Watershed benefited from 3D segmentation. See Table 1b for the evaluation results and see Supplementary 2 for further details on the baseline methods and a qualitative comparison.

The three methods that use short- and long-range connectivty perform significantly better than the other methods. Somewhat surprisingly, MWS performs better than MC-FULL and LMC, which are based on a NP-hard partition problem. This might be explained by the lack of 3D information in the two latter two approaches (solving the 3D model was infeasible).

4.3 Study on Natural Image Segmentation

We conducted preliminary experiments on the Berkeley segmentation dataset BSD500 [53] to study the Mutex Watersheds applicability to natural images. Training a state-of-the-art edge detection network on this small dataset requires a set of dataset specific optimization tricks such as training with external data, multi resolution architectures and auxiliary losses [43]. In this preliminary study we train a 2D version of the network used for the ISBI experiments to predict the 2D connectivity pattern depicted in Fig. 4a. To alleviate the small size of the training set, we present this network with predictions from [42] as additional input channel.

Table 1. Results on the ISBI 2012 EM Segmentation Challenge.

In order to isolate the influence of the quality of the underlying affinities, we run ablation experiments where we interpolate (via weighted average) between (a) affinities as predicted by our neural network, (b) those obtained from the ground-truth and (c) uniform noise. We obtain Mutex Watershed segmentations from the interpolated affinities for the BSD testset, size-filter them (as the only post-processing step) and evaluate with the Rand Index. The “phase transition diagram” resulting from these experiments is shown in Fig. 6a; Table 6b shows Rand Index and Variation of Information obtained for several points on this diagram.

Observe that the vertices corresponding to (a) and (c) can be interpreted as structured and unstructured noise on the ground-truth affinities (respectively). Hence, the results of our experiments show that the Mutex Watershed is fairly robust against both types of noise; when mixing the GT with noise, the quality of the segmentations is unaffected up to 60 % noise. When mixing GT with NN predictions, it is unaffected to an even higher degree.

In addition, we compare to the result of [22], who use an approach similar to ours and solve a Lifted Multicut based on long range potentials extracted from a pre-computed probability map. In Supplementary 3, we show the segmentations resulting at different stages of interpolation between GT, NN predictions and noise.

Fig. 6.
figure 6

(a) BSD500 segmentation quality of MWS algorithm, given affinities from ground truth (top corner), from a neural network (right corner) or pure noise (left corner); plus hundreds of experiments on weighted combinations of the above. MWS segmentation quality (evaluated with Rand index) degrades only once a large amount of noise is added to the affinities. (b) BSD500 scores at various interpolations between the neural network predictions (NN), ground-truth (GT) and noise. See Supplementary Sect. 3 for example images of the interpolated affinities. We include [22] as a reference point, because they also use long range potentials in their segmentation method.

5 Conclusion

We have presented a fast algorithm for the clustering of graphs with both attractive and repulsive edges. The ability to consider both obviates the need for the kind of stopping criterion or even seeds that all popular algorithms except for correlation clustering need. The proposed method has low computational complexity in imitation of its close relative, Kruskal’s algorithm.

Finally, we have found that the proposed algorithm, when presented with informative edge costs from a good neural network, outperforms all known methods on a competitive bioimage partitioning benchmark, including methods that operate on the very same network predictions.

In future work we want to generalize our algorithm to semantic instance segmentation commonly found in natural image segmentation challenges [54,55,56].