Abstract
Graph cut algorithms can produce consistent high-quality image segmentation masks by minimizing a predefined energy function over pixels. However, defining such a function is often impracticable, especially when it comes to semantic segmentation where pixel values must convey information about the class of a pixel. On the other hand, convolutional neural networks, like U-Net, can learn to implicitly extract meaningful information from an image, but they lack explicit constraints, leading to potential rugged boundaries in the produced masks. In recent years, many solutions have been proposed to implement graph-cut algorithms into a neural network layer, and thus combine the best of both worlds, but all lack in speed or quality of the results. SoftCut, the approach proposed in this work, is a differentiable relaxation of the graph cut problem, equivalent to an intuitive electric circuit, that, used as an output activation function, is shown to outperform both U-Net and submodular optimization in terms of IoU on real-world images taken from Cityscapes, while being faster than the latter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Code available at https://github.com/alessiobonfiglio/softcut-lod.
References
Agrawal, A., Boyd, S.: Differentiating through log-log convex programs. arXiv (2020)
Amos, B., Kolter, J.Z.: OptNet: differentiable optimization as a layer in neural networks. In: Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 136–145. PMLR (2017)
Berthet, Q., Blondel, M., Teboul, O., Cuturi, M., Vert, J.P., Bach, F.: Learning with differentiable pertubed optimizers. In: Advances in Neural Information Processing Systems, vol. 33, pp. 9508–9519 (2020)
Borenstein, E., Ullman, S.: Class-specific, top-down segmentation. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 109–122. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47967-8_8
Borse, S., Cai, H., Zhang, Y., Porikli, F.: HS3: learning with proper task complexity in hierarchically supervised semantic segmentation. In: 32nd British Machine Vision Conference 2021, BMVC 2021, Online, 22–25 November 2021, p. 175. BMVA Press (2021)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2016). https://doi.org/10.1109/cvpr.2016.350
Djolonga, J.: torch-submod (2017). https://github.com/josipd/torch-submod
Djolonga, J., Krause, A.: Differentiable learning of submodular models. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE (2016). https://doi.org/10.1109/cvpr.2016.90
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving. J. Res. Natl. Bur. Stand. 49(6), 409 (1952)
Laporte, F.: Torch sparse solve (2020). https://github.com/flaport/torch_sparse_solve
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE (2016). https://doi.org/10.1109/3dv.2016.79
Natarajan, E.P.: KLU-A high performance sparse linear solver for circuit simulation problems. Ph.D. thesis, University of Florida (2005)
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12179–12188. IEEE (2021). https://doi.org/10.1109/iccv48922.2021.01196
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sun, H., Shi, Y., Wang, J., Tuan, H.D., Poor, H.V., Tao, D.: Alternating differentiation for optimization layers. In: The Eleventh International Conference on Learning Representations (2023)
Yakubovskiy, P.: Segmentation models pytorch (2020). https://github.com/qubvel/segmentation_models.pytorch
Zhang, X., et al.: DCNAS: densely connected neural architecture search for semantic image segmentation. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13956–13967. IEEE (2021). https://doi.org/10.1109/cvpr46437.2021.01374
Acknowledgements
This paper is supported by the FAIR (Future Artificial Intelligence Research) project, funded by the NextGenerationEU program within the PNRR-PE-AI scheme (M4C2, investment 1.3, line on Artificial Intelligence).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bonfiglio, A., Cannici, M., Matteucci, M. (2024). SoftCut: A Fully Differentiable Relaxed Graph Cut Approach for Deep Learning Image Segmentation. In: Nicosia, G., Ojha, V., La Malfa, E., La Malfa, G., Pardalos, P.M., Umeton, R. (eds) Machine Learning, Optimization, and Data Science. LOD 2023. Lecture Notes in Computer Science, vol 14505. Springer, Cham. https://doi.org/10.1007/978-3-031-53969-5_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-53969-5_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53968-8
Online ISBN: 978-3-031-53969-5
eBook Packages: Computer ScienceComputer Science (R0)