Elsevier

Pattern Recognition

Volume 112, April 2021, 107789
Pattern Recognition

Label group diffusion for image and image pair segmentation

https://doi.org/10.1016/j.patcog.2020.107789Get rights and content

Highlights

  • Multivariate relationships are explored for segmentation task based on a unary probability conversion model.

  • A multi-relational learning method is proposed to estimate the label group information on TPG, which imposes a higher-order graph Laplacian to smooth the prior label.

  • A generalized LGD framework is established for semi-supervised image segmentation and unsupervised image pair co-segmentation tasks.

Abstract

Diffusion technique is powerful for semi-supervised image segmentation since the geometry of the data manifold can be captured by the affinity propagation. Conventional diffusion methods focus on single label, which however ignore interactions among labels. This workstudies a generalized diffusion framework that considers label group for diffusion (LGD). The proposed framework can effectively capture the interactions among image elements via tensor product graph (TPG). A multivariate affinity framework is proposed to learn on higher-order TPG. Label pair diffusion algorithm is naturally derivedfrom the framework by considering second-order affinity (LG(2)D). We theoretically show that conventional label diffusion is the simplest case of the proposed framework (LG(1)D). Extensive experiments on image segmentation and image pair co-segmentation datasets demonstrate the superior performance of the proposed framework.

Introduction

Given a set of label prior information, the goal of image segmentation is to partition an image into several connected homogeneous regions according to pre-defined similarity measures. Segmented semantic regions or contours associated with real-word entities or scenes are the basis for further advanced image processing. Therefore, image segmentation is a key step from image processing to image analysis, which is a fundamental problem in computer vision [1], [2], [3], [4], [5].

Based on the way to obtain the label prior information, segmentation schemes are generally classified into unsupervised, semi-supervised and fully supervised approaches. Unsupervised schemes automatically estimate label information based on feature clustering. However, it is difficult to determine the number of labels, segmentations of such approaches generally lack semantics. Semi-supervised schemes, also called interactive approaches, allow the user to provide simple interactions, such as scribbles, bounding boxes and target contours, to represent label information during segmentation. Such approaches can add the users’ intentions to obtain results meeting their demands. Typical interactive segmentation methods include the graph cut (GC) based approaches [6], [7], [8], [9] and the random walk (RW) based approaches [10], [11], [12], [13], [14]. In GC, the unary and pairwise potentials are constructed to represent the region and boundary information of the image. Minimizing the binary cost function is equivalent to finding a minimum cut on a specific graph. In RW, the probability that a random walker that starts at a pixel first reaches the foreground or background seeds is computed for each unseeded pixel. Then, each pixel is classified into the corresponding group according to the maximal probability.However, as pointed in [1], the label distributions estimated from seeds based on low-level features are insufficient at distinguishing each object in many cases, such as in images with similar object appearances, complex textures, and difficult lighting conditions. In such cases, excessive user interactions are required to achieve desirable results. Fully supervised schemes, such as the popular deep-learning-based approaches [2], [3], collect a set of labeled image samples for model training. The trained convolutional neural networks have been proven to have superior performance in many areas of artificial intelligence. However, such approaches generally rely on a large amount of sample data. The process of labeling all data is time-consuming and in many specific tasks, the available training data is limited. Furthermore, they also occasionally fail when a new category appears. Recently, many deep interactive segmentation methods have been proposed to better obtain the label prior information [1,4,5]. The users’ intentions can be semantically perceived from the simple interactions based on the deep technique.

Guided by the label prior information, the segmentation is executed based on the relationships between image elements. This paper mainly focuses on the similarity measures. The Euclidean distances are used to compute the similarities in many segmentation methods, such as GC approaches. The Euclidean distances are an adequate measure only if the data manifold is convex. However, the shape of the data manifold is often curved and is not a convex subset of the feature space [15]. Thus, the Euclidean distances are generally inadequate, which causes the GC algorithms to suffer from the “short cut” and texture problems.

To overcome the above problem, many diffusion-based methods have been proposed [16], [17], [18], [19], in whichthe data manifold is represented as a weighted graph and the similarity values are iteratively propagated following structure of the graph. Most common variants of diffusion process are summarized in a recent survey paper [20]. Typical diffusion-based segmentation approaches include the popular RW algorithm. However, as pointed in [15], though the inner class affinities can be propagated by the diffusion process, the intra class differences are easy to lose. To overcome this problem, the diffusion process on tensor product graph (TPG), obtained by the tensor product of the original graph with itself, is developed to better propagate the within class similarities while preserving the between class separation [15]. Further explorationand explanation of the advantage of TPG-based diffusion for image retrieval is described in [21].

Inspired by these TPG-based approaches [15,21], in our previous paper [22], we have extendedthe conventional label diffusion to label pair diffusion for interactive image segmentation task. As described in [22], more accurate relationships between unlabeled data and labeled data can be learned on TPG and more complex interactions between image element pairs and label pairscan be explored in label pair diffusion process. For example, instead of measuring the relationships between N image elements and M labels in label diffusion (N and M represent the number of image elements and labels, respectively), the relationships between N2 image element pairs and M2 label pairs are measured in label pair diffusion. In this case, the unary probability between each image element and each label are determined by the average influence of the multivariate relationships between all other elements and all other labels.

In this paper, a theoretical framework of LGD is proposed, whichnaturally derives label diffusion and label pair diffusion algorithms. Compared with the existing TPG-based approaches, we establish a generalized diffusion framework to extend the binary affinity diffusion to multivariate affinity learning on higher-order TPG. The detailed contributions of this paper are concluded as follows: first, multivariate relationships are explored for segmentation task based on a unary probability conversion model; second, a multi-relational learning theory is proposed to estimate the label group information on TPG, which imposes a higher-order graph Laplacian to smooth the prior label; last, a generalized LGD framework is established for semi-supervised image segmentation and unsupervised image pair co-segmentation tasks.

Section snippets

Related work

Semi-supervised Image Segmentation: The most popular semi-supervised segmentation methods include GC [6] andRW[10]. In these approaches, the graph theory-based optimization methods, such as the max-flow/min-cut [23] algorithm, are generally used to obtain the global optimal solution.The interactive graph cut method was first proposed in [6] to segment grayscale medical images. Superpixels produced by unsupervised algorithms were used to replace pixels to improve efficiency in Lazy snapping [24]

Generalized frameworkof LGD

Fig. 1 illustrates a schematic representationof the proposed theoretical framework, which generalizes the diffusion process to learn the relationship between image element group and label group on higher-order TPG.

Applications

Consideringthe algorithm complexity of LGD, we focus on exploring the LG(2)D theory (K=2) for practical applications first. Based on the way to obtain the label prior, we apply LG(2)D to semi-supervised single image segmentation and unsupervised image pair co-segmentation tasks, respectively.

Experimental results

The parameter settings are concluded as: constants β in Eq. (1) and K in Eq. (25) are fixed to 60 and 3, respectively, SPs are selected as image elements in our algorithm and the popular SLIC algorithm [31] is utilized to generate SPs with the number 500, and controlling parameter α in Eqs. (23, 28) is set to 0.99.

For semi-supervised image segmentation task, we compared LG(2)Dwith TPG diffusion (TPGD) [15], regularized TPG diffusion (RTPGD) [21], self-diffusion (SD) [16], probability diffusion

Conclusion

This paper studies LGD for image segmentation task. The proposed framework effectively leverages multivariate relationships via higher-order TPG. The diffusion can utilize both local and global relations, which helps to capture the geometry structure of the data manifold. The qualitative and quantitative comparisons demonstrate the superior performance of the proposed framework over the state-of-the-art diffusion approaches in the single image segmentation and image pair co-segmentation tasks.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the Natural Science Foundation of Jiangsu Province, China, under Grant BK20180458 and Grant BK20180069, in part by the National Science Foundation of China under Grant 61802188, Grant 62072241, Grant 61673220 and Grant 61972213, and in part by the Project funded by China Postdoctoral Science Foundation under Grant 2020M681530.

Tao Wang received the B.E. in Computer Science, and the Ph.D. degree in pattern recognition and intelligence system from Nanjing University of Science and Technology (NUST), China, in 2012 and 2017, respectively. Currently, he is an associate professorin the School of Computer Science and Engineering at NUST. His research interests include image segmentation, pattern recognition.

References (36)

  • T. Wang et al.

    Diffusive likelihood for interactive image segmentation

    Pattern Recogn.

    (2018)
  • T. Wang et al.

    Multi-layer graph constraints for interactive image segmentation via game theory

    Pattern Recognition

    (2016)
  • N. Xu et al.

    Deep interactive object selection

    IEEE Conf. Comp. Vision and Pattern Recogn.

    (2016)
  • H. Noh et al.

    Learning deconvolution network for semantic segmentation

    IEEE Int. Conf. Comp. Vision

    (2015)
  • E. Shelhamer et al.

    Fully convolutional networks for semantic segmentation

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2014)
  • D. Lin et al.

    Scribblesup: Scribble-supervised convolutional networks for semantic segmentation

  • N.Xu, B.Price, S.Cohen, J.Yang, and T.Huang. Deep GrabCut for object selection.arXiv preprint arXiv:1707.00243,...
  • Y. Boykov et al.

    Interactive graph cuts for optimal boundary & region segmentation of objects in ND images

  • A. Heimowitz et al.

    Image segmentation via probabilistic graph matching

    IEEE Trans. Image Process.

    (2016)
  • M. Jian et al.

    Interactive image segmentation using adaptive constraint propagation

    IEEE Trans. Image Process.

    (2016)
  • L. Grady

    Random walks for image segmentation

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2006)
  • W. Casaca et al.

    Laplacian coordinates for seeded image segmentation

  • C.G. Bampis et al.

    Graph-driven diffusion and random walk schemes for image segmentation

    IEEE Trans. Image Process.

    (2016)
  • X. Dong et al.

    Sub-Markov random walk for image segmentation

    IEEE Trans. Image Process.

    (2016)
  • X. Yang et al.

    Affinity learning with diffusion on tensor product graph

    IEEE Trans. Pattern Anal. Mach. Intel.

    (2013)
  • B. Wang et al.

    Affinity learning via self-diffusion for image segmentation and clustering

  • B. Wang et al.

    Dynamic label propagation for semi-supervised multi-class multi-label classification

  • K.I. Kim et al.

    Context-guided diffusion for label propagation on graphs

  • Cited by (3)

    • Super-resolution semantic segmentation with relation calibrating network

      2022, Pattern Recognition
      Citation Excerpt :

      The work [2] models the relationship of two pixels for affinity information, which is used to distinguish between different instances in instance segmentation. LGD [3] establishes a generalized diffusion framework to extend the binary affinity diffusion to multivariate affinity learning on higher-order tensor product graph for image and image pair segmentation. SCA-Net [4] explicitly exploits and assembles global and local contextual information by learning long-path and short-path dependencies of spatial locations on the feature maps for improving salient object detection.

    • Multigraph Fusion for Dynamic Graph Convolutional Network

      2024, IEEE Transactions on Neural Networks and Learning Systems

    Tao Wang received the B.E. in Computer Science, and the Ph.D. degree in pattern recognition and intelligence system from Nanjing University of Science and Technology (NUST), China, in 2012 and 2017, respectively. Currently, he is an associate professorin the School of Computer Science and Engineering at NUST. His research interests include image segmentation, pattern recognition.

    ZexuanJi received the B.E. degree inComputer Science and Technology, and the Ph.D. degree in pattern recognition and intelligence systemfromNanjingUniversity of Science and Technology (NUST), China, in 2007 and 2012, respectively. Currently, he is an associate professorwith the School of Computer Science and Engineering at the Nanjing University of Science and Technology. He visited the Shenzhen Institutes of Advanced Technology from eight months since Oct. 2009, and the School of Information Technologies, University of Sydney for one year since Nov. 2010. His currentinterests include medical imaging, image processing and pattern recognition.

    Jian Yang received the BS degree in mathematics from the Xuzhou Normal University in 1995. He received the MS degree in applied mathematics from the Changsha Railway University in 1998 and the PhD degree from the Nanjing University of Science and Technology (NUST), on the subject of pattern recognition and intelligence systems in 2002. In 2003, he was a postdoctoral researcher at the University of Zaragoza. From 2004 to 2006, he was a Postdoctoral Fellow at Biometrics Centre of Hong Kong Polytechnic University. From 2006 to 2007, he was a Postdoctoral Fellow at Department of Computer Science of New Jersey Institute of Technology. Now, he is a professor in the School of Computer Science and Technology of NUST. He is the author of more than 80 scientific papers in pattern recognition and computer vision. His journal papers have been cited more than 2000 times in the ISI Web of Science, and 4000 times in the Web of Scholar Google. His “2DPCA” paper published in TPAMI 2004 has been cited more than 2000 in Scholar Google. His research interests include pattern recognition, computer vision and machine learning. Currently, he is an associate editor of Pattern Recognition Letters and IEEE Trans. Neural Networks and Learning Systems, respectively.

    Quansen Sun received the Ph.D. degree in pattern recognition and intelligence system from Nanjing University of Science and Technology (NUST), China, in 2006. He is a professor in the Department of Computer Science at NUST. His current interests include pattern recognition, image processing, remote sensing information system, and medical image analysis.

    Xiaobo Shen received the B.Sc. and Ph.D. degrees from the School of Computer Science and Engineering, Nanjing University of Science and Technology, China, in 2011 and 2017, respectively. He is currently a Professor with the School of Computer Science and Engineering, Nanjing University of Science and Technology. He has authored over 30 technical papers in prominent journals and conferences, such as the IEEE TNNLS, the IEEE TIP, the IEEE TCYB, NIPS, ACM MM, AAAI, and IJCAI. His primary research interests are multi-view learning, multi-label learning, network embedding, and hashing.

    Zhenwen Ren received the B.E. in Computer Science from Nanjing University of Science and Technology (NUST), China, in 2014. Currently he is pursuing the Ph.D. degree in pattern recognition and intelligence systemfromNUST. His research interests include image segmentation, pattern recognition.

    Qi Ge received the B.Sc. degree in College of Math & Physics, Nanjing University of Information Science & Technology, Nanjing, China, in 2006, M.Sc. degree in Applied Mathematics from College of Math & Physics, Nanjing University of Information and Technology, Nanjing, China, in 2009, and Ph.D. degree in Pattern Recognition and Intelligent System in Nanjing University of Science and Technology, Nanjing, China, in 2013. Currently, she is an associate professorwith the school of Telecommunications and Information Engineering at Nanjing University of Posts and Telecommunications. Her research interests include pattern recognition, image processing, and image segmentation.

    View full text