Elsevier

Pattern Recognition

Volume 43, Issue 2, February 2010, Pages 445-456
Pattern Recognition

Interactive image segmentation by maximal similarity based region merging

https://doi.org/10.1016/j.patcog.2009.03.004Get rights and content

Abstract

Efficient and effective image segmentation is an important task in computer vision and object recognition. Since fully automatic image segmentation is usually very hard for natural images, interactive schemes with a few simple user inputs are good solutions. This paper presents a new region merging based interactive image segmentation method. The users only need to roughly indicate the location and region of the object and background by using strokes, which are called markers. A novel maximal-similarity based region merging mechanism is proposed to guide the merging process with the help of markers. A region R is merged with its adjacent region Q if Q has the highest similarity with Q among all Q's adjacent regions. The proposed method automatically merges the regions that are initially segmented by mean shift segmentation, and then effectively extracts the object contour by labeling all the non-marker regions as either background or object. The region merging process is adaptive to the image content and it does not need to set the similarity threshold in advance. Extensive experiments are performed and the results show that the proposed scheme can reliably extract the object contour from the complex background.

Introduction

Image segmentation is to separate the desired objects from the background. In general, the color and texture features in a natural image are very complex so that the fully automatic segmentation of the object from the background is very hard. Therefore, semi-automatic segmentation methods incorporating user interactions have been proposed [2], [4], [10], [13], [17], [20], [21], [24] and are becoming more and more popular. For instance, in the active contour model (ACM), i.e. the snake algorithm [2], a proper selection of the initial curve by the user could lead to a good convergence to the true object contour. Similarly, in the graph cut algorithm [10], [11], [12], the prior information obtained by the users is critical to the segmentation performance.

The low level image segmentation methods, such as mean shift [5], [6], watershed [3], level set [15] and super-pixel [28], usually divide the image into many small regions. Although may have severe over segmentation, these low level segmentation methods provide a good basis for the subsequent high level operations, such as region merging. For example, in [17], [18], Li et al. combined graph cut with watershed pre-segmentation for better segmentation outputs, where the segmented regions by watershed, instead of the pixels in the original image, are regarded as the nodes of graph cut. As a popular segmentation scheme for color image, mean shift [6] can have less over segmentation than watershed while preserving well the edge information of the object (Fig. 1a shows an example). Because of less over segmentation, the statistic features of each region, which will be exploited by the proposed region merging method, can be more robustly calculated and then be used in guiding the region merging process.

In this paper, we proposed a novel interactive region merging method based on the initial segmentation of mean shift. In the proposed scheme, the interactive information is introduced as markers, which are input by the users to roughly indicate the position and main features of the object and background. The markers can be the simple strokes (e.g. the green and blue lines in Fig. 1b). Then the proposed method will calculate the similarity of different regions and merge them based on the proposed maximal similarity rule with the help of these markers. The object will then be extracted from the background when the merging process ends (Fig. 1c shows an example of segmentation result).

Although the idea of introducing markers into interactive segmentation was used in Meyer's watershed scheme [4] and the graph cut schemes [10], [11], [12], this paper first uses it to guide the region merging for object contour extraction. The key contribution of the proposed method is a novel maximal similarity based region merging (MSRM) mechanism, which is adaptive to image content and does not require a preset threshold. With the proposed region merging algorithm, the non-marker background regions will be automatically merged and labeled, while the non-marker object regions will be identified and avoided from being merged with background. Once all the non-marker regions are labeled, the object contour can then be readily extracted from the background. The proposed algorithm is very simple but it can successfully extract the objects from complex scenes.

The rest of the paper is organized as follows. Section 2 presents the proposed region merging algorithm. Section 3 performs extensive experiments to verify the proposed method. Section 4 concludes the paper.

Section snippets

Maximal-similarity based region merging

In our method, an initial segmentation is required to partition the image into homogeneous regions for merging. Any existing low level segmentation methods, such as super-pixel [28], mean shift [5], [6], watershed [3] and level set [15], can be used for this step. In this paper, we choose to use the mean shift method for initial segmentation because it has less over segmentation and can well preserve the object boundaries. Particularly, we use the mean shift segmentation software—the EDISON

Experimental results

The proposed MSRM method is essentially an adaptive region merging method. With the markers input by the user, it will automatically merge regions and label the non-marker regions as object or background. In Section 3.1, we first evaluate the MSRM method qualitatively by several representative examples; in Section 3.2, we compare it quantitatively with the well-known graph cut algorithm; in Section 3.3, we test the MSRM under different color spaces, distance metrics and initial segmentation; at

Conclusion

This paper proposed a novel region merging based interactive image segmentation method. The image is initially segmented by mean shift segmentation and the users only need to roughly indicate the main features of the object and background by using some strokes, which are called markers. Since the object regions will have high similarity to the marked object regions and so do the background regions, a novel maximal similarity based region merging mechanism was proposed to extract the object. The

About the Author—JIFENG NING received his College Diploma in Shenyang College of Technology in 1996 and Master of Engineering in Northwest A&F University in 2002. He is a lecturer with the College of Information Engineering, Northwest A&F University. Since 2004, he has been pursuing his Ph.D. degree in the State Key Laboratory of Integrated Service Networks, XIDIAN University. His research interests include computer vision, image segmentation and pattern recognition. He is now working as a

References (30)

  • F. Meyer et al.

    Morphological segmentation

    Journal of Visual Communication and Image Representation

    (1990)
  • Q. Yang et al.

    Progressive cut: an image cutout algorithm that models user intentions

    IEEE Multimedia

    (2007)
  • P. Meer

    Stochastic image pyramids

    Computer Vision, Graphics, and Image Processing (CVGIP)

    (1989)
  • J.M. Jolion

    The adaptive pyramid: a framework for 2D image analysis

    Computer Vision, Graphics, and Image Processing (CVGIP): Image Understanding

    (1992)
  • T. Kailath

    The divergence and Bhattacharyya distance measures in signal selection

    IEEE Transactions on Communications Technology

    (1967)
  • M. Kass et al.

    Snake: active contour models

    International Journal of Computer Vision

    (1987)
  • L. Vincent et al.

    Watersheds in digital spaces: an efficient algorithm based on immersion simulations

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1991)
  • Y. Cheng

    Mean shift, mode seeking, and clustering

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1995)
  • D. Comaniciu et al.

    Mean shift: a robust approach toward feature space analysis

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2002)
  • C. Christoudias, B. Georgescu, P. Meer, Synergism in low level vision, in: Proceedings of the International Conference...
  • Q. Luo, T.M. Khoshgoftaar, Efficient image segmentation by mean shift clustering and MDL-guided region merging, in:...
  • J. Wang, B. Thiesson, Y. Xu, M.F. Cohen, Image and video segmentation by anisotropic Kernel mean shift, in: Proceedings...
  • P. Felzenszwalb et al.

    Efficient graph-based image segmentation

    International Journal of Computer Vision

    (2004)
  • Y. Boykov et al.

    An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2004)
  • V. Kolmogorov et al.

    What energy functions can be minimized via graph cuts

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2004)
  • Cited by (380)

    • Accurate estimation of biological age and its application in disease prediction using a multimodal image Transformer system

      2024, Proceedings of the National Academy of Sciences of the United States of America
    View all citing articles on Scopus

    About the Author—JIFENG NING received his College Diploma in Shenyang College of Technology in 1996 and Master of Engineering in Northwest A&F University in 2002. He is a lecturer with the College of Information Engineering, Northwest A&F University. Since 2004, he has been pursuing his Ph.D. degree in the State Key Laboratory of Integrated Service Networks, XIDIAN University. His research interests include computer vision, image segmentation and pattern recognition. He is now working as a research assistant in the Department of Computing, The Hong Kong Polytechnic University.

    About the Author—LEI ZHANG received the B.S. degree in 1995 from Shenyang Institute of Aeronautical Engineering, Shenyang, PR China, the M.S. and Ph.D. degrees in Electrical and Engineering from Northwestern Polytechnical University, Xi’an, PR China, respectively, in 1998 and 2001. From 2001 to 2002, he was a research associate in the Department of Computing, The Hong Kong Polytechnic University. From January 2003 to January 2006 he worked as a Postdoctoral Fellow in the Department of Electrical and Computer Engineering, McMaster University, Canada. Since January 2006, he has been an Assistant Professor in the Department of Computing, The Hong Kong Polytechnic University. His research interests include Image and Video Processing, Biometrics, Pattern Recognition, Computer Vision, Multisensor Data Fusion and Optimal Estimation Theory, etc.

    About the Author—DAVID ZHANG graduated in Computer Science from Peking University in 1974 and received his M.Sc. and Ph.D. degrees in Computer Science and Engineering from the Harbin Institute of Technology (HIT), Harbin, PR China, in 1983 and 1985, respectively. He received the second Ph.D. degree in Electrical and Computer Engineering at the University of Waterloo, Waterloo, Canada, in 1994. From 1986 to 1988, he was a Postdoctoral Fellow at Tsinghua University, Beijing, China, and became an Associate Professor at Academia Sinica, Beijing, China. Currently, he is a Professor with the Hong Kong Polytechnic University, Hong Kong. He is Founder and Director of Biometrics Research Centers supported by the Government of the Hong Kong SAR (UGC/CRC). He is also Founder and Editor-in-Chief of the International Journal of Image and Graphics (IJIG), Book Editor, The Kluwer International Series on Biometrics, and an Associate Editor of several international journals. His research interests include automated biometrics-based authentication, pattern recognition, biometric technology and systems. As a principal investigator, he has finished many biometrics projects since 1980. So far, he has published over 200 papers and 10 books.

    About the Author—CHENGKE WU received his B.Sc. in Wireless Communication in XIDIAN University in 1961. He is a professor with the School of Telecommunications Engineering and the State Key Laboratory of Integrated Service Networks, XIDIAN University. He was a visiting scholar in University of Pennsylvania, USA from 1980 to 1982, visiting professor in Nancy University, France, from 1990 to 1991, and visiting professor in The Chinese University of Hong Kong in 2000, 2001 and 2002, respectively. Professor Wu's research interests include image/video coding and transmission, multimedia, computer vision, etc. As a principle investigator, Professor Wu has successfully completed many projects, including the 863 High Technology Program of China and Natural Science Foundation of China (NSFC) Key Grant. He has won many awards, published four monographs and over 100 technical papers.

    This work is partially supported by the National Science Foundation Council of China under Grants 60532060 and 60775020.

    View full text