Interactive image segmentation by maximal similarity based region merging☆
Introduction
Image segmentation is to separate the desired objects from the background. In general, the color and texture features in a natural image are very complex so that the fully automatic segmentation of the object from the background is very hard. Therefore, semi-automatic segmentation methods incorporating user interactions have been proposed [2], [4], [10], [13], [17], [20], [21], [24] and are becoming more and more popular. For instance, in the active contour model (ACM), i.e. the snake algorithm [2], a proper selection of the initial curve by the user could lead to a good convergence to the true object contour. Similarly, in the graph cut algorithm [10], [11], [12], the prior information obtained by the users is critical to the segmentation performance.
The low level image segmentation methods, such as mean shift [5], [6], watershed [3], level set [15] and super-pixel [28], usually divide the image into many small regions. Although may have severe over segmentation, these low level segmentation methods provide a good basis for the subsequent high level operations, such as region merging. For example, in [17], [18], Li et al. combined graph cut with watershed pre-segmentation for better segmentation outputs, where the segmented regions by watershed, instead of the pixels in the original image, are regarded as the nodes of graph cut. As a popular segmentation scheme for color image, mean shift [6] can have less over segmentation than watershed while preserving well the edge information of the object (Fig. 1a shows an example). Because of less over segmentation, the statistic features of each region, which will be exploited by the proposed region merging method, can be more robustly calculated and then be used in guiding the region merging process.
In this paper, we proposed a novel interactive region merging method based on the initial segmentation of mean shift. In the proposed scheme, the interactive information is introduced as markers, which are input by the users to roughly indicate the position and main features of the object and background. The markers can be the simple strokes (e.g. the green and blue lines in Fig. 1b). Then the proposed method will calculate the similarity of different regions and merge them based on the proposed maximal similarity rule with the help of these markers. The object will then be extracted from the background when the merging process ends (Fig. 1c shows an example of segmentation result).
Although the idea of introducing markers into interactive segmentation was used in Meyer's watershed scheme [4] and the graph cut schemes [10], [11], [12], this paper first uses it to guide the region merging for object contour extraction. The key contribution of the proposed method is a novel maximal similarity based region merging (MSRM) mechanism, which is adaptive to image content and does not require a preset threshold. With the proposed region merging algorithm, the non-marker background regions will be automatically merged and labeled, while the non-marker object regions will be identified and avoided from being merged with background. Once all the non-marker regions are labeled, the object contour can then be readily extracted from the background. The proposed algorithm is very simple but it can successfully extract the objects from complex scenes.
The rest of the paper is organized as follows. Section 2 presents the proposed region merging algorithm. Section 3 performs extensive experiments to verify the proposed method. Section 4 concludes the paper.
Section snippets
Maximal-similarity based region merging
In our method, an initial segmentation is required to partition the image into homogeneous regions for merging. Any existing low level segmentation methods, such as super-pixel [28], mean shift [5], [6], watershed [3] and level set [15], can be used for this step. In this paper, we choose to use the mean shift method for initial segmentation because it has less over segmentation and can well preserve the object boundaries. Particularly, we use the mean shift segmentation software—the EDISON
Experimental results
The proposed MSRM method is essentially an adaptive region merging method. With the markers input by the user, it will automatically merge regions and label the non-marker regions as object or background. In Section 3.1, we first evaluate the MSRM method qualitatively by several representative examples; in Section 3.2, we compare it quantitatively with the well-known graph cut algorithm; in Section 3.3, we test the MSRM under different color spaces, distance metrics and initial segmentation; at
Conclusion
This paper proposed a novel region merging based interactive image segmentation method. The image is initially segmented by mean shift segmentation and the users only need to roughly indicate the main features of the object and background by using some strokes, which are called markers. Since the object regions will have high similarity to the marked object regions and so do the background regions, a novel maximal similarity based region merging mechanism was proposed to extract the object. The
About the Author—JIFENG NING received his College Diploma in Shenyang College of Technology in 1996 and Master of Engineering in Northwest A&F University in 2002. He is a lecturer with the College of Information Engineering, Northwest A&F University. Since 2004, he has been pursuing his Ph.D. degree in the State Key Laboratory of Integrated Service Networks, XIDIAN University. His research interests include computer vision, image segmentation and pattern recognition. He is now working as a
References (30)
- et al.
Morphological segmentation
Journal of Visual Communication and Image Representation
(1990) - et al.
Progressive cut: an image cutout algorithm that models user intentions
IEEE Multimedia
(2007) Stochastic image pyramids
Computer Vision, Graphics, and Image Processing (CVGIP)
(1989)The adaptive pyramid: a framework for 2D image analysis
Computer Vision, Graphics, and Image Processing (CVGIP): Image Understanding
(1992)The divergence and Bhattacharyya distance measures in signal selection
IEEE Transactions on Communications Technology
(1967)- et al.
Snake: active contour models
International Journal of Computer Vision
(1987) - et al.
Watersheds in digital spaces: an efficient algorithm based on immersion simulations
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1991) Mean shift, mode seeking, and clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1995)- et al.
Mean shift: a robust approach toward feature space analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2002) - C. Christoudias, B. Georgescu, P. Meer, Synergism in low level vision, in: Proceedings of the International Conference...
Efficient graph-based image segmentation
International Journal of Computer Vision
An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision
IEEE Transactions on Pattern Analysis and Machine Intelligence
What energy functions can be minimized via graph cuts
IEEE Transactions on Pattern Analysis and Machine Intelligence
Cited by (380)
Segmentation quality assessment by automated detection of erroneous surface regions in medical images
2023, Computers in Biology and MedicineFusion-based color and depth image segmentation method for rocks on conveyor belt
2023, Minerals EngineeringPopularity prediction for marketer-generated content: A text-guided attention neural network for multi-modal feature fusion
2022, Information Processing and ManagementAccurate estimation of biological age and its application in disease prediction using a multimodal image Transformer system
2024, Proceedings of the National Academy of Sciences of the United States of America
About the Author—JIFENG NING received his College Diploma in Shenyang College of Technology in 1996 and Master of Engineering in Northwest A&F University in 2002. He is a lecturer with the College of Information Engineering, Northwest A&F University. Since 2004, he has been pursuing his Ph.D. degree in the State Key Laboratory of Integrated Service Networks, XIDIAN University. His research interests include computer vision, image segmentation and pattern recognition. He is now working as a research assistant in the Department of Computing, The Hong Kong Polytechnic University.
About the Author—LEI ZHANG received the B.S. degree in 1995 from Shenyang Institute of Aeronautical Engineering, Shenyang, PR China, the M.S. and Ph.D. degrees in Electrical and Engineering from Northwestern Polytechnical University, Xi’an, PR China, respectively, in 1998 and 2001. From 2001 to 2002, he was a research associate in the Department of Computing, The Hong Kong Polytechnic University. From January 2003 to January 2006 he worked as a Postdoctoral Fellow in the Department of Electrical and Computer Engineering, McMaster University, Canada. Since January 2006, he has been an Assistant Professor in the Department of Computing, The Hong Kong Polytechnic University. His research interests include Image and Video Processing, Biometrics, Pattern Recognition, Computer Vision, Multisensor Data Fusion and Optimal Estimation Theory, etc.
About the Author—DAVID ZHANG graduated in Computer Science from Peking University in 1974 and received his M.Sc. and Ph.D. degrees in Computer Science and Engineering from the Harbin Institute of Technology (HIT), Harbin, PR China, in 1983 and 1985, respectively. He received the second Ph.D. degree in Electrical and Computer Engineering at the University of Waterloo, Waterloo, Canada, in 1994. From 1986 to 1988, he was a Postdoctoral Fellow at Tsinghua University, Beijing, China, and became an Associate Professor at Academia Sinica, Beijing, China. Currently, he is a Professor with the Hong Kong Polytechnic University, Hong Kong. He is Founder and Director of Biometrics Research Centers supported by the Government of the Hong Kong SAR (UGC/CRC). He is also Founder and Editor-in-Chief of the International Journal of Image and Graphics (IJIG), Book Editor, The Kluwer International Series on Biometrics, and an Associate Editor of several international journals. His research interests include automated biometrics-based authentication, pattern recognition, biometric technology and systems. As a principal investigator, he has finished many biometrics projects since 1980. So far, he has published over 200 papers and 10 books.
About the Author—CHENGKE WU received his B.Sc. in Wireless Communication in XIDIAN University in 1961. He is a professor with the School of Telecommunications Engineering and the State Key Laboratory of Integrated Service Networks, XIDIAN University. He was a visiting scholar in University of Pennsylvania, USA from 1980 to 1982, visiting professor in Nancy University, France, from 1990 to 1991, and visiting professor in The Chinese University of Hong Kong in 2000, 2001 and 2002, respectively. Professor Wu's research interests include image/video coding and transmission, multimedia, computer vision, etc. As a principle investigator, Professor Wu has successfully completed many projects, including the 863 High Technology Program of China and Natural Science Foundation of China (NSFC) Key Grant. He has won many awards, published four monographs and over 100 technical papers.
- ☆
This work is partially supported by the National Science Foundation Council of China under Grants 60532060 and 60775020.