Journal of Visual Communication and Image Representation
Sub-scene segmentation using constraints based on Gestalt principles
Introduction
Image segmentation aims to partition an image into non-overlapping homogeneous regions and is fundamental for all kinds of image processing and computer vision applications such as object and saliency detection [1], [2], semantic annotation [3], [4], event detection [5], and hierachical scene understanding [6].
Despite years of research, image segmentation remains a very challenging problem because it is inherently an ill-posed and ambiguous problem [7]. There are various possibilities to perceive and segment an image because people have different preferences. Besides, the “correct” segmentation may be different according to different visual tasks. To address the problem of ambiguity in segmentation, Arbelaez et al. proposed to collect human labeled boundaries as ground truth and perform the segmentation in a supervised manner [8]. Following this trend, the supervised methods usually emphasize on estimating the boundary probabilities rather than achieving integrated regions [9], [10]. The drawback is that the boundary may be in a discontinuity state and the disjointed edges affect visual perception when closed contours are preferred [11].
It is of course more challenging and demanding in unsupervised image segmentation. As the general purpose of unsupervised image segmentation is to derive segments which are suitable for human perception, relying on human perceptual rules from psychology is inevitably one of the major directions. Perceptual rules have been carefully studied and are used in many unsupervised segmentation methods [7], [12], [13], [14], [15], [16]. The most widely used is the Gestalt principles [17]. Gestalt is a psychology term that means unified whole. It refers to the theory which describes how people tend to group visual elements when certain principles are fulfilled. It concludes principles such as continuity, closure, similarity and proximity. However, there are difficulties to quantize them in mathematics since these principles are abstract psychology concepts. Actually, only a few principles such as similarity and proximity are used in literature, and they are interpreted in a simplified way. According to the similarity principle, regions with the most similar appearances are considered to be merged [7], [12], [13], [14], [15], [16]. In realizing of the proximity principle, only neighboring regions are actually merged [6], [16], [18]. It is necessary to accomplish the perceptual rules more deeply to further improve image segmentations.
Besides, it has long been identified that there is a “semantic gap” between the segmented patches and the semantic entities that can be readily used. Both Malisiewicz and Efros [1] and Jianping et al. [6] stated that homogenous segmented patches may not correspond to physical objects in the real world. The fundamental reason for this semantic gap roots in the limitation of current objectives of image segmentation which focuses on detecting precise boundaries [10] and producing homogenous regions. Therefore, any slight change in the image is captured and objects that are segmented into several parts are acceptable. However, these parts are needed to be integrated together to meet human expectations. Use global image context or apply corresponding perceptual rules to piece together the segmented parts and form semantically consistent regions is necessary [4], [6].
In this paper, an unsupervised sub-scene segmentation method is proposed to narrow the semantic gap. The notion of the sub-scene is intuitively derived from human perception towards a scene. When a person sees a scene, he may partition the scene into several sub-scenes, where the sub-scene fulfills certain “function” and the meaning of the entire scene is probably derived by combining the functions of the sub-scenes. The notion of sub-scene used in this paper may appear to be similar to semantic segmentation such as [19], [20]. However, the major difference is that the sub-scene here is not confined to fixed categories set beforehand and it does not need to go through a training step neither. Several perceptual rules are explored based on human psychology such as proximity grouping, area of influence by objects and harmony, and they are transformed into constraints which can be applied to low level features. With a self-determined retrieval approach, sub-scenes can be generated automatically. The contributions of the proposed method are:
- 1.
Proximity grouping is formulated more appropriately using influence areas instead of being restricted to neighboring pairs;
- 2.
Balancing between proximity grouping and similarity grouping is achieved by a self-determined optimal retrieval strategy; and
- 3.
The unimportant details are ignored and a more integrated segmentation result is achieved.
The paper is organized as follows. In Section 2, the proposed method is presented in details. Section 3 describes the experiments on three datasets, where each dataset emphasizes a different aspect of scenes. Comparison and discussion are given for each one of them. Section 4 concludes the paper.
Section snippets
Problem formulation
Consider I as a given image, one way to partition the image into M regions is , where represents region i. The common split-and-merge approach towards image segmentation is to first generate a number of superpixels and then gradually merge them until a stop criterion is satisfied; or complete the merging steps to the end as and then select the optimal segmentation from the entire process. The optimal is the partition that minimizes the cost
Experiments and results
The experiments are conducted on three datasets in order to evaluate our proposed method thoroughly. The first dataset is the Berkeley segmentation dataset [29]. This dataset is widely used in the research community that allows us to compare our results with the other state-of-the-art methods. However, it may not so suitable for sub-scene evaluation because there are many scenes in the dataset with close-up shots. Thus, the indoor scene dataset [30] and the Stanford background dataset [31] have
Conclusions
In this paper, a new method of sub-scene segmentation is proposed. The sub-scene segments are meaningful entities which ignore unimportant details compared to conventional segmentation results. The unsupervised sub-scene segmentation is conducted by using properties including proximity grouping, area of influence, similarity and harmony which are explored based on psychological principles. These properties are formulated into constraints and a self-determined optimal retrieval is conducted to
References (35)
- T. Malisiewicz, A.A. Efros, Improving spatial support for objects via multiple segmentations, in: British Mashine...
- S. Goferman, L. Zelnik-Manor, A. Tal, Context-aware saliency detection, in: IEEE Transactions on Pattern Analysis and...
- J. Shotton, M. Johnson, R. Cipolla, Semantic texton forests for image categorization and segmentation, in: Computer...
- C. Xi, A. Jain, A. Gupta, L. S. Davis, Piecing together the segmentation jigsaw using context, in: Computer Vision and...
- L. Li-Jia, F.-F. Li, What, where and who? Classifying events by scene and object recognition, in: International...
- F. Jianping, G. Yuli, L. Hangzai, R. Jain, Mining multilevel image semantics via hierarchical classification, in: IEEE...
- B. Peng, L. Zhang, D. Zhang, A survey of graph theoretical approaches to image segmentation, in: Pattern Recognition,...
- P. Arbelaez, Boundary extraction in natural images using ultrametric contour maps, in: Computer Vision and Pattern...
- J. Mairal, M. Leordeanu, F. Bach, M. Hebert, J. Ponce, Discriminative sparse image models for class-specific edge...
- P. Arbelaez, M. Maire, C. Fowlkes, J. Malik, Contour detection and hierarchical image segmentation, in: IEEE...
Psychology
Cited by (5)
A novel visual saliency detection method for infrared video sequences
2017, Infrared Physics and TechnologyCitation Excerpt :It refers to the theory which describes how people tend to group visual elements when certain principles are fulfilled. It concludes principles such as proximity, similarity, and continuity [25,28]. Specifically, the proximity principle reveals that when similar elements are placed in close proximity of each other, they are more likely to be perceived as belonging to a group.
Fixation data analysis for complex high-resolution satellite images*
2021, Geocarto InternationalReference data preparation for complex satellite image segmentation
2020, IET Image ProcessingDominance of perceptual grouping over functional category: An eye tracking study of high-resolution satellite images
2018, Proceedings of SPIE - The International Society for Optical EngineeringRobust vehicle edge detection by cross filter method
2015, Proceedings - Applied Imagery Pattern Recognition Workshop