Multimodality semantic segmentation based on polarization and color images
Introduction
Semantic segmentation, which is also known as scene/image parsing or image understanding, aims to divide an image into predefined meaningful non-overlapped regions (e.g. car, grass, road, etc). As an important task in intelligent vehicle (IV) applications, its ultimate goal is to equip IV with the ability to understand the surrounding environment. Other IV tasks, such as pedestrian detection, obstacle detection or road surface estimation, could benefit from semantic segmentation.
The substantial development of image classification, object detection, and superpixel segmentation in the past few years have boosted the research in the supervised scene parsing. However, the challenges ranging from feature representation to model design and optimization are still not fully resolved. Up to feature extraction, most methods extract features from RGB or gray level images. Since local low-level features are sensitive to perspective variations, researchers tried to solve this problem through the multimodality manner, by combining some other information with RGB images to give a better performance, such as RGB-D images [1], and geometry information [2] etc. In another aspect, some special illumination cases, such as reflective surfaces (too bright) or dark shaded surfaces, would appear to cover real texture or feature information, hence limiting the algorithm’s performance. Considering this limitation, we adopt polarization image as a new source of information, as multimodality image parsing algorithm, to improve the classification result.
Light is polarized once it is reflected from a surface. The light polarization properties are related to different surface materials, surface geometry structures, the roughness of the surfaces etc. So that these characteristics are coded implicitly in the light polarization state. In this point of view, polarization attributes can provide description of some surface features that can not be offered by color images. It is worth to know that, these attributes are still kept distinguishable under high reflection or in shadow areas, where the color-image based methods fail to produce reliable results.
In computer vision, there are many indoor polarization applications under ideal lighting conditions since early 1990s, e.g., surface modeling, shape recovery, and reflectance analysis. However, not much outdoor applications have been realized. The reason is that the outdoor incident and reflect light are extremely complex. To the best of our knowledge, no work in the literature has applied polarization in semantic segmentation, this is the first work which attempts to utilize polarization information as features for outdoor image processing applications.
In this paper, we propose to combine the polarization images (resulted from polarization state of each pixel) with the color images to improve the accuracy of image semantic segmentation. The combination method, more specifically, is through the HOG, LBP and LAB features that are extracted on both the polarization images and the color images independently. These features are concatenated and feed into a joint boosting classifier, a feature selection based classifier known for its facility to integrate new sources of features. In the training process, the classifier randomly selects different polarization features and color features from the input space to produce the polarization-based semantic segmentation results. In comparison, we repeat the same algorithm, which extracts the HOG, LBP and LAB features on, however, only color images. After training another joint boosting classifier, the color-based semantic segmentation results are given. The comparison shows that the accuracy of the semantic segmentation is improved thanks to the included polarization features.
Section snippets
Semantic segmentation
As very classical methods in image parsing, bottom-up semantic segmentation methods usually pursue the following pipelines [3]: (1) Grouping nearby pixels to image patches according to the local homogeneity. For this step, there exists methods like K-means, mean shift, Simple Linear Iterative Clustering (SLIC) [4], normalized-cut [5] etc; (2) Extracting local features, e.g., HOG, LBP, texture or curvature, from each patch; (3) Feeding the extracted features and hand-labeled ground truth to a
Polarization applied on semantic segmentation
In this section, we describe the proposed multimodality semantic segmentation algorithm using polarization and color images. This method follows four steps as shown in Fig. 2. First, we use local descriptors to describe the input image. This step is applied on both polarization and color images, so as to integrate information via different sources. These local descriptor vectors are then quantized through a clustered codebook which formulates the codebook maps as Fig. 3. As the final step, the
Efficient application
In the real application, we apply two strategies regarding the time efficiency of the algorithm during the training process.
Firstly, we propose to apply a pixel sampling process before feeding all features into the training model. The reason is that using all the pixels in the image is too much consuming, and that neighboring pixels always carry similar information. In [10], a center pixel subsampling was performed over 3 × 3 or 5 × 5 grid to reduce training samples. Since this process is
Data set
The experiment was applied on our polar-image data sets which contain 21 images at 320 × 240 pixels. The Day-set includes 10 images and the Dusk-set 11 images (examples shown in Fig. 4). The Dusk-set used 6 images for training and 5 images for testing, while the Day-set used 6 images for training and 4 images for testing. These images were labeled using LableME [24]. We defined 6 classes being car, road, tree, sky, building, and grass. Pixels which do not correspond to any of these classes are
Conclusion
In this paper, we have proposed a method to apply polarization image on semantic segmentation. The HOG, LBP and LAB features have been extracted from polarization images, being DOP and AOP. These features have been concatenated with the color-based features as the input of the joint boosting classifier. This classifier has been used since it adapts well to combine different features, since it is principally a feature-selection based classifier. In this way, the polarization-based feature has
Fan Wang received the B.S. degree in Electronic and Information Engineering from the Xidian University, Xi’an, China. She is currently pursuing the Ph.D. degree with the Laboratory LITIS, INSA de Rouen, France. Her current research interests concern the applications of the Polarization image in computer vision and intelligent vehicle.
References (24)
- et al.
Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation
J. Visual Commun. Image Represent.
(2016) Polarization vision: a new sensory approach to image understanding
Image Vis. Comput.
(1997)- et al.
Object separation by polarimetric and spectral imagery fusion
Comput. Vis. Image Underst.
(2009) - et al.
Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation
Int. J. Comput. Vis.
(2015) - et al.
Superparsing
Int. J. Comput. Vis.
(2013) - et al.
Slic superpixels compared to state-of-the-art superpixel methods
IEEE Trans. Pattern Anal. Mach. Intell.
(2012) - et al.
Normalized cuts and image segmentation
IEEE Trans. Pattern Anal. Mach. Intell.
(2000) - et al.
Turbopixels: fast superpixels using geometric flows
IEEE Trans. Pattern Anal. Mach. Intell.
(2009) Object recognition from local scale-invariant features
Proceedings of the seventh IEEE International Conference on Computer vision, 1999
(1999)- et al.
Histograms of oriented gradients for human detection
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005
(2005)
Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
IEEE Trans. Pattern Anal. Mach. Intell.
Semantic texton forests for image categorization and segmentation
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR 2008
Cited by (21)
Global feature-based multimodal semantic segmentation
2024, Pattern RecognitionPolarization-based optical characterization for color texture analysis and segmentation
2022, Pattern Recognition LettersCitation Excerpt :The rotation of the polarization is illustrated in Fig. 2 where the two glasses of the polarizer filter are depicted with different hatching. A similar approach is presented in [3], where only the DoP and AoP are used in conjunction with several local texture description - HOG, LBP and Color feature – for the purpose of semantic image segmentation. The authors of [32] propose a convolutional neural network named “Efficient Attention-bridged Fusion Network” (EAFNet) by fusing features from RGB and polarization images (DoP and AoP) captured with an integrated multimodal sensor used in autonomous driving.
Fabrication and performance analysis of infrared InGaAs polarimetric detector with complete coverage of superpixel-structured grating
2022, Infrared Physics and TechnologyCitation Excerpt :Polarization is one of the basic characteristics of light. The polarization characteristics of reflected or radiated light of an object reflect its own properties, including surface roughness [1], surface material [2], surface geometry [3], tissue characteristics [4], edge characteristics [5], etc. Polarization provide information that is largely irrelevant to spectral and intensity images [1].
Optical flow estimation using channel attention mechanism and dilated convolutional neural networks
2019, NeurocomputingCitation Excerpt :Moreover, these methods cannot learn weights from large amount of data and most of them are time consuming for real applications. Recently, convolutional neural networks have made rapid progress in many computer vision tasks, such as image classification [6], object recognition [7], semantic segmentation [8], depth estimation [9], and person re-identification [10]. Learning optical flow based on convolutional neural networks is first proposed by Dosovitskiy et al. [11], which designs a novel network named FlowNet based on encoder-decoder architecture.
A comprehensive review of fruit and vegetable classification techniques
2018, Image and Vision ComputingCitation Excerpt :The concept of multi-feature fusion as a combination of rotation-invariants Local Binary Patterns (LBP), RGB histogram distribution, weighted histograms, region connection statistics and multi-label k-nearest neighbour fusion has been analysed with the existing techniques of automated annotation in Ref. [106]. This concept has been used for segmentation of images using Histogram of Oriented Gradients (HOG) and LBP as feature fusion on RGB and polarised images separately, and improved segmentation results has been presented in [107]. This concept can be used with other significant classifiers for better segmentation.
Segmentation of images by color features: A survey
2018, NeurocomputingCitation Excerpt :The CIF compensates the difficulty of the LBP-based operator on describing color distributions. Wang et al. [168] proposed to combine the polarization images, resulted from polarization state of each pixel, with the color images to improve the accuracy of image semantic segmentation. The combination method, more specifically, is through the HOG feature [29] and LBP [124] features that are extracted on both the polarization image and the color images independently.
Fan Wang received the B.S. degree in Electronic and Information Engineering from the Xidian University, Xi’an, China. She is currently pursuing the Ph.D. degree with the Laboratory LITIS, INSA de Rouen, France. Her current research interests concern the applications of the Polarization image in computer vision and intelligent vehicle.
Samia Ainouz received her Ph.D. degree in image processing from Louis Pasteur University, Strasbourg. She carried out her postdoctoral work in 3D vision at Le2i UMR 6306 CNRS Lab. Since September 2008, she has worked as an associate professor in the LITIS Lab with the Intelligent Transportation Systems Team. Her main research interests are polarization imaging, stereovision, catadioptric vision, and applications of these technics to intelligent vehicles.
Chunfeng Lian is currently pursuing the Ph.D. degree with the Laboratory LITIS University of Rouen, France. His research interests include information fusion, pattern recognition, and medical image analysis.
Abdelaziz Bensrhair graduated with an M.Sc in electrical engineering (1989) and a Ph.D. degree in computer science (1992) at the University of Rouen, France. From 1992 to 1999, he was an assistant professor in the Physic and Instrumentation Department, University of Rouen. He is currently a professor in information systems architecture department, head of Intelligent Transportation Systems Division and co-director of the Computer Science, Information Processing, and Systems, Laboratory (LITIS) of the National Institute of Applied Science Rouen (INSAR).