Multimodality semantic segmentation based on polarization and color images

doi:10.1016/j.neucom.2016.10.090

Neurocomputing

Volume 253, 30 August 2017, Pages 193-200

https://doi.org/10.1016/j.neucom.2016.10.090 Get rights and content

Abstract

Semantic segmentation gives a meaningful class label to every pixel in an image. It enables intelligent devices to understand the scene and has received sufficient attention during recent years. Traditional imaging systems always apply their methods on RGB, RGB-D or even RGB combined with geometric information. However, for outdoor applications, strong reflection or poor illumination appears to reduce the visualization of the real shape or texture of the objects, thus limiting the performance of semantic segmentation algorithms. To tackle this problem, this paper adopts polarization imaging as it can provide complementary information by describing some imperceptible light properties, which varies from different materials. For acceleration, SLIC superpixel segmentation is used to speed up the system. HOG and LBP features are extracted from both color and polarization images. After quantization using visual codebooks, Joint Boosting classifier is trained to label each pixel based on the quantized features. The proposed method was evaluated both on Day-set and Dusk-set. The experimental results show that using polarization setup can provide complementary information to improve the semantic segmentation accuracy. Especially, a large improvement on Dusk-set shows its capacity for intelligent vehicle applications under dark illumination condition.

Introduction

Semantic segmentation, which is also known as scene/image parsing or image understanding, aims to divide an image into predefined meaningful non-overlapped regions (e.g. car, grass, road, etc). As an important task in intelligent vehicle (IV) applications, its ultimate goal is to equip IV with the ability to understand the surrounding environment. Other IV tasks, such as pedestrian detection, obstacle detection or road surface estimation, could benefit from semantic segmentation.

The substantial development of image classification, object detection, and superpixel segmentation in the past few years have boosted the research in the supervised scene parsing. However, the challenges ranging from feature representation to model design and optimization are still not fully resolved. Up to feature extraction, most methods extract features from RGB or gray level images. Since local low-level features are sensitive to perspective variations, researchers tried to solve this problem through the multimodality manner, by combining some other information with RGB images to give a better performance, such as RGB-D images [1], and geometry information [2] etc. In another aspect, some special illumination cases, such as reflective surfaces (too bright) or dark shaded surfaces, would appear to cover real texture or feature information, hence limiting the algorithm’s performance. Considering this limitation, we adopt polarization image as a new source of information, as multimodality image parsing algorithm, to improve the classification result.

Light is polarized once it is reflected from a surface. The light polarization properties are related to different surface materials, surface geometry structures, the roughness of the surfaces etc. So that these characteristics are coded implicitly in the light polarization state. In this point of view, polarization attributes can provide description of some surface features that can not be offered by color images. It is worth to know that, these attributes are still kept distinguishable under high reflection or in shadow areas, where the color-image based methods fail to produce reliable results.

In computer vision, there are many indoor polarization applications under ideal lighting conditions since early 1990s, e.g., surface modeling, shape recovery, and reflectance analysis. However, not much outdoor applications have been realized. The reason is that the outdoor incident and reflect light are extremely complex. To the best of our knowledge, no work in the literature has applied polarization in semantic segmentation, this is the first work which attempts to utilize polarization information as features for outdoor image processing applications.

In this paper, we propose to combine the polarization images (resulted from polarization state of each pixel) with the color images to improve the accuracy of image semantic segmentation. The combination method, more specifically, is through the HOG, LBP and LAB features that are extracted on both the polarization images and the color images independently. These features are concatenated and feed into a joint boosting classifier, a feature selection based classifier known for its facility to integrate new sources of features. In the training process, the classifier randomly selects different polarization features and color features from the input space to produce the polarization-based semantic segmentation results. In comparison, we repeat the same algorithm, which extracts the HOG, LBP and LAB features on, however, only color images. After training another joint boosting classifier, the color-based semantic segmentation results are given. The comparison shows that the accuracy of the semantic segmentation is improved thanks to the included polarization features.

Section snippets

Semantic segmentation

As very classical methods in image parsing, bottom-up semantic segmentation methods usually pursue the following pipelines [3]: (1) Grouping nearby pixels to image patches according to the local homogeneity. For this step, there exists methods like K-means, mean shift, Simple Linear Iterative Clustering (SLIC) [4], normalized-cut [5] etc; (2) Extracting local features, e.g., HOG, LBP, texture or curvature, from each patch; (3) Feeding the extracted features and hand-labeled ground truth to a

Polarization applied on semantic segmentation

In this section, we describe the proposed multimodality semantic segmentation algorithm using polarization and color images. This method follows four steps as shown in Fig. 2. First, we use local descriptors to describe the input image. This step is applied on both polarization and color images, so as to integrate information via different sources. These local descriptor vectors are then quantized through a clustered codebook which formulates the codebook maps as Fig. 3. As the final step, the

Efficient application

In the real application, we apply two strategies regarding the time efficiency of the algorithm during the training process.

Firstly, we propose to apply a pixel sampling process before feeding all features into the training model. The reason is that using all the pixels in the image is too much consuming, and that neighboring pixels always carry similar information. In [10], a center pixel subsampling was performed over 3 × 3 or 5 × 5 grid to reduce training samples. Since this process is

Data set

The experiment was applied on our polar-image data sets which contain 21 images at 320 × 240 pixels. The Day-set includes 10 images and the Dusk-set 11 images (examples shown in Fig. 4). The Dusk-set used 6 images for training and 5 images for testing, while the Day-set used 6 images for training and 4 images for testing. These images were labeled using LableME [24]. We defined 6 classes being car, road, tree, sky, building, and grass. Pixels which do not correspond to any of these classes are

Conclusion

In this paper, we have proposed a method to apply polarization image on semantic segmentation. The HOG, LBP and LAB features have been extracted from polarization images, being DOP and AOP. These features have been concatenated with the color-based features as the input of the joint boosting classifier. This classifier has been used since it adapts well to combine different features, since it is principally a feature-selection based classifier. In this way, the polarization-based feature has

Fan Wang received the B.S. degree in Electronic and Information Engineering from the Xidian University, Xi’an, China. She is currently pursuing the Ph.D. degree with the Laboratory LITIS, INSA de Rouen, France. Her current research interests concern the applications of the Polarization image in computer vision and intelligent vehicle.

References (24)

H. Zhu et al.
Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation
J. Visual Commun. Image Represent.
(2016)
L.B. Wolff
Polarization vision: a new sensory approach to image understanding
Image Vis. Comput.
(1997)
ZhaoY. et al.
Object separation by polarimetric and spectral imagery fusion
Comput. Vis. Image Underst.
(2009)
S. Gupta et al.
Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation
Int. J. Comput. Vis.
(2015)
J. Tighe et al.
Superparsing
Int. J. Comput. Vis.
(2013)
R. Achanta et al.
Slic superpixels compared to state-of-the-art superpixel methods
IEEE Trans. Pattern Anal. Mach. Intell.
(2012)
ShiJ. et al.
Normalized cuts and image segmentation
IEEE Trans. Pattern Anal. Mach. Intell.
(2000)
A. Levinshtein et al.
Turbopixels: fast superpixels using geometric flows
IEEE Trans. Pattern Anal. Mach. Intell.
(2009)
D.G. Lowe
Object recognition from local scale-invariant features
Proceedings of the seventh IEEE International Conference on Computer vision, 1999
(1999)
N. Dalal et al.
Histograms of oriented gradients for human detection
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005
(2005)

T. Ojala et al.

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

IEEE Trans. Pattern Anal. Mach. Intell.

(2002)

J. Shotton et al.

Semantic texton forests for image categorization and segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR 2008

(2008)

Cited by (21)

Global feature-based multimodal semantic segmentation
2024, Pattern Recognition
Incorporating complementary modality into RGB branch can significantly improve the effectiveness of semantic segmentation. However, fusion between the two modalities faces huge challenge due to the difference of their optical dimensions. Existed fusion methods can't keep a balance between performance and efficiency in aggregating detailed features. To address this problem, we propose a global feature-based network (GFBN) for semantic segmentation that establishes mapping function and extraction relationship among the multi-modalities. The GFBN contains three important modules, which are used for feature correction, fusion and edge enhancement. Firstly, the cross-attention rectification module (CARM) adaptively extracts mapping relationships and rectifies the RGB and complementary features. Secondly, the cross-field fusion module (CFM) integrates long-range rectified features of two branches to obtain an optimal fusion feature. Finally, the boundary guidance module (BGM) sharpens the boundary information of the fused features to effectively improve the segmentation accuracy of object boundaries. We make the experiments of GFBN on the challenging MCubeS and ZJU-RGB-Ps datasets. The results show that GFBN outperforms state-of-the-art methods by at least 0.64 % and 0.7 % on mean intersection over union (mIoU), respectively. It demonstrates the performance and efficiency of our proposed method. The code corresponding to our method can be found at the following link: https://github.com/Sci-Epiphany/GFBNext.
Polarization-based optical characterization for color texture analysis and segmentation
2022, Pattern Recognition Letters
Citation Excerpt :
The rotation of the polarization is illustrated in Fig. 2 where the two glasses of the polarizer filter are depicted with different hatching. A similar approach is presented in [3], where only the DoP and AoP are used in conjunction with several local texture description - HOG, LBP and Color feature – for the purpose of semantic image segmentation. The authors of [32] propose a convolutional neural network named “Efficient Attention-bridged Fusion Network” (EAFNet) by fusing features from RGB and polarization images (DoP and AoP) captured with an integrated multimodal sensor used in autonomous driving.
Texture characterization is very useful for automatic analysis of object surface images for a plethora of applications in medicine, agriculture, industry or remote sensing. Various texture characterization techniques exist, from the classical Haralick descriptors, Gabor filters, local binary patterns to automatically-extracted features using machine learning models. We propose a new hand-crafted texture characterization technique, based on light polarization property, by deploying a circular polarization filter (rotated from 0° to 360° in steps of 10°) in the image acquisition process. The hypothesis is that different materials and surfaces will exhibit different polarization signatures defined as pixel values variation as a function of polarization angle. Such polarization signature is able to locally characterize texture as a consequence of light reflections captured in every pixel due to the texture intrinsic variations. We show the usefulness of our approach for surface/material classification for the purpose of color image segmentation of natural outdoor scenes.
Fabrication and performance analysis of infrared InGaAs polarimetric detector with complete coverage of superpixel-structured grating
2022, Infrared Physics and Technology
Citation Excerpt :
Polarization is one of the basic characteristics of light. The polarization characteristics of reflected or radiated light of an object reflect its own properties, including surface roughness [1], surface material [2], surface geometry [3], tissue characteristics [4], edge characteristics [5], etc. Polarization provide information that is largely irrelevant to spectral and intensity images [1].
Polarization detection is widely used in various fields due to its unique advantages in increasing the dimensionality of information detected effectively. Here, we analyzed the effects of signal crosstalk among pixels and grating morphology on the polarization performance of SWIR InGaAs detector integrated with superpixel-structured subwavelength aluminum grating by test and FDTD simulation. Based on the analysis results, a superpixel-structured subwavelength aluminum grating was integrated on InGaAs FPA to realize the full coverage of grating on a 320 × 256 SWIR InGaAs detector. The detector has a responsivity non-uniformity of less than 6%. The pixel operability is better than 99.99% and peak detectivity reaches 5.99 × 10¹¹ cm·Hz^1/2/W at room temperature. An extinction ratio of up to 18:1 at 1064 nm is realized.
Optical flow estimation using channel attention mechanism and dilated convolutional neural networks
2019, Neurocomputing
Citation Excerpt :
Moreover, these methods cannot learn weights from large amount of data and most of them are time consuming for real applications. Recently, convolutional neural networks have made rapid progress in many computer vision tasks, such as image classification [6], object recognition [7], semantic segmentation [8], depth estimation [9], and person re-identification [10]. Learning optical flow based on convolutional neural networks is first proposed by Dosovitskiy et al. [11], which designs a novel network named FlowNet based on encoder-decoder architecture.
Learning optical flow based on convolutional neural networks has made great progress in recent years. These approaches usually design an encoder-decoder network that can be trained end-to-end. In encoder part, high-level feature information is extracted through a series of strided convolution, which is similar to most image classification networks. In contrast to classification task, spatial feature maps are then enlarged to full scale of input by conducting successive deconvolution layer in decoder part. However, optical flow estimation is a pixel-level task, and blurry flow fields are usually generated, which is caused by unrefined features and low-resolution. To address this problem, we propose a novel network, which combines attention mechanism and dilated convolutional neural network. In this network, the channel-wise features are adaptively weighted by building interdependencies among channels, which can weaken the weights of useless features and can enhance the directivity of feature extraction. Meanwhile, spatial precision is achieved by employing dilated convolution which improves the receptive field without large computational source and keeps the spatial resolution of feature map unchanged. Our network is trained on FlyingChairs and FlyingThings3D datasets in a supervised manner. Extensive experiments are conducted on MPI-Sintel and KITTI datasets to verify the effectiveness of the proposed method. The experimental results show that attention mechanism and dilated convolution are beneficial for optical flow estimation. Moreover, our method achieves better accuracy and visual improvements comparing to most of recent approaches.
A comprehensive review of fruit and vegetable classification techniques
2018, Image and Vision Computing
Citation Excerpt :
The concept of multi-feature fusion as a combination of rotation-invariants Local Binary Patterns (LBP), RGB histogram distribution, weighted histograms, region connection statistics and multi-label k-nearest neighbour fusion has been analysed with the existing techniques of automated annotation in Ref. [106]. This concept has been used for segmentation of images using Histogram of Oriented Gradients (HOG) and LBP as feature fusion on RGB and polarised images separately, and improved segmentation results has been presented in [107]. This concept can be used with other significant classifiers for better segmentation.
Recent advancements in computer vision have enabled wide-ranging applications in every field of life. One such application area is fresh produce classification, but the classification of fruit and vegetable has proven to be a complex problem and needs to be further developed. Fruit and vegetable classification presents significant challenges due to interclass similarities and irregular intraclass characteristics. Selection of appropriate data acquisition sensors and feature representation approach is also crucial due to the huge diversity of the field. Fruit and vegetable classification methods have been developed for quality assessment and robotic harvesting but the current state-of-the-art has been developed for limited classes and small datasets. The problem is of a multi-dimensional nature and offers significantly hyperdimensional features, which is one of the major challenges with current machine learning approaches. Substantial research has been conducted for the design and analysis of classifiers for hyperdimensional features which require significant computational power to optimise with such features. In recent years numerous machine learning techniques for example, Support Vector Machine (SVM), K-Nearest Neighbour (KNN), Decision Trees, Artificial Neural Networks (ANN) and Convolutional Neural Networks (CNN) have been exploited with many different feature description methods for fruit and vegetable classification in many real-life applications. This paper presents a critical comparison of different state-of-the-art computer vision methods proposed by researchers for classifying fruit and vegetable.
Segmentation of images by color features: A survey
2018, Neurocomputing
Citation Excerpt :
The CIF compensates the difficulty of the LBP-based operator on describing color distributions. Wang et al. [168] proposed to combine the polarization images, resulted from polarization state of each pixel, with the color images to improve the accuracy of image semantic segmentation. The combination method, more specifically, is through the HOG feature [29] and LBP [124] features that are extracted on both the polarization image and the color images independently.
Image segmentation is an important stage for object recognition. Many methods have been proposed in the last few years for grayscale and color images. In this paper, we present a deep review of the state of the art on color image segmentation methods; through this paper, we explain the techniques based on edge detection, thresholding, histogram-thresholding, region, feature clustering and neural networks. Because color spaces play a key role in the methods reviewed, we also explain in detail the most commonly color spaces to represent and process colors. In addition, we present some important applications that use the methods of image segmentation reviewed. Finally, a set of metrics frequently used to evaluate quantitatively the segmented images is shown.

View all citing articles on Scopus

Samia Ainouz received her Ph.D. degree in image processing from Louis Pasteur University, Strasbourg. She carried out her postdoctoral work in 3D vision at Le2i UMR 6306 CNRS Lab. Since September 2008, she has worked as an associate professor in the LITIS Lab with the Intelligent Transportation Systems Team. Her main research interests are polarization imaging, stereovision, catadioptric vision, and applications of these technics to intelligent vehicles.

Chunfeng Lian is currently pursuing the Ph.D. degree with the Laboratory LITIS University of Rouen, France. His research interests include information fusion, pattern recognition, and medical image analysis.

Abdelaziz Bensrhair graduated with an M.Sc in electrical engineering (1989) and a Ph.D. degree in computer science (1992) at the University of Rouen, France. From 1992 to 1999, he was an assistant professor in the Physic and Instrumentation Department, University of Rouen. He is currently a professor in information systems architecture department, head of Intelligent Transportation Systems Division and co-director of the Computer Science, Information Processing, and Systems, Laboratory (LITIS) of the National Institute of Applied Science Rouen (INSAR).

View full text

Multimodality semantic segmentation based on polarization and color images

Abstract

Introduction

Section snippets

Semantic segmentation

Polarization applied on semantic segmentation

Efficient application

Data set

Conclusion

J. Visual Commun. Image Represent.

Image Vis. Comput.

Comput. Vis. Image Underst.

Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation

Int. J. Comput. Vis.

Superparsing

Int. J. Comput. Vis.

Slic superpixels compared to state-of-the-art superpixel methods

IEEE Trans. Pattern Anal. Mach. Intell.

Normalized cuts and image segmentation

IEEE Trans. Pattern Anal. Mach. Intell.

Turbopixels: fast superpixels using geometric flows

IEEE Trans. Pattern Anal. Mach. Intell.

Object recognition from local scale-invariant features

Proceedings of the seventh IEEE International Conference on Computer vision, 1999

Histograms of oriented gradients for human detection

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

IEEE Trans. Pattern Anal. Mach. Intell.

Semantic texton forests for image categorization and segmentation

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR 2008