Elsevier

Neurocomputing

Volume 253, 30 August 2017, Pages 193-200
Neurocomputing

Multimodality semantic segmentation based on polarization and color images

https://doi.org/10.1016/j.neucom.2016.10.090Get rights and content

Abstract

Semantic segmentation gives a meaningful class label to every pixel in an image. It enables intelligent devices to understand the scene and has received sufficient attention during recent years. Traditional imaging systems always apply their methods on RGB, RGB-D or even RGB combined with geometric information. However, for outdoor applications, strong reflection or poor illumination appears to reduce the visualization of the real shape or texture of the objects, thus limiting the performance of semantic segmentation algorithms. To tackle this problem, this paper adopts polarization imaging as it can provide complementary information by describing some imperceptible light properties, which varies from different materials. For acceleration, SLIC superpixel segmentation is used to speed up the system. HOG and LBP features are extracted from both color and polarization images. After quantization using visual codebooks, Joint Boosting classifier is trained to label each pixel based on the quantized features. The proposed method was evaluated both on Day-set and Dusk-set. The experimental results show that using polarization setup can provide complementary information to improve the semantic segmentation accuracy. Especially, a large improvement on Dusk-set shows its capacity for intelligent vehicle applications under dark illumination condition.

Introduction

Semantic segmentation, which is also known as scene/image parsing or image understanding, aims to divide an image into predefined meaningful non-overlapped regions (e.g. car, grass, road, etc). As an important task in intelligent vehicle (IV) applications, its ultimate goal is to equip IV with the ability to understand the surrounding environment. Other IV tasks, such as pedestrian detection, obstacle detection or road surface estimation, could benefit from semantic segmentation.

The substantial development of image classification, object detection, and superpixel segmentation in the past few years have boosted the research in the supervised scene parsing. However, the challenges ranging from feature representation to model design and optimization are still not fully resolved. Up to feature extraction, most methods extract features from RGB or gray level images. Since local low-level features are sensitive to perspective variations, researchers tried to solve this problem through the multimodality manner, by combining some other information with RGB images to give a better performance, such as RGB-D images [1], and geometry information [2] etc. In another aspect, some special illumination cases, such as reflective surfaces (too bright) or dark shaded surfaces, would appear to cover real texture or feature information, hence limiting the algorithm’s performance. Considering this limitation, we adopt polarization image as a new source of information, as multimodality image parsing algorithm, to improve the classification result.

Light is polarized once it is reflected from a surface. The light polarization properties are related to different surface materials, surface geometry structures, the roughness of the surfaces etc. So that these characteristics are coded implicitly in the light polarization state. In this point of view, polarization attributes can provide description of some surface features that can not be offered by color images. It is worth to know that, these attributes are still kept distinguishable under high reflection or in shadow areas, where the color-image based methods fail to produce reliable results.

In computer vision, there are many indoor polarization applications under ideal lighting conditions since early 1990s, e.g., surface modeling, shape recovery, and reflectance analysis. However, not much outdoor applications have been realized. The reason is that the outdoor incident and reflect light are extremely complex. To the best of our knowledge, no work in the literature has applied polarization in semantic segmentation, this is the first work which attempts to utilize polarization information as features for outdoor image processing applications.

In this paper, we propose to combine the polarization images (resulted from polarization state of each pixel) with the color images to improve the accuracy of image semantic segmentation. The combination method, more specifically, is through the HOG, LBP and LAB features that are extracted on both the polarization images and the color images independently. These features are concatenated and feed into a joint boosting classifier, a feature selection based classifier known for its facility to integrate new sources of features. In the training process, the classifier randomly selects different polarization features and color features from the input space to produce the polarization-based semantic segmentation results. In comparison, we repeat the same algorithm, which extracts the HOG, LBP and LAB features on, however, only color images. After training another joint boosting classifier, the color-based semantic segmentation results are given. The comparison shows that the accuracy of the semantic segmentation is improved thanks to the included polarization features.

Section snippets

Semantic segmentation

As very classical methods in image parsing, bottom-up semantic segmentation methods usually pursue the following pipelines [3]: (1) Grouping nearby pixels to image patches according to the local homogeneity. For this step, there exists methods like K-means, mean shift, Simple Linear Iterative Clustering (SLIC) [4], normalized-cut [5] etc; (2) Extracting local features, e.g., HOG, LBP, texture or curvature, from each patch; (3) Feeding the extracted features and hand-labeled ground truth to a

Polarization applied on semantic segmentation

In this section, we describe the proposed multimodality semantic segmentation algorithm using polarization and color images. This method follows four steps as shown in Fig. 2. First, we use local descriptors to describe the input image. This step is applied on both polarization and color images, so as to integrate information via different sources. These local descriptor vectors are then quantized through a clustered codebook which formulates the codebook maps as Fig. 3. As the final step, the

Efficient application

In the real application, we apply two strategies regarding the time efficiency of the algorithm during the training process.

Firstly, we propose to apply a pixel sampling process before feeding all features into the training model. The reason is that using all the pixels in the image is too much consuming, and that neighboring pixels always carry similar information. In [10], a center pixel subsampling was performed over 3 × 3 or 5 × 5 grid to reduce training samples. Since this process is

Data set

The experiment was applied on our polar-image data sets which contain 21 images at 320 × 240 pixels. The Day-set includes 10 images and the Dusk-set 11 images (examples shown in Fig. 4). The Dusk-set used 6 images for training and 5 images for testing, while the Day-set used 6 images for training and 4 images for testing. These images were labeled using LableME [24]. We defined 6 classes being car, road, tree, sky, building, and grass. Pixels which do not correspond to any of these classes are

Conclusion

In this paper, we have proposed a method to apply polarization image on semantic segmentation. The HOG, LBP and LAB features have been extracted from polarization images, being DOP and AOP. These features have been concatenated with the color-based features as the input of the joint boosting classifier. This classifier has been used since it adapts well to combine different features, since it is principally a feature-selection based classifier. In this way, the polarization-based feature has

Fan Wang received the B.S. degree in Electronic and Information Engineering from the Xidian University, Xi’an, China. She is currently pursuing the Ph.D. degree with the Laboratory LITIS, INSA de Rouen, France. Her current research interests concern the applications of the Polarization image in computer vision and intelligent vehicle.

References (24)

  • H. Zhu et al.

    Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation

    J. Visual Commun. Image Represent.

    (2016)
  • L.B. Wolff

    Polarization vision: a new sensory approach to image understanding

    Image Vis. Comput.

    (1997)
  • ZhaoY. et al.

    Object separation by polarimetric and spectral imagery fusion

    Comput. Vis. Image Underst.

    (2009)
  • S. Gupta et al.

    Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation

    Int. J. Comput. Vis.

    (2015)
  • J. Tighe et al.

    Superparsing

    Int. J. Comput. Vis.

    (2013)
  • R. Achanta et al.

    Slic superpixels compared to state-of-the-art superpixel methods

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • ShiJ. et al.

    Normalized cuts and image segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • A. Levinshtein et al.

    Turbopixels: fast superpixels using geometric flows

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • D.G. Lowe

    Object recognition from local scale-invariant features

    Proceedings of the seventh IEEE International Conference on Computer vision, 1999

    (1999)
  • N. Dalal et al.

    Histograms of oriented gradients for human detection

    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005

    (2005)
  • T. Ojala et al.

    Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2002)
  • J. Shotton et al.

    Semantic texton forests for image categorization and segmentation

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR 2008

    (2008)
  • Cited by (21)

    • Polarization-based optical characterization for color texture analysis and segmentation

      2022, Pattern Recognition Letters
      Citation Excerpt :

      The rotation of the polarization is illustrated in Fig. 2 where the two glasses of the polarizer filter are depicted with different hatching. A similar approach is presented in [3], where only the DoP and AoP are used in conjunction with several local texture description - HOG, LBP and Color feature – for the purpose of semantic image segmentation. The authors of [32] propose a convolutional neural network named “Efficient Attention-bridged Fusion Network” (EAFNet) by fusing features from RGB and polarization images (DoP and AoP) captured with an integrated multimodal sensor used in autonomous driving.

    • Fabrication and performance analysis of infrared InGaAs polarimetric detector with complete coverage of superpixel-structured grating

      2022, Infrared Physics and Technology
      Citation Excerpt :

      Polarization is one of the basic characteristics of light. The polarization characteristics of reflected or radiated light of an object reflect its own properties, including surface roughness [1], surface material [2], surface geometry [3], tissue characteristics [4], edge characteristics [5], etc. Polarization provide information that is largely irrelevant to spectral and intensity images [1].

    • Optical flow estimation using channel attention mechanism and dilated convolutional neural networks

      2019, Neurocomputing
      Citation Excerpt :

      Moreover, these methods cannot learn weights from large amount of data and most of them are time consuming for real applications. Recently, convolutional neural networks have made rapid progress in many computer vision tasks, such as image classification [6], object recognition [7], semantic segmentation [8], depth estimation [9], and person re-identification [10]. Learning optical flow based on convolutional neural networks is first proposed by Dosovitskiy et al. [11], which designs a novel network named FlowNet based on encoder-decoder architecture.

    • A comprehensive review of fruit and vegetable classification techniques

      2018, Image and Vision Computing
      Citation Excerpt :

      The concept of multi-feature fusion as a combination of rotation-invariants Local Binary Patterns (LBP), RGB histogram distribution, weighted histograms, region connection statistics and multi-label k-nearest neighbour fusion has been analysed with the existing techniques of automated annotation in Ref. [106]. This concept has been used for segmentation of images using Histogram of Oriented Gradients (HOG) and LBP as feature fusion on RGB and polarised images separately, and improved segmentation results has been presented in [107]. This concept can be used with other significant classifiers for better segmentation.

    • Segmentation of images by color features: A survey

      2018, Neurocomputing
      Citation Excerpt :

      The CIF compensates the difficulty of the LBP-based operator on describing color distributions. Wang et al. [168] proposed to combine the polarization images, resulted from polarization state of each pixel, with the color images to improve the accuracy of image semantic segmentation. The combination method, more specifically, is through the HOG feature [29] and LBP [124] features that are extracted on both the polarization image and the color images independently.

    View all citing articles on Scopus

    Fan Wang received the B.S. degree in Electronic and Information Engineering from the Xidian University, Xi’an, China. She is currently pursuing the Ph.D. degree with the Laboratory LITIS, INSA de Rouen, France. Her current research interests concern the applications of the Polarization image in computer vision and intelligent vehicle.

    Samia Ainouz received her Ph.D. degree in image processing from Louis Pasteur University, Strasbourg. She carried out her postdoctoral work in 3D vision at Le2i UMR 6306 CNRS Lab. Since September 2008, she has worked as an associate professor in the LITIS Lab with the Intelligent Transportation Systems Team. Her main research interests are polarization imaging, stereovision, catadioptric vision, and applications of these technics to intelligent vehicles.

    Chunfeng Lian is currently pursuing the Ph.D. degree with the Laboratory LITIS University of Rouen, France. His research interests include information fusion, pattern recognition, and medical image analysis.

    Abdelaziz Bensrhair graduated with an M.Sc in electrical engineering (1989) and a Ph.D. degree in computer science (1992) at the University of Rouen, France. From 1992 to 1999, he was an assistant professor in the Physic and Instrumentation Department, University of Rouen. He is currently a professor in information systems architecture department, head of Intelligent Transportation Systems Division and co-director of the Computer Science, Information Processing, and Systems, Laboratory (LITIS) of the National Institute of Applied Science Rouen (INSAR).

    View full text