Elsevier

Computers & Graphics

Volume 104, May 2022, Pages 152-161
Computers & Graphics

Technical Section
Annotation is easy: Learning to generate a shadow mask

https://doi.org/10.1016/j.cag.2022.04.003Get rights and content

Highlights

  • We can generate shadow mask from only a few scribbles.

  • Designing loss functions to ensure the consistency between pseudo shadow mask and image.

  • Training on re-annotated data improve the performance of state-of-the-art detectors.

Abstract

Pixel-level annotation for supervised learning tends to be tedious and inaccurate when facing complex scenes, leading models to produce incomplete predictions, especially on self shadows and shadow boundaries. This paper presents a weakly supervised graph convolutional network (namely WSGCN) for generating an accurate pseudo shadow mask from only several annotation scribbles. Given these limited annotations and a priori superpixel segmentation mask, we seek a robust graph construction strategy and a label propagation method. Specifically, our network operates on superpixel-graphs, allowing us to reduce the data dimensions by several magnitudes. Then, under the guidance of scribbles, we formulate the generation of a shadow mask as a weakly supervised learning task and learn a 2-layer graph convolutional network (GCN) separately for each training image. Experimental results on the benchmark datasets SBU and ISTD, show that our network can achieve impressive performance by using a few thousand parameters, and training our re-annotated data can further improve the performance of the state-of-the-art detectors.

Introduction

As a common phenomenon in our daily life, shadows in images have hinted at extracting the scene geometry [1], [2], light direction [3], camera location, and its parameters [4], which are conducive to different high-level image understanding tasks such as image segmentation [5], object detection [6], and tracking [7]. Shadow detection will lead to higher accuracy for these applications.

A shadow image consists of sunlit, umbra, and penumbra regions [8], [9], [10]. Both a weaker light source and a long distance between ground and occluders can cause wider penumbra and even less clear shadow boundaries (soft shadows). Compared with soft shadows, hard shadows are better visible since they have clearer edges and darker colors, and one can employ simple filtering operations and additional manual adjustment to obtain the corresponding shadow masks. However, since the edges of soft shadows are difficult to identify even by eyes, it is time-consuming and difficult to annotate for soft shadows. Moreover, current datasets mainly focus on hard and cast shadows, while self shadows, which are regions on the object that do not receive direct light [10], are frequently neglected, and the corresponding annotations for soft shadows are incomplete.

Existing methods locate shadows by designing physical models of color, and illumination [11], [12], by using data-driven methods based on hand-crafted features [1], [13], [14], [15], [16], [17], [18], or by learning discriminative features [19] via convolutional neural networks (CNN) [10], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30]. For detecting complex shadows, Le et al. [24] propose to enhance the training image by weakening the shadow area of the original training image. However, we notice that these augmented images are often fake, because their non-shaded background is similar to the background on the original training image, leading to a limited generalizability.

The above methods may ignore tiny shadow regions, incorrectly identify shadow-like materials, and misrecognize non-obvious or soft shadows due to low contrast and weak boundaries. In recent years, large-scale benchmarks (i.e., SBU [21], ISTD [31]) have been created successively for supervised learning, while there may still be mainly three drawbacks: (1) More than 98% of ISTD are hard shadows such that training on it cannot work well on soft shadow scenes; (2) there are only 0.1 K+ unique scenes and 20+ occluders in the ISTD, and most samples in it share the same background with different shadow positions, limiting the diversity of cast shadows; (3) although shadow scenes are vibrant in the SBU. However, there are a large amount of noise labels on its training set, because these masks are generated by an optimization manner [17] based on least-squares support vector machines (LSSVM) [32]. They often neglect the self shadows and produce incomplete shadow annotations for those ambiguous areas (especially on shadow boundaries).

In recent years, deep learning-based shadow detectors [10], [22], [23], [25], [26], [27], [28], [29], [30] have been consistently improving state of the art using some “inaccurate” datasets, where deeper neural network and more robust feature extraction modules are applied. According to the study of memorization effects [33], deep neural networks is more promising to learn clean, easy samples first, and then overfit other noisy samples gradually. Accurate annotations may further improve the performance of these shadow detection models, while it is time-consuming as shown in Fig. 1(b), e.g., manual annotating for Fig. 1(a) by Photoshop may cost 80+ seconds. These motivate the following unexplored question: How can we annotate a shadow image quickly and accurately?

Motivated by that graph naturally let it model complex data representation [34] and serve as weakly supervised classifiers by propagating given labels among adjacent nodes. To quickly and accurately provide annotations for various scenes, we introduce a novel weakly supervised framework for label propagation through learning a GCN [34] for every training image, which only needs several annotation scribbles within a few seconds (Fig. 1(c)). Next, we perform superpixel segmentation for each training image and combine annotation scribbles information to formulate a weakly supervised learning problem. In a word, with the help of the critical annotation supervision and the image superpixel representation, we can train a GCN separately for each image, leading to our model can fit various complex scenarios well. Finally, we can use re-annotated masks to train a shadow detector.

The main contributions of this work are following three aspects:

  • We introduce a novel framework based on superpixel segmentation and graph convolutional networks, which can easily obtain a pseudo shadow mask from only several scribbles.

  • We formulate the problem of generating a shadow mask as a weakly supervised learning problem and then train a separate GCN for each input image. At the same time, we design a set of loss functions to ensure the consistency between the pseudo shadow mask and image contents.

  • Training on our re-annotated data can further improve the performance of the state-of-the-art detectors.

Section snippets

Shadow detection

Early works [11], [12], [35] mainly explore illumination models and color information to locate the shadow regions. Most of these methods only work well on high-quality and well-constrained images [11], [12]. Later, the data-driven strategy design hand-crafted features [1], [13], [14], [15], [16], [17], [18] on the annotated data and input these features into different classifiers for shadow detection. Although achieving a impressive improvement, these strategies usually have a degraded

Proposed method

This section introduces a GCN-based label propagation scheme to predict the label for each pixel and then generate a pseudo shadow mask. Note that the generation of shadow masks is an offline process, we train a separate GCN to optimize label propagation for every input shadow image. As an extension of CNN, GCN can handle data represented by an irregular graph and help us take the non-adjacent pixels into account, which is the reason why we choose GCN instead of CNN.

Fig. 2 depicts the outline

Experimental results

This section introduces the implementation details, evaluation datasets and metrics, and then compare our results both quantitatively and qualitatively from the following two aspects: (1) the quality of generated pseudo shadow masks and (2) the shadow detection performance. Finally, we perform thorough ablation studies to analyze the combinations of loss functions and the number of random nodes.

Conclusion

To further improve the performance of existing shadow detectors, we present a GCN-based annotation strategy for generating an accurate pseudo shadow mask in a weakly-supervised manner. Under the guidance of superpixel segmentation, we can get a complete mask with only several annotation scribbles. Unlike the traditional optimization-based method, we create a separate GCN model for each training image, and achieve much better generalizability for most of complex scenes. The learning process

CRediT authorship contribution statement

Xian-Tao Wu: Conceptualization, Methodology, Writing – original draft, Writing – review & editing. Yi Wang: Validation, Data curation. Yi Wan: Project administration, Funding acquisition. Wen Wu: Methodology, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the Natural Science Foundation of Xinjiang autonomous region in china (No. 2020D01A48), the Basic Research Foundation of Wenzhou city in Zhejiang province (No.G2020018), the Natural Science Foundation of Xinjiang Uygur Autonomous Region (No. 2019D01C062, 2019D01C041, 2019D01C205, 2020D01C028), the the National Natural Science Foundation of China (No. 12061071), the Higher Education of Xinjiang Uygur Autonomous Region (XJEDU2019Y006, XJEDU2020Y003), Tianshan Innovation

References (47)

  • AhmadF. et al.

    Multi-location wideband synthetic aperture imaging for urban sensing applications

    J Franklin Inst B

    (2008)
  • WuW. et al.

    Shadow removal via dual module network and low error shadow dataset

    Comput Graph

    (2021)
  • TianJ. et al.

    New spectrum ratio properties and features for shadow detection

    Pattern Recognit

    (2016)
  • OkabeT. et al.

    Attached shadow coding: Estimating surface normals from shadows under unknown reflectance and lighting conditions

  • LalondeJ.-F. et al.

    Estimating natural illumination from a single outdoor image

  • JunejoI.N. et al.

    Estimating geo-temporal location of stationary cameras using shadow trajectories

  • EcinsA. et al.

    Shadow free segmentation in still images using local density measure

  • CucchiaraR. et al.

    Detecting moving objects, ghosts, and shadows in video streams

    IEEE Trans Pattern Anal Mach Intell

    (2003)
  • NadimiS. et al.

    Physical models for moving shadow and object detection in video

    IEEE Trans Pattern Anal Mach Intell

    (2004)
  • Nielsen M, Madsen CB. Graph cut based segmentation of soft shadows for seamless removal and augmentation. In: SCIA’07...
  • GrykaM. et al.

    Learning to remove soft shadows

    ACM Trans Graph

    (2015)
  • HuX. et al.

    Revisiting shadow detection: A new benchmark dataset for complex world

    IEEE Trans Image Process

    (2021)
  • FinlaysonG.D. et al.

    On the removal of shadows from images

    IEEE Trans Pattern Anal Mach Intell

    (2005)
  • FinlaysonG.D. et al.

    Entropy minimization for shadow removal

    Int J Comput Vis

    (2009)
  • ZhuJ. et al.

    Learning to recognize shadows in monochromatic natural images

  • HuangX. et al.

    What characterizes a shadow boundary under the sun and sky?

  • GuoR. et al.

    Single-image shadow detection and removal using paired regions

  • Vicente TFY, Yu C-P, Samaras D. Single Image shadow detection using multiple cues in a supermodular MRF. In: BMVC....
  • Vicente TFY, Hoai M, Samaras D. Noisy label recovery for shadow detection in unfamiliar domains. In: Proceedings of the...
  • VicenteT.F.Y. et al.

    Leave-one-out kernel optimization for shadow detection and removal

    IEEE Trans Pattern Anal Mach Intell

    (2017)
  • TyagiK.

    Automated multistep classifier sizing and training for deep learner

    (2018)
  • KhanS.H. et al.

    Automatic feature learning for robust shadow detection

  • VicenteT.F.Y. et al.

    Large-scale training of shadow detectors with noisily-annotated shadow examples

  • Cited by (0)

    This article was recommended for publication by L. Liu.

    View full text