Technical SectionAnnotation is easy: Learning to generate a shadow mask☆
Graphical abstract
Introduction
As a common phenomenon in our daily life, shadows in images have hinted at extracting the scene geometry [1], [2], light direction [3], camera location, and its parameters [4], which are conducive to different high-level image understanding tasks such as image segmentation [5], object detection [6], and tracking [7]. Shadow detection will lead to higher accuracy for these applications.
A shadow image consists of sunlit, umbra, and penumbra regions [8], [9], [10]. Both a weaker light source and a long distance between ground and occluders can cause wider penumbra and even less clear shadow boundaries (soft shadows). Compared with soft shadows, hard shadows are better visible since they have clearer edges and darker colors, and one can employ simple filtering operations and additional manual adjustment to obtain the corresponding shadow masks. However, since the edges of soft shadows are difficult to identify even by eyes, it is time-consuming and difficult to annotate for soft shadows. Moreover, current datasets mainly focus on hard and cast shadows, while self shadows, which are regions on the object that do not receive direct light [10], are frequently neglected, and the corresponding annotations for soft shadows are incomplete.
Existing methods locate shadows by designing physical models of color, and illumination [11], [12], by using data-driven methods based on hand-crafted features [1], [13], [14], [15], [16], [17], [18], or by learning discriminative features [19] via convolutional neural networks (CNN) [10], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30]. For detecting complex shadows, Le et al. [24] propose to enhance the training image by weakening the shadow area of the original training image. However, we notice that these augmented images are often fake, because their non-shaded background is similar to the background on the original training image, leading to a limited generalizability.
The above methods may ignore tiny shadow regions, incorrectly identify shadow-like materials, and misrecognize non-obvious or soft shadows due to low contrast and weak boundaries. In recent years, large-scale benchmarks (i.e., SBU [21], ISTD [31]) have been created successively for supervised learning, while there may still be mainly three drawbacks: (1) More than 98% of ISTD are hard shadows such that training on it cannot work well on soft shadow scenes; (2) there are only 0.1 K+ unique scenes and 20+ occluders in the ISTD, and most samples in it share the same background with different shadow positions, limiting the diversity of cast shadows; (3) although shadow scenes are vibrant in the SBU. However, there are a large amount of noise labels on its training set, because these masks are generated by an optimization manner [17] based on least-squares support vector machines (LSSVM) [32]. They often neglect the self shadows and produce incomplete shadow annotations for those ambiguous areas (especially on shadow boundaries).
In recent years, deep learning-based shadow detectors [10], [22], [23], [25], [26], [27], [28], [29], [30] have been consistently improving state of the art using some “inaccurate” datasets, where deeper neural network and more robust feature extraction modules are applied. According to the study of memorization effects [33], deep neural networks is more promising to learn clean, easy samples first, and then overfit other noisy samples gradually. Accurate annotations may further improve the performance of these shadow detection models, while it is time-consuming as shown in Fig. 1(b), e.g., manual annotating for Fig. 1(a) by Photoshop may cost 80+ seconds. These motivate the following unexplored question: How can we annotate a shadow image quickly and accurately?
Motivated by that graph naturally let it model complex data representation [34] and serve as weakly supervised classifiers by propagating given labels among adjacent nodes. To quickly and accurately provide annotations for various scenes, we introduce a novel weakly supervised framework for label propagation through learning a GCN [34] for every training image, which only needs several annotation scribbles within a few seconds (Fig. 1(c)). Next, we perform superpixel segmentation for each training image and combine annotation scribbles information to formulate a weakly supervised learning problem. In a word, with the help of the critical annotation supervision and the image superpixel representation, we can train a GCN separately for each image, leading to our model can fit various complex scenarios well. Finally, we can use re-annotated masks to train a shadow detector.
The main contributions of this work are following three aspects:
- •
We introduce a novel framework based on superpixel segmentation and graph convolutional networks, which can easily obtain a pseudo shadow mask from only several scribbles.
- •
We formulate the problem of generating a shadow mask as a weakly supervised learning problem and then train a separate GCN for each input image. At the same time, we design a set of loss functions to ensure the consistency between the pseudo shadow mask and image contents.
- •
Training on our re-annotated data can further improve the performance of the state-of-the-art detectors.
Section snippets
Shadow detection
Early works [11], [12], [35] mainly explore illumination models and color information to locate the shadow regions. Most of these methods only work well on high-quality and well-constrained images [11], [12]. Later, the data-driven strategy design hand-crafted features [1], [13], [14], [15], [16], [17], [18] on the annotated data and input these features into different classifiers for shadow detection. Although achieving a impressive improvement, these strategies usually have a degraded
Proposed method
This section introduces a GCN-based label propagation scheme to predict the label for each pixel and then generate a pseudo shadow mask. Note that the generation of shadow masks is an offline process, we train a separate GCN to optimize label propagation for every input shadow image. As an extension of CNN, GCN can handle data represented by an irregular graph and help us take the non-adjacent pixels into account, which is the reason why we choose GCN instead of CNN.
Fig. 2 depicts the outline
Experimental results
This section introduces the implementation details, evaluation datasets and metrics, and then compare our results both quantitatively and qualitatively from the following two aspects: (1) the quality of generated pseudo shadow masks and (2) the shadow detection performance. Finally, we perform thorough ablation studies to analyze the combinations of loss functions and the number of random nodes.
Conclusion
To further improve the performance of existing shadow detectors, we present a GCN-based annotation strategy for generating an accurate pseudo shadow mask in a weakly-supervised manner. Under the guidance of superpixel segmentation, we can get a complete mask with only several annotation scribbles. Unlike the traditional optimization-based method, we create a separate GCN model for each training image, and achieve much better generalizability for most of complex scenes. The learning process
CRediT authorship contribution statement
Xian-Tao Wu: Conceptualization, Methodology, Writing – original draft, Writing – review & editing. Yi Wang: Validation, Data curation. Yi Wan: Project administration, Funding acquisition. Wen Wu: Methodology, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work is supported by the Natural Science Foundation of Xinjiang autonomous region in china (No. 2020D01A48), the Basic Research Foundation of Wenzhou city in Zhejiang province (No.G2020018), the Natural Science Foundation of Xinjiang Uygur Autonomous Region (No. 2019D01C062, 2019D01C041, 2019D01C205, 2020D01C028), the the National Natural Science Foundation of China (No. 12061071), the Higher Education of Xinjiang Uygur Autonomous Region (XJEDU2019Y006, XJEDU2020Y003), Tianshan Innovation
References (47)
- et al.
Multi-location wideband synthetic aperture imaging for urban sensing applications
J Franklin Inst B
(2008) - et al.
Shadow removal via dual module network and low error shadow dataset
Comput Graph
(2021) - et al.
New spectrum ratio properties and features for shadow detection
Pattern Recognit
(2016) - et al.
Attached shadow coding: Estimating surface normals from shadows under unknown reflectance and lighting conditions
- et al.
Estimating natural illumination from a single outdoor image
- et al.
Estimating geo-temporal location of stationary cameras using shadow trajectories
- et al.
Shadow free segmentation in still images using local density measure
- et al.
Detecting moving objects, ghosts, and shadows in video streams
IEEE Trans Pattern Anal Mach Intell
(2003) - et al.
Physical models for moving shadow and object detection in video
IEEE Trans Pattern Anal Mach Intell
(2004) - Nielsen M, Madsen CB. Graph cut based segmentation of soft shadows for seamless removal and augmentation. In: SCIA’07...
Learning to remove soft shadows
ACM Trans Graph
Revisiting shadow detection: A new benchmark dataset for complex world
IEEE Trans Image Process
On the removal of shadows from images
IEEE Trans Pattern Anal Mach Intell
Entropy minimization for shadow removal
Int J Comput Vis
Learning to recognize shadows in monochromatic natural images
What characterizes a shadow boundary under the sun and sky?
Single-image shadow detection and removal using paired regions
Leave-one-out kernel optimization for shadow detection and removal
IEEE Trans Pattern Anal Mach Intell
Automated multistep classifier sizing and training for deep learner
Automatic feature learning for robust shadow detection
Large-scale training of shadow detectors with noisily-annotated shadow examples
Cited by (0)
- ☆
This article was recommended for publication by L. Liu.