Multi-spectral dataset and its application in saliency detection
Introduction
Saliency detection has been a promising topic recently [1], [2], [3], [4]. The goal of saliency detection is to extract salient areas from an input image and present the result as a gray scale image. The whiter the pixel seems, the more possible it might be salient. Since the detected saliency map can be utilized in various applications, such as recognition [5], segmentation [6], and tracking [7], research towards this subject has attracted much attention [8], [9], [10].
Generally, methods for saliency detection can be categorized into local based and global based schemes [11]. Local based methods calculate a region’s saliency according to the contrast to a small neighborhood [12], [13], [14]. Global based methods evaluate saliency with respect to the whole image’s statistical characteristic [15], [16]. Whatever the case is, saliency detection is mostly conducted on natural images taken by ordinary cameras. These cameras can respond to wavelengths from about 390 to 700 nm, which is called the visible spectrum [17]. The obtained images are regular RGB images. As for the electromagnetic spectrums beyond this scope, their information is lost during the imaging process. However, the lost spectrums might be also valuable for vision tasks because the more supporting information we have, the more rationale decisions will be made. This judgment is not only the common sense for humans, but also proved by other applications in computer vision field. For example, after the proposition of SIFT descriptor [18] on gray scale images, CSIFT [19], [20] was developed to incorporate the color bands into the descriptor. Then not long ago, MSIFT [21] was presented to include the near-infrared band for a richer descriptor. As for the face recognition research, early work primarily focus on the gray or RGB images. Later, other light bands besides the visible spectrum [22] are involved to eliminate the lighting problem. The same is true for boundary detection [23] and tracking [24] that incorporating more clues will improve the performance. In remote sensing, the spectrum is not limited to one or several bands, but up to a level of tens and hundreds [25], [26], [27].
Considering the success of including other light bands besides the visible light in many applications, we construct a multi-spectral dataset containing both near-infrared (NIR) and regular RGB images in this work. Several dataset containing NIR images have been presented before. For example, the PolyU-NIRFD dataset [22] for face recognition, the NIR–RGB dataset [21] for scene categorization. But these datasets are designed for specific purpose. They cannot be readily utilized for saliency detection. To this aim, the presented dataset is constructed in the hope of providing a new platform for saliency research.
The rest of this paper is organized as follows. Section 2 presents the proposed multi-spectral dataset. Section 3 introduces the distinguishable properties of near-infrared band. Section 4 applies the presented dataset in saliency detection. Finally, conclusion is made in Section 5.
Section snippets
Multi-spectral dataset
Since more clues are prone to provide richer information, we hope that a camera can capture the NIR and RGB spectrums simultaneously. However, most existing datasets contain images captured from only RGB bands. We cannot get the information of the four bands at the same time. Though the NIR–RGB dataset [21] has images of both bands, each pair of them are taken consecutively with two cameras. This makes the contents of image pairs not the same. When these images are employed, they have to be
NIR spectrum
The NIR spectrum is between the visible light band and the thermal infrared band. It has the properties of both visible light and thermal infrared light, but is different from any of them. Firstly, unlike thermal infrared, objects can reflect the NIR light the same way as they do to visible light. Secondly, it is invisible to human eyes like the thermal infrared and reflects an “unseen” characteristic different from visible light.
In order to know the relationship and difference between the RGB
Saliency detection
To demonstrate the effectiveness of the presented dataset, we conduct experiments in the application of saliency detection. Saliency maps are firstly extracted from RGB images and NIR images. Then the obtained maps are combined together to get the final results. The purpose of these experiments is to answer the two following questions: 1) whether or not the incorporation of NIR band can improve the saliency detection performance; 2) which kind of models is the best to combine the saliency maps
Conclusion
In this work, a multi-spectral dataset is presented to serve as a new platform for saliency research. Different from existing ones, our dataset contains pairs of RGB and NIR images. This can provide more valuable information for detecting the salient areas in an image. Experiments demonstrate the effectiveness of the incorporation of NIR band in saliency detection. We also test several regression models for combining the RGB and NIR bands. Results show that it is not appropriate to employ one
Acknowledgment
This work is supported by the State Key Program of National Natural Science of China (Grant No. 61232010), the National Natural Science Foundation of China (Grant No. 61172143 and 61105012), and the Natural Science Foundation Research Project of Shaanxi Province (Grant No. 2012JM8024).
References (37)
- et al.
Assessing the contribution of color in visual attention
Comput. Vis. Image Understand.
(2005) - et al.
Selective visual attention enables learning and recognition of multiple objects in cluttered scenes
Comput. Vis. Image Understand.
(2005) - et al.
Dynamic visual attention on the sphere
Comput. Vis. Image Understand.
(2010) - et al.
A framework for visual-context-aware object detection in still images
Comput. Vis. Image Understand.
(2010) - et al.
A computer vision model for visual-object-based attention and eye movements
Comput. Vis. Image Understand.
(2008) - et al.
Images as sets of locally weighted features
Comput. Vis. Image Understand.
(2012) - et al.
Performance evaluation of local colour invariants
Comput. Vis. Image Understand.
(2009) - et al.
Directional binary code with application to polyU near-infrared face database
Pattern Recogn. Lett.
(2010) - et al.
Saliency detection by multiple-instance learning
IEEE Trans. Cybernetics
(2013) - et al.
Attentional selection for object recognition – a gentle way
Biol. Motivated Comput. Vis.
(2002)
Unsupervised extraction of visual attention objects in color images
IEEE Trans. Circ. Syst. Video Technol.
A model of saliency-based visual attention for rapid scene analysis
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (34)
Ship Detection in Multispectral Remote Sensing Images via Saliency Analysis
2021, Applied Ocean ResearchCitation Excerpt :Scholars have studied and proved that near-infrared band images have a certain positive effect on image saliency detection. A multi-spectral image dataset of ordinary life scenes was produced as a new platform for saliency research(Wang et al., 2013b). Some regression model experiments have proved the effectiveness using the data set which includes RGB and NIR bands images in saliency detection.
An image texture insensitive method for saliency detection
2017, Journal of Visual Communication and Image RepresentationCitation Excerpt :Kanan et al. [18] and Xie et al. [19] compute saliency based on Bayesian modeling. Learning based methods have been used in [20,21]. Rigas et al. [22], Yan et al. [23], Shen et al. [24], Fareed et al. [25] and Li et al. [26] compute saliency using sparse representation of image.
Recognising occluded multi-view actions using local nearest neighbour embedding
2016, Computer Vision and Image UnderstandingCitation Excerpt :However, how to recognise multiple, complex human actions or activities remains a challenging problem [4]. So far, the majority of action recognition systems are only restricted to a finite number of well-defined action categories, and the performance is evaluated on actions cropped by detected bounding boxes [5,8]. For realistic applications, current methods are still very sensitive to trivial environmental variations, e.g., gender, body size, viewpoint and illumination variations, and occlusions [6,7].