1 Introduction

A Range image provides the geometrical information (depth), not only the 2D information of a RGB image. Moreover, the feature extraction task in Range images is generally invariant to scale, rotation and illumination [1]. RGB-D images are acquired by means of a low cost 3D acquisition system, such as the Microsoft Kinect sensor [14].

On the other hand, the interesting point detection is an essential phase to develop a local feature extractor [2]. In this phase, the data, (for instance, texture in intensity images) is obtained to characterize the keypoints. Recently, Steder et al. [12] have introduced the Normal Aligned Radial Feature (NARF) for 3D object recognition from a Range image. Some other local detectors and descriptors for 2D or their version for 3D images are: Harris corner detector [5, 6], SURF (Speed Up Robust Feature) [7], and FAST (Features from Accelerated Segment Test).

The proposal presented here accomplishes a robust and balanced transformation from dense to a sparse process. First, the surface of the Homogeneity image \(H_{m}\) is constructed from the texture features of the Range image. Moreover, the dense normal vectors referred here as \(N_{D}\) of the Homogeneity image are computed. After in the sparse process, a Gaussian distribution is used to select a set with the more representative normal vectors (\(N_{D}\)) at each x-, y-, z- direction, forming a sparse normal vectors referred here as \(\mathbf {N}_{S}\). Additionally, is carried out an analysis on the neighborhood of each component of \(\mathbf {N}_{S}\) through the pdf (Probability Density Function), in accordance to describe this particular region. Afterwards, the interesting points are obtained from each directional surface highlighting the contour of the objects in the scene. Experimental tests have been performed with two different datasets: (1) the benchmark Middlebury proposed by Pal et al. in [3] and Hirsmüller et al. in [8], and (2) our DSPLab dataset [9]. Finally, a comparative analysis among our proposal and different proposals for key points detection considered in the-state-of-art demonstrates a high performance at least in almost all the test.

The main contributions of this study are: first, the use of the homogeneity texture feature as a local surface descriptor applied to Range images. In particular, it is proposed to highlight the homogeneous regions because they represent the smooth curvatures of the Range image. Second, this proposal in the sparse process allows a transformation from \(\mathbb {R}^{3}\) to \(\mathbb {R}\) space, implying a significant reduction in the cardinality of the descriptor vector (\(\mathbf {N}_{S}\) vector). These descriptors allow the representation of the scene through the separation of the forefront and the background planes. Finally, objects are defined by their keypoints highlighting their contours, with a low computational cost. In this paper, firstly, the proposed technique is described in Sects. 2 and 3. Section 4 discusses the results with a deep qualitative and quantitative analysis. Finally, brief conclusions are discussed in Sect. 5.

Fig. 1.
figure 1

Global diagram of the Sparse Normal Detector technique (SND).

2 Methodology

This section describes each phase of our proposal based on the Sparse Normal Detector (SND) technique. Figure 1 illustrates a global block diagram of the proposed strategy. The Range images used are the Middlebury dataset and the Depth images acquired by a Kinect sensor, in which the income image could contain either controlled conditions or real indoor scenes. In the dense process, the homogeneity texture feature constructs a dense surface from the Range image, to highlight the uniform regions. Then, the normal vectors (called dense normal vectors) of the Homogeneity image are computed. Later, in the sparse process, \(\mathbf {M}_{x}, \mathbf {M}_{y}\) and \(\mathbf {M}_{z}\) surfaces are built up by selecting the dense normal vectors in accordance to a Gaussian distribution. This process is called Sparse Normal descriptor (SN-descriptor), which carries out a transformation from \(\mathbb {R}^{3}\) to \(\mathbb {R}\). The vectors contained in the \(\mathbf {M}_{x}, \mathbf {M}_{y}\) and \(\mathbf {M}_{z}\) surfaces are the sparse normal vectors, with which are represented the most distinctive values of these surfaces. Finally, the interesting points corresponding to each object in the scene are computed from the sparse vectors, as well as of the pdf information computed.

2.1 Dense Porcess: The Homogeneity Image (\(H_{m}\))

The Range image is a gray-level image with areas of smooth variation of intensity representing the depth of different objects. Therefore, in this paper we suggest that, the extraction of the homogeneity feature is similar to obtain a surface with a minimal change of energy. Furthermore, the change in the flow obtained from the homogeneity feature forms geodesic curves that highlight the contour and the curvature among regions [13]. In this case, the representation of the geodesic curves is similar to depict the iso-elevation lines (lines that, join of the points of equal value in height), usually used on topographic maps. On another hand, the information obtained from the Homogeneity image allows us represent objects as minimal surfaces (see Fig. 2(c)), with a Gaussian curvature established as a positive hemisphere [4, 11], see Fig. 2 top row. Therefore, the extraction of normal vectors for all the points in such a hemisphere allows a dense representation of the Range image surface. These vectors are orthogonal to all tangent vectors in the \(H_{m}\) surface yielding the dense descriptor referred as \(\mathbf {N}_{D}\). The \(H_{m}\) image contains the homogeneity texture feature of the Range image, and it is obtained using the Sum and Difference of Histograms technique (SDH) presented in [10]. A window size of \(3 \times 3\) pixels is applied to the SDH, yielding an range image of \(K \times L\) size and it is defined in the range 0 to 255 grey levels. This process is illustrated in the first block of the Fig. 1. Note that, the equal values of homogeneity are depicted as iso-elevation lines (see first block, Homogeneity image Fig. 1).

Fig. 2.
figure 2

Column (a) depicts the surface of the Range image as a positive hemisphere, column (b) the \(H_{m}\) surface is shown as a highlight of (a). The surface in y-direction is formed through sparse vectors. The labelled region is considered a smooth flat.

2.2 Sparse Process: The Sparse Normal Descriptor (SN-Descriptor)

The Sparse Normal descriptor process (referred here as SN-descriptor), divides the dense normal vectors (\(\mathbf {N}_{D}\)) into clusters with the most representative values per surface for each x-, y- and z- direction. This process is illustrated in the first step of the Sparse process in Fig. 1. When the surface of \(H_{m}\) is decomposed in its normal vectors, the homogeneous surface is “broken up”. That is, new flat surfaces called \(\mathbf {M}_{x}\), \(\mathbf {M}_{y}\) and \(\mathbf {M}_{z}\) are built up at \(\mathbb {R}^{2}\) space. Hence, components of each directional surface are referred as sparse normal vectors (\(\mathbf {N}_{S}\)). Here, for every \(\mathbf {N}_{S}\) vector on \(\mathbf {M}_{x}\), \(\mathbf {M}_{y}\) or \(\mathbf {M}_{z}\) surfaces are mapping of the \(H_{m}\) surface to \(\mathbb {R}^{3}\). Additionally, the neighborhood corresponding to each \(\mathbf {N}_{S}\) vector is associated with a similar six neighbor connectivity. Then, it is carried out by an analysis of pdf in this neighborhood, to obtain the mean values with respect to intensity level (depth or disparity) and homogeneity feature. Finally, this allows to establish a descriptors set, for every one of the directional surfaces, where each component of this set is a vector of \(1 \times m\) size. Therefore, descriptors are into \(\mathbb {R}\) space, but, both are inmerse into \(\mathbb {R}^{2}\) space.

2.3 Keypoint Detection: SKD Technique

The Sparse Keypoint Detector technique (referred as SKD) is shown in last block of the Fig. 1. From every one of the directional surfaces, the interesting points are extracted through of a pdf analysis. This is carried out for each neighborhood corresponding to one of the \(\mathbf {N}_{S}\) vector, contained in the \(\mathbf {M}_{x}\), \(\mathbf {M}_{y}\) or \(\mathbf {M}_{z}\) surfaces. Additionally, it is considered a 95 % confidence interval. Thereby, all sparse normal vectors form clusters in accordance to similarity values of their normals corresponding to the same neighborhood. Subsequently, these highlight vectors belong to the contour of objects in the scene.

figure a

3 Algorithm of the SND Technique

Algorithm 1 explains the computation of the SND technique. The normal vector \(\mathbf {N}_{D}\) is computed from the \(H_{m}\) surface (first and second steps of the algorithm). Vector \(\mathbf {N}_{D} = [\mathbf {N}_{Dx},\mathbf {N}_{Dy},\mathbf {N}_{Dz}]\) contains matrices of \( K \times L\) size, which contain dense normal vectors in each one of directions. Moreover, \(\mathbf {N}_{D}\) vector is not normalized and its values are in the range \([-1, 1]\). Here, the surface in \(H_{m}\) is defined as a positive hemisphere surface (see Subsect. 2.1) thus, only the outward-pointing normal vectors of \(\mathbf {N}_{D}\) are considered (step 3(a)). This first selection is used to start the sparse process (steps 3(b-c)).

The sparse normal vectors are formed by selecting the components of \(\mathbf {N}_{D}\) in accordance to a Gaussian distribution represented by the \(\mathbf {t}\) vector (see step 3(d) of the algorithm). Here, the \(\mathbf {N}_{S}\) vector is defined as \(\mathbf {N}_{S} = [\mathbf {N}_{Sx},\mathbf {N}_{Sy},\mathbf {N}_{Sz}]\). It is built up from the \(\mathbf {N}_{D}\) elements that are into the range (0, 1). Thus, the elements of \(\mathbf {N}_{S}\) are defined as all the normal vectors of \(\mathbf {N}_{D}\) selected by the condition \(\mathbf {N}_{Di} <\mathbf {t}\), where i is the i-th component of \(\mathbf {N}_{D}\). Additionally, \(\mathbf {t}\) is the vector consisting of three threshold levels in the range of \(0<\mathbf {t} < 1\), called \(\mathbf {t}_{x}\), \(\mathbf {t}_{y}\) and \(\mathbf {t}_{z}\). As indicated in step 3(b) and 3(c) of the algorithm, the values of \(\mathbf {t}\) are computed from the statistical information, mean and standard deviation, by analyzing each direction of vector \(\mathbf {N}_{D}\). That is, \(\mathbf {t}\) is given by

$$\begin{aligned} t_{x}= {\left\{ \begin{array}{ll} \begin{aligned} \mu (N_{Dx}) + \sigma (N_{Dx}) &{} \; \text {if} \quad 0<N_{Dx} \le t_{u} \\ \mu (N_{Dx}) -\sigma (N_{Dx}) &{} \; \text {if} \quad t_{u}< N_{Dx}< 1 \\ \end{aligned} \end{array}\right. } \end{aligned}$$
(1)
$$\begin{aligned} t_{y}={\left\{ \begin{array}{ll} \begin{aligned} \mu (N_{Dy}) + \sigma (N_{Dy}) &{} \; \text {if} \quad 0< N_{Dy} \le t_{u} \\ \mu (N_{Dy}) - \sigma (N_{Dy}) &{} \;\text {if} \quad t_{u}< N_{Dy}<1 \\ \end{aligned} \end{array}\right. } \end{aligned}$$
(2)
$$\begin{aligned} t_{z}={\left\{ \begin{array}{ll} \begin{aligned} &{}\mu (N_{Dz}) &{}\text {if} \quad max(N_{Dz})< 1 \\ &{}\max (N_{Dz}) - \sigma (N_{Dz}) &{}\text {if} \quad max(N_{Dz}) = 1 \\ \end{aligned} \end{array}\right. } \end{aligned}$$
(3)

where \(\mathbf {t}_{u}\) is a level of threshold defined within 95 % confidence interval corresponding to mean of all components vectors of \(\mathbf {N}_{D}\). Therefore, the computation of the thresholds in Eqs. (1) and (2), is determined in accordance to value of \(\mathbf {t}_{u}\). The \(\mathbf {N}_{S}\) vector has enough information to represent the most significant normal components of the \(H_{m}\) surface, at each x-, y- and z- direction, respectively (see fourth step). Furthermore, one of the main contributions of this study is the number of elements contained in the \(\mathbf {N}_{S}\) vector, which is established as lower than those of vector \(\mathbf {N}_{D}\) by at least 80 % on average, in any of its three directions. Thus,

$$\begin{aligned} \begin{aligned}&If\quad C_{Sx}=&card (N_{Sx}) \quad \text {and} \quad C_{Dx}=&card (N_{Dx})\quad \text {then} \quad C_{Sx}<< C_{Dx}\\&If \quad C_{Sy}=&card (N_{Sy}) \quad \text {and} \quad C_{Dy}=&card (N_{Dy})\quad \text {then} \quad C_{Sy}<< C_{Dy}\\&If \quad C_{Sz}=&card (N_{Sz}) \quad \text {and} \quad C_{Dz}=&card (N_{Dz})\quad \text {then} \quad C_{Sz}<< C_{Dz} \end{aligned} \end{aligned}$$
(4)

Eq. (4) presents how the cardinality of \(\mathbf {N}_{Sx}\) and \(\mathbf {N}_{Sy}\) is closer to 10 % with respect to the cardinality of \(\mathbf {N}_{Dx}\) and \(\mathbf {N}_{Dy}\), respectively, and this percentage appears in virtually all the experimental results. In z-direction, the cardinality of \(\mathbf {N}_{Sz}\) is closed to 30 % with respect to the cardinality of \(\mathbf {N}_{Dz}\). The difference between the percentages of the sparse normal vectors in x- and y- directions with respect to z- direction is explained through of the homogeneity level presented at the \(H_{m}\) image. Therefore, at each direction is possible to form a new surface that represents the original \(H_{m}\) surface, with a minimal number of descriptors. Finally in the fifth step, the strategy used to detect the most significant key points is the Sparse Keypoints Detector (SKD). This strategy uses the data contained on the \(\mathbf {M}_{x}\), \(\mathbf {M}_{y}\) and \(\mathbf {M}_{z}\) surfaces to rank the sparse key points based on the statistically defined thresholds \(\mathbf {t}_{kx}\), \(\mathbf {t}_{ky}\), \(\mathbf {t}_{kz}\), corresponding to each \(\mathbf {x-}\), \(\mathbf {y-}\), and \(\mathbf {z-}\) direction, respectively. These thresholds are defined by:

$$\begin{aligned} \mathbf {t}_{kx} = \mu (N_{Sx}) +\sigma (N_{Sx});\quad \mathbf {t}_{ky} = \mu (N_{Sy}) +\sigma (N_{Sy}); \quad \mathbf {t}_{kz} = \mu (N_{Sz}) \end{aligned}$$
(5)

The last block of the Fig. 1 shows a first example of the interesting points detected at each directional surface, using the SKD technique.

4 Experimental Results

The experimental tests have been performed using two datasets: the benchmark Middlebury provided in [3, 8], as well as our DSPLab dataset [9]. The importance of testing two different datasets is to establish the robustness and repeatability of our algorithm through offline images obtained under several non-controlled conditions (changes of intensity, exposure and noise in indoor environments). The most representative results respect to DSPLab dataset are depicted in Fig. 3. Also, a qualitative analysis to compare the SKD technique with respect to other common key point detectors is carried out. Finally, the effectiveness of the method is tested using a quantitative analysis of the keypoint repeatability.

Fig. 3.
figure 3

Results of the homogeneity and extraction of the contour using the DSPLab database. First row, cube-rugby. Second row, white-bottle and last row, flower-cup. Notice that \(H_{m}\) images (third column) show a slightly loss of information. Last column shows the key points with which is defined the contour of the objects in the scene.

Figure 4 shows the most meaningful keypoints detected on the \(H_{m}\) surface using the approach presented in the Sect. 3. Thus, the results using the SKD technique can be compared with those of other techniques proposed in the state-of-the-art applied to range images (third to sixth column of Fig. 4). In particular, the applied techniques are NARF, Harris corner, SURF and FAST. The performance of the SKD shows a better index of repeatability than the other techniques (see Fig. 5). Qualitatively, it can be seen that the keypoints retrieve the contour of all the objects at the scene using e.g., sparse descriptors of the \(\mathbf {M}_{y}\) surface.

Fig. 4.
figure 4

Comparison between the proposal Sparse Keypoint Detector (SKD) and other techniques. First row color image. Second row SKD technique. Third row NARF detector. Fourth row Harris detector. Fifth row SURF detector and sixth row FAST detector.

4.1 Quantitative Analysis

In order to evaluate the quality of the proposed key point detector, an analysis of its repeatability was carried out. This analysis was applied to the Middlebury and DSPLab dataset. A Gaussian Filter and salt and pepper noise (S&P) were used to generate synthetic and diffused images. To this purpose, the Gaussian filter was generated with a typical mask of \(3\times 3\) pixels; in addition, the average level of noise used by the S&P was 5 %. The test images contain four types of noises: salt and pepper noise (S&P), diffuse image (D), salt and pepper combined with diffuse image (S&P+D) and diffuse image combined with salt and pepper (D+S&P). The repeatability rate is established as the number of interesting points found in the different synthetic images under the same process with respect to the total points initially detected. Thereby, each interesting point of SKD is associated once to an interesting point in the other image. These points are first validated by means of the Sum of Squared Differences technique (SSD) to avoid false positive point detection. Figure 5(a, b) show the results for the Middlebury and the DSPLab dataset, respectively. In general, the best performance of the detectors is given for the synthetic images with S&P noise. The worst results are obtained for the diffuse image and their different combinations. On average, the repeatability rate is near to 0.5 for the SKD technique (in particular, \(\mathbf {M}_{x}\) and \(\mathbf {M}_{y}\) surfaces), 0.4 for SURF and FAST techniques, and 0.2 with other techniques. In the case of crowded scenes or similar textures under controlled indoor conditions, such an images Fig. 4(a, b), the SURF and FAST techniques present a low performance (close to Zero). It is important to note that under uncontrolled environmental conditions (see images Fig. 4(c, d) NARF and Harris techniques have shown a low performance (under to 0.2 in Fig. 5(b)). Figure 5(c) presents the relation between \(\mathbf {N}_{S}\) with respect to \(\mathbf {N}_{D}\). For each technique in both datasets, it was obtained the average computation time of 100 executions, see Fig. 5(d). All of our data were obtained by using non-optimized Matlab implementations on an ordinary Intel(R) Core(TM)2 Duo 3.16 GHz CPU with 4 GB of RAM.

Fig. 5.
figure 5

Comparative analysis of the keypoints detectors depicted in Fig. 4. Graph (a) illustrates the results for images Middlebury and (b) DSPLab dataset; (c) presents the relation between sparse vectors and dense vectors (see Eq. (4)), and (d) show the performance with respect to time.

5 Conclusions

This study presented a novel approach for the contour detection by means of the interesting points from Range images. Although range images contain crowded or similar texture, the objects in the scene are separated from the background and the object contours are depicted using interesting points, with a minimal loss in the details of the scene. It is important to points out that the quantity of the descriptors is less to 6 % and 8 % of the total dense descriptors at x- and y- directions, respectively. This represents a reduction of near to 90 % in almost all tested cases. Additionally, the comparison with similar key-point detectors and the proposal presented here demonstrates that the SKD technique produces the best results in the detection of interesting points. Finally, the SND technique could be used in disciplines that imply processing of grey-level images.