1 Introduction

Shape analysis is considered one of the most important areas in image analysis, since it is able to describe an object preserving its most relevant information. Shape classification is a fundamental problem in Computer Vision, which has many applications such as, object detection, classification and image retrieval.

When analyzing an image, it is of utmost importance to consider that certain information only makes sense under certain viewing conditions, such as the scale. Nevertheless, choosing the appropriate scales of observation is not a trivial task, which motivated the development of scale-space filters used to create a multiscale image representation [5, 15]. Some multiscale shape descriptor that can be found in the literature are Multiscale Fractal Dimension [2], Multiscale Hough Transform Statistics [10], Multiscale Fourier Descriptor [4], and Curvature Scale-Space [14].

It is well known that multiscale/multiresolution methods are regarded to be consistent with plausible models of the human visual system, therefore, approaches based on this concept can be promising [9]. In this paper, we proposed two new shape descriptors: SBAS, a scale-space version of BAS (Beam Angle Statistics) proposed by [1], and SMFD, a scale-space version of MFD (Multiscale Fractal Dimension) proposed by [2]. Experimental results obtained on two public shape database images showed that both SBAS and SMFD presented better recognition rates than many traditional shape description methods, such as: Zernike Moments [13], BAS [1], HTS (Hough Transform Statistics) [12], MFD [2], TS (Tensor Scale) [7], FD (Fourier Descriptors) [16] and CS (Contour Salience) [14].

2 Scale-Space BAS and MFD

In scale-space theory, the characteristics of interest describe a continuous path in the representation, allowing a consistent manipulation of structures present in different scales. One of the main reasons to represent information at multiple levels is that the successive simplification removes unwanted details, such as noise and insignificant structures. Also, the scale reduction is directly related with information reduction, which implies in a reduction in processing time and increase in computational efficiency [15]. An important property of the scale-space theory is that the transformation to a coarser level should not introduce new structures, thus, structures present in a coarser scale should be present in all others refined scales [15].

In this paper, the image representation in coarse scales was performed by convolving the image with a low-pass filter, the 2D Gaussian kernel, defined by Eq. 1.

$$\begin{aligned} G(x,y)=\frac{1}{2\pi \sigma ^2}e^{-\frac{x^2+y^2}{2\sigma ^2}}. \end{aligned}$$
(1)

where \(\sigma \) is the standard deviation of the Gaussian distribution. The \(\sigma \) value represents the 2D Gaussian kernel’s width, and is referred here as the scale parameter. The convolution with the Gaussian kernel causes blurring of the image’s border, reducing the information and creating a coarser level of it as the scale (\(\sigma \)) increases. For each scale i, the scale parameter is defined by \(\sigma =2^i\). Figure 1 illustrates the process of Gaussian kernel convolution. One can notice that the higher the value of \(\sigma \), the coarser the image and the blurrier the borders.

Fig. 1.
figure 1

Examples of images obtained by the convolution with the 2D Gaussian kernel: (a) \(\sigma =2\), (b) \(\sigma =4\), (c) \(\sigma =8\), (d) \(\sigma =16\), (e) \(\sigma =32\), (f) \(\sigma =64\).

After the convolution, the resulting images are binarized as shown in Fig. 2, so their border can be extracted. Once the borders are blurred and the images are binarized the BAS and MFD methods can be applied for each image from each scale and the feature vectors of each scale are concatenated in a single feature vector making the new SBAS and SMFD shape descriptors, respectively.

Fig. 2.
figure 2

Binarized images after the convolution with the 2D Gaussian kernel: (a) \(\sigma =2\), (b) \(\sigma =4\), (c) \(\sigma =8\), (d) \(\sigma =16\), (e) \(\sigma =32\), (f) \(\sigma =64\).

3 Experimental Results

In order to evaluate the performance of the proposed methods, SBAS and SMFD, they were applied on two public and well known shape datasets: Kimia-216 [11] and MPEG-7 Part B [3]. Then, their results were compared with results obtained with some well-referenced shape description methods: Zernike Moments [13], BAS [1], HTS (Hough Transform Statistics) [12], MFD [2], TS (Tensor Scale) [7], FD (Fourier Descriptors) [16], and CS (Contour Salience) [14].

The performance comparisons were based on the following metrics:  

Precision x Recall: :

The precision is the fraction of retrieved instances that are relevant, while the recall is the fraction of relevant instances that are retrieved [8];

Multiscale Separability: :

The Multiscale Separability indicates how clusters of different classes are distributed in the feature space. The more separated the clusters, the better is the descriptor [14];

Bulls-Eye Score: :

The Bulls-Eye score is calculated as follows: given a dataset \((S_c{}_,{}_n)\), where c is the number of classes in S and n the number of images per class, each image in \((S_c{}_,{}_n)\) is used as a query and the number of correct images in the top 2n matches is computed. A perfect score is achieved when \(c.n^2\) positive cases are found across all the dataset [6].

 

3.1 Experiments on MPEG-7 Part B Shape Dataset

The MPEG-7 Part B dataset is composed by 1400 images divided into 70 classes of 20 images each, with white silhouette and black background.

The Precision x Recall curves for the MPEG-7 Part B dataset obtained with the proposed shape descriptors (SBAS and SMFD), and with the other shape descriptors (Zernike Moments, BAS, HTS, MFD, TS, FD, and CS) are presented in Fig. 3. One can observe that SBAS presented the best Precision x Recall results (highest curve), followed by its monoscale version BAS. Although the SMFD did not present top results, its performance was better than the results of its monoscale version (MFD).

Fig. 3.
figure 3

Precision x Recall curves for the MPEG-7 Part B dataset.

For the shape descriptors that presented the best Precision x Recall curves (SBAS, BAS, Zernike Moments, SMFD, HTS, and MFD), we also calculated their Multiscale Separability curves. Figure 4 presents such curves. One can observe that SBAS also presented the best result according to this measure. The SMFD and Zernike Moments presented very similar results, both outperforming BAS, MFD and HTS.

Fig. 4.
figure 4

Multiscale separability curves for the MPEG-7 Part B dataset [3].

Finally the Bulls-Eye score was calculated for the methods that presented the best performances according to the Precision x Recall and Multiscale separability results. Table 1 presents the Bulls-Eye score for each method. One can observe that also for this measure SBAS presented the best results, followed by its monoscale version BAS [1] and Zernike Moments [13]. The SMFD method showed better results than its monoscale version MFD [2] and than HTS [12].

From all these results we conclude that the proposed scale-space approach for shape recognition significantly improved the accuracy of the monoscale shape recognition approach.

Table 1. Bulls-Eye scores for each method using the MPEG-7 Part B dataset [3].

3.2 Experiments on Kimia-216 Shape Dataset

The Kimia-216 shape dataset [11] is composed by 216 images divided into 18 classes of 12 images each. This dataset is simpler than MPEG-7 Part B, because it is composed by fewer classes, and they do not present many transformations as the MPEG-7 Part B classes do.

The Precision x Recall curves for the Kimia-216 dataset obtained with the proposed shape descriptors (SBAS and SMFD), and with the other shape descriptors (Zernike Moments, BAS, HTS, MFD, TS, FD, and CS) are presented in Fig. 5. One can observe that SBAS presented best Precision x Recall results for most parts of the curve, followed closely by its monoscale version BAS. Although the SMFD did not present top results, its performance in this dataset was also improved when compared to its monoscale version MFD, likewise in MPEG-7 Part B dataset.

Fig. 5.
figure 5

Precision x Recall curves for the Kimia-216 dataset [11].

For the shape descriptors that presented the best Precision x Recall curves (SBAS, BAS, Zernike Moments, HTS, SMFD and MFD), we also calculated their Multiscale Separability curves. Figure 6 presents such curves. One can observe that SBAS also presented the best results according to this measure, followed by Zernike Moments. The SMFD did not present the top results, but it was significantly better than BAS and MFD.

Fig. 6.
figure 6

Multiscale separability curves for the Kimia-216 dataset [11].

Finally the Bulls-Eye score was calculated for the methods that presented the best performances according to the Precision x Recall and Multiscale separability results. Table 2 presents the Bulls-Eye score from each method. One can observe that also for this measure SBAS presented the best results, followed by its monoscale version BAS. The SMFD method showed better results than its monoscale version MFD.

Likewise in the MPEG-7 Part B dataset, from all the results obtained on Kimia-216 dataset, we conclude that the proposed scale-space approach for shape recognition significantly improved the accuracy of the monoscale shape recognition approach.

Table 2. Bulls-Eye scores for each method using the Kimia-216 dataset [11].

4 Discussion and Conclusion

In this paper we presented two new shape description methods, SBAS and SMFD. Experiments carried out on MPEG-7 Part B dataset showed that the SBAS presented the best results among several well-referenced shape description methods in the literature, such as: Zernike Moments [13], BAS [1], HTS (Hough Transform Statistics) [12], MFD [2], TS (Tensor Scale) [7], FD (Fourier Descriptors) [16] and CS (Contour Salience) [14], for all three evaluation metrics used in this work (Precision x Recall, Multiscale Separability and Bulls-Eye). While the SMFD did not present so good results, it performed better than its monoscale version, the MFD shape descriptor.

Regarding the results obtained with the Kimia-216 dataset, SBAS presented the best Multiscale Separability results and Bulls-Eye score. The SMFD also presented better results than its monoscale version, but it did not outperform other methods. It is important to notice that the Kimia-216 dataset is a simpler dataset than MPEG-7 Part B and the results obtained by the methods in the Kimia-216 dataset are already very good, making it harder to obtain relevant improved results.

From the obtained results, one can observe that the new descriptor SBAS showed better results than all methods compared in this paper, improving the monoscale version of BAS [1] accuracy in approximately 5.3% according to the Bulls-Eye score for the MPEG-7 Part B dataset, and in 0.65% for the Kimia-216 dataset.

Therefore, the results obtained in this paper suggests that the proposed scale-space approach for shape recognition can significantly improve the accuracy of any shape description method already proposed in literature that does not explores the scale-space. In this work we assessed the BAS and MFD shape descriptors. However, since the proposed multiscale approach is applied in the pre-processing stage, it can be applied in any other shape description method.