Abstract
Not all interest points are equally interesting. The most valuable interest points lead to optimal performance of the computer vision method in which they are employed. But a measure of this kind will be dependent on the chosen vision application. We propose a more general performance measure based on spatial invariance of interest points under changing acquisition parameters by measuring the spatial recall rate. The scope of this paper is to investigate the performance of a number of existing well-established interest point detection methods. Automatic performance evaluation of interest points is hard because the true correspondence is generally unknown. We overcome this by providing an extensive data set with known spatial correspondence. The data is acquired with a camera mounted on a 6-axis industrial robot providing very accurate camera positioning. Furthermore the scene is scanned with a structured light scanner resulting in precise 3D surface information. In total 60 scenes are depicted ranging from model houses, building material, fruit and vegetables, fabric, printed media and more. Each scene is depicted from 119 camera positions and 19 individual LED illuminations are used for each position. The LED illumination provides the option for artificially relighting the scene from a range of light directions. This data set has given us the ability to systematically evaluate the performance of a number of interest point detectors. The highlights of the conclusions are that the fixed scale Harris corner detector performs overall best followed by the Hessian based detectors and the difference of Gaussian (DoG). The methods based on scale space features have an overall better performance than other methods especially when varying the distance to the scene, where especially FAST corner detector, Edge Based Regions (EBR) and Intensity Based Regions (IBR) have a poor performance. The performance of Maximally Stable Extremal Regions (MSER) is moderate. We observe a relatively large decline in performance with both changes in viewpoint and light direction. Some of our observations support previous findings while others contradict these findings.
Similar content being viewed by others
References
Aanæs, H., Dahl, A. L., & Perfernov, V. (2009). Technical report on two view ground truth image data (Tech. rep.) DTU Informatics, Technical University of Denmark.
Aanæs, H., Dahl, A. L., & Pedersen, K. S. (2010). On recall rate of interest point detectors. In Proceedings of 3DPVT. http://campwww.informatik.tu-muenchen.de/3DPVT2010/data/media/e-proceeding/session07.html#paper97.
Alvarez, L., Gousseau, Y., & Morel, J. M. (1999). The size of objects in natural and artificial images. In P.W. Hawkes (Ed.), Advances in imaging and electron physics. New York: Academic Press.
Brown, M., Hua, G., & Winder, S. (2011). Discriminative learning of local image descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 43–57.
Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6), 676–698.
Crowley, J. L., & Parker, A. C. (1984). A representation for shape based on peaks and ridges in the difference of low-pass transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(2), 156–170.
Demirci, A. F., Platel, B., Shokoufandeh, A., Florack, L. M. J., & Dickinson, S. J. (2009). The representation and matching of images using top points. Journal of Mathematical Imaging and Vision, 35(2), 103–116.
Einarsson, P., Chabert, C., Jones, A., Ma, W., Lamond, B., Hawkins, T., Bolas, M., Sylwan, S., & Debevec, P. (2006). Relighting human locomotion with flowed reflectance fields. In Rendering techniques (pp. 183–194).
Forstner, W. (1986). A feature based correspondence algorithms for image matching. International Archives of Photogrammetry and Remote Sensing, 24, 60–166.
Fraundorfer, F., & Bischof, H. (2004). Evaluation of local detectors on non-planar scenes. In Proc. 28th workshop of AAPR (pp. 125–132).
Fraundorfer, F., & Bischof, H. (2005). A novel performance evaluation method of local detectors on non-planar scenes. In Proceedings of computer vision and pattern recognition—CVPR workshops (pp. 33–43).
Furukawa, Y., & Ponce, J. (2007). Accurate, dense, and robust multi-view stereopsis. In 2007 IEEE conference on computer vision and pattern recognition (pp. 1–8).
Griffin, L. D., Lillholm, M., Crosier, M., & van Sande, J. (2009). Basic image features (bifs) arising from approximate symmetry type. In LNCS: Vol. 5567. Scale space and variational methods in computer vision (pp. 343–355).
Gustavsson, D. (2009). On texture and geometry in image analysis. Ph.D. thesis, Department of Computer Science, University of Copenhagen, Denmark.
Haeberli, P. (1992). Synthetic lighting for photography. Grafica obscura. http://www.graficaobscura.com/synth/index.html.
Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In 4th Alvey vision conf. (pp. 147–151).
Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.
Hua, G., Brown, M., & Winder, S. (2007). Discriminant embedding for local image descriptors. In ICCV (pp. 1–8).
Johansen, P., Skelboe, S., Grue, K., & Andersen, J. (1986). Representing signals by their toppoints in scale space. In Proceedings of the international conference on image analysis and pattern recognition (pp. 215–217). New York: IEEE Computer Society Press.
Johansen, P., Nielsen, M., & Olsen, O. F. (2000). Branch points in one-dimensional Gaussian scale space. Journal of Mathematical Imaging and Vision, 13(3), 193–203.
Kadir, T., Zisserman, A., & Brady, M. (2004). An affine invariant salient region detector. In Proceedings of European conference on computer vision (ECCV) (pp. 228–241).
Konishi, S., Yuille, A., & Coughlan, J. (2003a). A statistical approach to multi-scale edge detection. Image and Vision Computing 21(1):37–48.
Konishi, S., Yuille, A. L., Coughlan, J. M., & Zhu, S. C. (2003b). Statistical edge detection: Learning and evaluating edge cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1), 57–74.
Laptev, I., & Lindeberg, T. (2003). A distance measure and a feature likelihood map concept for scale-invariant model matching. International Journal of Computer Vision, 52(2/3), 97–120.
Lillholm, M., & Griffin, L. (2008). Novel image feature alphabets for object recognition. In Proceedings of ICPR’08.
Lillholm, M., & Pedersen, K. S. (2004). Jet based feature classification. In Proceedings of international conference on pattern recognition.
Lindeberg, T. (1993). Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention. International Journal of Computer Vision, 11, 283–318.
Lindeberg, T. (1998a). Edge detection and ridge detection with automatic scale selection. International Journal of Computer Vision, 30(2), 117–154.
Lindeberg, T. (1998b). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.
Lowe, D. (1999). Object recognition from local scale-invariant features. In Proc. of 7th ICCV (pp. 1150–1157).
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.
Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767.
Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86.
Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Gool, L. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72.
Moreels, P., & Perona, P. (2007). Evaluation of features detectors and descriptors based on 3d objects. International Journal of Computer Vision, 73(3), 263–284.
Mumford, D., & Gidas, B. (2001). Stochastic models for generic images. Quarterly of Applied Mathematics, 59(1), 85–111.
Nielsen, M., & Lillholm, M. (2001). What do features tell about images. In M. Kerckhove (Ed.), LNCS: Vol. 2106. Proc. of Scale-Space’01 (pp. 39–50). Vancouver: Springer.
Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR (Vol. 5).
Ren, X., & Malik, J. (2002). A probabilistic multi-scale model for contour completion based on image statistics. In A. Heyden, G. Sparr, M. Nielsen, & P. Johansen (Eds.), LNCS: Vol. 2350–2353. Proc. of ECCV’02 (pp. 312–327). Copenhagen: Springer. Vol. I.
Ren, X., Fowlkes, C. C., Malik, J. (2008). Learning probabilistic models for contour completion in natural images. International Journal of Computer Vision, 77(1–3), 47–63.
Salvi, J., Pages, J., & Batlle, J. (2004). Pattern codification strategies in structured light systems. Pattern Recognition, 37(4), 827–849.
Scharstein, D., & Szeliski, R. (2003). High-accuracy stereo depth maps using structured light. In Proceedings of CVPR (Vol. 1, pp. 195–202).
Schmid, C., & Mohr, R. (1997). Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530–535.
Schmid, C., Mohr, R., & Bauckhage, C. (2000). Evaluation of interest point detectors. International Journal of Computer Vision, 37(4), 151–172.
Sivic, J., & Zisserman, A. (2006). Video Google: Efficient visual search of videos. Lecture Notes in Computer Science, 4170, 127.
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In ICCV, 2005. Tenth IEEE international conference on computer vision (pp. 370–377).
Snavely, N., Seitz, S., & Szeliski, R. (2008a). Modeling the world from Internet photo collections. International Journal of Computer Vision, 80(2), 189–210.
Snavely, N., Seitz, S. M., & Szeliski, R. (2008b). Modeling the world from Internet photo collections. International Journal of Computer Vision 80(2), 189–210. http://phototour.cs.washington.edu/.
Srivastava, A., Lee, A. B., Simoncelli, E. P., & Zhu, S. C. (2003). On advances in statistical modeling of natural images. Journal of Mathematical Imaging and Vision, 18(1), 17–33.
Torr, P., & Zisserman, A. (1999). Feature based methods for structure and motion estimation. In Lecture notes in computer science (pp. 278–294).
Trajković, M., & Hedley, M. (1998). Fast corner detection. Image and Vision Computing, 16(2), 75–87.
Tuytelaars, T., & Van Gool, L. (2004). Matching widely separated views based on affine invariant regions. International Journal of Computer Vision, 59(1), 61–85.
Winder, S. A. J., & Brown, M. (2007). Learning local image descriptors. In Proceedings of CVPR (pp. 1–8).
Winder, S., Hua, G., & Brown, M. (2009). Picking the best daisy. In Proceedings of CVPR (pp. 178–185).
Author information
Authors and Affiliations
Corresponding author
Additional information
An erratum to this article is available at http://dx.doi.org/10.1007/s11263-014-0714-8.
Rights and permissions
About this article
Cite this article
Aanæs, H., Dahl, A.L. & Steenstrup Pedersen, K. Interesting Interest Points. Int J Comput Vis 97, 18–35 (2012). https://doi.org/10.1007/s11263-011-0473-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-011-0473-8