Skip to main content
Log in

Interesting Interest Points

A Comparative Study of Interest Point Performance on a Unique Data Set

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

An Erratum to this article was published on 16 April 2014

Abstract

Not all interest points are equally interesting. The most valuable interest points lead to optimal performance of the computer vision method in which they are employed. But a measure of this kind will be dependent on the chosen vision application. We propose a more general performance measure based on spatial invariance of interest points under changing acquisition parameters by measuring the spatial recall rate. The scope of this paper is to investigate the performance of a number of existing well-established interest point detection methods. Automatic performance evaluation of interest points is hard because the true correspondence is generally unknown. We overcome this by providing an extensive data set with known spatial correspondence. The data is acquired with a camera mounted on a 6-axis industrial robot providing very accurate camera positioning. Furthermore the scene is scanned with a structured light scanner resulting in precise 3D surface information. In total 60 scenes are depicted ranging from model houses, building material, fruit and vegetables, fabric, printed media and more. Each scene is depicted from 119 camera positions and 19 individual LED illuminations are used for each position. The LED illumination provides the option for artificially relighting the scene from a range of light directions. This data set has given us the ability to systematically evaluate the performance of a number of interest point detectors. The highlights of the conclusions are that the fixed scale Harris corner detector performs overall best followed by the Hessian based detectors and the difference of Gaussian (DoG). The methods based on scale space features have an overall better performance than other methods especially when varying the distance to the scene, where especially FAST corner detector, Edge Based Regions (EBR) and Intensity Based Regions (IBR) have a poor performance. The performance of Maximally Stable Extremal Regions (MSER) is moderate. We observe a relatively large decline in performance with both changes in viewpoint and light direction. Some of our observations support previous findings while others contradict these findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aanæs, H., Dahl, A. L., & Perfernov, V. (2009). Technical report on two view ground truth image data (Tech. rep.) DTU Informatics, Technical University of Denmark.

  • Aanæs, H., Dahl, A. L., & Pedersen, K. S. (2010). On recall rate of interest point detectors. In Proceedings of 3DPVT. http://campwww.informatik.tu-muenchen.de/3DPVT2010/data/media/e-proceeding/session07.html#paper97.

    Google Scholar 

  • Alvarez, L., Gousseau, Y., & Morel, J. M. (1999). The size of objects in natural and artificial images. In P.W. Hawkes (Ed.), Advances in imaging and electron physics. New York: Academic Press.

    Google Scholar 

  • Brown, M., Hua, G., & Winder, S. (2011). Discriminative learning of local image descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 43–57.

    Article  Google Scholar 

  • Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8(6), 676–698.

    Article  Google Scholar 

  • Crowley, J. L., & Parker, A. C. (1984). A representation for shape based on peaks and ridges in the difference of low-pass transform. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(2), 156–170.

    Article  Google Scholar 

  • Demirci, A. F., Platel, B., Shokoufandeh, A., Florack, L. M. J., & Dickinson, S. J. (2009). The representation and matching of images using top points. Journal of Mathematical Imaging and Vision, 35(2), 103–116.

    Article  MathSciNet  Google Scholar 

  • Einarsson, P., Chabert, C., Jones, A., Ma, W., Lamond, B., Hawkins, T., Bolas, M., Sylwan, S., & Debevec, P. (2006). Relighting human locomotion with flowed reflectance fields. In Rendering techniques (pp. 183–194).

    Google Scholar 

  • Forstner, W. (1986). A feature based correspondence algorithms for image matching. International Archives of Photogrammetry and Remote Sensing, 24, 60–166.

    Google Scholar 

  • Fraundorfer, F., & Bischof, H. (2004). Evaluation of local detectors on non-planar scenes. In Proc. 28th workshop of AAPR (pp. 125–132).

    Google Scholar 

  • Fraundorfer, F., & Bischof, H. (2005). A novel performance evaluation method of local detectors on non-planar scenes. In Proceedings of computer vision and pattern recognition—CVPR workshops (pp. 33–43).

    Google Scholar 

  • Furukawa, Y., & Ponce, J. (2007). Accurate, dense, and robust multi-view stereopsis. In 2007 IEEE conference on computer vision and pattern recognition (pp. 1–8).

    Chapter  Google Scholar 

  • Griffin, L. D., Lillholm, M., Crosier, M., & van Sande, J. (2009). Basic image features (bifs) arising from approximate symmetry type. In LNCS: Vol. 5567. Scale space and variational methods in computer vision (pp. 343–355).

    Chapter  Google Scholar 

  • Gustavsson, D. (2009). On texture and geometry in image analysis. Ph.D. thesis, Department of Computer Science, University of Copenhagen, Denmark.

  • Haeberli, P. (1992). Synthetic lighting for photography. Grafica obscura. http://www.graficaobscura.com/synth/index.html.

  • Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In 4th Alvey vision conf. (pp. 147–151).

    Google Scholar 

  • Hartley, R., & Zisserman, A. (2003). Multiple view geometry in computer vision. Cambridge: Cambridge University Press.

    Google Scholar 

  • Hua, G., Brown, M., & Winder, S. (2007). Discriminant embedding for local image descriptors. In ICCV (pp. 1–8).

    Google Scholar 

  • Johansen, P., Skelboe, S., Grue, K., & Andersen, J. (1986). Representing signals by their toppoints in scale space. In Proceedings of the international conference on image analysis and pattern recognition (pp. 215–217). New York: IEEE Computer Society Press.

    Google Scholar 

  • Johansen, P., Nielsen, M., & Olsen, O. F. (2000). Branch points in one-dimensional Gaussian scale space. Journal of Mathematical Imaging and Vision, 13(3), 193–203.

    Article  MathSciNet  MATH  Google Scholar 

  • Kadir, T., Zisserman, A., & Brady, M. (2004). An affine invariant salient region detector. In Proceedings of European conference on computer vision (ECCV) (pp. 228–241).

    Google Scholar 

  • Konishi, S., Yuille, A., & Coughlan, J. (2003a). A statistical approach to multi-scale edge detection. Image and Vision Computing 21(1):37–48.

    Article  Google Scholar 

  • Konishi, S., Yuille, A. L., Coughlan, J. M., & Zhu, S. C. (2003b). Statistical edge detection: Learning and evaluating edge cues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(1), 57–74.

    Article  Google Scholar 

  • Laptev, I., & Lindeberg, T. (2003). A distance measure and a feature likelihood map concept for scale-invariant model matching. International Journal of Computer Vision, 52(2/3), 97–120.

    Article  Google Scholar 

  • Lillholm, M., & Griffin, L. (2008). Novel image feature alphabets for object recognition. In Proceedings of ICPR’08.

    Google Scholar 

  • Lillholm, M., & Pedersen, K. S. (2004). Jet based feature classification. In Proceedings of international conference on pattern recognition.

    Google Scholar 

  • Lindeberg, T. (1993). Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention. International Journal of Computer Vision, 11, 283–318.

    Article  Google Scholar 

  • Lindeberg, T. (1998a). Edge detection and ridge detection with automatic scale selection. International Journal of Computer Vision, 30(2), 117–154.

    Article  Google Scholar 

  • Lindeberg, T. (1998b). Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2), 79–116.

    Article  Google Scholar 

  • Lowe, D. (1999). Object recognition from local scale-invariant features. In Proc. of 7th ICCV (pp. 1150–1157).

    Google Scholar 

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.

    Article  Google Scholar 

  • Matas, J., Chum, O., Urban, M., & Pajdla, T. (2004). Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10), 761–767.

    Article  Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2004). Scale & affine invariant interest point detectors. International Journal of Computer Vision, 60(1), 63–86.

    Article  Google Scholar 

  • Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.

    Article  Google Scholar 

  • Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Gool, L. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72.

    Article  Google Scholar 

  • Moreels, P., & Perona, P. (2007). Evaluation of features detectors and descriptors based on 3d objects. International Journal of Computer Vision, 73(3), 263–284.

    Article  Google Scholar 

  • Mumford, D., & Gidas, B. (2001). Stochastic models for generic images. Quarterly of Applied Mathematics, 59(1), 85–111.

    MathSciNet  MATH  Google Scholar 

  • Nielsen, M., & Lillholm, M. (2001). What do features tell about images. In M. Kerckhove (Ed.), LNCS: Vol. 2106. Proc. of Scale-Space’01 (pp. 39–50). Vancouver: Springer.

    Google Scholar 

  • Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR (Vol. 5).

    Google Scholar 

  • Ren, X., & Malik, J. (2002). A probabilistic multi-scale model for contour completion based on image statistics. In A. Heyden, G. Sparr, M. Nielsen, & P. Johansen (Eds.), LNCS: Vol. 2350–2353. Proc. of ECCV’02 (pp. 312–327). Copenhagen: Springer. Vol. I.

    Google Scholar 

  • Ren, X., Fowlkes, C. C., Malik, J. (2008). Learning probabilistic models for contour completion in natural images. International Journal of Computer Vision, 77(1–3), 47–63.

    Article  Google Scholar 

  • Salvi, J., Pages, J., & Batlle, J. (2004). Pattern codification strategies in structured light systems. Pattern Recognition, 37(4), 827–849.

    Article  MATH  Google Scholar 

  • Scharstein, D., & Szeliski, R. (2003). High-accuracy stereo depth maps using structured light. In Proceedings of CVPR (Vol. 1, pp. 195–202).

    Google Scholar 

  • Schmid, C., & Mohr, R. (1997). Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530–535.

    Article  Google Scholar 

  • Schmid, C., Mohr, R., & Bauckhage, C. (2000). Evaluation of interest point detectors. International Journal of Computer Vision, 37(4), 151–172.

    Article  MATH  Google Scholar 

  • Sivic, J., & Zisserman, A. (2006). Video Google: Efficient visual search of videos. Lecture Notes in Computer Science, 4170, 127.

    Article  Google Scholar 

  • Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In ICCV, 2005. Tenth IEEE international conference on computer vision (pp. 370–377).

    Google Scholar 

  • Snavely, N., Seitz, S., & Szeliski, R. (2008a). Modeling the world from Internet photo collections. International Journal of Computer Vision, 80(2), 189–210.

    Article  Google Scholar 

  • Snavely, N., Seitz, S. M., & Szeliski, R. (2008b). Modeling the world from Internet photo collections. International Journal of Computer Vision 80(2), 189–210. http://phototour.cs.washington.edu/.

    Article  Google Scholar 

  • Srivastava, A., Lee, A. B., Simoncelli, E. P., & Zhu, S. C. (2003). On advances in statistical modeling of natural images. Journal of Mathematical Imaging and Vision, 18(1), 17–33.

    Article  MathSciNet  MATH  Google Scholar 

  • Torr, P., & Zisserman, A. (1999). Feature based methods for structure and motion estimation. In Lecture notes in computer science (pp. 278–294).

    Google Scholar 

  • Trajković, M., & Hedley, M. (1998). Fast corner detection. Image and Vision Computing, 16(2), 75–87.

    Article  Google Scholar 

  • Tuytelaars, T., & Van Gool, L. (2004). Matching widely separated views based on affine invariant regions. International Journal of Computer Vision, 59(1), 61–85.

    Article  Google Scholar 

  • Winder, S. A. J., & Brown, M. (2007). Learning local image descriptors. In Proceedings of CVPR (pp. 1–8).

    Google Scholar 

  • Winder, S., Hua, G., & Brown, M. (2009). Picking the best daisy. In Proceedings of CVPR (pp. 178–185).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anders Lindbjerg Dahl.

Additional information

An erratum to this article is available at http://dx.doi.org/10.1007/s11263-014-0714-8.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aanæs, H., Dahl, A.L. & Steenstrup Pedersen, K. Interesting Interest Points. Int J Comput Vis 97, 18–35 (2012). https://doi.org/10.1007/s11263-011-0473-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-011-0473-8

Keywords

Navigation