Skip to main content
Log in

Evaluation of Local Features for Structure from Motion

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Structure from motion (SFM) is an effective approach for reconstructing large-scale 3D scene from multiple images. In this field, many local feature methods have been proposed to detect feature point and compute descriptor. For designing a robust SFM system, how to select a good feature from existing methods is an important problem. In this paper, we aim to help different users for making decision by an experimental way for large-scale 3D reconstruction where many high resolution images are captured. To this end, we make a comprehensive evaluation of several local features on the ground truth datasets. Experimental results show that SIFT and SURF have a better performance than that of some binary features such as ORB and BRISK.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Agrawal M, Konolige K, Blas MR (2008) Censure: center surround extremas for realtime feature detection and matching. Computer Vision–ECCV 2008, Springer, pp 102–115

  2. Alcantarilla PF, Bartoli A, Davison AJ (2012) KAZE features. Computer Vision–ECCV 2012, Springer, pp 214–227

  3. Bao S, Savarese S (2011) Semantic structure from motion. In: Proceedings of the 2011 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp 2025–2032

  4. Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. pp 404–417

  5. Calonder M, Lepetit V, Strecha C, Fua P (2010) Brief: binary robust independent elementary features. Computer Vision–ECCV 2010, pp 778–792

  6. Cao M, Li S, Jia W, Li S, Liu X (2017) Robust bundle adjustment for large-scale structure from motion. Multimed Tools Appl 76(21):21843–21867

    Article  Google Scholar 

  7. Cao MW, Jia W, Zhao Y, Li SJ, Liu XP (2017) Fast and robust absolute camera pose estimation with known focal length. Neural Computing and Applications, July 07, 2017

  8. Cheng J, Leng C, Wu J, Cui H, Lu H (2014, June) Fast and accurate image matching with cascade hashing for 3d reconstruction. In: 2014 I.E. Conf. on Computer Vision and Pattern Recognition (CVPR), pp 1–8

  9. Crandall D, Owns A, Snavely N, Hutenlocher D (2011) Discrete-continuous optimization for large-scale structure from motion. In: Proceedings of the 2011 I.E. Conference on Computer Vision and Pattern Recognition (CVPR), pp 3001–3008

  10. Dong Z, Zhang G, Jia J, Bao H (2009) Keyframe-based real-time camera tracking. In: Proceedings of the 2009 12th IEEE International Conference on Computer Vision (ICCV), pp 1538–1545

  11. Forssén P-E, Lowe DG (2007) Shape descriptors for maximally stable extremal regions. pp 1–8

  12. Frahm J, George P, Gallup D, Johnson T, Raguran R, Wu C, Jen Y, Dunn E, Clipp B Lazebnik S, Pollefeys M (2010) Building Rome on a cloudless day. In: Proceedings of the 11th European Conference on Computer Vision (ECCV), pp 368–381

  13. Furukawa Y, Ponce J (2007) Accurate, dense, and robust multi-view stereopsis. CVPR '07, IEEE conference on computer vision and pattern recognition, 2007, IEEE, pp 1–8

  14. Hartley RI, Sturm P (1997) Triangulation. Comput Vis Image Underst 68(2):146–157

    Article  Google Scholar 

  15. Heinly J, Dunn E, Frahm J-M (2012) Comparative evaluation of binary features. Computer Vision–ECCV 2012, Springer, pp 759–773

  16. Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. vol. 2, pp II-506–II-513

  17. Klein G, Murray D (2007) Parallel tracking and mapping for small AR workspaces. pp 225–234

  18. Leutenegger S, Chli M, Siegwart RY (2011) BRISK: binary robust invariant scalable keypoints. pp 2548–2555

  19. Levi G, Hassner T (2015) LATCH: learned arrangements of three patch codes. arXiv preprint arXiv:1501.03719

  20. Li H, Hartley R (2006) Five-point motion estimation made easy. In: International conference on pattern recognition, IEEE, pp 630–633

  21. Li P, Wang D, Wang L, Lu H (2017) Deep visual tracking: review and experimental comparison. Pattern Recogn 76:323–338

    Article  Google Scholar 

  22. Liu J, Liang X (2011) I-BRIEF: a fast feature point descriptor with more robust features. Seventh international conference on signal image technology & internet-based systems, IEEE computer society, pp 322–328

  23. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  24. Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2017) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things J PP(99):1–1

    Article  Google Scholar 

  25. Lu H, Li B, Zhu J, Li Y, Li Y, Xu X, He L, Li X, Li J, Serikawa S (2017) Wound intensity correction and segmentation with convolutional neural networks. Concurrency Computat Pract Exp 29(6)

  26. Lu H, Li Y, Chen M, Kim H, Serikawa S (2017) Brain intelligence: go beyond artificial intelligence. Mobile Networks and Applications, pp 1–8. https://doi.org/10.1007/s11036-017-0932-8

  27. Mair E, Hager GD, Burschka D, Suppa M, Hirzinger G (2010) Adaptive and generic corner detection based on the accelerated segment test. Computer Vision–ECCV 2010, Springer, pp 183–196

  28. Morel J-M, Yu G (2009) ASIFT: a new framework for fully affine invariant image comparison. SIAM J Imag Sci 2(2):438–469

    Article  MathSciNet  MATH  Google Scholar 

  29. Moulon P, Monasse P, Marlet R (2013) Global fusion of relative motions for robust, accurate and scalable structure from motion. In: Proceedings of the 2013 I.E. International Conference on Computer Vision (ICCV). pp 3248–3255

  30. Ni K, Dellaert F (2012) HyperSfM. In: Second international conference on 3d imaging, modeling, processing, visualization & transmission, IEEE computer society, pp 144–151

  31. Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. Computer Vision–ECCV 2006, Springer, pp 430–443

  32. Rublee E, Rabaud V, Konolige K, Bradski G (2013) ORB: an efficient alternative to SIFT or SURF. pp 2564–2571

  33. Schönberger JL, Frahm J-M (2016) Structure-from-motion revisited. Comput Vis Pattern Recognit pp 4104–4113

  34. Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Comput Electr Eng 40(1):41–50

    Article  Google Scholar 

  35. Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. ACM Trans Graph (TOG) 25(3):835–846

    Article  Google Scholar 

  36. Sweeney C, Sattler T, Hollerer T, Turk M, Pollefey M (2015) Optimizing the viewing graph for structure-from-motion. In: Proceedings of the 2015 I.E. International Conference on Computer Vision (ICCV), pp 801–809

  37. Triggs B, McLuchlan P, Hartley R, Fitzgibbon A (1999) Bundle adjustment—a modern synthesis. In: Vision algorithms: theory and practice pp 298–372

  38. Trzcinski T, Christoudias M, Fua P, et al. (2013) Boosting binary keypoint descriptors. In: Computer vision and pattern recognition, IEEE, pp 2874–2881

  39. Vandergheynst P, Ortiz R, Alahi A (2012) FREAK: fast retina keypoint. Computer Vision and Pattern Recognition, IEEE, pp 510–517

  40. Wang D, Lu H, Xiao Z, Yang MH (2015) Inverse sparse tracker with a locally weighted distance metric. IEEE Trans Image Process 24(9):2646–2657

    Article  MathSciNet  Google Scholar 

  41. Wang T, Kohli P, Mitra N (2015) Dynamic SFM: detecting scene changes from image pairs. Comput Graphics Forum 34(5):177–189

    Article  Google Scholar 

  42. Wang D, Lu H, Yang MH (2016) Robust visual tracking via least soft-threshold squares. IEEE Press, Piscataway

    Google Scholar 

  43. Wilson K, Snavely N (2014) Robust global translations with 1DSFM. In: Proceedings of the 13th European Conference on Computer Vision (ECCV), pp 61–75

  44. Wu C (2011) SiftGPU: a GPU implementation of scale invariant feature transform. URL http://cs.unc.edu/~ccwu/siftgpu

  45. Wu C (2013) Towards linear-time incremental structure from motion. In: Proceedings of the 2013 International Conference on 3D Vision (3DV), pp 127–134

  46. Wu C (2015) P3.5P: pose estimation with unknown focal length. IEEE conference on computer vision and pattern recognition, IEEE computer society, pp 2440–2448

  47. Xiao J, Owens A, Torralba A (2013) SUN3D: a database of big spaces reconstructed using sfm and object labels. In: Proceedings of the 2013 I.E. International Conference on Computer Vision (ICCV), pp 1625–1632

  48. Yang X, Cheng KT (2012) LDB: an ultra-fast feature for scalable augmented reality on mobile devices. IEEE international symposium on mixed and augmented reality, IEEE computer society, pp 49–57

  49. Yang X, Cheng K-T (2014) Learning optimized local difference binaries for scalable augmented reality on mobile devices. IEEE Trans Vis Comput Graph 20(6):852–865

    Article  Google Scholar 

  50. Yang X, Cheng K-T (2014) Local difference binary for ultrafast and distinctive feature description. IEEE Trans Pattern Anal Mach Intell 36(1):188–194

    Article  Google Scholar 

  51. Yi KM, Trulls E, Lepetit V, Fua P (2016) LIFT: learned invariant feature transform. arXiv preprint arXiv:1603.09114

  52. Zach C (2010) ETH-V3D structure-and-motion software.© 2010–2011. ETH Zurich, Zurich

    Google Scholar 

  53. Zach C (2014) Robust bundle adjustment revisited. In: Proceedings of the 13th European Conference on Computer Vision (ECCV), pp 772–787

  54. Zhang G, Liu H, Dong Z, Jia J, Wong, T, Bao H (2015) ENFT: efficient non-consecutive feature tracking for robust structure-from-motion. arXiv preprint arXiv:1510.08012

  55. Zhang G, Liu H, Dong Z et al (2015) Efficient non-consecutive feature tracking for robust structure-from-motion. IEEE Trans Image Process 25(12):5957–5970

    Article  MathSciNet  Google Scholar 

  56. Zheng E, Wu C (2015) Structure from motion using structure-less resection. In: Proceedings of the 2015 I.E. International Conference on Computer Vision (ICCV), pp 2075–2083

Download references

Acknowledgments

This work is partly supported by the grants of the National Science Foundation of China, Nos. 61673157, and 61402018, the grant of the Natural Science Foundation of Anhui Province, Nos. KJ2014ZD27 and JZ2015AKZR0664, and also supported by the National Key Research and Development Plan under Grant No. 2016YFC0800100.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Cao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, M., Cao, L., Jia, W. et al. Evaluation of Local Features for Structure from Motion. Multimed Tools Appl 77, 10979–10993 (2018). https://doi.org/10.1007/s11042-018-5864-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5864-1

Keywords

Navigation