Abstract
Camera motion estimation is a standard yet critical step to endoscopic visualization. It is affected by the variation of locations and correspondences of features detected in 2D images. Feature detectors and descriptors vary, though one of the most widely used remains SIFT. Practitioners usually also adopt its feature matching strategy, which defines inliers as the feature pairs subjecting to a global affine transformation. However, for endoscopic videos, we are curious if it is more suitable to cluster features into multiple groups. We can still enforce the same transformation as in SIFT within each group. Such a multi-model idea has been recently examined in the Multi-Affine work, which outperforms Lowe’s SIFT in terms of re-projection error on minimally invasive endoscopic images with manually labelled ground-truth matches of SIFT features. Since their difference lies in matching, the accuracy gain of estimated motion is attributed to the holistic Multi-Affine feature matching algorithm. But, more concretely, the matching criterion and point searching can be the same as those built in SIFT. We argue that the real variation is only the motion model verification. We either enforce a single global motion model or employ a group of multiple local ones. In this paper, we investigate how sensitive the estimated motion is affected by the number of motion models assumed in feature matching. While the sensitivity can be analytically evaluated, we present an empirical analysis in a leaving-one-out cross validation setting without requiring labels of ground-truth matches. Then, the sensitivity is characterized by the variance of a sequence of motion estimates. We present a series of quantitative comparison such as accuracy and variance between Multi-Affine motion models and the global affine model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Tokgozoglu, H.N., Meisner, E.M., Kazhdan, M., Hager, G.D.: Color-based hybrid reconstruction for endoscopy. In: CVPR Workshops (2012)
Mirota, D., Wang, H., Taylor, R.H., Ishii, M., Gallia, G.L., Hager, G.D.: A system for video-based navigation for endoscopic endonasal skull base surgery. IEEE T-MI 31, 963–976 (2012)
Mori, K., Deguchi, D., Akiyama, K., Kitasaka, T., Maurer, C.R., Suenaga, Y., Takabatake, H., Mori, M., Natori, H.: Hybrid bronchoscope tracking using a magnetic tracking sensor and image registration. In: MICCAI (2005)
Ma, Y., Soatto, S., Kosecka, J., Sastry, S.: An Invitation to 3-D Vision. Springer, Berlin (2004)
Wu, C.: VisualSFM: A Visual Structure from Motion System. http://ccwu.me/vsfm/ (2011)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004)
Collins, T., Bartoli, A.: Towards live monocular 3D laparoscopy using shading and specularity information. In: Abolmaesumi, P., Joskowicz, L., Navab, N., Jannin, P. (eds.) IPCAI 2012. LNCS, vol. 7330, pp. 11–21. Springer, Heidelberg (2012)
Nister, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26 756–770 (2004)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. In: BMVC (2002)
Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60 91–110 (2005)
Puerto-Souza, G.A., Mariottini, G.L.: Adaptive multi-affine (ama) feature-matching algorithm and its application to minimally-invasive surgery images. In: MICCAI (2012)
Puerto-Souza, G.A., Mariottini, G.L.: Hierarchical multi-affine (HMA) algorithm for fast and accurate feature matching in minimally-invasive surgical images. In: IEEE IROS (2012)
Puerto-Souza, G.A., Mariottini, G.L.: A fast and accurate feature-matching algorithm for minimally invasive endoscopic images. In: IEEE T-MI (2013)
Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, Berlin (2010)
Abretske, D., Mirota, D., Hager, G.D., Ishii, M.: Intelligent frame selection for anatomic reconstruction from endoscopic video. In: WACV (2009)
Mirota, D.: Video-based navigation with application to endoscopic skull base surgery. Ph.D. dissertation, Johns Hopkins University Computer Science (2012)
Wikipedia: Quaternion. http://en.wikipedia.org/wiki/Quaternion
Wikipedia: Euler Angles. http://en.wikipedia.org/wiki/Euler_angles
Wikipedia: Rotation Formalisms in Three Dimensions. http://en.wikipedia.org/wiki/Rotation_formalisms_in_three_dimensions
Wikipedia: Euler’s Rotation Theorem. http://en.wikipedia.org/wiki/Euler’s_rotation_theorem
Caltech Vision Lab: Camera Calibration Toolbox for Matlab. http://www.vision.caltech.edu/bouguetj/calib_doc/ (2010)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. http://www.vlfeat.org/ (2008)
Puerto, G.A., Mariottini, G.L.: HMA feature-matching toolbox. http://ranger.uta.edu/~gianluca/feature_matching/ (2012)
Torr, P.: Structure and motion toolkit. http://www.mathworks.com (2004)
Acknowledgements
This work is supported by NIH of USA under grant R01 EB015530. The first author is grateful for the fellowship from China Scholarship Council.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Xiang, X., Mirota, D., Reiter, A., Hager, G.D. (2014). Is Multi-model Feature Matching Better for Endoscopic Motion Estimation?. In: Luo, X., Reichl, T., Mirota, D., Soper, T. (eds) Computer-Assisted and Robotic Endoscopy. CARE 2014. Lecture Notes in Computer Science(), vol 8899. Springer, Cham. https://doi.org/10.1007/978-3-319-13410-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-13410-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13409-3
Online ISBN: 978-3-319-13410-9
eBook Packages: Computer ScienceComputer Science (R0)