Abstract
Camera focal length estimation from a single image is of great importance for many computer vision tasks. Unfortunately, previous methods cannot achieve satisfactory accuracy yet. This paper proposes a focal length estimation approach based on distances among four points in the scene of a single image. The issue is cast into a nonlinear optimization by adapting a standard pinhole camera model under the constraints of distance information. Multiple algorithms are employed to solve the optimization, and the best solution is then regarded as the final solution. Experimental results show that our method is able to obtain a more accurate focal length than some state of the art in a single image setting. In addition, we provide some simple application examples and show the intuitive effects of focal length estimation errors. We also demonstrate experimentally that distance information has an improved meaning for the solution of the focal length.
Similar content being viewed by others
References
Abbas, S.A., Zisserman, A.: A Geometric Approach to Obtain a Bird’s Eye View From an Image. In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), pp. 4095–4104. IEEE, Seoul, Korea (South) (2019). 10.1109/ICCVW.2019.00504
Barreto, J.P.: A unifying geometric representation for central projection systems. Computer Vision Image Understand. 103(3), 208–217 (2006). https://doi.org/10.1016/j.cviu.2006.06.003
Bogdan, O., Eckstein, V., Rameau, F., Bazin, J.C.: DeepCalib: A deep learning approach for automatic intrinsic calibration of wide field-of-view cameras. In: Proceedings of the 15th ACM SIGGRAPH European Conference on Visual Media Production, CVMP ’18, pp. 1–10. Association for Computing Machinery, London, United Kingdom (2018). 10.1145/3278471.3278479
Cao, Y.T., Wang, J.M., Sun, Y.K., Duan, X.J.: Circle Marker Based Distance Measurement Using a Single Camera. LNSE pp. 376–380 (2013). 10.7763/LNSE.2013.V1.80
Caprile, B., Torre, V.: Using vanishing points for camera calibration. Int. J. Comput. Vision 4(2), 127–139 (1990). https://doi.org/10.1007/BF00127813
Chen, H.T.: Geometry-based camera calibration using five-point correspondences from a single image. IEEE Trans. Circuits Syst. Video Technol. 27(12), 2555–2566 (2017). https://doi.org/10.1109/TCSVT.2016.2595319
Chen, Q., Wu, H., Wada, T.: Camera calibration with two arbitrary coplanar circles. In: Pajdla, T., Matas, J. (eds.) Computer vision - ECCV (2004). Springer, Berlin Heidelberg (2004)
Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust region methods. Soc. Indus. Appl. Math. (2000). https://doi.org/10.1137/1.9780898719857
Coughlan, J., Yuille, A.: Manhattan World: Compass direction from a single image by Bayesian inference. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 941–947. IEEE, Kerkyra, Greece (1999). 10.1109/ICCV.1999.790349
Deutscher, J., Isard, M., MacCormick, J.: Automatic camera calibration from a single Manhattan image. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) Computer vision – ECCV, 2002. Springer, Berlin Heidelberg (2002)
Godard, C., Aodha, O.M., Firman, M., Brostow, G.: Digging Into Self-Supervised Monocular Depth Estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3827–3837. IEEE, Seoul, Korea (South) (2019). 10.1109/ICCV.2019.00393
Gopalan, R., Li, R., Chellappa, R.: Domain adaptation for object recognition: An unsupervised approach. In: 2011 International Conference on Computer Vision, pp. 999–1006. IEEE, Barcelona, Spain (2011). 10.1109/ICCV.2011.6126344
Gordon, A., Li, H., Jonschkowski, R., Angelova, A.: Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8976–8985. IEEE, Seoul, Korea (South) (2019). 10.1109/ICCV.2019.00907
Hestenes, M., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Nat. Bur. Stan. 49(6), 409 (1952). https://doi.org/10.6028/jres.049.044
Hold-Geoffroy, Y., Sunkavalli, K., Eisenmann, J., Fisher, M., Gambaretto, E., Hadap, S., Lalonde, J.F.: A Perceptual Measure for Deep Single Image Camera Calibration. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2354–2363. IEEE, Salt Lake City, UT, USA (2018). 10.1109/CVPR.2018.00250
Jancosek, M., Pajdla, T.: Multi-view reconstruction preserving weakly-supported surfaces. In: CVPR 2011, pp. 3121–3128. IEEE, Colorado Springs, CO, USA (2011). 10.1109/CVPR.2011.5995693
Jiang, G., Quan, L.: Detection of concentric circles for camera calibration. In: Tenth IEEE International Conference on Computer Vision (ICCV’05), vol. 1, pp. 333–340. IEEE, Beijing, China (2005). 10.1109/ICCV.2005.73
Levenberg, K.: A method for the solution of certain non-linear problems in least squares. Quart. Appl. Math. 2(2), 164–168 (1944). https://doi.org/10.1090/qam/10666
Li, B., Peng, K., Ying, X., Zha, H.: Simultaneous vanishing point detection and camera calibration from single images. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Chung, R., Hammound, R., Hussain, M., Kar-Han, T., Crawfis, R., Thalmann, D., Kao, D., Avila, L. (eds.) Advances in visual computing. Springer, Berlin Heidelberg (2010)
Miyagawa, I., Arai, H., Koike, H.: Simple camera calibration from a single image using five points on two orthogonal 1-D objects. IEEE Trans. on Image Process. 19(6), 1528–1538 (2010). https://doi.org/10.1109/TIP.2010.2042118
Moulon, P., Monasse, P., Marlet, R.: Adaptive structure from motion with a contrario model estimation. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) Computer vision - ACCV. Springer, Berlin Heidelberg (2013)
Ricolfe-Viala, C., Sánchez-Salmerón, A.J.: Robust metric calibration of non-linear camera lens distortion. Pattern Recogn. 43(4), 1688–1699 (2010). https://doi.org/10.1016/j.patcog.2009.10.003
Virtanen, P., et al.: SciPy 1.0 Fundamental algorithms for scientific computing in Python. Nat. Methods 14(3), 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
Song, W., Wang, Y., Li, H.X., Cai, Z.: Locating multiple optimal solutions of nonlinear equation systems based on multiobjective optimization. IEEE Trans. Evol. Computat. 19(3), 414–431 (2015). https://doi.org/10.1109/TEVC.2014.2336865
Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997). https://doi.org/10.1023/A:1008202821328
Torralba, Murphy, Freeman, Rubin: Context-based vision system for place and object recognition. In: Proceedings Ninth IEEE International Conference on Computer Vision, vol. 1, pp. 273–280. IEEE, Nice, France (2003). 10.1109/ICCV.2003.1238354
Tsai, R.: A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses. IEEE J. Robot. Automat. 3(4), 323–344 (1987). https://doi.org/10.1109/JRA.1987.1087109
Wales, D.J., Doye, J.P.K.: Global optimization by basin-hopping and the lowest energy structures of lennard-jones clusters containing up to 110 atoms. J. Phys. Chem. A 101(28), 5111–5116 (1997). https://doi.org/10.1021/jp970984n
Wildenauer, H., Hanbury, A.: Robust camera self-calibration from monocular images of Manhattan worlds. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2831–2838. IEEE, Providence, RI (2012). 10.1109/CVPR.2012.6248008
Workman, S., Greenwell, C., Zhai, M., Baltenberger, R., Jacobs, N.: DEEPFOCAL: A method for direct focal length estimation. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1369–1373. IEEE, Quebec City, QC, Canada (2015). 10.1109/ICIP.2015.7351024
Yan, H., Zhang, Y., Zhang, S., Zhao, S., Zhang, L.: Focal length estimation guided with object distribution on FocaLens dataset. J. Electron Imag. 26(3), 018–033 (2017). https://doi.org/10.1117/1.JEI.26.3.033018
Yin, W., Zhang, J., Wang, O., Niklaus, S., Mai, L., Chen, S., Shen, C.: Learning to Recover 3D Scene Shape from a Single Image. arXiv:2012.09365 [cs] (2020). http://arxiv.org/abs/2012.09365
Zhang, C., Rameau, F., Kim, J., Argaw, D.M., Bazin, J.C., Kweon, I.S.: DeepPTZ: Deep Self-Calibration for PTZ Cameras. In: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1030–1038. IEEE, Snowmass Village, CO, USA (2020). 10.1109/WACV45572.2020.9093629
Zhang, Z.: A flexible new technique for camera calibration. IEEE Trans. Pattern Anal. Machine Intell. 22(11), 1330–1334 (2000). https://doi.org/10.1109/34.888718
Zhang, Z.: Camera calibration with one-dimensional objects. IEEE Trans. Pattern Anal. Machine Intell. 26(7), 892–899 (2004). https://doi.org/10.1109/TPAMI.2004.21
Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: Algorithm 778: L-BFGS-B: fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. 23(4), 550–560 (1997). https://doi.org/10.1145/279232.279236
Funding
The research in this paper is funded by National Natural Science Foundation of China (NSFC Nos. 51978271 and 61972160), Guangdong Basic and Applied Basic Research Foundation under Grant 2020A1515010699 and 2021A1515012301, Natural Science Fund of Guangdong Province under Grant 2019A1515011793 and 2021A1515011849 and the Fundamental Research Funds for the Central Universities (2020ZYGXZR042).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Improvement of DeepCalib using distances
To use the distance information to improve the DeepCalib [3] results, we need to use DeepCalib to estimate the focal length first. We estimate the focal length of all 189 images using DeepCalib and use them as initial values of the focal length. In addition, to start the optimization process of Eq. (8), we need to perform the initialization of the depth z. Here we have two options: one is to estimate the depths by depth estimation methods, for example, MonoDepth2 [11] and scale them, and the other is to still optimize by initializing the depths randomly several times. In the first scheme, since MonoDepth2 only produces depths without truth scale information and its accuracy is still questionable, the second option is more likely to give suitable initial values of depth. Therefore, we randomly generate the initial depth values. By initializing CFLD in this way, we obtain the results in Fig. 11. We set the number of random generations to 5, that is, \(k=5\).
Figure 11 shows the results after optimization using the distance information. As shown in the left panel, the distance information does improve the estimation of the focal length over the entire dataset. However, in the right panel, not every result on every image is improved using distance information. Two reasons are causing this result. On the one hand, our method itself still suffers from defects related to the scale problem. On the other hand, the initial values of focal length and depth estimated by the other methods are not precise enough to make the optimized results fall at the proper minimum. After all, distance information improves the accuracy of the other methods in general, which demonstrates the improved meaning of distance information.
1.2 What factors influence the obtained performance?
To investigate the impact of camera quality on our method, we conducted a set of experiments on several major camera quality factors including lighting condition (this may be due to change of external lighting or internal exposure parameters), resolution and image distortion. In the end, we find that low resolution and distortion have a relatively strong impact on the accuracy of our algorithm.
1.2.1 Illumination
Illumination is a crucial aspect of many deep-learning based algorithms because it changes the RGB value of pixels. We took photos of the same scene under multiple lighting conditions from a fixed camera perspective and found that the lighting does not affect the results (see Fig. 12) while using the same marked points. It can infer that our algorithm is not affected by color changes, because the input of our algorithm does not involve the RGB value of pixels.
1.2.2 Resolution
Resolution is also an important factor impacting the performance of image processing algorithms. We use downsampling to obtain different resolution images while maintaining the consistency of the input marker points. In each downsampling, the coordinates of the marker points are simultaneously scaled by half but the physical distance between them remains unchanged. We performed 10 iterations of the experiment for 5 different resolutions. The algorithm is less effective and more unstable as the resolution getting lower as shown in Fig. 13.
1.2.3 Distortion
We use the fisheye function of the camera to simulate distortion. It is adjusted step by step through the given distortion degree inside the camera. Since our method does not model the camera distortion, it is unsuitable to deal with images with severe distortion (see Fig. 14). One of our future works is to extend the model to consider more complex situations including camera distortion.
Rights and permissions
About this article
Cite this article
Xiong, Y., Lin, Z., Li, G. et al. Camera focal length from distances in a single image. Vis Comput 37, 2869–2881 (2021). https://doi.org/10.1007/s00371-021-02233-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-021-02233-z