skip to main content
research-article

Investigation of Single Image Depth Prediction Under Different Lighting Conditions: A Case Study of Ancient Roman Coins

Authors Info & Claims
Published:16 July 2021Publication History
Skip Abstract Section

Abstract

This article investigates the limitations of single image depth prediction (SIDP) under different lighting conditions. Besides that, it also offers a new approach to obtain the ideal condition for SIDP. To satisfy the data requirement, we exploit a photometric stereo dataset consisting of several images of an object under different light properties. In this work, we used a dataset of ancient Roman coins captured under 54 different lighting conditions to illustrate how the approach is affected by them. This dataset emulates many lighting variances with a different state of shading and reflectance common in the natural environment. The ground truth depth data in the dataset was obtained using the stereo photometric method and used as training data. We investigated the capabilities of three different state-of-the-art methods to reconstruct ancient Roman coins with different lighting scenarios. The first investigation compares the performance of a given network using previously trained data to check cross-domains performance. Second, the model is fine-tuned from pre-trained data and trained using 70% of the ancient Roman coin dataset. Both models are tested on the remaining 30% of the data. As evaluation metrics, root mean square error and visual inspection are used. As a result, the methods show different characteristic results based on the lighting condition of the test data. Overall, they perform better at 51° and 71° angles of light, so-called ideal condition afterward. However, they perform worse at 13° and 32° because of the high density of shadows. They also cannot reach the best performance at 82° caused by the reflection that appears on the image. Based on these findings, we propose a new approach to reduce the shadows and reflections on the image using intrinsic image decomposition to achieve a synthetic ideal condition. Based on the results of synthetic images, this approach can enhance the performance of SIDP. For some state-of-the-art methods, it also achieves better results than previous original RGB images.

References

  1. Baijiang Fan, Yunbo Rao, Wei Liu, Qifei Wang, and Huaiyu Wen. 2017. Region-based growing algorithm for 3D reconstruction from MRI images. In Proceedings of the 2nd International Conference on Image, Vision, and Computing (ICIVC’17). 521–525.Google ScholarGoogle Scholar
  2. Christoph Baur, Shadi Albarqouni, Stefanie Demirci, Nassir Navab, and Pascal Fallavollita. 2016. CathNets: Detection and single-view depth prediction of catheter electrodes. In Medical Imaging and Augmented Reality, Guoyan Zheng, Hongen Liao, Pierre Jannin, Philippe Cattin, and Su-Lin Lee (Eds.). Springer International, Cham, Switzerland, 38–49.Google ScholarGoogle Scholar
  3. Simon Brenner, Sebastian Zambanini, and Robert Sablatnig. 2018. An investigation of optimal light source setups for photometric stereo reconstruction of historical coins. In Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage. https://doi.org/10.2312/gch.20181362Google ScholarGoogle Scholar
  4. Y. Cao, Z. Wu, and C. Shen. 2018. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology 28, 11 (2018), 3174–3182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2018), 834–848.Google ScholarGoogle ScholarCross RefCross Ref
  6. David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems 27. Curran Associates, Red Hook, NY, 2366–2374.Google ScholarGoogle Scholar
  7. Haoqiang Fan, Hao Su, and Leonidas Guibas. 2017. A point set generation network for 3D object reconstruction from a single image. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2463–2471. https://doi.org/10.1109/CVPR.2017.264Google ScholarGoogle ScholarCross RefCross Ref
  8. Aufaclav Frisky, Adieyatna Fajri, Simon Brenner, and Robert Sablatnig. 2020. Acquisition evaluation on outdoor scanning for archaeological artifact digitalization. In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging, and Computer Graphics Theory and Applications(VISIGRAPP’20). 792–799. https://doi.org/10.5220/0008964907920799Google ScholarGoogle ScholarCross RefCross Ref
  9. Aufaclav Frisky, Andi Putranto, Sebastian Zambanini, and Robert Sablatnig. 2021. MCCNet: Multi-color cascade network with weight transfer for single image depth prediction on outdoor relief images. In Pattern Recognition. ICPR International Workshops and Challenges. Lecture Notes in Computer Science, Vol. 12667. Springer, 263–278. https://doi.org/10.5220/0008964907920799Google ScholarGoogle Scholar
  10. Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’18). 2002–2011. https://doi.org/10.1109/CVPR.2018.00214Google ScholarGoogle ScholarCross RefCross Ref
  11. Andreas Geiger, P. Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research 32 (Sept. 2013), 1231–1237. https://doi.org/10.1177/0278364913491297Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Geiger, P. Lenz, and R. Urtasun. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. 3354–3361.Google ScholarGoogle Scholar
  13. M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S. M. Seitz. 2007. Multi-view stereo for community photo collections. In Proceedings of the IEEE 11th International Conference on Computer Vision. 1–8.Google ScholarGoogle Scholar
  14. D. B. Goldman, B. Curless, A. Hertzmann, and S. M. Seitz. 2010. Shape and spatially-varying BRDFs from photometric stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 6 (2010), 1060–1071.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hanry Ham, Julian Wesley, and Hendra Hendra. 2019. Computer vision based 3D reconstruction: A review. International Journal of Electrical and Computer Engineering 9 (Aug. 2019), 2394. https://doi.org/10.11591/ijece.v9i4.pp2394-2402Google ScholarGoogle ScholarCross RefCross Ref
  16. L. He, G. Wang, and Z. Hu. 2018. Learning depth from single images with deep neural network embedding focal length. IEEE Transactions on Image Processing 27, 9 (2018), 4676–4689.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jianbo Jiao, Ying Cao, Yibing Song, and Rynson Lau. 2018. Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In Computer Vision—ECCV 2018. Lecture Notes in Computer Science, Vol. 11219. Springer, 55–71. https://doi.org/10.1007/978-3-030-01267-0_4Google ScholarGoogle ScholarCross RefCross Ref
  18. Seungryong Kim, Kihong Park, Kwanghoon Sohn, and Stephen Lin. 2016. Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In Computer Vision—ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer, Cham, Switzerland, 143–159.Google ScholarGoogle ScholarCross RefCross Ref
  19. Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco Körner. 2018. Evaluation of CNN-based single-image depth estimation methods. In Proceedings of ECCV Workshops. 1–17.Google ScholarGoogle Scholar
  20. N. Kong and M. J. Black. 2015. Intrinsic depth: Improving depth transfer with intrinsic images. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 3514–3522. https://doi.org/10.1109/ICCV.2015.401Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In Proceedings of the 2016 4th International Conference on 3D Vision (3DV’16). 239–248.Google ScholarGoogle Scholar
  22. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.Google ScholarGoogle ScholarCross RefCross Ref
  23. Jae-Han Lee and Chang-Su Kim. 2019. Monocular depth estimation using relative depth maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 9729–9738.Google ScholarGoogle ScholarCross RefCross Ref
  24. Louis Lettry, Kenneth Vanhoey, and Luc Van Gool. 2018. Unsupervised deep single-image intrinsic decomposition using illumination-varying image sequences. Computer Graphics Forum 37, 10 (Oct. 2018), 409–419.Google ScholarGoogle ScholarCross RefCross Ref
  25. F. Liu, C. Shen, G. Lin, and I. Reid. 2016. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 10 (2016), 2024–2039.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Robert Maier, Kihwan Kim, Daniel Cremers, Jan Kautz, and Matthias Niessner. 2017. Intrinsic3D: High-quality 3D reconstruction by joint appearance and geometry optimization with spatially-varying lighting. In Proceedings of the International Conference on Computer Vision (ICCV’17). 3133–3141. https://doi.org/10.1109/ICCV.2017.338Google ScholarGoogle ScholarCross RefCross Ref
  27. M. Menze and A. Geiger. 2015. Object scene flow for autonomous vehicles. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3061–3070.Google ScholarGoogle Scholar
  28. Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In Computer Vision—ECCV 2012. Lecture Notes in Computer Science, Vol. 7576. Springer, 746–760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Art B. Owen. 2007. A robust hybrid of lasso and ridge regression. Contemporary Mathematics 1 (2007), 443.Google ScholarGoogle Scholar
  30. G. Oxholm and K. Nishino. 2014. Multiview shape and reflectance from natural illumination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 2163–2170.Google ScholarGoogle Scholar
  31. Jiao Pan, Liang Li, Hiroshi Yamaguchi, Kyoko Hasegawa, Fadjar I. Thufail, Bramantara, and Satoshi Tanaka. 2019. 3D transparent visualization of relief-type cultural heritage assets based on depth reconstruction of old monocular photos. In Methods and Applications for Modeling and Simulation of Complex Systems, Gary Tan, Axel Lehmann, Yong Meng Teo, and Wentong Cai (Eds.). Springer Singapore, Singapore, 187–198.Google ScholarGoogle Scholar
  32. Jiao Pan, Liang Li, Hiroshi Yamaguchi, Kyoko Hasegawa, Fadjar I. Thufail, Bra Mantara, and Satoshi Tanaka. 2018. 3D reconstruction and transparent visualization of indonesian cultural heritage from a single image. In Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage. 207–210. https://doi.org/10.2312/gch.20181363Google ScholarGoogle Scholar
  33. Peng Wang, Xiaohui Shen, Zhe Lin, S. Cohen, B. Price, and A. Yuille. 2015. Towards unified depth and semantic prediction from a single image. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2800–2809.Google ScholarGoogle Scholar
  34. Yvain Queau, Francois Lauze, and Jean-Denis Durou. 2014. Solving uncalibrated photometric stereo using total variation. Journal of Mathematical Imaging and Vision 52 (May 2014), 87–107. https://doi.org/10.1007/s10851-014-0512-5Google ScholarGoogle Scholar
  35. Michael Ramamonjisoa and Vincent Lepetit. 2019. SharpNet: Fast and accurate recovery of occluding contours in monocular depth estimation. In Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCVW’19). 2109–2118. https://doi.org/10.1109/ICCVW.2019.00266Google ScholarGoogle ScholarCross RefCross Ref
  36. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International, Cham, Switzerland, 234–241.Google ScholarGoogle ScholarCross RefCross Ref
  37. A. Saxena, M. Sun, and A. Y. Ng. 2009. Make3D: Learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 5 (May 2009), 824–840. https://doi.org/10.1109/TPAMI.2008.132Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. E. Shelhamer, J. T. Barron, and T. Darrell. 2015. Scene intrinsics and depth from a single image. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW’15). 235–242. https://doi.org/10.1109/ICCVW.2015.39Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. N. Silberman and R. Fergus. 2011. Indoor scene segmentation using a structured light sensor. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops’11). 601–608.Google ScholarGoogle Scholar
  40. G. Vogiatzis, P. Favaro, and R. Cipolla. 2005. Using frontier points to recover shape, reflectance and illumination. In Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV’05), Vol. 1. 228–235.Google ScholarGoogle Scholar
  41. R. White and D. A. Forsyth. 2006. Combining cues: Shape from shading and texture. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. 1809–1816.Google ScholarGoogle Scholar
  42. D. Xu, E. Ricci, W. Ouyang, X. Wang, and N. Sebe. 2019. Monocular depth estimation using multi-scale continuous CRFs as sequential deep networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 6 (2019), 1426–1440.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang. 2018. DenseASPP for semantic segmentation in street scenes. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3684–3692.Google ScholarGoogle Scholar
  44. W. Yin, Y. Liu, C. Shen, and Y. Yan. 2019. Enforcing geometric constraints of virtual normal for depth prediction. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV’19). 5683–5692.Google ScholarGoogle Scholar
  45. Y. Zhang, S. Song, E. Yumer, M. Savva, J. Lee, H. Jin, and T. Funkhouser. 2017. Physically-based rendering for indoor scene understanding using convolutional neural networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5057–5065.Google ScholarGoogle Scholar

Index Terms

  1. Investigation of Single Image Depth Prediction Under Different Lighting Conditions: A Case Study of Ancient Roman Coins

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Journal on Computing and Cultural Heritage
        Journal on Computing and Cultural Heritage   Volume 14, Issue 4
        December 2021
        328 pages
        ISSN:1556-4673
        EISSN:1556-4711
        DOI:10.1145/3476246
        Issue’s Table of Contents

        Copyright © 2021 ACM

        © 2021 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 July 2021
        • Accepted: 1 May 2021
        • Received: 1 November 2020
        Published in jocch Volume 14, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)32
        • Downloads (Last 6 weeks)3

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format