Abstract
This article investigates the limitations of single image depth prediction (SIDP) under different lighting conditions. Besides that, it also offers a new approach to obtain the ideal condition for SIDP. To satisfy the data requirement, we exploit a photometric stereo dataset consisting of several images of an object under different light properties. In this work, we used a dataset of ancient Roman coins captured under 54 different lighting conditions to illustrate how the approach is affected by them. This dataset emulates many lighting variances with a different state of shading and reflectance common in the natural environment. The ground truth depth data in the dataset was obtained using the stereo photometric method and used as training data. We investigated the capabilities of three different state-of-the-art methods to reconstruct ancient Roman coins with different lighting scenarios. The first investigation compares the performance of a given network using previously trained data to check cross-domains performance. Second, the model is fine-tuned from pre-trained data and trained using 70% of the ancient Roman coin dataset. Both models are tested on the remaining 30% of the data. As evaluation metrics, root mean square error and visual inspection are used. As a result, the methods show different characteristic results based on the lighting condition of the test data. Overall, they perform better at 51° and 71° angles of light, so-called ideal condition afterward. However, they perform worse at 13° and 32° because of the high density of shadows. They also cannot reach the best performance at 82° caused by the reflection that appears on the image. Based on these findings, we propose a new approach to reduce the shadows and reflections on the image using intrinsic image decomposition to achieve a synthetic ideal condition. Based on the results of synthetic images, this approach can enhance the performance of SIDP. For some state-of-the-art methods, it also achieves better results than previous original RGB images.
- Baijiang Fan, Yunbo Rao, Wei Liu, Qifei Wang, and Huaiyu Wen. 2017. Region-based growing algorithm for 3D reconstruction from MRI images. In Proceedings of the 2nd International Conference on Image, Vision, and Computing (ICIVC’17). 521–525.Google Scholar
- Christoph Baur, Shadi Albarqouni, Stefanie Demirci, Nassir Navab, and Pascal Fallavollita. 2016. CathNets: Detection and single-view depth prediction of catheter electrodes. In Medical Imaging and Augmented Reality, Guoyan Zheng, Hongen Liao, Pierre Jannin, Philippe Cattin, and Su-Lin Lee (Eds.). Springer International, Cham, Switzerland, 38–49.Google Scholar
- Simon Brenner, Sebastian Zambanini, and Robert Sablatnig. 2018. An investigation of optimal light source setups for photometric stereo reconstruction of historical coins. In Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage. https://doi.org/10.2312/gch.20181362Google Scholar
- Y. Cao, Z. Wu, and C. Shen. 2018. Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Transactions on Circuits and Systems for Video Technology 28, 11 (2018), 3174–3182.Google ScholarDigital Library
- L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. 2018. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 4 (2018), 834–848.Google ScholarCross Ref
- David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. In Advances in Neural Information Processing Systems 27. Curran Associates, Red Hook, NY, 2366–2374.Google Scholar
- Haoqiang Fan, Hao Su, and Leonidas Guibas. 2017. A point set generation network for 3D object reconstruction from a single image. In Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2463–2471. https://doi.org/10.1109/CVPR.2017.264Google ScholarCross Ref
- Aufaclav Frisky, Adieyatna Fajri, Simon Brenner, and Robert Sablatnig. 2020. Acquisition evaluation on outdoor scanning for archaeological artifact digitalization. In Proceedings of the 15th International Joint Conference on Computer Vision, Imaging, and Computer Graphics Theory and Applications(VISIGRAPP’20). 792–799. https://doi.org/10.5220/0008964907920799Google ScholarCross Ref
- Aufaclav Frisky, Andi Putranto, Sebastian Zambanini, and Robert Sablatnig. 2021. MCCNet: Multi-color cascade network with weight transfer for single image depth prediction on outdoor relief images. In Pattern Recognition. ICPR International Workshops and Challenges. Lecture Notes in Computer Science, Vol. 12667. Springer, 263–278. https://doi.org/10.5220/0008964907920799Google Scholar
- Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’18). 2002–2011. https://doi.org/10.1109/CVPR.2018.00214Google ScholarCross Ref
- Andreas Geiger, P. Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research 32 (Sept. 2013), 1231–1237. https://doi.org/10.1177/0278364913491297Google ScholarDigital Library
- A. Geiger, P. Lenz, and R. Urtasun. 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition. 3354–3361.Google Scholar
- M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S. M. Seitz. 2007. Multi-view stereo for community photo collections. In Proceedings of the IEEE 11th International Conference on Computer Vision. 1–8.Google Scholar
- D. B. Goldman, B. Curless, A. Hertzmann, and S. M. Seitz. 2010. Shape and spatially-varying BRDFs from photometric stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 6 (2010), 1060–1071.Google ScholarDigital Library
- Hanry Ham, Julian Wesley, and Hendra Hendra. 2019. Computer vision based 3D reconstruction: A review. International Journal of Electrical and Computer Engineering 9 (Aug. 2019), 2394. https://doi.org/10.11591/ijece.v9i4.pp2394-2402Google ScholarCross Ref
- L. He, G. Wang, and Z. Hu. 2018. Learning depth from single images with deep neural network embedding focal length. IEEE Transactions on Image Processing 27, 9 (2018), 4676–4689.Google ScholarCross Ref
- Jianbo Jiao, Ying Cao, Yibing Song, and Rynson Lau. 2018. Look deeper into depth: Monocular depth estimation with semantic booster and attention-driven loss. In Computer Vision—ECCV 2018. Lecture Notes in Computer Science, Vol. 11219. Springer, 55–71. https://doi.org/10.1007/978-3-030-01267-0_4Google ScholarCross Ref
- Seungryong Kim, Kihong Park, Kwanghoon Sohn, and Stephen Lin. 2016. Unified depth prediction and intrinsic image decomposition from a single image via joint convolutional neural fields. In Computer Vision—ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer, Cham, Switzerland, 143–159.Google ScholarCross Ref
- Tobias Koch, Lukas Liebel, Friedrich Fraundorfer, and Marco Körner. 2018. Evaluation of CNN-based single-image depth estimation methods. In Proceedings of ECCV Workshops. 1–17.Google Scholar
- N. Kong and M. J. Black. 2015. Intrinsic depth: Improving depth transfer with intrinsic images. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). 3514–3522. https://doi.org/10.1109/ICCV.2015.401Google ScholarDigital Library
- I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab. 2016. Deeper depth prediction with fully convolutional residual networks. In Proceedings of the 2016 4th International Conference on 3D Vision (3DV’16). 239–248.Google Scholar
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278–2324.Google ScholarCross Ref
- Jae-Han Lee and Chang-Su Kim. 2019. Monocular depth estimation using relative depth maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 9729–9738.Google ScholarCross Ref
- Louis Lettry, Kenneth Vanhoey, and Luc Van Gool. 2018. Unsupervised deep single-image intrinsic decomposition using illumination-varying image sequences. Computer Graphics Forum 37, 10 (Oct. 2018), 409–419.Google ScholarCross Ref
- F. Liu, C. Shen, G. Lin, and I. Reid. 2016. Learning depth from single monocular images using deep convolutional neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 10 (2016), 2024–2039.Google ScholarDigital Library
- Robert Maier, Kihwan Kim, Daniel Cremers, Jan Kautz, and Matthias Niessner. 2017. Intrinsic3D: High-quality 3D reconstruction by joint appearance and geometry optimization with spatially-varying lighting. In Proceedings of the International Conference on Computer Vision (ICCV’17). 3133–3141. https://doi.org/10.1109/ICCV.2017.338Google ScholarCross Ref
- M. Menze and A. Geiger. 2015. Object scene flow for autonomous vehicles. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3061–3070.Google Scholar
- Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. 2012. Indoor segmentation and support inference from RGBD images. In Computer Vision—ECCV 2012. Lecture Notes in Computer Science, Vol. 7576. Springer, 746–760.Google ScholarDigital Library
- Art B. Owen. 2007. A robust hybrid of lasso and ridge regression. Contemporary Mathematics 1 (2007), 443.Google Scholar
- G. Oxholm and K. Nishino. 2014. Multiview shape and reflectance from natural illumination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 2163–2170.Google Scholar
- Jiao Pan, Liang Li, Hiroshi Yamaguchi, Kyoko Hasegawa, Fadjar I. Thufail, Bramantara, and Satoshi Tanaka. 2019. 3D transparent visualization of relief-type cultural heritage assets based on depth reconstruction of old monocular photos. In Methods and Applications for Modeling and Simulation of Complex Systems, Gary Tan, Axel Lehmann, Yong Meng Teo, and Wentong Cai (Eds.). Springer Singapore, Singapore, 187–198.Google Scholar
- Jiao Pan, Liang Li, Hiroshi Yamaguchi, Kyoko Hasegawa, Fadjar I. Thufail, Bra Mantara, and Satoshi Tanaka. 2018. 3D reconstruction and transparent visualization of indonesian cultural heritage from a single image. In Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage. 207–210. https://doi.org/10.2312/gch.20181363Google Scholar
- Peng Wang, Xiaohui Shen, Zhe Lin, S. Cohen, B. Price, and A. Yuille. 2015. Towards unified depth and semantic prediction from a single image. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2800–2809.Google Scholar
- Yvain Queau, Francois Lauze, and Jean-Denis Durou. 2014. Solving uncalibrated photometric stereo using total variation. Journal of Mathematical Imaging and Vision 52 (May 2014), 87–107. https://doi.org/10.1007/s10851-014-0512-5Google Scholar
- Michael Ramamonjisoa and Vincent Lepetit. 2019. SharpNet: Fast and accurate recovery of occluding contours in monocular depth estimation. In Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCVW’19). 2109–2118. https://doi.org/10.1109/ICCVW.2019.00266Google ScholarCross Ref
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Eds.). Springer International, Cham, Switzerland, 234–241.Google ScholarCross Ref
- A. Saxena, M. Sun, and A. Y. Ng. 2009. Make3D: Learning 3D scene structure from a single still image. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 5 (May 2009), 824–840. https://doi.org/10.1109/TPAMI.2008.132Google ScholarDigital Library
- E. Shelhamer, J. T. Barron, and T. Darrell. 2015. Scene intrinsics and depth from a single image. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW’15). 235–242. https://doi.org/10.1109/ICCVW.2015.39Google ScholarDigital Library
- N. Silberman and R. Fergus. 2011. Indoor scene segmentation using a structured light sensor. In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops’11). 601–608.Google Scholar
- G. Vogiatzis, P. Favaro, and R. Cipolla. 2005. Using frontier points to recover shape, reflectance and illumination. In Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV’05), Vol. 1. 228–235.Google Scholar
- R. White and D. A. Forsyth. 2006. Combining cues: Shape from shading and texture. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), Vol. 2. 1809–1816.Google Scholar
- D. Xu, E. Ricci, W. Ouyang, X. Wang, and N. Sebe. 2019. Monocular depth estimation using multi-scale continuous CRFs as sequential deep networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 6 (2019), 1426–1440.Google ScholarDigital Library
- M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang. 2018. DenseASPP for semantic segmentation in street scenes. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3684–3692.Google Scholar
- W. Yin, Y. Liu, C. Shen, and Y. Yan. 2019. Enforcing geometric constraints of virtual normal for depth prediction. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV’19). 5683–5692.Google Scholar
- Y. Zhang, S. Song, E. Yumer, M. Savva, J. Lee, H. Jin, and T. Funkhouser. 2017. Physically-based rendering for indoor scene understanding using convolutional neural networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5057–5065.Google Scholar
Index Terms
Investigation of Single Image Depth Prediction Under Different Lighting Conditions: A Case Study of Ancient Roman Coins
Recommendations
Spatial image based lighting
SIGGRAPH '07: ACM SIGGRAPH 2007 postersWhen compositing computer generated imagery into photographed scenes, image based lighting (IBL) [Debevec 1998] is used so that synthetic and real objects are consistently illuminated. Traditional IBL takes light captured from a single point in space ...
Recovery of Surface Normals and Reflectance from Different Lighting Conditions
ICIAR '08: Proceedings of the 5th international conference on Image Analysis and RecognitionThis paper presents a method for finding the surface normals and reflectance of an object from a set of images obtained under different lighting conditions. This set of images, assuming a Lambertian object, can be approximated by a three dimensional ...
Physically-Based Editing of Indoor Scene Lighting from a Single Image
Computer Vision – ECCV 2022AbstractWe present a method to edit complex indoor lighting from a single image with its predicted depth and light source segmentation masks. This is an extremely challenging problem that requires modeling complex light transport, and disentangling HDR ...
Comments