MCCNet: Multi-Color Cascade Network with Weight Transfer for Single Image Depth Prediction on Outdoor Relief Images

Frisky, Aufaclav Zatu Kusuma; Putranto, Andi; Zambanini, Sebastian; Sablatnig, Robert

doi:10.1007/978-3-030-68787-8_19

MCCNet: Multi-Color Cascade Network with Weight Transfer for Single Image Depth Prediction on Outdoor Relief Images

Conference paper
First Online: 21 February 2021

2561 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12667))

Abstract

Single image depth prediction is considerably difficult since depth cannot be estimated from pixel correspondences. Thus, prior knowledge, such as registered pixel and depth information from the user is required. Another problem rises when targeting a specific domain requirement as the number of freely available training datasets is limited. Due to color problem in relief images, we present a new outdoor Registered Relief Depth (RRD) Prambanan dataset, consisting of outdoor images of Prambanan temple relief with registered depth information supervised by archaeologists and computer scientists. In order to solve the problem, we also propose a new depth predictor, called Multi-Color Cascade Network (MCCNet), with weight transfer. Applied on the new RRD Prambanan dataset, our method performs better in different materials than the baseline with 2.53 mm RMSE. In the NYU Depth V2 dataset, our method’s performance is better than the baselines and in line with other state-of-the-art works.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The dataset can be obtained by sending an email to aufaclav@ugm.ac.id or aufaclav@cvl.tuwien.ac.at.

References

Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv e-prints arXiv:1812.11941, December 2018
Antoniou, A., Storkey, A., Edwards, H.: Augmenting image classifiers using data augmentation generative adversarial networks. In: 27th International Conference on Artificial Neural Networks, pp. 594–603, October 2018. https://doi.org/10.1007/978-3-030-01424-758
Bernardes, P., Magalhães, F., Ribeiro, J., Madeira, J., Martins, M.: Image-based 3D modelling in archaeology: application and evaluation. In: 22nd International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, June 2014
Google Scholar
Chakrabarti, A., Shao, J., Shakhnarovich, G.: Depth from a single image by harmonizing overcomplete local network predictions. In: Advances in Neural Information Processing Systems 29, pp. 2658–2666. Curran Associates, Inc. (2016)
Google Scholar
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. CoRR abs/1512.03012 (2015)
Google Scholar
Cubuk, E.D., Zoph, B., Mané, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation policies from data. arXiv e-prints, abs/1805.09501. arXiv:1805.09501, August 2018
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2650–2658, December 2015. https://doi.org/10.1109/ICCV.2015.304
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems 27, pp. 2366–2374. Curran Associates, Inc. (2014)
Google Scholar
Favaro, P., Soatto, S.: A geometric approach to shape from defocus. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 406–417 (2005)
Article Google Scholar
Frisky, A.Z.K., Fajri, A., Brenner, S., Sablatnig, R.: Acquisition evaluation on outdoor scanning for archaeological artifact digitalization. In: Farinella, G.M., Radeva, P., Braz, J. (eds.) Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2020, Volume 5: VISAPP, Valletta, Malta, 27–29 February 2020, pp. 792–799. SCITEPRESS (2020). https://doi.org/10.5220/0008964907920799
Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2018, pp. 2002–2011, June 2018. https://doi.org/10.1109/CVPR.2018.00214
Georgopoulos, A., Ioannidis, C., Valanis, A.: Assessing the performance of a structured light scanner. Int. Arch. Photogram. Remote Sens. Spatial Inf. Sci. XXXVIII, 250–255 (2010)
Google Scholar
Georgopoulos, A., Stathopoulou, E.K.: Data acquisition for 3D geometric recording: state of the art and recent innovations. In: Vincent, M.L., López-Menchero Bendicho, V.M., Ioannides, M., Levy, T.E. (eds.) Heritage and Archaeology in the Digital Age. QMHSS, pp. 1–26. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65370-9_1
Chapter Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth prediction. In: The International Conference on Computer Vision (ICCV), pp. 3828–3838, October 2019
Google Scholar
Gowda, S.N., Yuan, C.: ColorNet: investigating the importance of color spaces for image classification. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11364, pp. 581–596. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_36
Chapter Google Scholar
Han, X., Laga, H., Bennamoun, M.: Image-based 3D object reconstruction: state-of-the-art and trends in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 1–27 (2019)
Google Scholar
Hane, C., Tulsiani, S., Malik, J.: Hierarchical surface prediction for 3D object reconstruction. In: 2017 International Conference on 3D Vision (3DV), pp. 412–420 (2017)
Google Scholar
Johnston, A., Garg, R., Carneiro, G., Reid, I., van den Hengel, A.: Scaling CNNs for high resolution volumetric reconstruction from a single image. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 930–939 (2017)
Google Scholar
Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2144–2158 (2014)
Article Google Scholar
Larbi, K., Ouarda, W., Drira, H., Ben Amor, B., Ben Amar, C.: DeepColorfASD: face anti spoofing solution using a multi channeled color spaces CNN. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 4011–4016, October 2018. https://doi.org/10.1109/SMC.2018.00680
Lee, J., Heo, M., Kim, K., Kim, C.: Single-image depth estimation based on Fourier domain analysis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 330–339, June 2018. https://doi.org/10.1109/CVPR.2018.00042
Lee, J.H., Kim, C.S.: Monocular depth estimation using relative depth maps. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9729–9738, June 2019
Google Scholar
Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5162–5170, June 2015. https://doi.org/10.1109/CVPR.2015.7299152
Loesdau, M., Chabrier, S., Gabillon, A.: Hue and saturation in the RGB color space. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2014. LNCS, vol. 8509, pp. 203–212. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07998-1_23
Chapter Google Scholar
Loh, A.: The recovery of 3-D structure using visual texture patterns. Ph.D. thesis, University of Western Australia (2006)
Google Scholar
Luhmann, T., Stuart, R., Kyle, S., Boehmn, J.: Close-Range photogrammetry and 3D imaging. De Gruyter, Berlin, Germany (2013)
Google Scholar
Mikołajczyk, A., Grochowski, M.: Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary Ph.D. Workshop (IIPhDW), pp. 117–122 (2018)
Google Scholar
Oswald, M.R., Töppe, E., Cremers, D.: Fast and globally optimal single view reconstruction of curved objects. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 534–541 (2012)
Google Scholar
Pan, J., et al.: 3D reconstruction and transparent visualization of Indonesian cultural heritage from a single image. In: Eurographics Workshop on Graphics and Cultural Heritage, pp. 207–210. The Eurographics Association (2018). https://doi.org/10.2312/gch.20181363
Pollefeys, M., Van Gool, L., Vergauwen, M., Cornelis, K., Verbiest, F., Tops, J.: Image-based 3d acquisition of archaeological heritage and applications. In: Proceedings of the 2001 Conference on Virtual Reality, Archeology, and Cultural Heritage, VAST 01, pp. 255–262, January 2001. https://doi.org/10.1145/584993.585033
Zhang, R., Tsai, P.-S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)
Google Scholar
Sablier, M., Garrigues, P.: Cultural heritage and its environment: an issue of interest for environmental science and pollution research. Environ. Sci. Pollution Res. 21(9), 5769–5773 (2014). https://doi.org/10.1007/s11356-013-2458-3
Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009). https://doi.org/10.1109/TPAMI.2008.132
Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54
Chapter Google Scholar
Snodgrass, A.: The Symbolism of the Stupa, 2nd edn. Cornell University Press (1985). http://www.jstor.org/stable/10.7591/j.ctv1nhnhr
Wiles, O., Zisserman, A.: SilNet: single- and multi-view reconstruction by learning from silhouettes. In: British Machine Vision Conference (2017)
Google Scholar
Wu, S., Rupprecht, C., Vedaldi, A.: Unsupervised learning of probably symmetric deformable 3D objects from images in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–10, June 2020
Google Scholar
Xu, H., Jiang, M.: Depth prediction from a single image based on non-parametric learning in the gradient domain. Optik 181, 880–890 (2019). https://doi.org/10.1016/j.ijleo.2018.12.061
Article Google Scholar
Zollhöfer, M., et al.: Shading-based refinement on volumetric signed distance functions. ACM Trans. Graph. 34(4), 96:1–96:14 (2015). https://doi.org/10.1145/2766887

Download references

Acknowledgment

This work is funded by a collaboration scheme between the Ministry of Research and Technology of the Republic of Indonesia and OeAD-GmbH within the Indonesian-Austrian Scholarship Program (IASP). This work is also supported by the Ministry of Education and Culture of the Republic of Indonesia and the Institute for Preservation of Cultural Heritage (BPCB) D.I. Yogyakarta by their permission to take the relief dataset.

Author information

Authors and Affiliations

Computer Vision Lab, Institute of Visual Computing and Human-Centered Technology, Faculty of Informatics, TU Wien, Vienna, Austria
Aufaclav Zatu Kusuma Frisky, Sebastian Zambanini & Robert Sablatnig
Department of Computer Science and Electronics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Yogyakarta, Indonesia
Aufaclav Zatu Kusuma Frisky
Department of Archaeology, Faculty of Cultural Science, Universitas Gadjah Mada, Yogyakarta, Indonesia
Andi Putranto

Authors

Aufaclav Zatu Kusuma Frisky
View author publications
You can also search for this author in PubMed Google Scholar
Andi Putranto
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Zambanini
View author publications
You can also search for this author in PubMed Google Scholar
Robert Sablatnig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aufaclav Zatu Kusuma Frisky .

Editor information

Editors and Affiliations

Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Alberto Del Bimbo
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Rita Cucchiara
Department of Computer Science, Boston University, Boston, MA, USA
Stan Sclaroff
Dipartimento di Matematica e Informatica, University of Catania, Catania, Italy
Giovanni Maria Farinella
Cloud & AI, JD.COM, Beijing, China
Tao Mei
Dipartimento di Ingegneria dell’Informazione, University of Firenze, Firenze, Italy
Marco Bertini
Computational Sciences Department, National Institute of Astrophysics, Optics and Electronics (INAOE), Tonantzintla, Puebla, Mexico
Hugo Jair Escalante
Dipartimento di Ingegneria “Enzo Ferrari”, Università di Modena e Reggio Emilia, Modena, Italy
Roberto Vezzani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Frisky, A.Z.K., Putranto, A., Zambanini, S., Sablatnig, R. (2021). MCCNet: Multi-Color Cascade Network with Weight Transfer for Single Image Depth Prediction on Outdoor Relief Images. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12667. Springer, Cham. https://doi.org/10.1007/978-3-030-68787-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-68787-8_19
Published: 21 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68786-1
Online ISBN: 978-3-030-68787-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)