Skip to main content

MCCNet: Multi-Color Cascade Network with Weight Transfer for Single Image Depth Prediction on Outdoor Relief Images

  • Conference paper
  • First Online:
  • 2561 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12667))

Abstract

Single image depth prediction is considerably difficult since depth cannot be estimated from pixel correspondences. Thus, prior knowledge, such as registered pixel and depth information from the user is required. Another problem rises when targeting a specific domain requirement as the number of freely available training datasets is limited. Due to color problem in relief images, we present a new outdoor Registered Relief Depth (RRD) Prambanan dataset, consisting of outdoor images of Prambanan temple relief with registered depth information supervised by archaeologists and computer scientists. In order to solve the problem, we also propose a new depth predictor, called Multi-Color Cascade Network (MCCNet), with weight transfer. Applied on the new RRD Prambanan dataset, our method performs better in different materials than the baseline with 2.53 mm RMSE. In the NYU Depth V2 dataset, our method’s performance is better than the baselines and in line with other state-of-the-art works.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The dataset can be obtained by sending an email to aufaclav@ugm.ac.id or aufaclav@cvl.tuwien.ac.at.

References

  1. Alhashim, I., Wonka, P.: High quality monocular depth estimation via transfer learning. arXiv e-prints arXiv:1812.11941, December 2018

  2. Antoniou, A., Storkey, A., Edwards, H.: Augmenting image classifiers using data augmentation generative adversarial networks. In: 27th International Conference on Artificial Neural Networks, pp. 594–603, October 2018. https://doi.org/10.1007/978-3-030-01424-758

  3. Bernardes, P., Magalhães, F., Ribeiro, J., Madeira, J., Martins, M.: Image-based 3D modelling in archaeology: application and evaluation. In: 22nd International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, June 2014

    Google Scholar 

  4. Chakrabarti, A., Shao, J., Shakhnarovich, G.: Depth from a single image by harmonizing overcomplete local network predictions. In: Advances in Neural Information Processing Systems 29, pp. 2658–2666. Curran Associates, Inc. (2016)

    Google Scholar 

  5. Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. CoRR abs/1512.03012 (2015)

    Google Scholar 

  6. Cubuk, E.D., Zoph, B., Mané, D., Vasudevan, V., Le, Q.V.: Autoaugment: learning augmentation policies from data. arXiv e-prints, abs/1805.09501. arXiv:1805.09501, August 2018

  7. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 2650–2658, December 2015. https://doi.org/10.1109/ICCV.2015.304

  8. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems 27, pp. 2366–2374. Curran Associates, Inc. (2014)

    Google Scholar 

  9. Favaro, P., Soatto, S.: A geometric approach to shape from defocus. IEEE Trans. Pattern Anal. Mach. Intell. 27(3), 406–417 (2005)

    Article  Google Scholar 

  10. Frisky, A.Z.K., Fajri, A., Brenner, S., Sablatnig, R.: Acquisition evaluation on outdoor scanning for archaeological artifact digitalization. In: Farinella, G.M., Radeva, P., Braz, J. (eds.) Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2020, Volume 5: VISAPP, Valletta, Malta, 27–29 February 2020, pp. 792–799. SCITEPRESS (2020). https://doi.org/10.5220/0008964907920799

  11. Fu, H., Gong, M., Wang, C., Batmanghelich, K., Tao, D.: Deep ordinal regression network for monocular depth estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2018, pp. 2002–2011, June 2018. https://doi.org/10.1109/CVPR.2018.00214

  12. Georgopoulos, A., Ioannidis, C., Valanis, A.: Assessing the performance of a structured light scanner. Int. Arch. Photogram. Remote Sens. Spatial Inf. Sci. XXXVIII, 250–255 (2010)

    Google Scholar 

  13. Georgopoulos, A., Stathopoulou, E.K.: Data acquisition for 3D geometric recording: state of the art and recent innovations. In: Vincent, M.L., López-Menchero Bendicho, V.M., Ioannides, M., Levy, T.E. (eds.) Heritage and Archaeology in the Digital Age. QMHSS, pp. 1–26. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65370-9_1

    Chapter  Google Scholar 

  14. Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth prediction. In: The International Conference on Computer Vision (ICCV), pp. 3828–3838, October 2019

    Google Scholar 

  15. Gowda, S.N., Yuan, C.: ColorNet: investigating the importance of color spaces for image classification. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11364, pp. 581–596. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20870-7_36

    Chapter  Google Scholar 

  16. Han, X., Laga, H., Bennamoun, M.: Image-based 3D object reconstruction: state-of-the-art and trends in the deep learning era. IEEE Trans. Pattern Anal. Mach. Intell. 1–27 (2019)

    Google Scholar 

  17. Hane, C., Tulsiani, S., Malik, J.: Hierarchical surface prediction for 3D object reconstruction. In: 2017 International Conference on 3D Vision (3DV), pp. 412–420 (2017)

    Google Scholar 

  18. Johnston, A., Garg, R., Carneiro, G., Reid, I., van den Hengel, A.: Scaling CNNs for high resolution volumetric reconstruction from a single image. In: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 930–939 (2017)

    Google Scholar 

  19. Karsch, K., Liu, C., Kang, S.B.: Depth transfer: depth extraction from video using non-parametric sampling. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2144–2158 (2014)

    Article  Google Scholar 

  20. Larbi, K., Ouarda, W., Drira, H., Ben Amor, B., Ben Amar, C.: DeepColorfASD: face anti spoofing solution using a multi channeled color spaces CNN. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 4011–4016, October 2018. https://doi.org/10.1109/SMC.2018.00680

  21. Lee, J., Heo, M., Kim, K., Kim, C.: Single-image depth estimation based on Fourier domain analysis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 330–339, June 2018. https://doi.org/10.1109/CVPR.2018.00042

  22. Lee, J.H., Kim, C.S.: Monocular depth estimation using relative depth maps. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9729–9738, June 2019

    Google Scholar 

  23. Liu, F., Shen, C., Lin, G.: Deep convolutional neural fields for depth estimation from a single image. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5162–5170, June 2015. https://doi.org/10.1109/CVPR.2015.7299152

  24. Loesdau, M., Chabrier, S., Gabillon, A.: Hue and saturation in the RGB color space. In: Elmoataz, A., Lezoray, O., Nouboud, F., Mammass, D. (eds.) ICISP 2014. LNCS, vol. 8509, pp. 203–212. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07998-1_23

    Chapter  Google Scholar 

  25. Loh, A.: The recovery of 3-D structure using visual texture patterns. Ph.D. thesis, University of Western Australia (2006)

    Google Scholar 

  26. Luhmann, T., Stuart, R., Kyle, S., Boehmn, J.: Close-Range photogrammetry and 3D imaging. De Gruyter, Berlin, Germany (2013)

    Google Scholar 

  27. Mikołajczyk, A., Grochowski, M.: Data augmentation for improving deep learning in image classification problem. In: 2018 International Interdisciplinary Ph.D. Workshop (IIPhDW), pp. 117–122 (2018)

    Google Scholar 

  28. Oswald, M.R., Töppe, E., Cremers, D.: Fast and globally optimal single view reconstruction of curved objects. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 534–541 (2012)

    Google Scholar 

  29. Pan, J., et al.: 3D reconstruction and transparent visualization of Indonesian cultural heritage from a single image. In: Eurographics Workshop on Graphics and Cultural Heritage, pp. 207–210. The Eurographics Association (2018). https://doi.org/10.2312/gch.20181363

  30. Pollefeys, M., Van Gool, L., Vergauwen, M., Cornelis, K., Verbiest, F., Tops, J.: Image-based 3d acquisition of archaeological heritage and applications. In: Proceedings of the 2001 Conference on Virtual Reality, Archeology, and Cultural Heritage, VAST 01, pp. 255–262, January 2001. https://doi.org/10.1145/584993.585033

  31. Zhang, R., Tsai, P.-S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)

    Google Scholar 

  32. Sablier, M., Garrigues, P.: Cultural heritage and its environment: an issue of interest for environmental science and pollution research. Environ. Sci. Pollution Res. 21(9), 5769–5773 (2014). https://doi.org/10.1007/s11356-013-2458-3

  33. Saxena, A., Sun, M., Ng, A.Y.: Make3D: learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 824–840 (2009). https://doi.org/10.1109/TPAMI.2008.132

  34. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54

    Chapter  Google Scholar 

  35. Snodgrass, A.: The Symbolism of the Stupa, 2nd edn. Cornell University Press (1985). http://www.jstor.org/stable/10.7591/j.ctv1nhnhr

  36. Wiles, O., Zisserman, A.: SilNet: single- and multi-view reconstruction by learning from silhouettes. In: British Machine Vision Conference (2017)

    Google Scholar 

  37. Wu, S., Rupprecht, C., Vedaldi, A.: Unsupervised learning of probably symmetric deformable 3D objects from images in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–10, June 2020

    Google Scholar 

  38. Xu, H., Jiang, M.: Depth prediction from a single image based on non-parametric learning in the gradient domain. Optik 181, 880–890 (2019). https://doi.org/10.1016/j.ijleo.2018.12.061

    Article  Google Scholar 

  39. Zollhöfer, M., et al.: Shading-based refinement on volumetric signed distance functions. ACM Trans. Graph. 34(4), 96:1–96:14 (2015). https://doi.org/10.1145/2766887

Download references

Acknowledgment

This work is funded by a collaboration scheme between the Ministry of Research and Technology of the Republic of Indonesia and OeAD-GmbH within the Indonesian-Austrian Scholarship Program (IASP). This work is also supported by the Ministry of Education and Culture of the Republic of Indonesia and the Institute for Preservation of Cultural Heritage (BPCB) D.I. Yogyakarta by their permission to take the relief dataset.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aufaclav Zatu Kusuma Frisky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Frisky, A.Z.K., Putranto, A., Zambanini, S., Sablatnig, R. (2021). MCCNet: Multi-Color Cascade Network with Weight Transfer for Single Image Depth Prediction on Outdoor Relief Images. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12667. Springer, Cham. https://doi.org/10.1007/978-3-030-68787-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68787-8_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68786-1

  • Online ISBN: 978-3-030-68787-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics