Deep Learning for Automated Occlusion Edge Detection in RGB-D Frames

Sarkar, Soumik; Venugopalan, Vivek; Reddy, Kishore; Ryde, Julian; Jaitly, Navdeep; Giering, Michael

doi:10.1007/s11265-016-1209-3

Deep Learning for Automated Occlusion Edge Detection in RGB-D Frames

Published: 08 December 2016

Volume 88, pages 205–217, (2017)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Soumik Sarkar¹^nAff4,
Vivek Venugopalan¹,
Kishore Reddy¹,
Julian Ryde²,
Navdeep Jaitly³^nAff5 &
…
Michael Giering¹

1410 Accesses
24 Citations
2 Altmetric
Explore all metrics

Abstract

Occlusion edges correspond to range discontinuity in a scene from the point of view of the observer. Detection of occlusion edges is an important prerequisite for many machine vision and mobile robotic tasks. Although they can be extracted from range data, extracting them from images and videos would be extremely beneficial. We trained a deep convolutional neural network (CNN) to identify occlusion edges in images and videos with just RGB, RGB-D and RGB-D-UV inputs, where D stands for depth and UV stands for horizontal and vertical components of the optical flow field respectively. The use of CNN avoids hand-crafting of features for automatically isolating occlusion edges and distinguishing them from appearance edges. Other than quantitative occlusion edge detection results, qualitative results are provided to evaluate input data requirements and to demonstrate the trade-off between high resolution analysis and frame-level computation time that is critical for real-time robotics applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Object Recognition Under Partial Occlusions Using an RGB-D Camera

Depth Estimation Based on Optical Flow and Depth Prediction

Learning Depth from Monocular Sequence with Convolutional LSTM Network

References

Jacobson, N., Freund, Y., & Nguyen, T. Q. (2012). An online learning approach to occlusion boundary detection. IEEE Transactions on Image Processing, 21(1), 252–261.
Article MathSciNet Google Scholar
Ayvaci, A, & Soatto, S. (2011). Detachable object detection with efficient model selection. In Energy Minimization Methods in Computer Vision and Pattern Recognition (pp. 191–204): Springer.
Sargin, M. E., Bertelli, L., Manjunath, B. S., & Rose, K. (2009). Probabilistic occlusion boundary detection on spatio-temporal lattices. In 2009 IEEE 12th International Conference on Computer Vision, (pp. 560–567).
Marshall, J. A., Burbeck, C. A., Ariely, D., Rolland, J. P., & Martin, K. E. (1996). Occlusion edge blur: a cue to relative visual depth. JOSA A, 13(4), 681–688.
Article Google Scholar
Stein, A. N., & Hebert, M. (2009). Occlusion boundaries from motion: Low-level detection and mid-level reasoning. International journal of computer vision, 82(3), 325–357.
Article Google Scholar
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R. (2012). A century of gestalt psychology in visual perception: i. perceptual grouping and figure–ground organization. Psychological Bulletin, 138(6), 1172.
Article Google Scholar
Sundberg, P., Brox, T., Maire, M., Arbeláez, P., & Malik, J. (2011). Occlusion boundary detection and figure/ground assignment from optical flow. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 2233–2240).
Smith, P., Drummond, T., & Cipolla, R. (2004). Layered motion segmentation and depth ordering by tracking edges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(4), 479–494.
Article Google Scholar
Pathak, K., Birk, A., Vaskevicius, N., Pfingsthorn, M., Schwertfeger, S., & Poppinga, J. (2010). Online 3D SLAM by registration of large planar surface segments and closed form pose-graph relaxation. Journal of Field Robotics: Special Issue on 3D, Mapping, 27(1), 52–84.
Article Google Scholar
Gil, A., Mozos, O. M., Ballesta, M., & Reinoso, O. (2010). A comparative evaluation of interest point detectors and local descriptors for visual slam. Machine Vision and Applications, 21(6), 905–920.
Article Google Scholar
Tian, Y., Guan, T., & Wang, C. (2010). Real-time occlusion handling in augmented reality based on an object tracking approach. Sensors, 10(4), 2885.
Article Google Scholar
Fukiage, T., Oishi, T., & Ikeuchi, K. (2012). Reduction of contradictory partial occlusion in mixed reality by using characteristics of transparency perception. In Proceedings of the 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), (pp.129–139). Washington, DC, USA: IEEE Computer Society.
Blasch, E., & Aved, A. (2015). Dynamic data-driven application system (DDDAS) for video surveillance user support. Procedia Computer Science, 51, 2503–2517.
Article Google Scholar
Uzkent, B., Hoffman, M. J., Vodacek, A., & Kerekes, J. P. (2013). Feature matching and adaptive prediction models in an object tracking DDDAS. Procedia Computer Science, 18, 1939–1948.
Article Google Scholar
Bengio, Y, & Olivier, D. (2011). On the expressive power of deep architectures, Algorithmic Learning Theory, Springer, Berlin/Heidelberg.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks, in NIPS.
Hinton, G. E., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., & Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6).
Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P. A. (2008). Extracting and composing robust features with denoising autoencoders, in ICML.
Salakhutdinov, R., & Hinton, G. E. (2009). Semantic hashing. International Journal of Approximate Reasoning, 50, 969–978.
Article Google Scholar
Srivastava, N., & Salakhutdinov, R. (2014). Multimodal learning with deep boltzmann machines. Journal of Machine Learning Research, 15, 2949–2980.
MathSciNet MATH Google Scholar
Roux, N. L., & Bengio, Y. (2008). Representational power of restricted boltzmann machines and deep belief networks. Neural Computation, 6, 1631–1649.
Article MathSciNet MATH Google Scholar
Hinton, G., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313.5786, 504–507.
Article MathSciNet MATH Google Scholar
Kavukcuoglu, K., Sermanet, Y. L., Boureau, P., Gregor, K., Mathieu, M., & LeCun, Y. (2010). Learning convolutional feature hierachies for visual recognition, in NIPS.
Lore, K. G., Akintayo, A., & Sarkar, S. (2017). Llnet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition, 61, 650–662.
Article Google Scholar
Mason, J., Ricco, S., & Parr, R. (2011). Textured occupancy grids for monocular localization without features. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 9-13.
Mei, X., Ling, H., Wu, Y., & Blasch, E. P. (2013). Efficient minimum error bounded particle resampling l1 tracker with occlusion detection. IEEE Transactions on Image Processing, 22, 2661–2675.
Article MathSciNet Google Scholar
Ordez, F. J., & Roggen, D. (2016). Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition, Ed. Yun Liu et al. Sensors (Basel, Switzerland).
Giering, M., Venugopalan, V., & Reddy, K. (2015). Multi-modal sensor registration for vehicle perception via deep neural networks. In High Performance Extreme Computing Conference (HPEC), 2015 (pp. 1–6): IEEE.
Chen, X., Xiang, S., Liu, C.-L., & Pan, C.-H. (2013). Vehicle detection in satellite images by parallel deep convolutional neural networks. In Proceedings of the 2013 2 ^nd IAPR Asian Conference on Pattern Recognition, ACPR 13 (pp. 181–185). Washington, DC, USA: IEEE Computer Society.
Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of rgb-d slam systems. In Proceedings of the International Conference on Intelligent Robot Systems (IROS).
Rusu, R.B., & Cousins, S. (2011). 3D is here: Point cloud library (pcl). In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), (Shanghai, China).
Sipiran, I., & Bustos, B. (2011). Harris 3D: a robust extension of the harris operator for interest point detection on 3D meshes. The Visual Computer, 27(11), 963–976.
Article Google Scholar
Couprie, C., Farabet, C., Najman, L., & LeCun, Y. (2013). Indoor semantic segmentation using depth[[33]] information. In ICLR.
Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network, NIPS.
Yu, S. X., Gross, R., & Shi, J. (2002). Concurrent object recognition and segmentation by graph partitioning. In NIPS.
Kontschieder, P., Bulo, S. R., Criminisi, A., Kohli, P., Pelillo, M., & Bischof, H. (2012). Context-sensitive decision forests for object detection. In NIPS.
Liu, C. (2009). Beyond pixels: Exploring new representations and applications for motion analysis, Doctoral Thesis. Massachusetts Institute of Technology.
Boaventura, G., & Gonzaga, A. (2007). Method to evaluate the performance of edge detector. International Conference on Intelligent Systems Design and Applications, pp. 341–346.
Xie, S., & Tu, Z. (2015). Holistically-nested edge detection. Proceedings of the IEEE International Conference on Computer Vision, 1395–1403.
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding, arXiv preprint arXiv:1408.5093.

Download references

Author information

Soumik Sarkar
Present address: Iowa State University, Ames, IA, USA
Navdeep Jaitly
Present address: Google Inc., Mountain View, CA, USA

Authors and Affiliations

Decision Support & Machine Intelligence, United Technologies Research Center, East Hartford, CT, 06108, USA
Soumik Sarkar, Vivek Venugopalan, Kishore Reddy & Michael Giering
Embedded Systems, United Technologies Research Center, Berkeley, CA, USA
Julian Ryde
Department of Computer Science, University of Toronto, Ontario, Canada
Navdeep Jaitly

Authors

Soumik Sarkar
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Venugopalan
View author publications
You can also search for this author in PubMed Google Scholar
Kishore Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Julian Ryde
View author publications
You can also search for this author in PubMed Google Scholar
Navdeep Jaitly
View author publications
You can also search for this author in PubMed Google Scholar
Michael Giering
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Soumik Sarkar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sarkar, S., Venugopalan, V., Reddy, K. et al. Deep Learning for Automated Occlusion Edge Detection in RGB-D Frames. J Sign Process Syst 88, 205–217 (2017). https://doi.org/10.1007/s11265-016-1209-3

Download citation

Received: 02 September 2015
Revised: 03 November 2016
Accepted: 20 November 2016
Published: 08 December 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s11265-016-1209-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning for Automated Occlusion Edge Detection in RGB-D Frames

Abstract

Access this article

Similar content being viewed by others

Robust Object Recognition Under Partial Occlusions Using an RGB-D Camera

Depth Estimation Based on Optical Flow and Depth Prediction

Learning Depth from Monocular Sequence with Convolutional LSTM Network

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep Learning for Automated Occlusion Edge Detection in RGB-D Frames

Abstract

Access this article

Similar content being viewed by others

Robust Object Recognition Under Partial Occlusions Using an RGB-D Camera

Depth Estimation Based on Optical Flow and Depth Prediction

Learning Depth from Monocular Sequence with Convolutional LSTM Network

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation