Skip to main content

Joint Object Detection and Depth Estimation in Multiplexed Image

  • Conference paper
  • First Online:
Intelligence Science and Big Data Engineering. Visual Data Engineering (IScIDE 2019)

Abstract

This paper presents an object detection method that can simultaneously estimate the positions and depth of the objects from multiplexed images. Multiplexed image [28] is produced by a new type of imaging device that collects the light from different fields of view using a single image sensor, which is originally designed for stereo, 3D reconstruction and broad view generation using computation imaging. Intuitively, multiplexed image is a blended result of the images of multiple views and both of the appearance and disparities of objects are encoded in a single image implicitly, which provides the possibility for reliable object detection and depth/disparity estimation. Motivated by the recent success of CNN based detector, a multi-anchor detector method is proposed, which detects all the views of the same object as a clique and uses the disparity of different views to estimate the depth of the object. The method is interesting in the following aspects: firstly, both locations and depth of the objects can be simultaneously estimated from a single multiplexed image; secondly, there is almost no computation load increase comparing with the popular object detectors; thirdly, even in the blended multiplexed images, the detection and depth estimation results are very competitive. There is no public multiplexed image dataset yet, therefore the evaluation is based on simulated multiplexed image using the stereo images from KITTI, and very encouraging results have been obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. arXiv preprint. arXiv:1712.00726 (2017)

  2. Cao, S., Liu, Y., Lasang, P., Shen, S.: Detecting the objects on the road using modular lightweight network. arXiv preprint. arXiv:1811.06641 (2018)

  3. Chang, J.R., Chen, Y.S.: Pyramid stereo matching network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5410ā€“5418 (2018)

    Google Scholar 

  4. Chen, X., Kundu, K., Zhang, Z., Ma, H., Fidler, S., Urtasun, R.: Monocular 3D object detection for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147ā€“2156 (2016)

    Google Scholar 

  5. Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1907ā€“1915 (2017)

    Google Scholar 

  6. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303ā€“338 (2010)

    Article  Google Scholar 

  7. Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector. arXiv preprint. arXiv:1701.06659 (2017)

  8. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  9. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440ā€“1448 (2015)

    Google Scholar 

  10. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580ā€“587 (2014)

    Google Scholar 

  11. Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6602ā€“6611. IEEE (2017)

    Google Scholar 

  12. He, K., Gkioxari, G., DollĆ”r, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961ā€“2969 (2017)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770ā€“778 (2016)

    Google Scholar 

  14. Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3588ā€“3597 (2018)

    Google Scholar 

  15. Kendall, A., et al.: End-to-end learning of geometry and context for deep stereo regression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 66ā€“75 (2017)

    Google Scholar 

  16. Li, P., Chen, X., Shen, S.: Stereo R-CNN based 3D object detection for autonomous driving. In: CVPR (2019)

    Google Scholar 

  17. Liang, M., Yang, B., Wang, S., Urtasun, R.: Deep continuous fusion for multi-sensor 3D object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 641ā€“656 (2018)

    Chapter  Google Scholar 

  18. Lin, T.Y., Goyal, P., Girshick, R., He, K., DollĆ”r, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980ā€“2988 (2017)

    Google Scholar 

  19. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740ā€“755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  20. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21ā€“37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  21. Mayer, N., et al.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4040ā€“4048 (2016)

    Google Scholar 

  22. Menze, M., Geiger, A.: Object scene flow for autonomous vehicles. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3061ā€“3070 (2015)

    Google Scholar 

  23. Mousavian, A., Anguelov, D., Flynn, J., Kosecka, J.: 3D bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7074ā€“7082 (2017)

    Google Scholar 

  24. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779ā€“788 (2016)

    Google Scholar 

  25. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263ā€“7271 (2017)

    Google Scholar 

  26. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint. arXiv:1804.02767 (2018)

  27. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91ā€“99 (2015)

    Google Scholar 

  28. Shepard, R.H., Rachlin, Y.: Devices and methods for optically multiplexed imaging. US Patent App. 14/668,214 (2018)

    Google Scholar 

  29. Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761ā€“769 (2016)

    Google Scholar 

  30. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint. arXiv:1409.1556 (2014)

  31. Wang, Y., et al.: Anytime stereo image depth estimation on mobile devices. arXiv preprint. arXiv:1810.11408 (2018)

  32. Zbontar, J., LeCun, Y.: Stereo matching by training a convolutional neural network to compare image patches. J. Mach. Learn. Res. 17(1ā€“32), 2 (2016)

    MATH  Google Scholar 

  33. Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. arXiv preprint. arXiv:1903.00621 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yazhou Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, C., Liu, Y. (2019). Joint Object Detection and Depth Estimation in Multiplexed Image. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Visual Data Engineering. IScIDE 2019. Lecture Notes in Computer Science(), vol 11935. Springer, Cham. https://doi.org/10.1007/978-3-030-36189-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-36189-1_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-36188-4

  • Online ISBN: 978-3-030-36189-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics