Skip to main content

Amodal Layout Completion in Complex Outdoor Scenes

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

  • 1639 Accesses

Abstract

A layout is a group of bounding boxes with labels annotating objects in complex scenes. However, manually labelled layouts often annotate only visible parts of objects (modal layout) instead of the whole body including both visible and invisible parts (amodal layout). Modal layouts are caused by occlusion in scenes, while amodal layouts contain more accurate information of objects’ relative positions and sizes. In this paper, we investigate the influence of modal layout on the layout-to-image generation. Specifically, to recover an amodal layout from a modal layout and improve the generation quality, we propose Amodal Layout Completion Network (ALCN) regressing amodal bounding boxes from potential occluded boxes. Following a divide-and-conquer strategy, we divide the modal layout of a scene into occlusion groups of bounding boxes, which are processed by ALCN individually. Furthermore, we propose four challenging IoU variants to measure completion performances for different completion conditions. Experiment results show the ALCN achieves state-of-the-art layout completion performances in most cases and improves the layout-to-image generation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Palmer S.E.: Vision science: photons to phenomenology. MIT Press (1999)

    Google Scholar 

  2. Lehar, S.: Gestalt isomorphism and the quantification of spatial perception. Gestalt Theor. 21, 122–139 (1999)

    Google Scholar 

  3. Zhu, Y., Tian, Y., Metaxas, D., Dollar, P.: Semantic amodal segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1464–1472 (2017)

    Google Scholar 

  4. Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with KINS dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3014–3023 (2019)

    Google Scholar 

  5. Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5356–5364 (2019)

    Google Scholar 

  6. Zhan, X., Pan, X., Dai, B., Liu, Z., Lin, D., Loy, C.C.: Self-supervised scene de-occlusion. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3783–3791 (2020)

    Google Scholar 

  7. Zhao, B., Meng, L., Yin, W., Sigal L.: Image generation from layout. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8584–8593 (2019)

    Google Scholar 

  8. Sun, W., Wu, T.: Image synthesis from reconfigurable layout and style. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10531–10540 (2019)

    Google Scholar 

  9. Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12873–12883 (2021)

    Google Scholar 

  10. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  11. Sun, W., Wu, T.: Learning layout and style reconfigurable GANs for controllable image synthesis. TPAMI, pp. 5070–5087 (2022)

    Google Scholar 

  12. Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  13. Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Amodal completion and size constancy in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 127–135 (2015)

    Google Scholar 

  14. Ehsani, K., Mottaghi, R., Farhadi, A.: Segan: segmenting and generating the invisible. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6144–6453 (2018)

    Google Scholar 

  15. Follmann, P., König, R., Härtinger, P., Klostermann, M., Böttger, T.: Learning to see the invisible: end-to-end trainable amodal instance segmentation. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1328–1336. IEEE (2019)

    Google Scholar 

  16. Ke, L., Tai, Y.-W., Tang, C.-K.: Deep occlusion-aware instance segmentation with overlapping bilayers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4019–4028 (2021)

    Google Scholar 

  17. Yan, X., Wang, F., Liu, W., Yu, Y., He, S., Pan, J.: Visualizing the invisible: occluded vehicle segmentation and recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7618–7627 (2019)

    Google Scholar 

  18. Bowen, R.S., Chang, H., Herrmann, C., Teterwak, P., Liu, C., Zabih, R.: OCONET: image extrapolation by object completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2307–2317 (2021)

    Google Scholar 

  19. Kimia, B.B., Frankel, I., Popescu, A.-M.: Euler spiral for shape completion. Int. J. Comput. Vis. 54(1), 159–182 (2003)

    MATH  Google Scholar 

  20. Lin, H., Wang, Z., Feng, P., Lu, X., Yu, J.: A computational model of topological and geometric recovery for visual curve completion. Comput. Vis. Media 2(4), 329–342 (2016). https://doi.org/10.1007/s41095-016-0055-3

    Article  Google Scholar 

  21. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  22. Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2010)

    Article  Google Scholar 

  23. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)

    Google Scholar 

  24. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Bing, X., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27, 2672–2680 (2014)

    Google Scholar 

  25. Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: International Conference on Machine Learning, pp. 2642–2651. PMLR (2017)

    Google Scholar 

  26. Sun, W., Wu, T.: Deep consensus learning. arXiv preprint arXiv:2103.08475 (2021)

  27. van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. arXiv preprint arXiv:1711.00937 (2017)

  28. Li, Z., Wu, J., Koh, I., Tang, Y., Sun, L.: Image synthesis from layout with locality-aware mask adaption. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13819–13828 (2021)

    Google Scholar 

  29. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon, I., et al. eds, Advances in Neural Information Processing Systems, vol. 30, pp. 6626–6637. Curran Associates Inc. (2017)

    Google Scholar 

  30. Salimans, T., et al.: Improved techniques for training GANs. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., eds, Advances in Neural Information Processing Systems, vol. 29. Curran Associates Inc. (2016)

    Google Scholar 

  31. Qiao, X., Hancke, G.P., Lau, R.W.H.: Learning object context for novel-view scene layout generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16990–16999 (2022)

    Google Scholar 

  32. Liang, L., Lang, C., Li, Z., Zhao, J., Wang, T., Feng, S.: Seeing crucial parts: vehicle model verification via a discriminative representation model, 18(1s), Jan (2022)

    Google Scholar 

Download references

Acknowledgement

This is paper is funded by National Key R &D Program of China (2018AAA0100703), and the National Natural Science Foundation of China (No. 62006208 and No. 62107035).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zejian Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, J., Li, Z., Zhang, S., Sun, L. (2022). Amodal Layout Completion in Complex Outdoor Scenes. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20497-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20496-8

  • Online ISBN: 978-3-031-20497-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics