Amodal Layout Completion in Complex Outdoor Scenes

Wu, Jingyu; Li, Zejian; Zhang, Shengyuan; Sun, Lingyun

doi:10.1007/978-3-031-20497-5_3

Jingyu Wu¹²,
Zejian Li¹²,
Shengyuan Zhang¹² &
…
Lingyun Sun^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13604))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

1639 Accesses

Abstract

A layout is a group of bounding boxes with labels annotating objects in complex scenes. However, manually labelled layouts often annotate only visible parts of objects (modal layout) instead of the whole body including both visible and invisible parts (amodal layout). Modal layouts are caused by occlusion in scenes, while amodal layouts contain more accurate information of objects’ relative positions and sizes. In this paper, we investigate the influence of modal layout on the layout-to-image generation. Specifically, to recover an amodal layout from a modal layout and improve the generation quality, we propose Amodal Layout Completion Network (ALCN) regressing amodal bounding boxes from potential occluded boxes. Following a divide-and-conquer strategy, we divide the modal layout of a scene into occlusion groups of bounding boxes, which are processed by ALCN individually. Furthermore, we propose four challenging IoU variants to measure completion performances for different completion conditions. Experiment results show the ALCN achieves state-of-the-art layout completion performances in most cases and improves the layout-to-image generation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Joint 3D Layout and Depth Prediction from a Single Indoor Panorama Image

Manhattan Room Layout Reconstruction from a Single $360^{\circ }$ Image: A Comparative Study of State-of-the-Art Methods

Article 09 February 2021

PolyRoom: Room-Aware Transformer for Floorplan Reconstruction

References

Palmer S.E.: Vision science: photons to phenomenology. MIT Press (1999)
Google Scholar
Lehar, S.: Gestalt isomorphism and the quantification of spatial perception. Gestalt Theor. 21, 122–139 (1999)
Google Scholar
Zhu, Y., Tian, Y., Metaxas, D., Dollar, P.: Semantic amodal segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1464–1472 (2017)
Google Scholar
Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with KINS dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3014–3023 (2019)
Google Scholar
Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5356–5364 (2019)
Google Scholar
Zhan, X., Pan, X., Dai, B., Liu, Z., Lin, D., Loy, C.C.: Self-supervised scene de-occlusion. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3783–3791 (2020)
Google Scholar
Zhao, B., Meng, L., Yin, W., Sigal L.: Image generation from layout. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8584–8593 (2019)
Google Scholar
Sun, W., Wu, T.: Image synthesis from reconfigurable layout and style. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10531–10540 (2019)
Google Scholar
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12873–12883 (2021)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Sun, W., Wu, T.: Learning layout and style reconfigurable GANs for controllable image synthesis. TPAMI, pp. 5070–5087 (2022)
Google Scholar
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Amodal completion and size constancy in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 127–135 (2015)
Google Scholar
Ehsani, K., Mottaghi, R., Farhadi, A.: Segan: segmenting and generating the invisible. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6144–6453 (2018)
Google Scholar
Follmann, P., König, R., Härtinger, P., Klostermann, M., Böttger, T.: Learning to see the invisible: end-to-end trainable amodal instance segmentation. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1328–1336. IEEE (2019)
Google Scholar
Ke, L., Tai, Y.-W., Tang, C.-K.: Deep occlusion-aware instance segmentation with overlapping bilayers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4019–4028 (2021)
Google Scholar
Yan, X., Wang, F., Liu, W., Yu, Y., He, S., Pan, J.: Visualizing the invisible: occluded vehicle segmentation and recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7618–7627 (2019)
Google Scholar
Bowen, R.S., Chang, H., Herrmann, C., Teterwak, P., Liu, C., Zabih, R.: OCONET: image extrapolation by object completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2307–2317 (2021)
Google Scholar
Kimia, B.B., Frankel, I., Popescu, A.-M.: Euler spiral for shape completion. Int. J. Comput. Vis. 54(1), 159–182 (2003)
MATH Google Scholar
Lin, H., Wang, Z., Feng, P., Lu, X., Yu, J.: A computational model of topological and geometric recovery for visual curve completion. Comput. Vis. Media 2(4), 329–342 (2016). https://doi.org/10.1007/s41095-016-0055-3
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2010)
Article Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Bing, X., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27, 2672–2680 (2014)
Google Scholar
Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: International Conference on Machine Learning, pp. 2642–2651. PMLR (2017)
Google Scholar
Sun, W., Wu, T.: Deep consensus learning. arXiv preprint arXiv:2103.08475 (2021)
van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. arXiv preprint arXiv:1711.00937 (2017)
Li, Z., Wu, J., Koh, I., Tang, Y., Sun, L.: Image synthesis from layout with locality-aware mask adaption. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13819–13828 (2021)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon, I., et al. eds, Advances in Neural Information Processing Systems, vol. 30, pp. 6626–6637. Curran Associates Inc. (2017)
Google Scholar
Salimans, T., et al.: Improved techniques for training GANs. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., eds, Advances in Neural Information Processing Systems, vol. 29. Curran Associates Inc. (2016)
Google Scholar
Qiao, X., Hancke, G.P., Lau, R.W.H.: Learning object context for novel-view scene layout generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16990–16999 (2022)
Google Scholar
Liang, L., Lang, C., Li, Z., Zhao, J., Wang, T., Feng, S.: Seeing crucial parts: vehicle model verification via a discriminative representation model, 18(1s), Jan (2022)
Google Scholar

Download references

Acknowledgement

This is paper is funded by National Key R &D Program of China (2018AAA0100703), and the National Natural Science Foundation of China (No. 62006208 and No. 62107035).

Author information

Authors and Affiliations

Alibaba-Zhejiang University Joint Institute of Frontier Technologies, Zhejiang University, Hangzhou, 310027, China
Jingyu Wu, Zejian Li, Shengyuan Zhang & Lingyun Sun
Singapore Innovation and AI Joint Research Lab, Zhejiang, China
Lingyun Sun

Authors

Jingyu Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zejian Li
View author publications
You can also search for this author in PubMed Google Scholar
Shengyuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lingyun Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zejian Li .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Xiaomi Inc., Beijing, China
Daniel Povey
Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
JD Explore Academy, Beijing, China
Tao Mei
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, J., Li, Z., Zhang, S., Sun, L. (2022). Amodal Layout Completion in Complex Outdoor Scenes. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-20497-5_3
Published: 17 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Amodal Layout Completion in Complex Outdoor Scenes

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Joint 3D Layout and Depth Prediction from a Single Indoor Panorama Image

Manhattan Room Layout Reconstruction from a Single \(360^{\circ }\) Image: A Comparative Study of State-of-the-Art Methods

PolyRoom: Room-Aware Transformer for Floorplan Reconstruction

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Amodal Layout Completion in Complex Outdoor Scenes

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Joint 3D Layout and Depth Prediction from a Single Indoor Panorama Image

Manhattan Room Layout Reconstruction from a Single \(360^{\circ }\) Image: A Comparative Study of State-of-the-Art Methods

PolyRoom: Room-Aware Transformer for Floorplan Reconstruction

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation