Abstract
A layout is a group of bounding boxes with labels annotating objects in complex scenes. However, manually labelled layouts often annotate only visible parts of objects (modal layout) instead of the whole body including both visible and invisible parts (amodal layout). Modal layouts are caused by occlusion in scenes, while amodal layouts contain more accurate information of objects’ relative positions and sizes. In this paper, we investigate the influence of modal layout on the layout-to-image generation. Specifically, to recover an amodal layout from a modal layout and improve the generation quality, we propose Amodal Layout Completion Network (ALCN) regressing amodal bounding boxes from potential occluded boxes. Following a divide-and-conquer strategy, we divide the modal layout of a scene into occlusion groups of bounding boxes, which are processed by ALCN individually. Furthermore, we propose four challenging IoU variants to measure completion performances for different completion conditions. Experiment results show the ALCN achieves state-of-the-art layout completion performances in most cases and improves the layout-to-image generation performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Palmer S.E.: Vision science: photons to phenomenology. MIT Press (1999)
Lehar, S.: Gestalt isomorphism and the quantification of spatial perception. Gestalt Theor. 21, 122–139 (1999)
Zhu, Y., Tian, Y., Metaxas, D., Dollar, P.: Semantic amodal segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1464–1472 (2017)
Qi, L., Jiang, L., Liu, S., Shen, X., Jia, J.: Amodal instance segmentation with KINS dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3014–3023 (2019)
Gupta, A., Dollar, P., Girshick, R.: LVIS: a dataset for large vocabulary instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5356–5364 (2019)
Zhan, X., Pan, X., Dai, B., Liu, Z., Lin, D., Loy, C.C.: Self-supervised scene de-occlusion. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3783–3791 (2020)
Zhao, B., Meng, L., Yin, W., Sigal L.: Image generation from layout. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8584–8593 (2019)
Sun, W., Wu, T.: Image synthesis from reconfigurable layout and style. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 10531–10540 (2019)
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12873–12883 (2021)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Sun, W., Wu, T.: Learning layout and style reconfigurable GANs for controllable image synthesis. TPAMI, pp. 5070–5087 (2022)
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Kar, A., Tulsiani, S., Carreira, J., Malik, J.: Amodal completion and size constancy in natural scenes. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 127–135 (2015)
Ehsani, K., Mottaghi, R., Farhadi, A.: Segan: segmenting and generating the invisible. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6144–6453 (2018)
Follmann, P., König, R., Härtinger, P., Klostermann, M., Böttger, T.: Learning to see the invisible: end-to-end trainable amodal instance segmentation. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1328–1336. IEEE (2019)
Ke, L., Tai, Y.-W., Tang, C.-K.: Deep occlusion-aware instance segmentation with overlapping bilayers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4019–4028 (2021)
Yan, X., Wang, F., Liu, W., Yu, Y., He, S., Pan, J.: Visualizing the invisible: occluded vehicle segmentation and recovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7618–7627 (2019)
Bowen, R.S., Chang, H., Herrmann, C., Teterwak, P., Liu, C., Zabih, R.: OCONET: image extrapolation by object completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2307–2317 (2021)
Kimia, B.B., Frankel, I., Popescu, A.-M.: Euler spiral for shape completion. Int. J. Comput. Vis. 54(1), 159–182 (2003)
Lin, H., Wang, Z., Feng, P., Lu, X., Yu, J.: A computational model of topological and geometric recovery for visual curve completion. Comput. Vis. Media 2(4), 329–342 (2016). https://doi.org/10.1007/s41095-016-0055-3
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2010)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the KITTI vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Bing, X., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27, 2672–2680 (2014)
Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: International Conference on Machine Learning, pp. 2642–2651. PMLR (2017)
Sun, W., Wu, T.: Deep consensus learning. arXiv preprint arXiv:2103.08475 (2021)
van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. arXiv preprint arXiv:1711.00937 (2017)
Li, Z., Wu, J., Koh, I., Tang, Y., Sun, L.: Image synthesis from layout with locality-aware mask adaption. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13819–13828 (2021)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon, I., et al. eds, Advances in Neural Information Processing Systems, vol. 30, pp. 6626–6637. Curran Associates Inc. (2017)
Salimans, T., et al.: Improved techniques for training GANs. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., eds, Advances in Neural Information Processing Systems, vol. 29. Curran Associates Inc. (2016)
Qiao, X., Hancke, G.P., Lau, R.W.H.: Learning object context for novel-view scene layout generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16990–16999 (2022)
Liang, L., Lang, C., Li, Z., Zhao, J., Wang, T., Feng, S.: Seeing crucial parts: vehicle model verification via a discriminative representation model, 18(1s), Jan (2022)
Acknowledgement
This is paper is funded by National Key R &D Program of China (2018AAA0100703), and the National Natural Science Foundation of China (No. 62006208 and No. 62107035).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, J., Li, Z., Zhang, S., Sun, L. (2022). Amodal Layout Completion in Complex Outdoor Scenes. In: Fang, L., Povey, D., Zhai, G., Mei, T., Wang, R. (eds) Artificial Intelligence. CICAI 2022. Lecture Notes in Computer Science(), vol 13604. Springer, Cham. https://doi.org/10.1007/978-3-031-20497-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-20497-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20496-8
Online ISBN: 978-3-031-20497-5
eBook Packages: Computer ScienceComputer Science (R0)