Skip to main content

Semantic Synthesis of Pedestrian Locomotion

  • Conference paper
  • First Online:
Computer Vision – ACCV 2020 (ACCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12623))

Included in the following conference series:

Abstract

We present a model for generating 3d articulated pedestrian locomotion in urban scenarios, with synthesis capabilities informed by the 3d scene semantics and geometry. We reformulate pedestrian trajectory forecasting as a structured reinforcement learning (RL) problem. This allows us to naturally combine prior knowledge on collision avoidance, 3d human motion capture and the motion of pedestrians as observed e.g. in Cityscapes, Waymo or simulation environments like Carla. Our proposed RL-based model allows pedestrians to accelerate and slow down to avoid imminent danger (e.g. cars), while obeying human dynamics learnt from in-lab motion capture datasets. Specifically, we propose a hierarchical model consisting of a semantic trajectory policy network that provides a distribution over possible movements, and a human locomotion network that generates 3d human poses in each step. The RL-formulation allows the model to learn even from states that are seldom exhibited in the dataset, utilizing all of the available prior and scene information. Extensive evaluations using both real and simulated data illustrate that the proposed model is on par with recent models such as S-GAN, ST-GAT and S-STGCNN in pedestrian forecasting, while outperforming these in collision avoidance. We also show that our model can be used to plan goal reaching trajectories in urban scenes with dynamic actors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The \(\varvec{x}_t\) are 2d locations in the movement plane.

  2. 2.

    See details of the alternating training in the supplement.

  3. 3.

    This is comparable to a pedestrian being aware of cars in its vicinity.

  4. 4.

    The goal-directed agent additionally includes the direction to the goal at this stage.

  5. 5.

    The previous hidden state is used, as the HLN is executed after the STPN.

  6. 6.

    Each term weighted with the respective weights, \(\lambda _v=1\), \(\lambda _p=0.1\), \(\lambda _s=0.02\), \(\lambda _{k}=0.01\), \(\lambda _{d}=0.01\), \(\lambda _{\phi }=0.001\).

  7. 7.

    We set \(\epsilon =20\sqrt{2}\) cm, i.e. the agent must overlap the goal area.

  8. 8.

    The weights except for \(\lambda _v=2\), \(\lambda _g=1\) are the same. The fraction term of \(R_g\) is weighted by 0.001.

References

  1. Chang, M.F., et al.: Argoverse: 3d tracking and forecasting with rich maps. In: CVPR (2019)

    Google Scholar 

  2. Sun, P., et al.: Scalability in perception for autonomous driving: Waymo open dataset (2019)

    Google Scholar 

  3. Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR (2020)

    Google Scholar 

  4. Behley, J., et al.: SemanticKITTI: a dataset for semantic scene understanding of lidar sequences. In: ICCV (2019)

    Google Scholar 

  5. Huang, X., et al.: The apolloscape dataset for autonomous driving. In: CVPR Workshops (2018)

    Google Scholar 

  6. Kesten, R., et al.: Lyft level 5 AV dataset 2019, vol. 2, p. 5 (2019). https.level5.lyft.com/dataset

  7. Mangalam, K., Adeli, E., Lee, K.H., Gaidon, A., Niebles, J.C.: Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision. In: The IEEE Winter Conference on Applications of Computer Vision, pp. 2784–2793 (2020)

    Google Scholar 

  8. Mínguez, R.Q., Alonso, I.P., Fernández-Llorca, D., Sotelo, M.Á.: Pedestrian path, pose, and intention prediction through Gaussian process dynamical models and pedestrian activity recognition. IEEE Trans. Intell. Transp. Syst. 20, 1803–1814 (2018)

    Article  Google Scholar 

  9. Rasouli, A., Kotseruba, I., Tsotsos, J.K.: Pedestrian action anticipation using contextual feature fusion in stacked RNNs. arXiv preprint arXiv:2005.06582 (2020)

  10. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_34

    Chapter  Google Scholar 

  11. Zanfir, M., Oneata, E., Popa, A.I., Zanfir, A., Sminchisescu, C.: Human synthesis and scene compositing. In: AAAI, pp. 12749–12756 (2020)

    Google Scholar 

  12. Wang, M., et al.: Example-guided style-consistent image synthesis from semantic labeling. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  13. Cheng, S., et al.: Improving 3d object detection through progressive population based augmentation. arXiv preprint arXiv:2004.00831 (2020)

  14. Ho, J., Ermon, S.: Generative adversarial imitation learning. In: NIPS (2016)

    Google Scholar 

  15. Rhinehart, N., Kitani, K.M., Vernaza, P.: R2p2: a reparameterized pushforward policy for diverse, precise generative path forecasting. In: ECCV (2018)

    Google Scholar 

  16. Li, Y.: Which way are you going? Imitative decision learning for path forecasting in dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  17. van der Heiden, T., Nagaraja, N.S., Weiss, C., Gavves, E.: SafeCritic: collision-aware trajectory prediction. In: British Machine Vision Conference Workshop (2019)

    Google Scholar 

  18. Zou, H., Su, H., Song, S., Zhu, J.: Understanding human behaviors in crowds by imitating the decision-making process. ArXiv abs/1801.08391 (2018)

    Google Scholar 

  19. Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: CVPR (2018)

    Google Scholar 

  20. Kosaraju, V., Sadeghian, A., Martín-Martín, R., Reid, I., Rezatofighi, H., Savarese, S.: Social-BiGAT: multimodal trajectory forecasting using bicycle-GAN and graph attention networks. In: NeurIPS (2019)

    Google Scholar 

  21. Zhang, L., She, Q., Guo, P.: Stochastic trajectory prediction with social graph network. CoRR abs/1907.10233 (2019)

    Google Scholar 

  22. Huang, Y., Bi, H., Li, Z., Mao, T., Wang, Z.: STGAT: modeling spatial-temporal interactions for human trajectory prediction. In: The IEEE International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  23. Mohamed, A., Qian, K., Elhoseiny, M., Claudel, C.: Social-STGCNN: a social spatio-temporal graph convolutional neural network for human trajectory prediction. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  24. Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Li, F., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: CVPR (2016)

    Google Scholar 

  25. Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H., Chandraker, M.: Desire: distant future prediction in dynamic scenes with interacting agents. In: CVPR (2017)

    Google Scholar 

  26. Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In: CVPR (2018)

    Google Scholar 

  27. Zhao, T., et al.: Multi-agent tensor fusion for contextual trajectory prediction. In: CVPR (2019)

    Google Scholar 

  28. Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Rezatofighi, H., Savarese, S.: SoPhie: an attentive GAN for predicting paths compliant to social and physical constraints. In: CVPR (2019)

    Google Scholar 

  29. Malla, S., Dariush, B., Choi, C.: Titan: future forecast using action priors. In: CVPR (2020)

    Google Scholar 

  30. Tanke, J., Weber, A., Gall, J.: Human motion anticipation with symbolic label. CoRR abs/1912.06079 (2019)

    Google Scholar 

  31. Liang, J., Jiang, L., Niebles, J.C., Hauptmann, A.G., Fei-Fei, L.: Peeking into the future: predicting future person activities and locations in videos. In: CVPR (2019)

    Google Scholar 

  32. Liang, J., Jiang, L., Murphy, K., Yu, T., Hauptmann, A.: The garden of forking paths: towards multi-future trajectory prediction. In: CVPR (2020)

    Google Scholar 

  33. Liang, J., Jiang, L., Hauptmann, A.: SimAug: learning robust representations from 3d simulation for pedestrian trajectory prediction in unseen cameras. arXiv preprint arXiv:2004.02022 (2020)

  34. Makansi, O., Cicek, O., Buchicchio, K., Brox, T.: Multimodal future localization and emergence prediction for objects in egocentric view with a reachability prior. In: CVPR (2020)

    Google Scholar 

  35. Zhang, Y., Hassan, M., Neumann, H., Black, M.J., Tang, S.: Generating 3d people in scenes without people. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6194–6204 (2020)

    Google Scholar 

  36. Hong, S., Yan, X., Huang, T.S., Lee, H.: Learning hierarchical semantic image manipulation through structured representations. In: Advances in Neural Information Processing Systems, pp. 2708–2718 (2018)

    Google Scholar 

  37. Chien, J.T., Chou, C.J., Chen, D.J., Chen, H.T.: Detecting nonexistent pedestrians. In: CVPR (2017)

    Google Scholar 

  38. Li, X., Liu, S., Kim, K., Wang, X., Yang, M.H., Kautz, J.: Putting humans in a scene: learning affordance in 3d indoor environments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12368–12376 (2019)

    Google Scholar 

  39. Lee, D., Pfister, T., Yang, M.H.: Inserting videos into videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10061–10070 (2019)

    Google Scholar 

  40. Wang, B., Adeli, E., Chiu, H.K., Huang, D.A., Niebles, J.C.: Imitation learning for human pose prediction. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7123–7132 (2019)

    Google Scholar 

  41. Wei, M., Miaomiao, L., Mathieu, S., Hongdong, L.: Learning trajectory dependencies for human motion prediction. In: ICCV (2019)

    Google Scholar 

  42. Du, X., Vasudevan, R., Johnson-Roberson, M.: Bio-LSTM: a biomechanically inspired recurrent neural network for 3-d pedestrian pose and gait prediction. IEEE Robot. Autom. Lett. 4, 1501–1508 (2019)

    Article  Google Scholar 

  43. Cao, Z., Gao, H., Mangalam, K., Cai, Q.-Z., Vo, M., Malik, J.: Long-term human motion prediction with scene context. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 387–404. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_23

    Chapter  Google Scholar 

  44. Adeli, V., Adeli, E., Reid, I., Niebles, J.C., Rezatofighi, H.: Socially and contextually aware human motion and pose forecasting. IEEE Robot. Autom. Lett. 5, 6033–6040 (2020)

    Article  Google Scholar 

  45. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

  46. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229–256 (1992)

    Google Scholar 

  47. Hodgins, J.: CMU graphics lab motion capture database (2015)

    Google Scholar 

  48. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1325–1339 (2013)

    Google Scholar 

  49. Joo, H., et al.: Panoptic studio: a massively multiview system for social motion capture. In: ICCV (2015)

    Google Scholar 

  50. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  51. Holden, D., Komura, T., Saito, J.: Phase-functioned neural networks for character control. ACM Trans. Graph. 36, 42:1–42:13 (2017)

    Google Scholar 

  52. Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015)

  53. Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2–4, 2016 (2016)

    Google Scholar 

  54. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  55. Schöller, C., Aravantinos, V., Lay, F., Knoll, A.: What the constant velocity model can teach us about pedestrian motion prediction. IEEE Robot. Autom. Lett. 5, 1696–1703 (2020)

    Article  Google Scholar 

  56. Chandra, S., Bharti, A.K.: Speed distribution curves for pedestrians during walking and crossing. Procedia-Soc. Behav. Sci. 104, 660–667 (2013)

    Article  Google Scholar 

  57. Everett, M., Chen, Y.F., How, J.P.: Motion planning among dynamic, decision-making agents with deep reinforcement learning. In: IROS (2018)

    Google Scholar 

  58. Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: CoRL (2017)

    Google Scholar 

  59. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)

    Google Scholar 

  60. Nilsson, D., Sminchisescu, C.: Semantic video segmentation by gated recurrent flow propagation. In: CVPR (2018)

    Google Scholar 

  61. Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR (2016)

    Google Scholar 

  62. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: CVPR (2018)

    Google Scholar 

  63. Zhou, B., et al.: Semantic understanding of scenes through the ADE20K dataset. IJCV 127, 302–321 (2018)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the European Research Council Consolidator grant SEED, CNCS-UEFISCDI PN-III-P4-ID-PCE-2016-0535 and PCCF-2016-0180, the EU Horizon 2020 Grant DE-ENIGMA, and the Swedish Foundation for Strategic Research (SSF) Smart Systems Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria Priisalu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 45796 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Priisalu, M., Paduraru, C., Pirinen, A., Sminchisescu, C. (2021). Semantic Synthesis of Pedestrian Locomotion. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69532-3_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69531-6

  • Online ISBN: 978-3-030-69532-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics