Skip to main content

RLSTM: A Novel Residual and Recurrent Network for Pedestrian Action Classification

  • Conference paper
  • First Online:
Computer Analysis of Images and Patterns (CAIP 2023)

Abstract

Properly training LSTMs requires long time and extensive amount of data. To improve the training of these models, this paper proposes a novel residual and recurrent neural network, Resnet-LSTM, for spatio-temporal pedestrian action recognition from image sequences. The model includes a novel layer, called MapGrad, whose goal is improving stationarity of the feature map sequences processed by the ConvLSTM. The paper demonstrates the effectiveness of the proposed model and the MapGrad layer in the spatio-temporal classification of pedestrian actions through an ablation study and comparison with state-of-the-art methods. Overall, RLSTM achieves an accuracy value of 88% and an average precision of 94% on the JAAD dataset, which is a widely used benchmark in the field. Finally, the paper empirically analyzes the effect of increasing input sequence length on standing action recognition, showing that the proposed method yields a recall of 93%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Rasouli, A., Kotseruba, I., Tsotsos, J.K.: Are they going to cross? A benchmark dataset and baseline for pedestrian crosswalk behavior. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 206–213 (2017)

    Google Scholar 

  2. Park, S.K., Chung, J.H., Pae, D.S., Lim, M.T.: Binary dense SIFT flow based position-information added two-stream CNN for pedestrian action recognition. Appl. Sci. 12(20), 10445 (2022)

    Article  Google Scholar 

  3. Marginean, A., Brehar, R., Negru, M.: Understanding pedestrian behaviour with pose estimation and recurrent networks. In: 2019 6th International Symposium on Electrical and Electronics Engineering (ISEEE), pp. 1–6. IEEE (2019)

    Google Scholar 

  4. Yang, B., Zhan, W., Wang, P., Chan, C., Cai, Y., Wang, N.: Crossing or not? Context-based recognition of pedestrian crossing intention in the urban environment. IEEE Trans. Intell. Transp. Syst. 23(6), 5338–5349 (2021)

    Article  Google Scholar 

  5. Yang, D., Zhang, H., Yurtsever, E., Redmill, K.A., Özgüner, Ü.: Predicting pedestrian crossing intention with feature fusion and spatio-temporal attention. IEEE Trans. Intell. Veh. 7(2), 221–230 (2022)

    Google Scholar 

  6. Chen, T., Tian, R., Ding, Z.: Visual reasoning using graph convolutional networks for predicting pedestrian crossing intention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3103–3109 (2021)

    Google Scholar 

  7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  8. Shi, X., Chen, Z., Wang, H., Yeung, D. Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems, vol. 28 (2015)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  10. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)

    Google Scholar 

  11. He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, J.: Track R-CNN: multiple object tracking with track-RCNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10838–10847 (2020)

    Google Scholar 

  12. Liu, B., et al.: Spatiotemporal relationship reasoning for pedestrian intent prediction. IEEE Robot. Autom. Lett. 5(2), 3485–3492 (2020)

    Article  Google Scholar 

  13. Guo, D., Mordan, T., Alahi, A.: Pedestrian stop and go forecasting with hybrid feature fusion. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 940–947. IEEE (2022)

    Google Scholar 

  14. Qi, M., Qin, J., Wu, Y., Yang, Y.: Imitative non-autoregressive modeling for trajectory forecasting and imputation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12736–12745 (2020)

    Google Scholar 

  15. Mangalam, K., et al.: It is not the journey but the destination: endpoint conditioned trajectory prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 759–776. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_45

    Chapter  Google Scholar 

  16. Noguchi, C., Tanizawa, T.: Ego-vehicle action recognition based on semi-supervised contrastive learning. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 5988–5998 (2023)

    Google Scholar 

  17. Lian, J., Yu, F., Li, L., Zhou, Y.: Early intention prediction of pedestrians using contextual attention-based LSTM. Multimedia Tools Appl. 82(10), 14713–14729 (2023)

    Article  Google Scholar 

  18. Rasouli, A., Kotseruba, I., Tsotsos, J.K.: Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs (2020)

    Google Scholar 

  19. Cadena, P.R.G., Yang, M., Qian, Y., Wang, C.: Pedestrian graph: pedestrian crossing prediction based on 2D pose estimation and graph convolutional networks. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), pp. 2000–2005. IEEE (2019)

    Google Scholar 

  20. Moreno, E., et al.: Pedestrian crossing intention forecasting at unsignalized intersections using naturalistic trajectories. Sensors 23(5), 2773 (2023)

    Google Scholar 

  21. Yang, C., Pei, Z.: Long-short term spatio-temporal aggregation for trajectory prediction. IEEE Trans. Intell. Transp. Syst. 24(4), 4114–4126 (2023)

    Article  Google Scholar 

  22. https://www.otexts.org/fpp/8/1

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soulayma Gazzeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gazzeh, S., Lo Presti, L., Douik, A., La Cascia, M. (2023). RLSTM: A Novel Residual and Recurrent Network for Pedestrian Action Classification. In: Tsapatsoulis, N., et al. Computer Analysis of Images and Patterns. CAIP 2023. Lecture Notes in Computer Science, vol 14185. Springer, Cham. https://doi.org/10.1007/978-3-031-44240-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44240-7_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44239-1

  • Online ISBN: 978-3-031-44240-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics