Skip to main content
Log in

Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Despite a big volume of research on action recognition, little attention has been given to individual action recognition in poor-quality spectator crowd scenes. It is an important scenario, because most of the surveillance systems generate poor-quality videos, though current state-of-the-art methods may not be effectively applicable. Therefore recognizing actions performed by individuals in poor-quality spectator crowd scenes is an unsolved problem. In such cases, the main challenge is localizing person proposals for each actor in the crowd. This challenge becomes more difficult when occlusion is severe. In this work, we propose a novel approach to find person proposals in poor-quality spectator crowds using crowd-based constraints. First, we define persons in the crowd by using efficient person head detectors. We exploit person head size to estimate the person bounding box using linear regression. Then, we use distribution of heads in the crowd image to estimate more accurate person proposals. Motion trajectories are independently computed in the video without considering persons and then assigned to each person based on a novel distance measure computed between the trajectory and the person proposal. The set of trajectories and associated motion and texture-based features in overlapped time windows are used to compute the final feature vector. For each time window using early information fusion in the bag of visual-words framework, cumulative feature vectors are computed encoding action information. Experiments are performed on a publicly available real-world spectator crowd dataset containing as many as 150 actors performing multiple actions at the same time. Our experiments have demonstrated excellent performance of the proposed technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Au, S., Gilroy, J., Haslam, R.: Assessing crowd dynamics and spectator safety in seated area at a football stadium. In: Pedestrian and Evacuation Dynamics, pp. 663–674. Springer (2011)

  2. Bassetti, C.: A novel interdisciplinary approach to socio-technical complexity. In: New Frontiers in the Study of Social Phenomena, pp. 117–143. Springer (2016)

  3. Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. Comput. Vis.-ECCV 2010, 282–295 (2010)

    Google Scholar 

  4. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017)

  5. Conigliaro, D., Ferrario, R., Hudelot, C., Porello, D.: Integrating computer vision algorithms and ontologies for spectator crowd behavior analysis. In: Group and Crowd Behavior for Computer Vision, pp. 297–319. Elsevier (2017)

  6. Conigliaro, D., Rota, P., Setti, F., Bassetti, C., Conci, N., Sebe, N., Cristani, M.: The shock dataset: analyzing crowds at the stadium. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2039–2047 (2015)

  7. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)

  8. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Computer Vision–ECCV 2006, pp. 428–441. Springer (2006)

  9. Fani, M., Neher, H., Clausi, D.A., Wong, A., Zelek, J.: Hockey action recognition via integrated stacked hourglass network. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 85–93. IEEE (2017)

  10. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Machine Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  11. Gao, Z., Zhang, H., Liu, A.A., Xu, G., Xue, Y.: Human action recognition on depth dataset. Neural Comput. Appl. 27(7), 2047–2054 (2016)

    Article  Google Scholar 

  12. Gemert, J., Jain, M., Gati, E., Snoek, C.G.: Apt: Action localization proposals from dense trajectories. In: Xie, M.W.J.X., Tam, G.K.L. (eds) Proceedings of the British Machine Vision Conference (BMVC), September 2015. Swansea, UK, September 7–10, 2015. BMVA Press (2015)

  13. Gkioxari, G., Malik, J.: Finding action tubes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 759–768 (2015)

  14. Gkioxari, G., Malik, J.: Finding action tubes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 759–768. IEEE (2015)

  15. Goldstein, J.H.: Sports Violence. Springer, Berlin (2012)

    Google Scholar 

  16. Guilianotti, R.: Football, Violence and Social Identity. Routledge, Abingdon (2013)

    Google Scholar 

  17. Han, D., Li, J., Zeng, Z., Yuan, X., Li, W.: Regframe: fast recognition of simple human actions on a stand-alone mobile device. Neural Comput. Appl. pp. 1–7 (2018)

  18. Hassner, T.: A critical review of action recognition benchmarks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 245–250. IEEE (2013)

  19. Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: Real-time detection of violent crowd behavior. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–6. IEEE (2012)

  20. Hou, R., Chen, C., Shah, M.: Tube convolutional neural network (t-cnn) for action detection in videos. In: IEEE International Conference on Computer Vision (2017)

  21. Hu, P., Ramanan, D.: Finding tiny faces. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1522–1530. IEEE (2017)

  22. Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1971–1980. IEEE (2016)

  23. Idrees, H., Soomro, K., Shah, M.: Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. In: IEEE Transactions on PAMI (2015)

  24. Kalogeiton, V., Weinzaepfel, P., Ferrari, V., Schmid, C.: Action tubelet detector for spatio-temporal action localization. In: ICCV-IEEE International Conference on Computer Vision (2017)

  25. Kennedy, D.: The Spectator and the Spectacle: Audiences in Modernity and Postmodernity. Cambridge University Press, Cambridge (2009)

    Google Scholar 

  26. Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: European Conference on Computer Vision, pp. 256–269. Springer (2012)

  27. Lenk, K.M., Toomey, T.L., Erickson, D.J.: Alcohol-related problems and enforcement at professional sports stadiums. Drugs: Educ. Prev. Policy 16(5), 451–462 (2009)

    Google Scholar 

  28. Li, T., Chang, H., Wang, M., Ni, B., Hong, R., Yan, S.: Crowded scene analysis: a survey. IEEE Trans. Circuits Syst. Video Technol. 25(3), 367–386 (2015)

    Article  Google Scholar 

  29. Lu, J., Xu, R., Corso, J.J.: Human action segmentation with hierarchical supervoxel consistency. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

  30. Madensen, T., Eck, J.E.: Spectator violence in stadiums. US Department of Justice, Office of Community Oriented Policing Services (2008)

  31. Mahmood, A., Rajpoot, N.: Action recognition in spectator crowds. In: Qatar Foundation Annual Research Conference Proceedings, vol. 2016, p. ICTPP3076. HBKU Press Qatar (2016)

  32. Manen, S., Guillaumin, M., Van Gool, L.: Prime object proposals with randomized prim’s algorithm. In: The IEEE International Conference on Computer Vision (ICCV) (2013)

  33. Office, H., MP, T.R.H.M.P.: Football-related arrests and banning orders, season 2013 to 2014. In: Online Published (2014)

  34. Oneata, D., Revaud, J., Verbeek, J., Schmid, C.: Spatio-temporal object detection proposals. In: European conference on computer vision, pp. 737–752. Springer (2014)

  35. Peng, X., Schmid, C.: Multi-region two-stream R-CNN for action detection. In: European Conference on Computer Vision, pp. 744–759. Springer (2016)

  36. Press, A.: Major soccer stadium disasters. Wall Street J. (World) (2012)

  37. Rahman, S., See, J., Ho, C.C.: Action recognition in low quality videos by jointly using shape, motion and texture features. In: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 83–88. IEEE (2015)

  38. Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)

    Article  Google Scholar 

  39. Saha, S., Singh, G., Sapienza, M., Torr, P.H., Cuzzolin, F.: Deep learning for detecting multiple space-time action tubes in videos. arXiv preprint arXiv:1608.01529 (2016)

  40. Shaban, M., Mahmood, A., Al-maadeed, S., Rajpoot, N.: Multi-person head segmentation in low resolution crowd scenes using convolutional encoder-decoder framework. In: International Workshop on Representation, analysis and recognition of shape and motion FroM Image data (RFMI) (2017)

  41. Shao, J., Kang, K., Change Loy, C., Wang, X.: Deeply learned attributes for crowded scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4657–4666 (2015)

  42. Shi, J., Tomasi, C.: Good features to track. In: 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)

  43. Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.F.: Cdc: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1417–1426. IEEE (2017)

  44. Singh, G., Saha, S., Sapienza, M., Torr, P., Cuzzolin, F.: Online real-time multiple spatiotemporal action localisation and prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3637–3646 (2017)

  45. Siva, P., Xiang, T.: Action detection in crowd. In: BMVC, pp. 1–11 (2010)

  46. Soomro, K., Shah, M.: Unsupervised action discovery and localization in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 696–705 (2017)

  47. Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

  48. Thomas, G., Gade, R., Moeslund, T.B., Carr, P., Hilton, A.: Computer vision for sports: current applications and research topics. Comput. Vis. Image Understand. 159, 3–18 (2017)

    Article  Google Scholar 

  49. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)

  50. Yu, G., Yuan, J.: Fast action proposals for human action detection and search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

  51. Zitouni, M.S., Bhaskar, H., Dias, J., Al-Mualla, M.E.: Advances and trends in visual crowd analysis: a systematic survey and evaluation of crowd modelling techniques. Neurocomputing 186, 139–159 (2016)

    Article  Google Scholar 

  52. Zolfaghari, M., Oliveira, G.L., Sedaghat, N., Brox, T.: Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2923–2932. IEEE (2017)

Download references

Acknowledgements

This work was made possible by NPRP Grant number 7-1711-1-312 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arif Mahmood.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahmood, A., Al-Maadeed, S. Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation. Machine Vision and Applications 30, 1083–1096 (2019). https://doi.org/10.1007/s00138-019-01039-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00138-019-01039-3

Keywords

Navigation