Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation

Mahmood, Arif; Al-Maadeed, Somaya

doi:10.1007/s00138-019-01039-3

Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation

Original Paper
Published: 15 June 2019

Volume 30, pages 1083–1096, (2019)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

373 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Despite a big volume of research on action recognition, little attention has been given to individual action recognition in poor-quality spectator crowd scenes. It is an important scenario, because most of the surveillance systems generate poor-quality videos, though current state-of-the-art methods may not be effectively applicable. Therefore recognizing actions performed by individuals in poor-quality spectator crowd scenes is an unsolved problem. In such cases, the main challenge is localizing person proposals for each actor in the crowd. This challenge becomes more difficult when occlusion is severe. In this work, we propose a novel approach to find person proposals in poor-quality spectator crowds using crowd-based constraints. First, we define persons in the crowd by using efficient person head detectors. We exploit person head size to estimate the person bounding box using linear regression. Then, we use distribution of heads in the crowd image to estimate more accurate person proposals. Motion trajectories are independently computed in the video without considering persons and then assigned to each person based on a novel distance measure computed between the trajectory and the person proposal. The set of trajectories and associated motion and texture-based features in overlapped time windows are used to compute the final feature vector. For each time window using early information fusion in the bag of visual-words framework, cumulative feature vectors are computed encoding action information. Experiments are performed on a publicly available real-world spectator crowd dataset containing as many as 150 actors performing multiple actions at the same time. Our experiments have demonstrated excellent performance of the proposed technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Au, S., Gilroy, J., Haslam, R.: Assessing crowd dynamics and spectator safety in seated area at a football stadium. In: Pedestrian and Evacuation Dynamics, pp. 663–674. Springer (2011)
Bassetti, C.: A novel interdisciplinary approach to socio-technical complexity. In: New Frontiers in the Study of Social Phenomena, pp. 117–143. Springer (2016)
Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. Comput. Vis.-ECCV 2010, 282–295 (2010)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017)
Conigliaro, D., Ferrario, R., Hudelot, C., Porello, D.: Integrating computer vision algorithms and ontologies for spectator crowd behavior analysis. In: Group and Crowd Behavior for Computer Vision, pp. 297–319. Elsevier (2017)
Conigliaro, D., Rota, P., Setti, F., Bassetti, C., Conci, N., Sebe, N., Cristani, M.: The shock dataset: analyzing crowds at the stadium. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2039–2047 (2015)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Computer Vision–ECCV 2006, pp. 428–441. Springer (2006)
Fani, M., Neher, H., Clausi, D.A., Wong, A., Zelek, J.: Hockey action recognition via integrated stacked hourglass network. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 85–93. IEEE (2017)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Machine Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Gao, Z., Zhang, H., Liu, A.A., Xu, G., Xue, Y.: Human action recognition on depth dataset. Neural Comput. Appl. 27(7), 2047–2054 (2016)
Article Google Scholar
Gemert, J., Jain, M., Gati, E., Snoek, C.G.: Apt: Action localization proposals from dense trajectories. In: Xie, M.W.J.X., Tam, G.K.L. (eds) Proceedings of the British Machine Vision Conference (BMVC), September 2015. Swansea, UK, September 7–10, 2015. BMVA Press (2015)
Gkioxari, G., Malik, J.: Finding action tubes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 759–768 (2015)
Gkioxari, G., Malik, J.: Finding action tubes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 759–768. IEEE (2015)
Goldstein, J.H.: Sports Violence. Springer, Berlin (2012)
Google Scholar
Guilianotti, R.: Football, Violence and Social Identity. Routledge, Abingdon (2013)
Google Scholar
Han, D., Li, J., Zeng, Z., Yuan, X., Li, W.: Regframe: fast recognition of simple human actions on a stand-alone mobile device. Neural Comput. Appl. pp. 1–7 (2018)
Hassner, T.: A critical review of action recognition benchmarks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 245–250. IEEE (2013)
Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: Real-time detection of violent crowd behavior. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–6. IEEE (2012)
Hou, R., Chen, C., Shah, M.: Tube convolutional neural network (t-cnn) for action detection in videos. In: IEEE International Conference on Computer Vision (2017)
Hu, P., Ramanan, D.: Finding tiny faces. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1522–1530. IEEE (2017)
Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1971–1980. IEEE (2016)
Idrees, H., Soomro, K., Shah, M.: Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. In: IEEE Transactions on PAMI (2015)
Kalogeiton, V., Weinzaepfel, P., Ferrari, V., Schmid, C.: Action tubelet detector for spatio-temporal action localization. In: ICCV-IEEE International Conference on Computer Vision (2017)
Kennedy, D.: The Spectator and the Spectacle: Audiences in Modernity and Postmodernity. Cambridge University Press, Cambridge (2009)
Google Scholar
Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: European Conference on Computer Vision, pp. 256–269. Springer (2012)
Lenk, K.M., Toomey, T.L., Erickson, D.J.: Alcohol-related problems and enforcement at professional sports stadiums. Drugs: Educ. Prev. Policy 16(5), 451–462 (2009)
Google Scholar
Li, T., Chang, H., Wang, M., Ni, B., Hong, R., Yan, S.: Crowded scene analysis: a survey. IEEE Trans. Circuits Syst. Video Technol. 25(3), 367–386 (2015)
Article Google Scholar
Lu, J., Xu, R., Corso, J.J.: Human action segmentation with hierarchical supervoxel consistency. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Madensen, T., Eck, J.E.: Spectator violence in stadiums. US Department of Justice, Office of Community Oriented Policing Services (2008)
Mahmood, A., Rajpoot, N.: Action recognition in spectator crowds. In: Qatar Foundation Annual Research Conference Proceedings, vol. 2016, p. ICTPP3076. HBKU Press Qatar (2016)
Manen, S., Guillaumin, M., Van Gool, L.: Prime object proposals with randomized prim’s algorithm. In: The IEEE International Conference on Computer Vision (ICCV) (2013)
Office, H., MP, T.R.H.M.P.: Football-related arrests and banning orders, season 2013 to 2014. In: Online Published (2014)
Oneata, D., Revaud, J., Verbeek, J., Schmid, C.: Spatio-temporal object detection proposals. In: European conference on computer vision, pp. 737–752. Springer (2014)
Peng, X., Schmid, C.: Multi-region two-stream R-CNN for action detection. In: European Conference on Computer Vision, pp. 744–759. Springer (2016)
Press, A.: Major soccer stadium disasters. Wall Street J. (World) (2012)
Rahman, S., See, J., Ho, C.C.: Action recognition in low quality videos by jointly using shape, motion and texture features. In: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 83–88. IEEE (2015)
Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)
Article Google Scholar
Saha, S., Singh, G., Sapienza, M., Torr, P.H., Cuzzolin, F.: Deep learning for detecting multiple space-time action tubes in videos. arXiv preprint arXiv:1608.01529 (2016)
Shaban, M., Mahmood, A., Al-maadeed, S., Rajpoot, N.: Multi-person head segmentation in low resolution crowd scenes using convolutional encoder-decoder framework. In: International Workshop on Representation, analysis and recognition of shape and motion FroM Image data (RFMI) (2017)
Shao, J., Kang, K., Change Loy, C., Wang, X.: Deeply learned attributes for crowded scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4657–4666 (2015)
Shi, J., Tomasi, C.: Good features to track. In: 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)
Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.F.: Cdc: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1417–1426. IEEE (2017)
Singh, G., Saha, S., Sapienza, M., Torr, P., Cuzzolin, F.: Online real-time multiple spatiotemporal action localisation and prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3637–3646 (2017)
Siva, P., Xiang, T.: Action detection in crowd. In: BMVC, pp. 1–11 (2010)
Soomro, K., Shah, M.: Unsupervised action discovery and localization in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 696–705 (2017)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Thomas, G., Gade, R., Moeslund, T.B., Carr, P., Hilton, A.: Computer vision for sports: current applications and research topics. Comput. Vis. Image Understand. 159, 3–18 (2017)
Article Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)
Yu, G., Yuan, J.: Fast action proposals for human action detection and search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Zitouni, M.S., Bhaskar, H., Dias, J., Al-Mualla, M.E.: Advances and trends in visual crowd analysis: a systematic survey and evaluation of crowd modelling techniques. Neurocomputing 186, 139–159 (2016)
Article Google Scholar
Zolfaghari, M., Oliveira, G.L., Sedaghat, N., Brox, T.: Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2923–2932. IEEE (2017)

Download references

Acknowledgements

This work was made possible by NPRP Grant number 7-1711-1-312 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

Author information

Authors and Affiliations

Department of Computer Science, Information Technology University (ITU), Lahore, Pakistan
Arif Mahmood
Department of Computer Science and Engineering, College of Engineering, Qatar University, Doha, Qatar
Somaya Al-Maadeed

Authors

Arif Mahmood
View author publications
You can also search for this author in PubMed Google Scholar
Somaya Al-Maadeed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arif Mahmood.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mahmood, A., Al-Maadeed, S. Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation. Machine Vision and Applications 30, 1083–1096 (2019). https://doi.org/10.1007/s00138-019-01039-3

Download citation

Received: 02 April 2018
Revised: 27 February 2019
Accepted: 05 June 2019
Published: 15 June 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s00138-019-01039-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Deep Learning Techniques—R-CNN to Mask R-CNN: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation