Abstract
Despite a big volume of research on action recognition, little attention has been given to individual action recognition in poor-quality spectator crowd scenes. It is an important scenario, because most of the surveillance systems generate poor-quality videos, though current state-of-the-art methods may not be effectively applicable. Therefore recognizing actions performed by individuals in poor-quality spectator crowd scenes is an unsolved problem. In such cases, the main challenge is localizing person proposals for each actor in the crowd. This challenge becomes more difficult when occlusion is severe. In this work, we propose a novel approach to find person proposals in poor-quality spectator crowds using crowd-based constraints. First, we define persons in the crowd by using efficient person head detectors. We exploit person head size to estimate the person bounding box using linear regression. Then, we use distribution of heads in the crowd image to estimate more accurate person proposals. Motion trajectories are independently computed in the video without considering persons and then assigned to each person based on a novel distance measure computed between the trajectory and the person proposal. The set of trajectories and associated motion and texture-based features in overlapped time windows are used to compute the final feature vector. For each time window using early information fusion in the bag of visual-words framework, cumulative feature vectors are computed encoding action information. Experiments are performed on a publicly available real-world spectator crowd dataset containing as many as 150 actors performing multiple actions at the same time. Our experiments have demonstrated excellent performance of the proposed technique.
Similar content being viewed by others
References
Au, S., Gilroy, J., Haslam, R.: Assessing crowd dynamics and spectator safety in seated area at a football stadium. In: Pedestrian and Evacuation Dynamics, pp. 663–674. Springer (2011)
Bassetti, C.: A novel interdisciplinary approach to socio-technical complexity. In: New Frontiers in the Study of Social Phenomena, pp. 117–143. Springer (2016)
Brox, T., Malik, J.: Object segmentation by long term analysis of point trajectories. Comput. Vis.-ECCV 2010, 282–295 (2010)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017)
Conigliaro, D., Ferrario, R., Hudelot, C., Porello, D.: Integrating computer vision algorithms and ontologies for spectator crowd behavior analysis. In: Group and Crowd Behavior for Computer Vision, pp. 297–319. Elsevier (2017)
Conigliaro, D., Rota, P., Setti, F., Bassetti, C., Conci, N., Sebe, N., Cristani, M.: The shock dataset: analyzing crowds at the stadium. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2039–2047 (2015)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Computer Vision–ECCV 2006, pp. 428–441. Springer (2006)
Fani, M., Neher, H., Clausi, D.A., Wong, A., Zelek, J.: Hockey action recognition via integrated stacked hourglass network. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 85–93. IEEE (2017)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Machine Intell. 32(9), 1627–1645 (2010)
Gao, Z., Zhang, H., Liu, A.A., Xu, G., Xue, Y.: Human action recognition on depth dataset. Neural Comput. Appl. 27(7), 2047–2054 (2016)
Gemert, J., Jain, M., Gati, E., Snoek, C.G.: Apt: Action localization proposals from dense trajectories. In: Xie, M.W.J.X., Tam, G.K.L. (eds) Proceedings of the British Machine Vision Conference (BMVC), September 2015. Swansea, UK, September 7–10, 2015. BMVA Press (2015)
Gkioxari, G., Malik, J.: Finding action tubes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 759–768 (2015)
Gkioxari, G., Malik, J.: Finding action tubes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 759–768. IEEE (2015)
Goldstein, J.H.: Sports Violence. Springer, Berlin (2012)
Guilianotti, R.: Football, Violence and Social Identity. Routledge, Abingdon (2013)
Han, D., Li, J., Zeng, Z., Yuan, X., Li, W.: Regframe: fast recognition of simple human actions on a stand-alone mobile device. Neural Comput. Appl. pp. 1–7 (2018)
Hassner, T.: A critical review of action recognition benchmarks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 245–250. IEEE (2013)
Hassner, T., Itcher, Y., Kliper-Gross, O.: Violent flows: Real-time detection of violent crowd behavior. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–6. IEEE (2012)
Hou, R., Chen, C., Shah, M.: Tube convolutional neural network (t-cnn) for action detection in videos. In: IEEE International Conference on Computer Vision (2017)
Hu, P., Ramanan, D.: Finding tiny faces. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1522–1530. IEEE (2017)
Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1971–1980. IEEE (2016)
Idrees, H., Soomro, K., Shah, M.: Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. In: IEEE Transactions on PAMI (2015)
Kalogeiton, V., Weinzaepfel, P., Ferrari, V., Schmid, C.: Action tubelet detector for spatio-temporal action localization. In: ICCV-IEEE International Conference on Computer Vision (2017)
Kennedy, D.: The Spectator and the Spectacle: Audiences in Modernity and Postmodernity. Cambridge University Press, Cambridge (2009)
Kliper-Gross, O., Gurovich, Y., Hassner, T., Wolf, L.: Motion interchange patterns for action recognition in unconstrained videos. In: European Conference on Computer Vision, pp. 256–269. Springer (2012)
Lenk, K.M., Toomey, T.L., Erickson, D.J.: Alcohol-related problems and enforcement at professional sports stadiums. Drugs: Educ. Prev. Policy 16(5), 451–462 (2009)
Li, T., Chang, H., Wang, M., Ni, B., Hong, R., Yan, S.: Crowded scene analysis: a survey. IEEE Trans. Circuits Syst. Video Technol. 25(3), 367–386 (2015)
Lu, J., Xu, R., Corso, J.J.: Human action segmentation with hierarchical supervoxel consistency. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Madensen, T., Eck, J.E.: Spectator violence in stadiums. US Department of Justice, Office of Community Oriented Policing Services (2008)
Mahmood, A., Rajpoot, N.: Action recognition in spectator crowds. In: Qatar Foundation Annual Research Conference Proceedings, vol. 2016, p. ICTPP3076. HBKU Press Qatar (2016)
Manen, S., Guillaumin, M., Van Gool, L.: Prime object proposals with randomized prim’s algorithm. In: The IEEE International Conference on Computer Vision (ICCV) (2013)
Office, H., MP, T.R.H.M.P.: Football-related arrests and banning orders, season 2013 to 2014. In: Online Published (2014)
Oneata, D., Revaud, J., Verbeek, J., Schmid, C.: Spatio-temporal object detection proposals. In: European conference on computer vision, pp. 737–752. Springer (2014)
Peng, X., Schmid, C.: Multi-region two-stream R-CNN for action detection. In: European Conference on Computer Vision, pp. 744–759. Springer (2016)
Press, A.: Major soccer stadium disasters. Wall Street J. (World) (2012)
Rahman, S., See, J., Ho, C.C.: Action recognition in low quality videos by jointly using shape, motion and texture features. In: IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 83–88. IEEE (2015)
Reddy, K.K., Shah, M.: Recognizing 50 human action categories of web videos. Mach. Vis. Appl. 24(5), 971–981 (2013)
Saha, S., Singh, G., Sapienza, M., Torr, P.H., Cuzzolin, F.: Deep learning for detecting multiple space-time action tubes in videos. arXiv preprint arXiv:1608.01529 (2016)
Shaban, M., Mahmood, A., Al-maadeed, S., Rajpoot, N.: Multi-person head segmentation in low resolution crowd scenes using convolutional encoder-decoder framework. In: International Workshop on Representation, analysis and recognition of shape and motion FroM Image data (RFMI) (2017)
Shao, J., Kang, K., Change Loy, C., Wang, X.: Deeply learned attributes for crowded scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4657–4666 (2015)
Shi, J., Tomasi, C.: Good features to track. In: 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 593–600 (1994)
Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.F.: Cdc: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1417–1426. IEEE (2017)
Singh, G., Saha, S., Sapienza, M., Torr, P., Cuzzolin, F.: Online real-time multiple spatiotemporal action localisation and prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3637–3646 (2017)
Siva, P., Xiang, T.: Action detection in crowd. In: BMVC, pp. 1–11 (2010)
Soomro, K., Shah, M.: Unsupervised action discovery and localization in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 696–705 (2017)
Soomro, K., Zamir, A.R., Shah, M.: Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Thomas, G., Gade, R., Moeslund, T.B., Carr, P., Hilton, A.: Computer vision for sports: current applications and research topics. Comput. Vis. Image Understand. 159, 3–18 (2017)
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176. IEEE (2011)
Yu, G., Yuan, J.: Fast action proposals for human action detection and search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Zitouni, M.S., Bhaskar, H., Dias, J., Al-Mualla, M.E.: Advances and trends in visual crowd analysis: a systematic survey and evaluation of crowd modelling techniques. Neurocomputing 186, 139–159 (2016)
Zolfaghari, M., Oliveira, G.L., Sedaghat, N., Brox, T.: Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2923–2932. IEEE (2017)
Acknowledgements
This work was made possible by NPRP Grant number 7-1711-1-312 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mahmood, A., Al-Maadeed, S. Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation. Machine Vision and Applications 30, 1083–1096 (2019). https://doi.org/10.1007/s00138-019-01039-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00138-019-01039-3