Actions Recognition in Crowd Based on Coarse-to-Fine Multi-object Tracking

Gong, Sixue; Han, Hu; Shan, Shiguang; Chen, Xilin

doi:10.1007/978-3-319-54526-4_35

Sixue Gong¹⁶,
Hu Han¹⁶,
Shiguang Shan¹⁶ &
…
Xilin Chen¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10118))

Included in the following conference series:

Asian Conference on Computer Vision

3064 Accesses

Abstract

Action recognition has wide applications from video surveillance, scene understanding to forensic investigation. While recent methods typically focus on a single action recognition from video clips, we investigate the problem of action recognition in crowd, which better replicates real video surveillance scenarios. We propose to perform actions recognition in crowd based on an efficient coarse-to-fine multi-object tracking algorithm. With Faster R-CNN as our human detector, we utilize a coarse-to-fine strategy for multi-object tracking in crowd, consisting of multi-object fast tracking and per-object fine tracking. The tracking results are used to extract the action cuboids, and spatial-temporal features are computed for action classification. We evaluate the proposed approach on a self-collected actions-in-crowd dataset, and two public domain databases (CMU and and MOT2015). The results show the effectiveness of the proposed approach for multi-action recognition in crowd.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://motchallenge.net/results/2D_MOT_2015/.

References

Siva, P., Xiang, T.: Action detection in crowd. In: BMVC, pp. 1–11 (2010)
Google Scholar
Luo, Y., Cheong, L.F., Tran, A.: Actionness-assisted recognition of actions. In: ICCV, pp. 3244–3252 (2015)
Google Scholar
Li, Y., Ye, J., Wang, T., Huang, S.: Augmenting bag-of-words: a robust contextual representation of spatiotemporal interest points for action recognition. Visual Comput. 31, 1383–1394 (2015)
Article Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Google Scholar
Gkioxari, G., Girshick, R., Malik, J.: Contextual action recognition with R*CNN. In: ICCV, pp. 1080–1088 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Fu, Z., Han, Y.: Centroid weighted Kalman filter for visual object tracking. Measurement 45, 650–655 (2012)
Article Google Scholar
Efros, A.A., Berg, A.C., G.M., Malik, J.: Recognizing action at a distance. In: ICCV, pp. 726–733 (2003)
Google Scholar
Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., Huang, T.S.: Action detection in complex scenes with spatial and temporal ambiguities. In: ICCV, pp. 128–135 (2009)
Google Scholar
Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR, pp. 2555–2562 (2013)
Google Scholar
Ryoo, M.S., Matthies, L.: First-person activity recognition: what are they NG to me? In: CVPR, pp. 2730–2737 (2013)
Google Scholar
Zhou, S., Shen, W., Zeng, D., Zhang, Z.: Unusual event detection in crowded scenes by trajectory analysis. In: ICASSP, pp. 1300–1304 (2015)
Google Scholar
Zhu, Y., Nayak, N.M., Roy-Chowdhury, A.K.: Context-aware modeling and recognition of activities in video. In: CVPR, pp. 2491–2498 (2013)
Google Scholar
Li, W., Wen, L., Choo Chuah, M., Lyu, S.: Category-blind human action recognition: a practical recognition system. In: ICCV, pp. 4444–4452 (2015)
Google Scholar
Wu, J., Hu, D., Chen, F.: Action recognition by hidden temporal models. Visual Comput. 30, 1395–1404 (2014)
Article Google Scholar
Hoai, M., Zisserman, A.: Improving human action recognition using score distribution and ranking. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 3–20. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16814-2_1
Google Scholar
Ni, B., Moulin, P., Yang, X., Yan, S.: Motion part regularization: improving action recognition via trajectory group selection. In: Proceedings of CVPR, pp. 3698–3706 (2015)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vision 103, 60–79 (2013)
Article MathSciNet Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2013)
Google Scholar
Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intel. 34, 1704–1716 (2012)
Article Google Scholar
Chen, W., Corso, J.J.: Action detection by implicit intentional motion clustering. In: ICCV, pp. 3298–3306 (2015)
Google Scholar
Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Van Gool, L.: Robust tracking-by-detection using a detector confidence particle filter. In: ICCV, pp. 1515–1522 (2009)
Google Scholar
Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: ICCV, pp. 3029–3037 (2015)
Google Scholar
Chari, V., Lacoste-Julien, S., Laptev, I., Sivic, J.: On pairwise costs for network flow multi-object tracking. In: CVPR, pp. 5537–5545 (2015)
Google Scholar
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: ICCV, pp. 3074–3082 (2015)
Google Scholar
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: ICCV, pp. 4310–4318 (2015)
Google Scholar
Tang, M., Feng, J.: Multi-kernel correlation filter for visual tracking. In: ICCV, pp. 3038–3046 (2015)
Google Scholar
Liu, T., Wang, G., Yang, Q.: Real-time part-based visual tracking via adaptive correlation filters. In: CVPR, pp. 4902–4912 (2015)
Google Scholar
Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: ICCV, pp. 3119–3127 (2015)
Google Scholar
Bae, S.H., Yoon, K.J.: Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: CVPR, pp. 1218–1225 (2014)
Google Scholar
Xing, J., Ai, H., Lao, S.: Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. In: CVPR, pp. 1200–1207 (2009)
Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2, 83–97 (1955)
Article MathSciNet MATH Google Scholar
Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17, 185–203 (1981)
Article Google Scholar
Chatfield, K., Karen Simonyan, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC, pp. 2491–2498 (2014)
Google Scholar
Everingham, M., Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88, 303–338 (2010)
Article Google Scholar
Kang, D., Han, H., Jain, A.K., Lee, S.W.: Nighttime face recognition at large standoff: cross-distance and cross-spectral matching. Pattern Recogn. 47, 3750–3766 (2014)
Article Google Scholar
Klum, S.J., Han, H., Klare, B.F., Jain, A.K.: The FaceSketchID system: matching facial composites to mugshots. IEEE Trans. Inf. Forensics Secur. 9, 2248–2263 (2014)
Article Google Scholar
Han, H., Shan, S., Chen, X., Lao, S., Gao, W.: Separability oriented preprocessing for illumination-insensitive face recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 307–320. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33786-4_23
Chapter Google Scholar
Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV, pp. 1–8 (2007)
Google Scholar
Hubel, D., Wiesel, T.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)
Article Google Scholar
Liu, C., Wechsler, H.: Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Process. 11, 467–476 (2002)
Article Google Scholar
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, pp. 32–36 (2004)
Google Scholar
Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos. In: CVPR, pp. 1996–2003 (2009)
Google Scholar
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR, pp. 2929–2936 (2009)
Google Scholar
Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 548–561. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88682-2_42
Chapter Google Scholar
Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: ICCV, pp. 1–8 (2007)
Google Scholar
Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K.: Motchallenge 2015: towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942 (2015)
Han, H., Otto, C., Liu, X., Jain, A.K.: Demographic estimation from face images: human vs. machine performance. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1148–1161 (2015)
Article Google Scholar

Download references

Acknowledgement

This research was partially supported by 973 Program (grant No. 2015CB351802), and Natural Science Foundation of China (grant No. 61672496). The authors would like to thank Xiaoyan Li for her proofreading of this paper. H. Han gratefully acknowledges the support of NVIDIA Corporation with the donation of the Titan X GPU used for his research.

Author information

Authors and Affiliations

Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing, 100190, China
Sixue Gong, Hu Han, Shiguang Shan & Xilin Chen

Authors

Sixue Gong
View author publications
You can also search for this author in PubMed Google Scholar
Hu Han
View author publications
You can also search for this author in PubMed Google Scholar
Shiguang Shan
View author publications
You can also search for this author in PubMed Google Scholar
Xilin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hu Han .

Editor information

Editors and Affiliations

Institute of Information Science, Academia Sinica, Taipei, Taiwan
Chu-Song Chen
Tsinghua University , Beijing, China
Jiwen Lu
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Kai-Kuang Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gong, S., Han, H., Shan, S., Chen, X. (2017). Actions Recognition in Crowd Based on Coarse-to-Fine Multi-object Tracking. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10118. Springer, Cham. https://doi.org/10.1007/978-3-319-54526-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-319-54526-4_35
Published: 16 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54525-7
Online ISBN: 978-3-319-54526-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics