Skip to main content

Actions Recognition in Crowd Based on Coarse-to-Fine Multi-object Tracking

  • Conference paper
  • First Online:
Computer Vision – ACCV 2016 Workshops (ACCV 2016)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10118))

Included in the following conference series:

  • 3064 Accesses

Abstract

Action recognition has wide applications from video surveillance, scene understanding to forensic investigation. While recent methods typically focus on a single action recognition from video clips, we investigate the problem of action recognition in crowd, which better replicates real video surveillance scenarios. We propose to perform actions recognition in crowd based on an efficient coarse-to-fine multi-object tracking algorithm. With Faster R-CNN as our human detector, we utilize a coarse-to-fine strategy for multi-object tracking in crowd, consisting of multi-object fast tracking and per-object fine tracking. The tracking results are used to extract the action cuboids, and spatial-temporal features are computed for action classification. We evaluate the proposed approach on a self-collected actions-in-crowd dataset, and two public domain databases (CMU and and MOT2015). The results show the effectiveness of the proposed approach for multi-action recognition in crowd.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://motchallenge.net/results/2D_MOT_2015/.

References

  1. Siva, P., Xiang, T.: Action detection in crowd. In: BMVC, pp. 1–11 (2010)

    Google Scholar 

  2. Luo, Y., Cheong, L.F., Tran, A.: Actionness-assisted recognition of actions. In: ICCV, pp. 3244–3252 (2015)

    Google Scholar 

  3. Li, Y., Ye, J., Wang, T., Huang, S.: Augmenting bag-of-words: a robust contextual representation of spatiotemporal interest points for action recognition. Visual Comput. 31, 1383–1394 (2015)

    Article  Google Scholar 

  4. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)

    Google Scholar 

  5. Gkioxari, G., Girshick, R., Malik, J.: Contextual action recognition with R*CNN. In: ICCV, pp. 1080–1088 (2015)

    Google Scholar 

  6. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)

    Google Scholar 

  7. Fu, Z., Han, Y.: Centroid weighted Kalman filter for visual object tracking. Measurement 45, 650–655 (2012)

    Article  Google Scholar 

  8. Efros, A.A., Berg, A.C., G.M., Malik, J.: Recognizing action at a distance. In: ICCV, pp. 726–733 (2003)

    Google Scholar 

  9. Hu, Y., Cao, L., Lv, F., Yan, S., Gong, Y., Huang, T.S.: Action detection in complex scenes with spatial and temporal ambiguities. In: ICCV, pp. 128–135 (2009)

    Google Scholar 

  10. Jain, M., Jegou, H., Bouthemy, P.: Better exploiting motion for better action recognition. In: CVPR, pp. 2555–2562 (2013)

    Google Scholar 

  11. Ryoo, M.S., Matthies, L.: First-person activity recognition: what are they NG to me? In: CVPR, pp. 2730–2737 (2013)

    Google Scholar 

  12. Zhou, S., Shen, W., Zeng, D., Zhang, Z.: Unusual event detection in crowded scenes by trajectory analysis. In: ICASSP, pp. 1300–1304 (2015)

    Google Scholar 

  13. Zhu, Y., Nayak, N.M., Roy-Chowdhury, A.K.: Context-aware modeling and recognition of activities in video. In: CVPR, pp. 2491–2498 (2013)

    Google Scholar 

  14. Li, W., Wen, L., Choo Chuah, M., Lyu, S.: Category-blind human action recognition: a practical recognition system. In: ICCV, pp. 4444–4452 (2015)

    Google Scholar 

  15. Wu, J., Hu, D., Chen, F.: Action recognition by hidden temporal models. Visual Comput. 30, 1395–1404 (2014)

    Article  Google Scholar 

  16. Hoai, M., Zisserman, A.: Improving human action recognition using score distribution and ranking. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9007, pp. 3–20. Springer, Heidelberg (2015). doi:10.1007/978-3-319-16814-2_1

    Google Scholar 

  17. Ni, B., Moulin, P., Yang, X., Yan, S.: Motion part regularization: improving action recognition via trajectory group selection. In: Proceedings of CVPR, pp. 3698–3706 (2015)

    Google Scholar 

  18. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vision 103, 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  19. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2013)

    Google Scholar 

  20. Jegou, H., Perronnin, F., Douze, M., Sánchez, J., Perez, P., Schmid, C.: Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intel. 34, 1704–1716 (2012)

    Article  Google Scholar 

  21. Chen, W., Corso, J.J.: Action detection by implicit intentional motion clustering. In: ICCV, pp. 3298–3306 (2015)

    Google Scholar 

  22. Breitenstein, M.D., Reichlin, F., Leibe, B., Koller-Meier, E., Van Gool, L.: Robust tracking-by-detection using a detector confidence particle filter. In: ICCV, pp. 1515–1522 (2009)

    Google Scholar 

  23. Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: ICCV, pp. 3029–3037 (2015)

    Google Scholar 

  24. Chari, V., Lacoste-Julien, S., Laptev, I., Sivic, J.: On pairwise costs for network flow multi-object tracking. In: CVPR, pp. 5537–5545 (2015)

    Google Scholar 

  25. Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: ICCV, pp. 3074–3082 (2015)

    Google Scholar 

  26. Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: ICCV, pp. 4310–4318 (2015)

    Google Scholar 

  27. Tang, M., Feng, J.: Multi-kernel correlation filter for visual tracking. In: ICCV, pp. 3038–3046 (2015)

    Google Scholar 

  28. Liu, T., Wang, G., Yang, Q.: Real-time part-based visual tracking via adaptive correlation filters. In: CVPR, pp. 4902–4912 (2015)

    Google Scholar 

  29. Wang, L., Ouyang, W., Wang, X., Lu, H.: Visual tracking with fully convolutional networks. In: ICCV, pp. 3119–3127 (2015)

    Google Scholar 

  30. Bae, S.H., Yoon, K.J.: Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: CVPR, pp. 1218–1225 (2014)

    Google Scholar 

  31. Xing, J., Ai, H., Lao, S.: Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. In: CVPR, pp. 1200–1207 (2009)

    Google Scholar 

  32. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2, 83–97 (1955)

    Article  MathSciNet  MATH  Google Scholar 

  33. Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17, 185–203 (1981)

    Article  Google Scholar 

  34. Chatfield, K., Karen Simonyan, A.: Return of the devil in the details: delving deep into convolutional nets. In: BMVC, pp. 2491–2498 (2014)

    Google Scholar 

  35. Everingham, M., Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88, 303–338 (2010)

    Article  Google Scholar 

  36. Kang, D., Han, H., Jain, A.K., Lee, S.W.: Nighttime face recognition at large standoff: cross-distance and cross-spectral matching. Pattern Recogn. 47, 3750–3766 (2014)

    Article  Google Scholar 

  37. Klum, S.J., Han, H., Klare, B.F., Jain, A.K.: The FaceSketchID system: matching facial composites to mugshots. IEEE Trans. Inf. Forensics Secur. 9, 2248–2263 (2014)

    Article  Google Scholar 

  38. Han, H., Shan, S., Chen, X., Lao, S., Gao, W.: Separability oriented preprocessing for illumination-insensitive face recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7578, pp. 307–320. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33786-4_23

    Chapter  Google Scholar 

  39. Jhuang, H., Serre, T., Wolf, L., Poggio, T.: A biologically inspired system for action recognition. In: ICCV, pp. 1–8 (2007)

    Google Scholar 

  40. Hubel, D., Wiesel, T.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)

    Article  Google Scholar 

  41. Liu, C., Wechsler, H.: Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. IEEE Trans. Image Process. 11, 467–476 (2002)

    Article  Google Scholar 

  42. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: ICPR, pp. 32–36 (2004)

    Google Scholar 

  43. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos. In: CVPR, pp. 1996–2003 (2009)

    Google Scholar 

  44. Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR, pp. 2929–2936 (2009)

    Google Scholar 

  45. Tran, D., Sorokin, A.: Human activity recognition with metric learning. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 548–561. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88682-2_42

    Chapter  Google Scholar 

  46. Ke, Y., Sukthankar, R., Hebert, M.: Event detection in crowded videos. In: ICCV, pp. 1–8 (2007)

    Google Scholar 

  47. Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K.: Motchallenge 2015: towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942 (2015)

  48. Han, H., Otto, C., Liu, X., Jain, A.K.: Demographic estimation from face images: human vs. machine performance. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1148–1161 (2015)

    Article  Google Scholar 

Download references

Acknowledgement

This research was partially supported by 973 Program (grant No. 2015CB351802), and Natural Science Foundation of China (grant No. 61672496). The authors would like to thank Xiaoyan Li for her proofreading of this paper. H. Han gratefully acknowledges the support of NVIDIA Corporation with the donation of the Titan X GPU used for his research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hu Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Gong, S., Han, H., Shan, S., Chen, X. (2017). Actions Recognition in Crowd Based on Coarse-to-Fine Multi-object Tracking. In: Chen, CS., Lu, J., Ma, KK. (eds) Computer Vision – ACCV 2016 Workshops. ACCV 2016. Lecture Notes in Computer Science(), vol 10118. Springer, Cham. https://doi.org/10.1007/978-3-319-54526-4_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54526-4_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54525-7

  • Online ISBN: 978-3-319-54526-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics