Abstract
In the last decades, action recognition task has evolved from single view recording to unconstrained environment. Recently, multi-view action recognition has become a hot topic in computer vision. However, we notice that only a few works have focused on the open-view action recognition, which is a common problem in the real world. Open-view action recognition focus on doing action recognition in unseen view without using any information from it. To address this issue, we firstly introduce a novel multi-view surveillance action dataset and benchmark several state-of-the-art algorithms. From the results, we observe that the performance of the state-of-the-art algorithms would drop a lot under open-view constraints. Then, we propose a novel open-view action recognition method based on the linear discriminant analysis. This method can learn a common space for action samples under different view by using their category information, which can achieve a good result in open-view action recognition.



Similar content being viewed by others
References
Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54 (11):4311–4322
Arthur D, Vassilvitskii S (2007) k-means++: The advantages of careful seeding. In: ACM-SIAM symposium on discrete algorithms, pp 1027–1035
Cai D, He X, Han J (2008) Srda: an efficient algorithm for large-scale discriminant analysis. IEEE Trans Knowl Data Eng 20(1):1–12
Chang X, Yang Y (2016) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst PP(99):1–12
Chang X, Yu YL, Yang Y, Xing EP (2016) They are not equally reliable: Semantic event search using differentiated concept classifiers. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1884–1893
Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp 65–72
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874
Faraki M, Palhang M, Sanderson C (2014) Log-euclidean bag of words for human action recognition. IET Comput Vis 9(3):331–339
Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253
Hao T, Peng W, Wang Q, Wang B, Sun J (2016a) Reconstruction and application of proteincprotein interaction network. Int J Mol Sci 17(6):907
Hao T, Yu AL, Peng W, Wang B, Sun JS (2016b) Cross domain mitotic cell recognition. Neurocomputing 195(C):6–12
He X, Gao M, Kan MY, Wang D (2017) Birank: towards ranking on bipartite graphs. IEEE Trans Knowl Data Eng 29(1):57–71
Ke Y, Sukthankar R, Hebert M (2007) Event detection in crowded videos. In: IEEE International Conference on Computer Vision, pp 1–8
Kliper-Gross O, Hassner T, Wolf L (2012) The action similarity labeling challenge. IEEE Trans Pattern Anal Mach Intell 34(3):615–621
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: IEEE International Conference on Computer Vision, pp 2556–2563
Kuehne H, Arslan A, Serre T (2014) The language of actions: Recovering the syntax and semantics of goal-directed human activities. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 780–787
Laptev I, Lindeberg T (2003) Space-time interest points. In: IEEE International Conference on Computer Vision, pp 432–439
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Liu A, Su Y, Jia PP, Gao Z, Hao T, Yang ZX (2015) Multipe/single-view human action recognition via part-induced multitask structural learning. IEEE Trans Cybern 45(6):1194–1208
Liu A, Nie W, Gao Y, Su Y (2016a) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116
Liu A, Ning X, Nie W, Su Y, Wong Y, Kankanhalli M (2016b) Benchmarking a multimodal and multiview and interactive dataset for human action recognition. IEEE Trans Cybern PP(99):1–14
Liu A, Su Y, Nie W, Kankanhalli M (2016c) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114
Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1996–2003
Luo M, Chang X, Nie L, Yang Y, Hauptmann AG, Zheng Q (2017) An adaptive semisupervised feature analysis for video semantic recognition. IEEE Trans Cybern PP(99):1–13
Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 2929–2936
Messing R, Pal C, Kautz H (2009) Activity recognition using the velocity histories of tracked keypoints. In: IEEE International Conference on Computer Vision, pp 104–111
Nie L, Wang M, Zha ZJ, Chua TS (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inf Syst 30(2):13
Nie L, Zhang L, Meng L, Song X, Chang X, Li X (2016) Modeling disease progression via multisource multitask learners: a case study with alzheimer’s disease. In: IEEE Transactions on Neural Networks and Learning Systems, pp 1–12
Nie W, Liu A, Li W, Su Y (2016b) Cross-view action recognition by cross-domain learning. Image Vis Comput 55:109–118
Niebles JC, Chen C, Li F (2010) Modeling temporal structure of decomposable motion segments for activity classification. Lect Notes Comput Sci 6312:392–405
Over P, Fiscus J, Sanders G, Joy D, Michel M, Awad G, Smeaton A, Kraaij W, Quénot G (2014) TRECVID 2014 – an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID, p 52
Rodriguez MD, Ahmed J, Shah M (2008) Action mach a spatio-temporal maximum average correlation height filter for action recognition. In: CVPR 2008 IEEE Conference on Computer Vision and pattern Recognition, pp 1–8
Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: IEEE International Conference on Computer Vision, pp 1593–1600
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: International Conference on Pattern Recognition, pp 32–36
Sigal L, Balan AO, Black MJ (2010) Humaneva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int J Comput Vis 87(1):4
Singh S, Velastin SA, Ragheb H (2010) Muhavi: a multicamera human action video dataset for the evaluation of action recognition methods. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, pp 48–55
Soomro K, Zamir AR, Shah M (2012) Ucf101: a dataset of 101 human actions classes from videos in the wild. Computer Science
Tropp JA, Gilbert AC (2007) Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans Inf Theory 53(12):4655–4666
UCF (2011) Aerial action dataset. http://crcv.ucf.edu/data/UCF-Aerial-Action.php
UCF (2011) Aerial camera, rooftop camera and ground camera dataset. http://crcv.ucf.edu/data/UCF-ARG.php
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: IEEE International Conference on Computer Vision, pp 3551–3558
Wang H, Ullah MM, Klaser A, Laptev I, Schmid C (2009) Evaluation of local spatio-temporal features for action recognition. In: British Machine Vision Conference, pp 124–1
Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79
Weinland D, Boyer E, Ronfard R (2007) Action recognition from arbitrary views using 3d exemplars. In: IEEE International Conference on Computer Vision, pp 1–7
Yao B, Jiang X, Khosla A, Lin AL, Guibas L, Fei-Fei L (2011) Human action recognition by learning bases of action attributes and parts. In: International Conference on Computer Vision, pp 1331– 1338
Zhang H, Shang X, Luan H, Wang M, Chua TS (2016) Learning from collective intelligence: Feature learning using social images and tags. ACM Trans Multimed Comput Commun Appl 13(1):1
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China under Grant 61772359 and Grant 61472275 and Grant 61572356 and in part by the Tianjin Research Program of Application Foundation and Advanced Technology under Grant 15JCYBJC16200.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Su, Y., Li, Y. & Liu, A. Open-view human action recognition based on linear discriminant analysis. Multimed Tools Appl 78, 767–782 (2019). https://doi.org/10.1007/s11042-018-5657-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5657-6