A Comparative Study of HMMs and LSTMs on Action Classification with Limited Training Data

Alp, Elit Cenk; Yalim Keles, Hacer

doi:10.1007/978-3-030-01054-6_76

A Comparative Study of HMMs and LSTMs on Action Classification with Limited Training Data

Elit Cenk Alp¹⁷ &
Hacer Yalim Keles¹⁷

Conference paper
First Online: 09 November 2018

1606 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 868))

Abstract

Action classification from video streams is a challenging problem, especially when there is a limited number of training data for different actions. Recent developments in deep learning based methods enabled high classification accuracies for many problems in different domains, yet they still perform poorly when the dataset is small. In this work, we examined the performances of Hidden Markov Models (HMM) and long short-term memory (LSTM) based recurrent neural network models using the same sequence classification framework with the well known KTH action dataset. KTH contains limited examples for training, hence challenges the deep learning based techniques even when transfer learning is applied in feature extraction. Our experiments depict that using a pre-trained convolutional network, i.e. SqueezeNet, and fine-tuning for feature extraction; HMM performs better in sequence modeling than an LSTM based model. Using the same feature extraction approach, i.e. fine-tuned SqueezeNet, we obtained 99.30% accuracy with an HMM, which is the best classification accuracy that is reported so far with this dataset; yet 81.92% accuracy with the best performing LSTM configuration.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
A comprehensive review of different action representation approaches are provided in [4].

References

Thurau, C., Hlavac, V.: Pose primitive based human action recognition in videos or still images. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008
Google Scholar
Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)
Article Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Article Google Scholar
Sargano, A., Angelov, P., Habib, Z.: A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition. Appl. Sci. 7(1), 110 (2017)
Article Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9, June 2015. doi.ieeecomputersociety.org/10.1109/CVPR.2015.7298594
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, ser. NIPS 2014, pp. 3104–3112. MIT Press, Cambridge (2014). http://dl.acm.org/citation.cfm?id=2969033.2969173
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and \(<\)0.5 mb model size, February 2016. arXiv:1602.07360 [cs], http://arxiv.org/abs/1602.07360
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36, August 2004
Google Scholar
Alp, E., Keles, H.: Action recognition using MHI based HU moments with HMMs. In: IEEE EUROCON 2017, 17th International Conference on Smart Technologies (2017)
Google Scholar
Vezzani, R., Baltieri, D., Cucchiara, R.: HMM based action recognition with projection histogram features. In: Proceedings of the 20th International Conference on Recognizing Patterns in Signals, Speech, Images, and Videos, ser. ICPR 2010, pp. 286–293. Springer, Heidelberg (2010)
Chapter Google Scholar
Chen, C.C., Ryoo, M., Aggarwal, J.: UT-tower dataset: aerial view activity classification challenge (2010). http://cvrc.ece.utexas.edu/SDHA2010
Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR 2011, pp. 3361–3368, June 2011
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, ser. NIPS 2014, pp. 568–576. MIT Press, Cambridge (2014)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. CoRR, vol. abs/1212.0402 (2012)
Google Scholar
Kuhne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: IEEE International Conference on Computer Vision (ICCV) (2011)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732, June 2014
Google Scholar
Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017)
Article Google Scholar
Ng, J., Hausknecht, M.J., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4694–4702 (2015)
Google Scholar
Lei, J., Li, G., Zhang, J., Guo, Q., Tu, D.: Continuous action segmentation and recognition using hybrid convolutional neural network-hidden markov model model. IET Comput. Vis. 10(6), 537–544 (2016)
Article Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer (2014)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, vol. abs/1409.1556 (2014)
Google Scholar
Baum, L.: An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities 3, 1–8 (1972)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representation (2015)
Google Scholar
Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008
Google Scholar
Hasan, M., Roy-Chowdhury, A.K.: Continuous learning of human activity models using deep nets. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014, pp. 705–720. Springer, Cham (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Ankara University, Ankara, Turkey
Elit Cenk Alp & Hacer Yalim Keles

Authors

Elit Cenk Alp
View author publications
You can also search for this author in PubMed Google Scholar
Hacer Yalim Keles
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elit Cenk Alp .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alp, E.C., Yalim Keles, H. (2019). A Comparative Study of HMMs and LSTMs on Action Classification with Limited Training Data. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_76

Download citation

DOI: https://doi.org/10.1007/978-3-030-01054-6_76
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01053-9
Online ISBN: 978-3-030-01054-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics