Skip to main content

A Comparative Study of HMMs and LSTMs on Action Classification with Limited Training Data

  • Conference paper
  • First Online:
  • 1606 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 868))

Abstract

Action classification from video streams is a challenging problem, especially when there is a limited number of training data for different actions. Recent developments in deep learning based methods enabled high classification accuracies for many problems in different domains, yet they still perform poorly when the dataset is small. In this work, we examined the performances of Hidden Markov Models (HMM) and long short-term memory (LSTM) based recurrent neural network models using the same sequence classification framework with the well known KTH action dataset. KTH contains limited examples for training, hence challenges the deep learning based techniques even when transfer learning is applied in feature extraction. Our experiments depict that using a pre-trained convolutional network, i.e. SqueezeNet, and fine-tuning for feature extraction; HMM performs better in sequence modeling than an LSTM based model. Using the same feature extraction approach, i.e. fine-tuned SqueezeNet, we obtained 99.30% accuracy with an HMM, which is the best classification accuracy that is reported so far with this dataset; yet 81.92% accuracy with the best performing LSTM configuration.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    A comprehensive review of different action representation approaches are provided in [4].

References

  1. Thurau, C., Hlavac, V.: Pose primitive based human action recognition in videos or still images. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008

    Google Scholar 

  2. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. IEEE Trans. Pattern Anal. Mach. Intell. 23(3), 257–267 (2001)

    Article  Google Scholar 

  3. Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)

    Article  Google Scholar 

  4. Sargano, A., Angelov, P., Habib, Z.: A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition. Appl. Sci. 7(1), 110 (2017)

    Article  Google Scholar 

  5. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9, June 2015. doi.ieeecomputersociety.org/10.1109/CVPR.2015.7298594

  6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012). http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  7. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)

    Google Scholar 

  8. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, ser. NIPS 2014, pp. 3104–3112. MIT Press, Cambridge (2014). http://dl.acm.org/citation.cfm?id=2969033.2969173

  9. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and \(<\)0.5 mb model size, February 2016. arXiv:1602.07360 [cs], http://arxiv.org/abs/1602.07360

  10. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36, August 2004

    Google Scholar 

  11. Alp, E., Keles, H.: Action recognition using MHI based HU moments with HMMs. In: IEEE EUROCON 2017, 17th International Conference on Smart Technologies (2017)

    Google Scholar 

  12. Vezzani, R., Baltieri, D., Cucchiara, R.: HMM based action recognition with projection histogram features. In: Proceedings of the 20th International Conference on Recognizing Patterns in Signals, Speech, Images, and Videos, ser. ICPR 2010, pp. 286–293. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  13. Chen, C.C., Ryoo, M., Aggarwal, J.: UT-tower dataset: aerial view activity classification challenge (2010). http://cvrc.ece.utexas.edu/SDHA2010

  14. Le, Q.V., Zou, W.Y., Yeung, S.Y., Ng, A.Y.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: CVPR 2011, pp. 3361–3368, June 2011

    Google Scholar 

  15. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)

    Article  Google Scholar 

  16. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, ser. NIPS 2014, pp. 568–576. MIT Press, Cambridge (2014)

    Google Scholar 

  17. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. CoRR, vol. abs/1212.0402 (2012)

    Google Scholar 

  18. Kuhne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: IEEE International Conference on Computer Vision (ICCV) (2011)

    Google Scholar 

  19. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732, June 2014

    Google Scholar 

  20. Donahue, J., Hendricks, L.A., Rohrbach, M., Venugopalan, S., Guadarrama, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 677–691 (2017)

    Article  Google Scholar 

  21. Ng, J., Hausknecht, M.J., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4694–4702 (2015)

    Google Scholar 

  22. Lei, J., Li, G., Zhang, J., Guo, Q., Tu, D.: Continuous action segmentation and recognition using hybrid convolutional neural network-hidden markov model model. IET Comput. Vis. 10(6), 537–544 (2016)

    Article  Google Scholar 

  23. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer (2014)

    Google Scholar 

  24. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)

    Google Scholar 

  25. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, vol. abs/1409.1556 (2014)

    Google Scholar 

  26. Baum, L.: An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities 3, 1–8 (1972)

    Google Scholar 

  27. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  28. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representation (2015)

    Google Scholar 

  29. Fathi, A., Mori, G.: Action recognition by learning mid-level motion features. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2008

    Google Scholar 

  30. Hasan, M., Roy-Chowdhury, A.K.: Continuous learning of human activity models using deep nets. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014, pp. 705–720. Springer, Cham (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elit Cenk Alp .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alp, E.C., Yalim Keles, H. (2019). A Comparative Study of HMMs and LSTMs on Action Classification with Limited Training Data. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Systems and Applications. IntelliSys 2018. Advances in Intelligent Systems and Computing, vol 868. Springer, Cham. https://doi.org/10.1007/978-3-030-01054-6_76

Download citation

Publish with us

Policies and ethics