Deep Neural Networks for Human Behavior Understanding

Singh, Rajiv; Nigam, Swati

doi:10.1007/978-3-030-15887-3_32

Rajiv Singh³ &
Swati Nigam⁴

1342 Accesses

Abstract

Human behavior understanding techniques are proposed for several applications likewise object recognition, face detection, emotion detection, action detection, finger print identification, gait recognition, voice recognition, etc. Emotion and action recognition are the most popular applications among them. This chapter presents an analysis of recently developed deep learning techniques for emotion and activity recognition. Existing approaches are discussed that use deep learning as their core component. Experimental results are reported on benchmark datasets i.e. CK+ and SFEW datasets for emotion recognition, and Skoda and UCF 101 datasets for activity recognition. Experimentation shows that deep learning methods outperform other existing techniques in literature and demonstrate great performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach. Malaysia; Pearson Education Limited.
Google Scholar
Sonka, M., Hlavac, V., & Boyle, R. (2014). Image processing, analysis, and machine vision. Cengage Learning.
Google Scholar
Nigam, S., Singh, R., & Misra, A. K. (2019). Towards intelligent human behavior detection for video surveillance. In Censorship, Surveillance, and Privacy: Concepts, Methodologies, Tools, and Applications (pp. 884-917). IGI Global.
Google Scholar
Nigam, S., Singh, R., & Misra, A. K. (2018). A Review of Computational Approaches for Human Behavior Detection. Archives of Computational Methods in Engineering, 1-33. https://doi.org/10.1007/s11831-018-9270-7.
Zhao, K., Chu, W. S., De la Torre, F., Cohn, J. F., & Zhang, H. (2016). Joint patch and multi-label learning for facial action unit and holistic expression recognition. IEEE Transactions on Image Processing, 25(8), 3931-3946.
Google Scholar
Nigam, S., Singh, R., & Misra, A. K. (2018). Efficient facial expression recognition using histogram of oriented gradients in wavelet domain. Multimedia Tools and Applications, 1-23.
Google Scholar
Emambakhsh, M., & Evans, A. (2017). Nasal patches and curves for expression-robust 3D face recognition. IEEE transactions on pattern analysis and machine intelligence, 39(5), 995-1007.
Google Scholar
Nigam, S., Singh, R., & Misra, A. K. (2018). Local Binary Patterns based Facial Expression Recognition for Efficient Smart Applications, Machine Learning Paradigms: Theory and Applications, Security in Smart Cities, Studies in Computational Intelligence Series, Springer.
Google Scholar
Kerola, T., Inoue, N., & Shinoda, K. (2017). Cross-view human action recognition from depth maps using spectral graph sequences. Computer Vision and Image Understanding, 154, 108-126.
Google Scholar
Nigam, S., & Khare, A. (2016). Integration of moment invariants and uniform local binary patterns for human activity recognition in video sequences. Multimedia Tools and Applications, 75(24), 17303-17332.
Google Scholar
Sharma, C. M., Kushwaha, A. K. S., Nigam, S., & Khare, A. (2011, September). On human activity recognition in video sequences. In Computer and Communication Technology (ICCCT), 2011 2nd International Conference on (pp. 152-158). IEEE.
Google Scholar
Salah, A. A., Gevers, T., Sebe, N., & Vinciarelli, A. (2010, August). Challenges of human behavior understanding. In International Workshop on Human Behavior Understanding (pp. 1-12). Springer, Berlin, Heidelberg.
Google Scholar
Kamnitsas, K., Ledig, C., Newcombe, V. F., Simpson, J. P., Kane, A. D., Menon, D. K., … & Glocker, B. (2017). Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis, 36, 61-78.
Google Scholar
Hoo-Chang, S., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., … & Summers, R. M. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE transactions on medical imaging, 35(5), 1285.
Google Scholar
Young, T., Hazarika, D., Poria, S., & Cambria, E. (2018). Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine, 13(3), 55-75.
Google Scholar
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., & Xu, W. (2016). Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2285-2294).
Google Scholar
Zheng, Z., Zheng, L., & Yang, Y. (2017). A discriminatively learned cnn embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 14(1), 13.
Google Scholar
Hafemann, L. G., Sabourin, R., & Oliveira, L. S. (2016, July). Writer-independent feature learning for offline signature verification using deep convolutional neural networks. In Neural networks (IJCNN), 2016 international joint conference on (pp. 2576-2583). IEEE.
Google Scholar
Leal-Taixé, L., Canton-Ferrer, C., & Schindler, K. (2016). Learning by tracking: Siamese cnn for robust target association. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 33-40).
Google Scholar
Shima, Y., & Omori, Y. (2018, August). Image Augmentation for Classifying Facial Expression Images by Using Deep Neural Network Pre-trained with Object Image Database. In Proceedings of the 3rd International Conference on Robotics, Control and Automation (pp. 140-146). ACM.
Google Scholar
Ronao, C. A., & Cho, S. B. (2016). Human activity recognition with smartphone sensors using deep learning neural networks. Expert Systems with Applications, 59, 235-244.
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep learning (Vol. 1). Cambridge: MIT press.
Google Scholar
Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., … & Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical image analysis, 42, 60-88.
Google Scholar
Zeng, Z., Li, Z., Cheng, D., Zhang, H., Zhan, K., & Yang, Y. (2018). Two-Stream Multirate Recurrent Neural Network for Video-Based Pedestrian Reidentification. IEEE Transactions on Industrial Informatics, 14(7), 3179-3186.
Google Scholar
Aldwairi, T., Perera, D., & Novotny, M. A. (2018). An evaluation of the performance of Restricted Boltzmann Machines as a model for anomaly network intrusion detection. Computer Networks, 144, 111-119.
Google Scholar
Sankaran, A., Vatsa, M., Singh, R., & Majumdar, A. (2017). Group sparse autoencoder. Image and Vision Computing, 60, 64-74.
Google Scholar
Dailey, M. N., Joyce, C., Lyons, M. J., Kamachi, M., Ishi, H., Gyoba, J., & Cottrell, G. W. (2010). Evidence and a computational explanation of cultural differences in facial expression recognition. Emotion, 10(6), 874.
Google Scholar
Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial expression analysis. In Automatic Face and Gesture Recognition. Proceedings. Fourth IEEE International Conference on (pp. 46-53). IEEE.
Google Scholar
Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010, June). The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on (pp. 94-101). IEEE.
Google Scholar
Yale facial expression database, http://vision.ucsd.edu/content/yale-face-database.
Pantic, M., Valstar, M., Rademaker, R., & Maat, L. (2005, July). Web-based database for facial expression analysis. In 2005 IEEE international conference on multimedia and Expo (p. 5). IEEE.
Google Scholar
Liu, M., Li, S., Shan, S., Wang, R., & Chen, X. (2014, November). Deeply learning deformable facial action parts model for dynamic expression analysis. In Asian conference on computer vision (pp. 143-157). Springer, Cham.
Google Scholar
Jung, H., Lee, S., Yim, J., Park, S., & Kim, J. (2015). Joint fine-tuning in deep neural networks for facial expression recognition. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2983-2991).
Google Scholar
Jung, H., Lee, S., Park, S., Kim, B., Kim, J., Lee, I., & Ahn, C. (2015, January). Development of deep learning-based facial expression recognition system. In Frontiers of Computer Vision (FCV), 2015 21st Korea-Japan Joint Workshop on (pp. 1-4). IEEE.
Google Scholar
Spiers, D. L. (2016). Facial emotion detection using deep learning. Doctoral Dissertation, UPPSALA Universitet.
Google Scholar
Meng, Z., Liu, P., Cai, J., Han, S., & Tong, Y. (2017, May). Identity-aware convolutional neural network for facial expression recognition. In Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on (pp. 558-565). IEEE.
Google Scholar
Liu, M., Li, S., Shan, S., & Chen, X. (2015). Au-inspired deep networks for facial expression feature learning. Neurocomputing, 159, 126-136.
Google Scholar
Liu, P., Han, S., Meng, Z., & Tong, Y. (2014). Facial expression recognition via a boosted deep belief network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1805-1812).
Google Scholar
Fathallah, A., Abdi, L., & Douik, A. (2017, October). Facial Expression Recognition via Deep Learning. In Computer Systems and Applications (AICCSA), 2017 IEEE/ACS 14th International Conference on (pp. 745-750). IEEE.
Google Scholar
Li, W., Li, M., Su, Z., & Zhu, Z. (2015, May). A deep-learning approach to facial expression recognition with candid images. In Machine Vision Applications (MVA), 2015 14th IAPR International Conference on (pp. 279-282). IEEE.
Google Scholar
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011, November). Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on (pp. 2106-2112). IEEE.
Google Scholar
Levi, G., & Hassner, T. (2015, November). Emotion recognition in the wild via convolutional neural networks and mapped binary patterns. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 503-510). ACM.
Google Scholar
Ng, H. W., Nguyen, V. D., Vonikakis, V., & Winkler, S. (2015, November). Deep learning for emotion recognition on small datasets using transfer learning. In Proceedings of the 2015 ACM on international conference on multimodal interaction (pp. 443-449). ACM.
Google Scholar
Li, S., & Deng, W. (2018). Reliable Crowdsourcing and Deep Locality-Preserving Learning for Unconstrained Facial Expression Recognition. IEEE Transactions on Image Processing.
Google Scholar
Ding, H., Zhou, S. K., & Chellappa, R. (2017, May). Facenet2expnet: Regularizing a deep face recognition net for expression recognition. In Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on (pp. 118-126). IEEE.
Google Scholar
Pons, G., & Masip, D. (2018). Multi-task, multi-label and multi-domain learning with residual convolutional networks for emotion recognition. arXiv preprint arXiv:1802.06664.
Google Scholar
Liu, X., Kumar, B. V., You, J., & Jia, P. (2017, July). Adaptive Deep Metric Learning for Identity-Aware Facial Expression Recognition. In CVPR Workshops (pp. 522-531).
Google Scholar
Cai, J., Meng, Z., Khan, A. S., Li, Z., O’Reilly, J., & Tong, Y. (2018, May). Island Loss for Learning Discriminative Features in Facial Expression Recognition. In Automatic Face & Gesture Recognition (FG 2018), 2018 13th IEEE International Conference on (pp. 302-309). IEEE.
Google Scholar
Kim, B. K., Lee, H., Roh, J., & Lee, S. Y. (2015, November). Hierarchical committee of deep cnns with exponentially-weighted decision fusion for static facial expression recognition. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 427-434). ACM.
Google Scholar
Yu, Z., & Zhang, C. (2015, November). Image based static facial expression recognition with multiple deep network learning. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 435-442). ACM.
Google Scholar
Roggen, D., Calatroni, A., Rossi, M., Holleczek, T., Förster, K., Tröster, G., … & Doppler, J. (2010, June). Collecting complex activity datasets in highly rich networked sensor environments. In Networked Sensing Systems (INSS), 2010 Seventh International Conference on (pp. 233-240). IEEE.
Google Scholar
Reiss, A., & Stricker, D. (2012, June). Introducing a new benchmarked dataset for activity monitoring. In Wearable Computers (ISWC), 2012 16th International Symposium on (pp. 108-109). IEEE.
Google Scholar
Zappi, P., Lombriser, C., Stiefmeier, T., Farella, E., Roggen, D., Benini, L., & Tröster, G. (2008). Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection. In Wireless sensor networks (pp. 17-33). Springer, Berlin, Heidelberg.
Google Scholar
Banos, O., Garcia, R., Holgado-Terriza, J. A., Damas, M., Pomares, H., Rojas, I., … & Villalonga, C. (2014, December). mHealthDroid: a novel framework for agile development of mobile health applications. In International Workshop on Ambient Assisted Living (pp. 91-98). Springer, Cham.
Google Scholar
Zeng, M., Nguyen, L. T., Yu, B., Mengshoel, O. J., Zhu, J., Wu, P., & Zhang, J. (2014, November). Convolutional neural networks for human activity recognition using mobile sensors. In Mobile Computing, Applications and Services (MobiCASE), 2014 6th International Conference on (pp. 197-205). IEEE.
Google Scholar
Alsheikh, M. A., Selim, A., Niyato, D., Doyle, L., Lin, S., & Tan, H. P. (2016, February). Deep Activity Recognition Models with Triaxial Accelerometers. In AAAI Workshop: Artificial Intelligence Applied to Assistive Technologies and Smart Environments.
Google Scholar
Ordóñez, F. J., & Roggen, D. (2016). Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors, 16(1), 115.
Google Scholar
Mohammad, Y., Matsumoto, K., & Hoashi, K. (2018). Primitive activity recognition from short sequences of sensory data. Applied Intelligence, 1-14.
Google Scholar
Hossain, H. M., Al Haiz Khan, M. D., & Roy, N. (2018). DeActive: Scaling Activity Recognition with Active Deep Learning. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(2), 66.
Google Scholar
Qian, H., Pan, S. J., & Miao, C. (2018). Sensor-based Activity Recognition via Learning from Distributions. The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 6262-6269.
Google Scholar
Hammerla, N. Y., Halloran, S., & Ploetz, T. (2016). Deep, convolutional, and recurrent models for human activity recognition using wearables. arXiv preprint arXiv:1604.08880. In Proc. IJCAI.
Google Scholar
Murahari, V. S., & Ploetz, T. (2018). On Attention Models for Human Activity Recognition. arXiv preprint arXiv:1805.07648. https://arxiv.org/abs/1805.07648.
Ravi, D., Wong, C., Lo, B., & Yang, G. Z. (2016, June). Deep learning for human activity recognition: A resource efficient implementation on low-power devices. In Wearable and Implantable Body Sensor Networks (BSN), 2016 IEEE 13th International Conference on (pp. 71-76). IEEE.
Google Scholar
Murad, A., & Pyun, J. Y. (2017). Deep recurrent neural networks for human activity recognition. Sensors, 17(11), 2556, doi: https://doi.org/10.3390/s17112556.
Article Google Scholar
Soomro, K., Zamir, A. R., & Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, CRCV-TR-12-01.
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 4489-4497).
Google Scholar
Sun, L., Jia, K., Yeung, D. Y., & Shi, B. E. (2015). Human action recognition using factorized spatio-temporal convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4597-4605).
Google Scholar
Varol, G., Laptev, I., & Schmid, C. (2018). Long-term temporal convolutions for action recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6), 1510-1517.
Google Scholar
Wang, L., Qiao, Y., & Tang, X. (2015). Action recognition with trajectory-pooled deep-convolutional descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4305-4314).
Google Scholar
Feichtenhofer, C., Pinz, A., & Zisserman, A. (2016). Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1933-1941).
Google Scholar
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., & Van Gool, L. (2016, October). Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision (pp. 20-36). Springer, Cham.
Google Scholar
Feichtenhofer, C., Pinz, A., & Wildes, R. (2016). Spatiotemporal residual networks for video action recognition. In Advances in neural information processing systems (pp. 3468-3476).
Google Scholar
Bilen, H., Fernando, B., Gavves, E., Vedaldi, A., & Gould, S. (2016). Dynamic image networks for action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3034-3042).
Google Scholar
Srivastava, N., Mansimov, E., & Salakhudinov, R. (2015, June). Unsupervised learning of video representations using lstms. In International conference on machine learning (pp. 843-852).
Google Scholar
Lev, G., Sadeh, G., Klein, B., & Wolf, L. (2016, October). Rnn fisher vectors for action recognition and image annotation. In European Conference on Computer Vision (pp. 833-850). Springer, Cham.
Google Scholar

Download references

Acknowledgments

This study is sponsored by Science and Engineering Research Board, Department of Science and Technology, Government of India via grant no. PDF/2016/003644.

Author information

Authors and Affiliations

Department of Computer Science, Banasthali Vidyapith, Banasthali, Rajasthan, India
Rajiv Singh
Computer Science and Engineering Department, S. P. Memorial Institute of Technology, Kaushambi, Uttar Pradesh, India
Swati Nigam

Authors

Rajiv Singh
View author publications
You can also search for this author in PubMed Google Scholar
Swati Nigam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, NIT Patna, Patna, India
Amit Kumar Singh
Department of Electronics Engineering, Indian Institute of Technology, BHU, Varanasi, Uttar Pradesh, India
Anand Mohan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Singh, R., Nigam, S. (2019). Deep Neural Networks for Human Behavior Understanding. In: Singh, A., Mohan, A. (eds) Handbook of Multimedia Information Security: Techniques and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-15887-3_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-15887-3_32
Published: 20 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15886-6
Online ISBN: 978-3-030-15887-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics