Extracting Deep Video Feature for Mobile Video Classification with ELU-3DCNN

Liu, Jihong; Zhang, Jing; Zhang, Hui; Liang, Xi; Zhuo, Li

doi:10.1007/978-981-10-8530-7_15

Jihong Liu¹²,
Jing Zhang¹²,
Hui Zhang¹²,
Xi Liang¹² &
…
Li Zhuo^12,13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 819))

Included in the following conference series:

International Conference on Internet Multimedia Computing and Service

1578 Accesses

Abstract

Extracting robust video feature has always been a challenge in the field of video classification. Although existing researches on video feature extraction have been active and extensive, the classification results based on traditional video feature are always neither flexible nor satisfactory enough. Recently, deep learning has shown an excellent performance in video feature extraction. In this paper, we improve a deep learning architecture called ELU-3DCNN to extract deep video feature for video classification. Firstly, ELU-3DCNN is trained with exponential linear units (ELUs). Then a video is split into 16-frame clips with 8-frame overlaps between consecutive clips. These clips are passed to ELU-3DCNN to extract fc7 activations, which are further averaged and normalized to form a 4096-dim video feature. Experimental results on UCF-101 dataset show that ELU-3DCNN can improve the performance of video classification compared with the state-of-the-art video feature extraction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 107.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

3D convolutional networks with multi-layer-pooling selection fusion for video classification

Article 13 August 2021

Video event classification based on two-stage neural network

Article 06 May 2020

Global Features of Fused Frame Relationships Help Video Classification

References

Deldjoo, Y., Elahi, M., Cremonesi, P., et al.: Content-based video recommendation system based on stylistic visual features. J. Data Semant. 5(2), 99–113 (2016)
Article Google Scholar
Hong, R., Hu, Z., Wang, R., Wang, M., Tao, D.: Multi-view object retrieval via multi-scale topic models. IEEE Trans. Image Process. 25(12), 5814–5827 (2016)
Article MathSciNet Google Scholar
Fernando, B., Gavves, E., Oramas, J., et al.: Rank pooling for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 773–787 (2017)
Article Google Scholar
Coar, S., Donatiello, G., Bogorny, V., et al.: Toward abnormal trajectory and event detection in video surveillance. IEEE Trans. Circ. Syst. Video Technol. 27(3), 683–695 (2017)
Article Google Scholar
Hong, R., Zhang, L., Zhang, C., Zimmermann, R.: Flickr circles: aesthetic tendency discovery by multi-view regularized topic modeling. IEEE Trans. Multimed. 18(8), 1555–1567 (2016)
Article Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., et al.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
Ji, S., Xu, W., Yang, M., et al.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Google Scholar
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of the International Conference on Machine Learning, pp. 843–852 (2015)
Google Scholar
Donahue, J., Anne Hendricks, L., Guadarrama, S., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., et al.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the International Conference on Machine Learning, pp. 807–814 (2010)
Google Scholar
Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units. In: International Conferences on Learning Representations, pp. 3327–3341 (2016)
Google Scholar
Srivastava, N., Hinton, G.E., Krizhevsky, A., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Hong, R., Yang, Y., Wang, M., Hua, X.-S.: Learning visual semantic relationships for efficient visual retrieval. IEEE Trans. Big Data 1(4), 152–161 (2015)
Article Google Scholar

Download references

Acknowledgements

The work in this paper is supported by the National Natural Science Foundation of China (No. 61531006, No. 61602018), the Science and Technology Development Program of Beijing Education Committee (No. KM201510005004), the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (No. CIT&TCD20150311). Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality.

Author information

Authors and Affiliations

Signal and Information Processing Laboratory, Beijing University of Technology, Beijing, China
Jihong Liu, Jing Zhang, Hui Zhang, Xi Liang & Li Zhuo
Collaborative Innovation Center of Electric Vehicles in Beijing, Beijing, 100124, People’s Republic of China
Li Zhuo

Authors

Jihong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xi Liang
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhuo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jihong Liu .

Editor information

Editors and Affiliations

Multimedia Communications Department, EURECOM, Sophia Antipolis, France
Benoit Huet
Shandong University , Qingdao, China
Liqiang Nie
Hefei University of Technology , Hefei, China
Richang Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Zhang, J., Zhang, H., Liang, X., Zhuo, L. (2018). Extracting Deep Video Feature for Mobile Video Classification with ELU-3DCNN. In: Huet, B., Nie, L., Hong, R. (eds) Internet Multimedia Computing and Service. ICIMCS 2017. Communications in Computer and Information Science, vol 819. Springer, Singapore. https://doi.org/10.1007/978-981-10-8530-7_15

Download citation

DOI: https://doi.org/10.1007/978-981-10-8530-7_15
Published: 01 March 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8529-1
Online ISBN: 978-981-10-8530-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics