Abstract
Extracting robust video feature has always been a challenge in the field of video classification. Although existing researches on video feature extraction have been active and extensive, the classification results based on traditional video feature are always neither flexible nor satisfactory enough. Recently, deep learning has shown an excellent performance in video feature extraction. In this paper, we improve a deep learning architecture called ELU-3DCNN to extract deep video feature for video classification. Firstly, ELU-3DCNN is trained with exponential linear units (ELUs). Then a video is split into 16-frame clips with 8-frame overlaps between consecutive clips. These clips are passed to ELU-3DCNN to extract fc7 activations, which are further averaged and normalized to form a 4096-dim video feature. Experimental results on UCF-101 dataset show that ELU-3DCNN can improve the performance of video classification compared with the state-of-the-art video feature extraction methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Deldjoo, Y., Elahi, M., Cremonesi, P., et al.: Content-based video recommendation system based on stylistic visual features. J. Data Semant. 5(2), 99–113 (2016)
Hong, R., Hu, Z., Wang, R., Wang, M., Tao, D.: Multi-view object retrieval via multi-scale topic models. IEEE Trans. Image Process. 25(12), 5814–5827 (2016)
Fernando, B., Gavves, E., Oramas, J., et al.: Rank pooling for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 773–787 (2017)
Coar, S., Donatiello, G., Bogorny, V., et al.: Toward abnormal trajectory and event detection in video surveillance. IEEE Trans. Circ. Syst. Video Technol. 27(3), 683–695 (2017)
Hong, R., Zhang, L., Zhang, C., Zimmermann, R.: Flickr circles: aesthetic tendency discovery by multi-view regularized topic modeling. IEEE Trans. Multimed. 18(8), 1555–1567 (2016)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Karpathy, A., Toderici, G., Shetty, S., et al.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Ji, S., Xu, W., Yang, M., et al.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Srivastava, N., Mansimov, E., Salakhutdinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of the International Conference on Machine Learning, pp. 843–852 (2015)
Donahue, J., Anne Hendricks, L., Guadarrama, S., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Tran, D., Bourdev, L., Fergus, R., et al.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the International Conference on Machine Learning, pp. 807–814 (2010)
Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units. In: International Conferences on Learning Representations, pp. 3327–3341 (2016)
Srivastava, N., Hinton, G.E., Krizhevsky, A., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Hong, R., Yang, Y., Wang, M., Hua, X.-S.: Learning visual semantic relationships for efficient visual retrieval. IEEE Trans. Big Data 1(4), 152–161 (2015)
Acknowledgements
The work in this paper is supported by the National Natural Science Foundation of China (No. 61531006, No. 61602018), the Science and Technology Development Program of Beijing Education Committee (No. KM201510005004), the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (No. CIT&TCD20150311). Funding Project for Academic Human Resources Development in Institutions of Higher Learning Under the Jurisdiction of Beijing Municipality.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, J., Zhang, J., Zhang, H., Liang, X., Zhuo, L. (2018). Extracting Deep Video Feature for Mobile Video Classification with ELU-3DCNN. In: Huet, B., Nie, L., Hong, R. (eds) Internet Multimedia Computing and Service. ICIMCS 2017. Communications in Computer and Information Science, vol 819. Springer, Singapore. https://doi.org/10.1007/978-981-10-8530-7_15
Download citation
DOI: https://doi.org/10.1007/978-981-10-8530-7_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8529-1
Online ISBN: 978-981-10-8530-7
eBook Packages: Computer ScienceComputer Science (R0)