Compact CNN Based Video Representation for Efficient Video Copy Detection

Wang, Ling; Bao, Yu; Li, Haojie; Fan, Xin; Luo, Zhongxuan

doi:10.1007/978-3-319-51811-4_47

Ling Wang¹⁸,
Yu Bao¹⁸,
Haojie Li¹⁸,
Xin Fan¹⁸ &
…
Zhongxuan Luo¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10132))

Included in the following conference series:

International Conference on Multimedia Modeling

3771 Accesses
24 Citations

Abstract

Many content-based video copy detection (CCD) systems have been proposed to identify the copies of a copyrighted video. Due to storage cost and retrieval response requirements, most CCD systems represent video contents using sparsely sampled features, which tends to lose information to some extend and thus results in unsatisfactory performance. In this paper, we propose a compact video representation based on convolutional neural network (CNN) and sparse coding (SC) for video copy detection. We first extract CNN features from the densely sampled video frames and then encode them into a fixed length vector via the SC method. The proposed representation presents two advantages. First, it is compact while is regardless of the sampling frame rate. Second, it is discriminative for video copy detection by encoding the densely sampled frames’ CNN features. We evaluate the performance of proposed representation on video copy detection over a real complex video dataset and marginal performance improvement has been achieved as compared to state-of-the-art CCD systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chou, C.L., Chen, H.T., Lee, S.Y.: Pattern-based near-duplicate video retrieval and localization on web-scale videos. IEEE Trans. Multimedia 17(3), 382–395 (2015)
Article Google Scholar
Coates, A., Ng, A.Y.: The importance of encoding versus training with sparse coding and vector quantization. In: Proceedings of the 28th International Conference on Machine Learning (ICML-2011), pp. 921–928 (2011)
Google Scholar
Douze, M., Jégou, H., Schmid, C.: An image-based approach to video copy detection with spatio-temporal post-filtering. IEEE Trans. Multimedia 12(4), 257–266 (2010)
Article Google Scholar
Douze, M., Jégou, H., Schmid, C., Pérez, P.: Compact video description for copy detection with precise temporal alignment. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 522–535. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15549-9_38
Chapter Google Scholar
Heikkilä, M., Pietikäinen, M., Schmid, C.: Description of interest regions with local binary patterns. Pattern Recogn. 42(3), 425–436 (2009)
Article MATH Google Scholar
Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 304–317. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88682-2_24
Chapter Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Dar-rell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multi-media, pp. 675–678. ACM (2014)
Google Scholar
Jiang, Y.-G., Jiang, Y., Wang, J.: VCDB: a large-scale database for partial copy detection in videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 357–371. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10593-2_24
Google Scholar
Jiang, Y.G., Wang, J.: Partial copy detection in videos: a benchmark and an evaluation of popular methods. IEEE Trans. Big Data 2(1), 32–42 (2016)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region pro-posal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)
Article MathSciNet MATH Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Song, J., Yang, Y., Huang, Z., Shen, H.T., Hong, R.: Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM International Conference on Multi-media, pp. 423–432. ACM (2011)
Google Scholar
U.S. National Institute of Standards and Technology: Trec video retrieval evaluation. http://www-nlpir.nist.gov/projects/tv2011/#ccd
Tan, H.K., Ngo, C.W., Hong, R., Chua, T.S.: Scalable detection of partial near-duplicate videos by visual-temporal consistency. In: Proceedings of the 17th ACM International Conference on Multi-media, pp. 145–154. ACM (2009)
Google Scholar
Tang, J., Hong, R., Yan, S., Chua, T.S., Qi, G.J., Jain, R.: Image annotation by k nn-sparse graph-based label propagation over noisily tagged web images. ACM Trans. Intell. Syst. Technol. (TIST) 2(2), 14 (2011)
Google Scholar
Tang, S., Zheng, Y.T., Wang, Y., Chua, T.S.: Sparse ensemble learning for concept detection. IEEE Trans. Multimedia 14(1), 43–54 (2012)
Article Google Scholar
Thomas, R.M., Sumesh, M.: A simple and robust colour based video copy detection on summarized videos. Procedia Comput. Sci. 46, 1668–1675 (2015)
Article Google Scholar
Tropp, J.A., Gilbert, A.C.: Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theor. 53(12), 4655–4666 (2007)
Article MathSciNet MATH Google Scholar
Wu, X., Hauptmann, A.G., Ngo, C.W.: Practical elimination of near-duplicates from web video search. In: Proceedings of the 15th ACM International Conference on Multi-media, pp. 218–227. ACM (2007)
Google Scholar
Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1798–1807 (2015)
Google Scholar
Yang, Y., Zhang, H., Zhang, M., Shen, F., Li, X.: Visual coding in a semantic hierarchy. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 59–68. ACM (2015)
Google Scholar
Zhao, W.L., Wu, X., Ngo, C.W.: On the annotation of web videos by efficient near-duplicate search. IEEE Trans. Multimedia 12(5), 448–461 (2010)
Article Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Funds of China (61472059, 61428202).

Author information

Authors and Affiliations

Dalian University of Technology, Dalian, China
Ling Wang, Yu Bao, Haojie Li, Xin Fan & Zhongxuan Luo

Authors

Ling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Bao
View author publications
You can also search for this author in PubMed Google Scholar
Haojie Li
View author publications
You can also search for this author in PubMed Google Scholar
Xin Fan
View author publications
You can also search for this author in PubMed Google Scholar
Zhongxuan Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haojie Li .

Editor information

Editors and Affiliations

CNRS–IRISA, Rennes, France
Laurent Amsaleg
Reykjavík University, Reykjavik, Iceland
Gylfi Þór Guðmundsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
Reykjavik University, Reykjavik, Ireland
Björn Þór Jónsson
National Institute of Informatics, Tokyo, Japan
Shin’ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Bao, Y., Li, H., Fan, X., Luo, Z. (2017). Compact CNN Based Video Representation for Efficient Video Copy Detection. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10132. Springer, Cham. https://doi.org/10.1007/978-3-319-51811-4_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-51811-4_47
Published: 31 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51810-7
Online ISBN: 978-3-319-51811-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics