A Temporal-Compress and Shorter SIFT Research on Web Videos

Zhu, Yingying; Jiang, Chuanhua; Huang, Xiaoyan; Xiao, Zhijiao; Zhong, Shenghua

doi:10.1007/978-3-319-25159-2_78

Yingying Zhu²²,
Chuanhua Jiang²²,
Xiaoyan Huang²³,
Zhijiao Xiao²² &
…
Shenghua Zhong²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9403))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2929 Accesses

Abstract

The large-scale video data on the web contain a lot of semantics, which are an important part of semantic web. Video descriptors can usually represent somewhat the semantics. Thus, they play a very important role in web multimedia content analysis, such as Scale-invariant feature transform (SIFT) feature. In this paper, we proposed a new video descriptor, called a temporal-compress and shorter SIFT(TC-S-SIFT) which can efficiently and effectively represent the semantics of web videos. By omitting the least discriminability orientation in three stages of standard SIFT on every representative frame, the dimensions of the shorter SIFT are reduced from 128-dimension to 96-dimension to save space storage. Then, the SIFT can be compressed by tracing SIFT features on video temporal domain, which highly compress the quantity of local features to reduce visual redundancy, and keep basically the robustness and discrimination. Experimental results show our method can yield comparable accuracy and compact storage size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE International Conference on Computer Vision, pp. 1150–1157 (1999)
Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant key points. International Journal of Computer Vision 60(2), 91–110 (2004)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, pp. 886–893 (2005)
Google Scholar
Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: SURF: Speeded up robust features. CVIU 110(3), 346–359 (2008)
Google Scholar
Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of Computer Vision and Pattern Recognition, pp. 560–513 (2004)
Google Scholar
Yi, J., Peng, Y., Xiao, J.: Exploiting semantic and visual context for effective video annotation. IEEE Trans. Multimed., 1400–1414 (2013)
Google Scholar
Megrhi, S., Souidene, W., Beghdadi, A.: Spatio-temporal salient feature extraction for perceptual content based video retrieval. In: CVCS, pp. 1–7 (2013)
Google Scholar
Coskun, B., Sankur, B., Memon, N.: Spatio-temporal transform based video hashing. IEEE Trans. on Multimedia, pp. 1190–1208 (2006)
Google Scholar
Malekesmaeili, M., Fatourechi, M., Ward, R.K.: Video copy detection using temporally informative representative images. In: International Conference on Machine Learning and Applications, pp. 69–74 (2009)
Google Scholar
Li, F.F., Fergus, R., Torralba, A.: Recognizing and learning object categories. In: Proceedings of the 12th IEEE International Conference on Computer Vision, Short course. The 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 506–513 (2009)
Google Scholar
Brown, M., Lowe, D.G.: Automatic panoramic image stitching using invariant features. International Journal of Computer Vision, 59–73 (2007)
Google Scholar
Qian, Y., Hui, R., Gao, X.H.: 3D CBIR with sparse coding for image-guided neurosurgery. Signal Processing 93, 1673–1683 (2013)
Article Google Scholar
Burghouts, G.J., Geusebroek. J.M.: Performance evaluation of local colour invariants. Computer Vision and Image Understanding, 48–62 (2009)
Google Scholar
Saeedi, P.P., Lawrence, D., Lowe, D.G.: Vision-based 3-D trajectory tracking for unknown environments. IEEE Transaction on Robotics 22(1), 119–136 (2006)
Article Google Scholar
Zhong, S.H., Liu, Y., Wu, G.S.: S-SIFT: a shorter SIFT without least discriminability visual orientation. In: Proceedings of the 2012 IEEE/WIC/ACM International Conference on Web Intelligence, vol. 1, pp. 669–672 (2012)
Google Scholar
Zhu, G.K., Wang, Q., Yuan, Y., Yan, P.K.: SIFT on manifold: An intrinsic description. Neurocomputing 113, 227–233 (2013)
Article Google Scholar
Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: MacLean, W. (ed.) SCVMA 2004. LNCS, vol. 3667, pp. 91–103. Springer, Heidelberg (2006)
Chapter Google Scholar
Girshick, A.R., Landy, M.S., Simoncelli, E.P.: Cardinal rules: visual orientation perception reflects knowledge of environmental statistics. Nat. Neurosci. 14, 926–932 (2011)
Article Google Scholar
Reddy, K., Shah, M.: Recognizing 50 human action categories of web videos. In: Proc. Mach. Vision Applicat., pp. 1–11 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science & Software Engineering, Shenzhen University, Shenzhen, 518060, China
Yingying Zhu, Chuanhua Jiang, Zhijiao Xiao & Shenghua Zhong
Oracle Research and Development Center Shenzhen Co., Ltd, Shenzhen, 518057, China
Xiaoyan Huang

Authors

Yingying Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Chuanhua Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Zhijiao Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Shenghua Zhong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shenghua Zhong .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Songmao Zhang
Ludwig-Maximilians-Universität München, Munich, Germany
Martin Wirsing
Southwest University, Chongqing, China
Zili Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Y., Jiang, C., Huang, X., Xiao, Z., Zhong, S. (2015). A Temporal-Compress and Shorter SIFT Research on Web Videos. In: Zhang, S., Wirsing, M., Zhang, Z. (eds) Knowledge Science, Engineering and Management. KSEM 2015. Lecture Notes in Computer Science(), vol 9403. Springer, Cham. https://doi.org/10.1007/978-3-319-25159-2_78

Download citation

DOI: https://doi.org/10.1007/978-3-319-25159-2_78
Published: 03 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25158-5
Online ISBN: 978-3-319-25159-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics