Binary feature representation learning for scene retrieval in micro-video

Guo, Jie; Nie, Xiushan; Jian, Muwei; Yin, Yilong

doi:10.1007/s11042-018-6999-9

Binary feature representation learning for scene retrieval in micro-video

Published: 16 April 2019

Volume 78, pages 24539–24552, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jie Guo¹,
Xiushan Nie²,
Muwei Jian² &
…
Yilong Yin³

276 Accesses
4 Citations
Explore all metrics

Abstract

Micro-video is popular as new social media, and scene retrieval is a useful application in micro-video. At present, few researches focus on scene retrieval in micro-video, and there is a big gap between scene feature and semantics. In order to extract better semantical feature, we propose a combinational fusion method which combines multi-layer neural network and supervised hash learning method. As nonlinear projection, multi-layer neural network fuses multiple modalities by nonlinear transformation, and supervised hash learning method transforms fusion feature by linear projection to binary code for semantics and similarity preservation. We evaluate the proposed method on an actual micro-video dataset crawled from Vine. The experimental results show its superior performance than single multi-modal fusion methods and single hash learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Video Hashing by Exploiting Spatio-Temporal Feature

An efficient and robust supervised video hashing scheme based on a timedistributed CNN-BLSTM model and principal component analysis

Article 28 December 2023

Enver Akbacak

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval

References

Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: Proceedings of the 2013 international conference on machine learning, pp III–1247
Google Scholar
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Article Google Scholar
Chen J, Song X, Nie L, Wang X, Zhang H, Chua TS (2016) Micro tells macro: predicting the popularity of micro-videos via a transductive model. In: Proceedings of the 24th ACM international conference on multimedia. ACM, pp 898–907
Cheng Z, Shen J (2016) On effective location-aware music recommendation. ACM Trans Inf Syst 34(2):1–32
Article MathSciNet Google Scholar
Cui H, Zhu L, Cui C et al (2018) Efficient weakly-supervised discrete hashing for large-scale social image retrieval. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2018.08.033
Jiang Q, Li W (2015) Scalable graph hashing with feature transformation. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, pp 2248–2254
Google Scholar
Jing P, Su Y, Nie L et al (2017) Low-rank multi-view embedding learning for micro-video popularity prediction[J]. IEEE Trans Knowl Data Eng pp(99):1–1
Google Scholar
Kan M, Shan S, Zhang H, Lao S, Chen X (2016) Multi-view discriminant analysis. IEEE Trans Pattern Anal Mach Intell 38(1):188–194
Article Google Scholar
Kang W, Li W, Zhou Z (2016) Column sampling based discrete supervised hashing. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (AAAI)
Google Scholar
Zhu L , Huang Z , Li Z , Xie L, & Shen, H. T. (2018). Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Transactions on Neural Networks and Learning Systems, 1-13.
Liu W, Wang J, Kumar S, Chang S (2011) Hashing with graphs. In: Proceedings of international conference on machine learning
Google Scholar
Liu W, Wang J, Ji R, Jiang Y, Chang S (2012) Supervised hashing with kernels. In: Proceeding of 25^th IEEE conference on computer vison and pattern recognition, pp 2074–2081
Google Scholar
Liu M, Nie L, Wang M et al (2017) Towards micro-video understanding by joint sequential-sparse modeling[C]. ACM on multimedia conference. ACM, pp 970–978
Liu X, Xu Q, Xu Y et al (2018) A stochastic attribute grammar for robust cross-view human tracking. IEEE Transaction on Circuits and Systems for Video Technology, pp(28):2884–2895
Liu X, Xu Q, Chau T et al (2018) Revisiting jump-diffusion process for visual tracking: a reinforcement learning approach. IEEE Transaction on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2018.2862891
Liu X, Zhu L, Cheng Z et al (2019) Efficient discrete latent semantic hashing for scalable cross-modal retrieval. Signal Process PP(154):217–231
Article Google Scholar
Nguyen PX, Rogez G, Fowlkes C, Ramamnan D (2016) The open world of micro-videos. arXiv preprint arXiv:1603.09439
Google Scholar
Nie L, Wang X, Zhang J, He X, Zhang H, Hong R, Tian Q (2017) Enhancing micro-video understanding by harnessing external sounds. In: Proceedings of the 25th ACM international conference on multimedia. ACM, pp 1192–1200
Nie X , Yin Y , Sun J , Liu J , & Cui C (2017). Comprehensive feature-based robust video fingerprinting using tensor model. IEEE Transactions on Multimedia, 19(4), 785-796
Norouzi M, Fleet DJ (2011) Minimal loss hashing for compact binary codes. In: Proceedings of international conference on machine learning
Google Scholar
Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th acm international conference on multimedia. ACM, pp 251–260
Redi M, Ohare N, Schifanella R, Trevisiol M, Jaimes A (2014) 6 seconds of sound and vision: creativity in micro-videos. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition. IEEE, pp 4272–4279
Rosipal R, Krämer N (2005) Overview and recent advances in partial least squares. In: Proceedings of the 2005 international conference on subspace, latent structure and feature selection, pp 34–51
Google Scholar
Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: Proceedings of the 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2160–2167
Shen F, Shen C, Liu W, Shen H (2015) Supervised discrete hashing. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp 37–45
Google Scholar
Song J, Yang Y , Huang Z , Shen H, & Luo J. (2013). Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Transactions on Multimedia, 15(8), 1997-2008
Tenenbaum JB, Freeman WT (2014) Separating style and content with bilinear models. Neural Comput 12(6):1247–1283
Article Google Scholar
Wang J, Kumar S, Chang S (2012) Semi-supervised hashing for large scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406
Article Google Scholar
Wang L, Zhu L, Yu E et al (2018) Task-dependent and query-dependent subspace learning for cross-modal retrieval. IEEE Access PP(6):27091–27102
Article Google Scholar
Xie L, Shen J, Han J et al (2017) Dynamic multi-view hashing for online image retrieval. In: Proceeding of 26th international joint conference on artificial intelligence, pp 3133–3139
Google Scholar
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the 28th AAAI conference on artificial intelligence. AAAI, pp 2177–2183
Zhang P, Zhang W, Li W, Guo M (2014) Supervised hashing with latent factor models. In: Proceeding of 37th international ACM SIGIR conference on research and development in information retrieval (SIGIR)
Google Scholar
Zhang J, Nie L, Wang X, He X, Huang X, Chua TS (2016) Shorter-is-better: venue category estimation from micro-video. In: Proceedings of the 24th ACM international conference on multimedia. ACM, pp 1415–1424
Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486
Article Google Scholar
Zhu L, Huang Z, Chang X et al (2017) Exploring consistent preferences: discrete hashing with pair-exemplar for scalable landmark search[C]. ACM, pp 726–734

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61671274, 61573219, 61876098), China Postdoctoral Science Foundation (2016M592190), Shandong Provincial Key Research and Development Plan (2017CXGC1504), Shandong Provincial High College Science and Technology Plan (J17KB161) and the Fostering Project of Dominant Discipline and Talent Team of Shandong Province Higher Education Institutions.

Author information

Authors and Affiliations

School of Computer Science and Technology, Shandong University, Jinan, 250101, Shandong, China
Jie Guo
Shandong University of Finance and Economics, Jinan, 250014, Shandong, China
Xiushan Nie & Muwei Jian
School of Software, Shandong University, Jinan, 250101, Shandong, China
Yilong Yin

Authors

Jie Guo
View author publications
You can also search for this author in PubMed Google Scholar
Xiushan Nie
View author publications
You can also search for this author in PubMed Google Scholar
Muwei Jian
View author publications
You can also search for this author in PubMed Google Scholar
Yilong Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yilong Yin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, J., Nie, X., Jian, M. et al. Binary feature representation learning for scene retrieval in micro-video. Multimed Tools Appl 78, 24539–24552 (2019). https://doi.org/10.1007/s11042-018-6999-9

Download citation

Received: 26 June 2018
Revised: 09 November 2018
Accepted: 28 November 2018
Published: 16 April 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s11042-018-6999-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Binary feature representation learning for scene retrieval in micro-video

Abstract

Access this article

Similar content being viewed by others

Unsupervised Video Hashing by Exploiting Spatio-Temporal Feature

An efficient and robust supervised video hashing scheme based on a timedistributed CNN-BLSTM model and principal component analysis

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Binary feature representation learning for scene retrieval in micro-video

Abstract

Access this article

Similar content being viewed by others

Unsupervised Video Hashing by Exploiting Spatio-Temporal Feature

An efficient and robust supervised video hashing scheme based on a timedistributed CNN-BLSTM model and principal component analysis

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation