Abstract
Micro-video is popular as new social media, and scene retrieval is a useful application in micro-video. At present, few researches focus on scene retrieval in micro-video, and there is a big gap between scene feature and semantics. In order to extract better semantical feature, we propose a combinational fusion method which combines multi-layer neural network and supervised hash learning method. As nonlinear projection, multi-layer neural network fuses multiple modalities by nonlinear transformation, and supervised hash learning method transforms fusion feature by linear projection to binary code for semantics and similarity preservation. We evaluate the proposed method on an actual micro-video dataset crawled from Vine. The experimental results show its superior performance than single multi-modal fusion methods and single hash learning methods.
Similar content being viewed by others
References
Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: Proceedings of the 2013 international conference on machine learning, pp III–1247
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Chen J, Song X, Nie L, Wang X, Zhang H, Chua TS (2016) Micro tells macro: predicting the popularity of micro-videos via a transductive model. In: Proceedings of the 24th ACM international conference on multimedia. ACM, pp 898–907
Cheng Z, Shen J (2016) On effective location-aware music recommendation. ACM Trans Inf Syst 34(2):1–32
Cui H, Zhu L, Cui C et al (2018) Efficient weakly-supervised discrete hashing for large-scale social image retrieval. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2018.08.033
Jiang Q, Li W (2015) Scalable graph hashing with feature transformation. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, pp 2248–2254
Jing P, Su Y, Nie L et al (2017) Low-rank multi-view embedding learning for micro-video popularity prediction[J]. IEEE Trans Knowl Data Eng pp(99):1–1
Kan M, Shan S, Zhang H, Lao S, Chen X (2016) Multi-view discriminant analysis. IEEE Trans Pattern Anal Mach Intell 38(1):188–194
Kang W, Li W, Zhou Z (2016) Column sampling based discrete supervised hashing. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (AAAI)
Zhu L , Huang Z , Li Z , Xie L, & Shen, H. T. (2018). Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Transactions on Neural Networks and Learning Systems, 1-13.
Liu W, Wang J, Kumar S, Chang S (2011) Hashing with graphs. In: Proceedings of international conference on machine learning
Liu W, Wang J, Ji R, Jiang Y, Chang S (2012) Supervised hashing with kernels. In: Proceeding of 25th IEEE conference on computer vison and pattern recognition, pp 2074–2081
Liu M, Nie L, Wang M et al (2017) Towards micro-video understanding by joint sequential-sparse modeling[C]. ACM on multimedia conference. ACM, pp 970–978
Liu X, Xu Q, Xu Y et al (2018) A stochastic attribute grammar for robust cross-view human tracking. IEEE Transaction on Circuits and Systems for Video Technology, pp(28):2884–2895
Liu X, Xu Q, Chau T et al (2018) Revisiting jump-diffusion process for visual tracking: a reinforcement learning approach. IEEE Transaction on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2018.2862891
Liu X, Zhu L, Cheng Z et al (2019) Efficient discrete latent semantic hashing for scalable cross-modal retrieval. Signal Process PP(154):217–231
Nguyen PX, Rogez G, Fowlkes C, Ramamnan D (2016) The open world of micro-videos. arXiv preprint arXiv:1603.09439
Nie L, Wang X, Zhang J, He X, Zhang H, Hong R, Tian Q (2017) Enhancing micro-video understanding by harnessing external sounds. In: Proceedings of the 25th ACM international conference on multimedia. ACM, pp 1192–1200
Nie X , Yin Y , Sun J , Liu J , & Cui C (2017). Comprehensive feature-based robust video fingerprinting using tensor model. IEEE Transactions on Multimedia, 19(4), 785-796
Norouzi M, Fleet DJ (2011) Minimal loss hashing for compact binary codes. In: Proceedings of international conference on machine learning
Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th acm international conference on multimedia. ACM, pp 251–260
Redi M, Ohare N, Schifanella R, Trevisiol M, Jaimes A (2014) 6 seconds of sound and vision: creativity in micro-videos. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition. IEEE, pp 4272–4279
Rosipal R, Krämer N (2005) Overview and recent advances in partial least squares. In: Proceedings of the 2005 international conference on subspace, latent structure and feature selection, pp 34–51
Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: Proceedings of the 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2160–2167
Shen F, Shen C, Liu W, Shen H (2015) Supervised discrete hashing. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp 37–45
Song J, Yang Y , Huang Z , Shen H, & Luo J. (2013). Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Transactions on Multimedia, 15(8), 1997-2008
Tenenbaum JB, Freeman WT (2014) Separating style and content with bilinear models. Neural Comput 12(6):1247–1283
Wang J, Kumar S, Chang S (2012) Semi-supervised hashing for large scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406
Wang L, Zhu L, Yu E et al (2018) Task-dependent and query-dependent subspace learning for cross-modal retrieval. IEEE Access PP(6):27091–27102
Xie L, Shen J, Han J et al (2017) Dynamic multi-view hashing for online image retrieval. In: Proceeding of 26th international joint conference on artificial intelligence, pp 3133–3139
Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the 28th AAAI conference on artificial intelligence. AAAI, pp 2177–2183
Zhang P, Zhang W, Li W, Guo M (2014) Supervised hashing with latent factor models. In: Proceeding of 37th international ACM SIGIR conference on research and development in information retrieval (SIGIR)
Zhang J, Nie L, Wang X, He X, Huang X, Chua TS (2016) Shorter-is-better: venue category estimation from micro-video. In: Proceedings of the 24th ACM international conference on multimedia. ACM, pp 1415–1424
Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486
Zhu L, Huang Z, Chang X et al (2017) Exploring consistent preferences: discrete hashing with pair-exemplar for scalable landmark search[C]. ACM, pp 726–734
Acknowledgements
This work is supported by the National Natural Science Foundation of China (61671274, 61573219, 61876098), China Postdoctoral Science Foundation (2016M592190), Shandong Provincial Key Research and Development Plan (2017CXGC1504), Shandong Provincial High College Science and Technology Plan (J17KB161) and the Fostering Project of Dominant Discipline and Talent Team of Shandong Province Higher Education Institutions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Guo, J., Nie, X., Jian, M. et al. Binary feature representation learning for scene retrieval in micro-video. Multimed Tools Appl 78, 24539–24552 (2019). https://doi.org/10.1007/s11042-018-6999-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6999-9