Skip to main content
Log in

Binary feature representation learning for scene retrieval in micro-video

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Micro-video is popular as new social media, and scene retrieval is a useful application in micro-video. At present, few researches focus on scene retrieval in micro-video, and there is a big gap between scene feature and semantics. In order to extract better semantical feature, we propose a combinational fusion method which combines multi-layer neural network and supervised hash learning method. As nonlinear projection, multi-layer neural network fuses multiple modalities by nonlinear transformation, and supervised hash learning method transforms fusion feature by linear projection to binary code for semantics and similarity preservation. We evaluate the proposed method on an actual micro-video dataset crawled from Vine. The experimental results show its superior performance than single multi-modal fusion methods and single hash learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Andrew G, Arora R, Bilmes J, Livescu K (2013) Deep canonical correlation analysis. In: Proceedings of the 2013 international conference on machine learning, pp III–1247

    Google Scholar 

  2. Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  3. Chen J, Song X, Nie L, Wang X, Zhang H, Chua TS (2016) Micro tells macro: predicting the popularity of micro-videos via a transductive model. In: Proceedings of the 24th ACM international conference on multimedia. ACM, pp 898–907

  4. Cheng Z, Shen J (2016) On effective location-aware music recommendation. ACM Trans Inf Syst 34(2):1–32

    Article  MathSciNet  Google Scholar 

  5. Cui H, Zhu L, Cui C et al (2018) Efficient weakly-supervised discrete hashing for large-scale social image retrieval. Pattern Recogn Lett. https://doi.org/10.1016/j.patrec.2018.08.033

  6. Jiang Q, Li W (2015) Scalable graph hashing with feature transformation. In: Proceedings of the twenty-fourth international joint conference on artificial intelligence, pp 2248–2254

    Google Scholar 

  7. Jing P, Su Y, Nie L et al (2017) Low-rank multi-view embedding learning for micro-video popularity prediction[J]. IEEE Trans Knowl Data Eng pp(99):1–1

    Google Scholar 

  8. Kan M, Shan S, Zhang H, Lao S, Chen X (2016) Multi-view discriminant analysis. IEEE Trans Pattern Anal Mach Intell 38(1):188–194

    Article  Google Scholar 

  9. Kang W, Li W, Zhou Z (2016) Column sampling based discrete supervised hashing. In: Proceedings of the thirtieth AAAI conference on artificial intelligence (AAAI)

    Google Scholar 

  10. Zhu L , Huang Z , Li Z , Xie L, & Shen, H. T. (2018). Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Transactions on Neural Networks and Learning Systems, 1-13.

  11. Liu W, Wang J, Kumar S, Chang S (2011) Hashing with graphs. In: Proceedings of international conference on machine learning

    Google Scholar 

  12. Liu W, Wang J, Ji R, Jiang Y, Chang S (2012) Supervised hashing with kernels. In: Proceeding of 25th IEEE conference on computer vison and pattern recognition, pp 2074–2081

    Google Scholar 

  13. Liu M, Nie L, Wang M et al (2017) Towards micro-video understanding by joint sequential-sparse modeling[C]. ACM on multimedia conference. ACM, pp 970–978

  14. Liu X, Xu Q, Xu Y et al (2018) A stochastic attribute grammar for robust cross-view human tracking. IEEE Transaction on Circuits and Systems for Video Technology, pp(28):2884–2895

  15. Liu X, Xu Q, Chau T et al (2018) Revisiting jump-diffusion process for visual tracking: a reinforcement learning approach. IEEE Transaction on Circuits and Systems for Video Technology. https://doi.org/10.1109/TCSVT.2018.2862891

  16. Liu X, Zhu L, Cheng Z et al (2019) Efficient discrete latent semantic hashing for scalable cross-modal retrieval. Signal Process PP(154):217–231

    Article  Google Scholar 

  17. Nguyen PX, Rogez G, Fowlkes C, Ramamnan D (2016) The open world of micro-videos. arXiv preprint arXiv:1603.09439

    Google Scholar 

  18. Nie L, Wang X, Zhang J, He X, Zhang H, Hong R, Tian Q (2017) Enhancing micro-video understanding by harnessing external sounds. In: Proceedings of the 25th ACM international conference on multimedia. ACM, pp 1192–1200

  19. Nie X , Yin Y , Sun J , Liu J , & Cui C (2017). Comprehensive feature-based robust video fingerprinting using tensor model. IEEE Transactions on Multimedia, 19(4), 785-796

  20. Norouzi M, Fleet DJ (2011) Minimal loss hashing for compact binary codes. In: Proceedings of international conference on machine learning

    Google Scholar 

  21. Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. In: Proceedings of the 18th acm international conference on multimedia. ACM, pp 251–260

  22. Redi M, Ohare N, Schifanella R, Trevisiol M, Jaimes A (2014) 6 seconds of sound and vision: creativity in micro-videos. In: Proceedings of the 2014 IEEE conference on computer vision and pattern recognition. IEEE, pp 4272–4279

  23. Rosipal R, Krämer N (2005) Overview and recent advances in partial least squares. In: Proceedings of the 2005 international conference on subspace, latent structure and feature selection, pp 34–51

    Google Scholar 

  24. Sharma A, Kumar A, Daume H, Jacobs DW (2012) Generalized multiview analysis: a discriminative latent space. In: Proceedings of the 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 2160–2167

  25. Shen F, Shen C, Liu W, Shen H (2015) Supervised discrete hashing. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp 37–45

    Google Scholar 

  26. Song J, Yang Y , Huang Z , Shen H, & Luo J. (2013). Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Transactions on Multimedia, 15(8), 1997-2008

  27. Tenenbaum JB, Freeman WT (2014) Separating style and content with bilinear models. Neural Comput 12(6):1247–1283

    Article  Google Scholar 

  28. Wang J, Kumar S, Chang S (2012) Semi-supervised hashing for large scale search. IEEE Trans Pattern Anal Mach Intell 34(12):2393–2406

    Article  Google Scholar 

  29. Wang L, Zhu L, Yu E et al (2018) Task-dependent and query-dependent subspace learning for cross-modal retrieval. IEEE Access PP(6):27091–27102

    Article  Google Scholar 

  30. Xie L, Shen J, Han J et al (2017) Dynamic multi-view hashing for online image retrieval. In: Proceeding of 26th international joint conference on artificial intelligence, pp 3133–3139

    Google Scholar 

  31. Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the 28th AAAI conference on artificial intelligence. AAAI, pp 2177–2183

  32. Zhang P, Zhang W, Li W, Guo M (2014) Supervised hashing with latent factor models. In: Proceeding of 37th international ACM SIGIR conference on research and development in information retrieval (SIGIR)

    Google Scholar 

  33. Zhang J, Nie L, Wang X, He X, Huang X, Chua TS (2016) Shorter-is-better: venue category estimation from micro-video. In: Proceedings of the 24th ACM international conference on multimedia. ACM, pp 1415–1424

  34. Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486

    Article  Google Scholar 

  35. Zhu L, Huang Z, Chang X et al (2017) Exploring consistent preferences: discrete hashing with pair-exemplar for scalable landmark search[C]. ACM, pp 726–734

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (61671274, 61573219, 61876098), China Postdoctoral Science Foundation (2016M592190), Shandong Provincial Key Research and Development Plan (2017CXGC1504), Shandong Provincial High College Science and Technology Plan (J17KB161) and the Fostering Project of Dominant Discipline and Talent Team of Shandong Province Higher Education Institutions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yilong Yin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, J., Nie, X., Jian, M. et al. Binary feature representation learning for scene retrieval in micro-video. Multimed Tools Appl 78, 24539–24552 (2019). https://doi.org/10.1007/s11042-018-6999-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6999-9

Keywords

Navigation