Skip to main content
Log in

Attention-based deep supervised hashing for near duplicate video retrieval

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

With the explosive growth of video data on the Internet, near duplicate video retrieval (NDVR) has become an important and challenging issue in the field of information retrieval. Hashing is typically employed to tackle this issue owing to its low memory and fast retrieval speed. Most of the existing video hashing methods directly adopt image hashing methods or perform the frame-pooling strategy, failing to fully explore the spatio-temporal information of videos. In this paper, we propose an attention-based deep supervised video hashing (ADVH) network for NDVR. To capture richer perceptions and acquire more comprehensive video representations, we use a residual network as the backbone and incorporate an attention module to extract spatio-temporal features of videos and motion information between adjacent frames. Moreover, we design a novel pairwise constraint utilizing supervised information to learn compact and discriminative video hash codes. The experimental results on three benchmark video datasets demonstrate that our proposed model outperforms other state-of-the-art hashing methods in retrieval precision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

This study is supported by several datasets, which are publicly available at the hyperlinks in the dataset section or at the locations cited in the reference section.

Notes

  1. https://www.crcv.ucf.edu/research/data-sets/ucf101/.

  2. http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/.

  3. http://bigvid.fudan.edu.cn/FCVID/.

References

  1. Li M, Monga V (2015) Twofold video hashing with automatic synchronization. IEEE Trans Inf Forens Secur 10(8):1727–1738

  2. Nie X, Jing W, Cui C, Zhang J, Yin Y (2019) Joint multi-view hashing for large-scale near-duplicate video retrieval. IEEE Trans Knowl Data Eng 32(10):1951–1965

    Article  Google Scholar 

  3. Zheng L, Lei Y, Qiu G, Huang J (2012) Near-duplicate image detection in a visually salient riemannian space. IEEE Trans Inf Forens Secur 7(5):1578–1593

  4. Khelifi F, Bouridane A (2019) Perceptual video hashing for content identification and authentication. IEEE Trans Circuits Syst Video Technol 29(1):50–67

    Article  Google Scholar 

  5. Liu X, Nie X, Dai Q, Huang Y, Lian L, Yin Y (2021) Reinforced short-length hashing. IEEE Trans Circuits Syst Video Technol 31(9):3655–3668

  6. Nie X, Zhou X, Shi Y, Sun J, Yin Y (2021) Classification-enhancement deep hashing for large-scale video retrieval. Appl Soft Comput 109:107467

    Article  Google Scholar 

  7. Masci J, Bronstein MM, Bronstein AM, Schmidhuber J (2014) Multimodal similarity-preserving hashing. IEEE Trans Pattern Anal Mach Intell 36(4):824–830

    Article  PubMed  Google Scholar 

  8. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864–3872

  9. Hu W, Xie N, Li L, Zeng X, Maybank S (2011) A survey on visual content-based video indexing and retrieval. IEEE Trans Syst, Man, Cybern, Part C (Appl Rev) 41(6):797–819

    Article  Google Scholar 

  10. Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimedia Comput Commun Appl 2(1):1–19

    Article  Google Scholar 

  11. Snoek C, Worring M (2009) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322

    Article  Google Scholar 

  12. Song J, Yang Y, Huang Z, Shen HT, Hong R (2011) Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: Proceedings of the 19th ACM international conference on Multimedia, pp 423–432

  13. Datar M, Immorlica N, Indyk P, Mirrokni V (2004) Locality sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on Computational geometry, pp 253–262

  14. Gong Y, Lazebnik S, Gordo A, Perronnin F (2013) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929

    Article  PubMed  Google Scholar 

  15. Wu G, Liu L, Guo Y, Ding G, Han J, Shen J, Shao L (2017) Unsupervised deep video hashing with balanced rotation. In: Proceedings of the 26th international joint conference on artificial intelligence, pp 3076–3082

  16. Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Advances in neural information processing systems 21

  17. Tang J, Li Z (2018) Weakly supervised multimodal hashing for scalable social image retrieval. IEEE Trans Circuits Syst Video Technol 28(10):2730–2741

  18. Yue C, Long M, Wang J, Han Z, Wen Q (2016) Deep quantization network for efficient image retrieval. In: Proceedings of the AAAI conference on artificial intelligence, pp 3457–3463

  19. Cao Z, Long M, Wang J, Yu PS (2017) Hashnet: deep learning to hash by continuation. In: Proceedings of the IEEE international conference on computer vision, pp 5608–5617

  20. Liong VE, Lu J, Gang W, Moulin P, Jie Z (2015) Deep hashing for compact binary codes learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2475–2483

  21. Shan P, Wang Y, Fu C, Song W, Chen J (2020) Automatic skin lesion segmentation based on FC-DPN. Comput Biol Med 123:103762

    Article  PubMed  Google Scholar 

  22. Zhao T, Fu C, Tian Y, Song W, Sham CW (2023) GSN-HVNET: a lightweight, multi-task deep learning framework for nuclei segmentation and classification. Bioengineering 10(3):393

    Article  PubMed  PubMed Central  Google Scholar 

  23. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  24. Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 2156–2162

  25. Liong VE, Lu J, Tan YP, Zhou J (2017) Deep video hashing. IEEE Trans Multimedia 19(6): 1209–1219

    Google Scholar 

  26. Chen H, Hu C, Lee F, Lin C, Chen Q (2021) A supervised video hashing method based on a deep 3D convolutional neural network for large-scale video retrieval. Sensors 21(9):3094

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  27. Song J, Zhang H, Li X, Gao L, Wang M, Hong R (2018) Self-supervised video hashing with hierarchical binary auto-encoder. IEEE Trans Image Process 27(7):3210–3221

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  28. Li WJ, Wang S, Kang WC (2015) Feature learning based deep supervised hashing with pairwise labels. arXiv preprint arXiv:1511.03855

  29. Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2064–2072

  30. Han Z, Long M, Wang J, Yue C (2016) Deep hashing network for efficient similarity retrieval. In: Proceedings of the AAAI conference on Artificial Intelligence, pp 2415–2421

  31. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25

  32. Li P, Wang M, Cheng J, Xu C, Lu H (2013) Spectral hashing with semantically consistent graph for image indexing. IEEE Trans Multimedia 15(1):141–152

    Article  Google Scholar 

  33. Shen F, Shen C, Liu W, Shen HT (2015) Supervised discrete hashing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 37–45

  34. Wei L, Wang J, Ji R, Jiang YG, Chang SF (2012) Supervised hashing with kernels. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2074–2081

  35. Yang E, Cheng D, Liu T, Wei L, Tao D (2018) Semantic structure-based unsupervised deep hashing. In: Proceedings of the 27th international joint conference on artificial intelligence, pp. 1064–1070

  36. Shen F, Yan X, Li L, Yang Y, Shen HT (2018) Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans Pattern Anal Mach Intell 40(12):3034–3044

    Google Scholar 

  37. Jiang QY, Cui X, Li WJ (2018) Deep discrete supervised hashing. IEEE Trans Image Process 27(12):5996–6009

  38. Ye G, Dong L, Wang J, Chang SF (2013) Large-scale video hashing via structure learning. In: Proceedings of the IEEE international conference on computer vision, pp 2272–2279

  39. Chen Z, Lu J, Feng J, Zhou J (2018) Nonlinear structural hashing for scalable video search. IEEE Trans Circuits Syst Video Technol 28(6):1421–1433

  40. Wu G, Li L, Guo Y, Ding G, Ling S (2017) Unsupervised deep video hashing with balanced rotation. In: Proceedings of the 26th international joint conference on artificial intelligence, pp 3076–3082

  41. Wu G, Han J, Guo Y, Li L, Ding G (2018) Unsupervised deep video hashing via balanced code for large-scale video retrieval. IEEE Trans Image Process 28(4):1993–2007

    Article  ADS  MathSciNet  Google Scholar 

  42. Wang L, Xiong Y, Zhe W, Yu Q, Gool LV (2019) Temporal segment networks for action recognition in videos. IEEE Trans Pattern Anal Mach Intell 41(11):2740–2755

  43. Li S, Li X, Lu J, Zhou J (2021) Self-supervised video hashing via bidirectional transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13549–13558

  44. Li Y, Ji B, Shi X, Zhang J, Kang B, Wang L (2020) Tea: Temporal excitation and aggregation for action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 909–918

  45. Jiang B, Wang M, Gan W, Wu W, Yan J (2019) STM: Spatiotemporal and motion encoding for action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2000–2009

  46. Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402

  47. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: IEEE international conference on computer vision

  48. Jiang YG, Wu Z, Wang J, Xue X, Chang SF (2018). Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):352–364

  49. Anuranji R, Srimathi H (2020) A supervised deep convolutional based bidirectional long short term memory video hashing for large scale video retrieval applications. Digit Signal Process 102:102729

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No. 62032013), and the Fundamental Research Funds for the Central Universities (No. N2324004-12).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chong Fu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shi, N., Fu, C., Tie, M. et al. Attention-based deep supervised hashing for near duplicate video retrieval. Neural Comput & Applic 36, 5217–5230 (2024). https://doi.org/10.1007/s00521-023-09342-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09342-x

Keywords

Navigation