research-article

Learn from Unlabeled Videos for Near-duplicate Video Retrieval

Authors:

Yuxin PengAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1002 - 1011

https://doi.org/10.1145/3477495.3532010

Published: 07 July 2022 Publication History

Abstract

Near-duplicate video retrieval (NDVR) aims to find the copies or transformations of the query video from a massive video database. It plays an important role in many video related applications, including copyright protection, tracing, filtering and etc. Video representation and similarity search are crucial to any video retrieval system. To derive effective video representation, most video retrieval systems require a large amount of manually annotated data for training, making it costly inefficient. In addition, most retrieval systems are based on frame-level features for video similarity searching, making it expensive both storage wise and search wise. To address the above issues, we propose a video representation learning (VRL) approach to effectively address the above shortcomings. It first effectively learns video representation from unlabeled videos via contrastive learning to avoid the expensive cost of manual annotation. Then, it exploits transformer structure to aggregate frame-level features into clip-level to reduce both storage space and search complexity. It can learn the complementary and discriminative information from the interactions among clip frames, as well as acquire the frame permutation and missing invariant ability to support more flexible retrieval manners. Comprehensive experiments on two challenging near-duplicate video retrieval datasets, namely FIVR-200K and SVD, verify the effectiveness of our proposed VRL approach, which achieves the best performance of video retrieval on accuracy and efficiency.

References

[1]

Qing-Yuan Jiang, Yi He, Gen Li, Jian Lin, Lei Li, and Wu-Jun Li. Svd: A large-scale short video dataset for near-duplicate video retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 5281--5289, 2019.

[2]

Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, and Ioannis Kompatsiaris. Fivr: Fine-grained incident video retrieval. IEEE Transactions on Multimedia (TMM), 21(10):2638--2652, 2019.

Digital Library

[3]

Zhen Han, Xiangteng He, Mingqian Tang, and Yiliang Lv. Video similarity and alignment learning on partial video copy detection. In Proceedings of the 29th ACM International Conference on Multimedia (ACM MM), pages 4165--4173, 2021.

Digital Library

[4]

Xiangming Mu. Content-based video retrieval: Does video's semantic visual feature matter? In Proceedings of the 29th annual international ACM SIGIR conference on Research and Development in Information Retrieval (ACM SIGIR), pages 679--680, 2006.

Digital Library

[5]

Feng He, Qi Wang, Zhifan Feng, Wenbin Jiang, Yajuan Lü, Yong Zhu, and Xiao Tan. Improving video retrieval by adaptive margin. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR), pages 1359--1368, 2021.

Digital Library

[6]

Peng Wu, Xiangteng He, Mingqian Tang, Yiliang Lv, and Jing Liu. Hanet: Hier- archical alignment networks for video-text retrieval. In Proceedings of the 29th ACM International Conference on Multimedia (ACM MM), pages 3518--3527, 2021.

Digital Library

[7]

Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, and Yiannis Kompatsiaris. Near-duplicate video retrieval with deep metric learning. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), pages 347--356, 2017.

[8]

Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, and Ioannis Kompatsiaris. Visil: Fine-grained spatio-temporal video similarity learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 6351--6360, 2019.

[9]

Jie Shao, Xin Wen, Bingchen Zhao, and Xiangyang Xue. Temporal context aggregation for video retrieval with contrastive learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3268--3278, 2021.

[10]

Giorgos Kordopatis-Zilos, Symeon Papadopoulos, Ioannis Patras, and Yiannis Kompatsiaris. Near-duplicate video retrieval by aggregating intermediate cnn layers. In International Conference on Multimedia Modeling (MMM), pages 251--263. Springer, 2017.

[11]

Chien-Li Chou, Hua-Tsung Chen, and Suh-Yin Lee. Pattern-based near-duplicate video retrieval and localization on web-scale videos. IEEE Transactions on Multi-media (TMM), 17(3):382--395, 2015.

Digital Library

[12]

Hung-Khoon Tan, Chong-Wah Ngo, Richard Hong, and Tat-Seng Chua. Scalable detection of partial near-duplicate videos by visual-temporal consistency. In Proceedings of the 17th ACM International Conference on Multimedia (ACM MM), pages 145--154, 2009.

Digital Library

[13]

Hao Liu, Qingjie Zhao, Hao Wang, Peng Lv, and Yanming Chen. An image-based near-duplicate video retrieval and localization using improved edit distance. Multimedia Tools and Applications (MTA), 76(22):24435--24456, 2017.

[14]

Yu-Gang Jiang and Jiajun Wang. Partial copy detection in videos: A benchmark and an evaluation of popular methods. IEEE Transactions on Big Data (TBD), 2(1):32--42, 2016.

[15]

Yaocong Hu and Xiaobo Lu. Learning spatial-temporal features for video copy detection by the combination of cnn and rnn. Journal of Visual Communication and Image Representation (JVCIR), 55:21--29, 2018.

[16]

Yu-Gang Jiang, Yudong Jiang, and Jiajun Wang. Vcdb: a large-scale database for partial copy detection in videos. In European Conference on Computer Vision (ECCV), pages 357--371. Springer, 2014.

[17]

Matthijs Douze, Hervé Jégou, and Cordelia Schmid. An image-based approach to video copy detection with spatio-temporal post-filtering. IEEE Transactions on Multimedia (TMM), 12(4):257--266, 2010.

Digital Library

[18]

Jingkuan Song, Yi Yang, Zi Huang, Heng Tao Shen, and Richang Hong. Multi- ple feature hashing for real-time large scale near-duplicate video retrieval. In Proceedings of the 19th ACM International Conference on Multimedia (ACM MM), pages 423--432, 2011.

Digital Library

[19]

Yang Feng, Lin Ma, Wei Liu, Tong Zhang, and Jiebo Luo. Video re-localization. In Proceedings of the European Conference on Computer Vision (ECCV), pages 51--66, 2018.

Digital Library

[20]

Lorenzo Baraldi, Matthijs Douze, Rita Cucchiara, and Hervé Jégou. Lamv: Learn- ing to align and match videos with kernelized temporal layers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7804--7813, 2018.

[21]

Jérôme Revaud, Matthijs Douze, Cordelia Schmid, and Hervé Jégou. Event retrieval in large video collections with circulant temporal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2459--2466, 2013.

Digital Library

[22]

Kaiyang Liao, Hao Lei, Yuanlin Zheng, Guangfeng Lin, Congjun Cao, Mingzhu Zhang, and Jie Ding. Ir feature embedded bof indexing method for near-duplicate video retrieval. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 29(12):3743--3753, 2018.

[23]

Yang Cai, Linjun Yang, Wei Ping, Fei Wang, Tao Mei, Xian-Sheng Hua, and Shipeng Li. Million-scale near-duplicate video retrieval system. In Proceedings of the 19th ACM International Conference on Multimedia (ACM MM), pages 837--838, 2011.

Digital Library

[24]

Zhanning Gao, Gang Hua, Dongqing Zhang, Nebojsa Jojic, Le Wang, Jianru Xue, and Nanning Zheng. Er3: A unified framework for event retrieval, recognition and recounting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2253--2262, 2017.

[25]

Xiao Wu, Alexander G Hauptmann, and Chong-Wah Ngo. Practical elimination of near-duplicates from web video search. In Proceedings of the 15th ACM International Conference on Multimedia (ACM MM), pages 218--227, 2007.

Digital Library

[26]

Venice Erin Liong, Jiwen Lu, Yap-Peng Tan, and Jie Zhou. Deep video hashing. IEEE Transactions on Multimedia (TMM), 19(6):1209--1219, 2016.

Digital Library

[27]

Shuyan Li, Zhixiang Chen, Jiwen Lu, Xiu Li, and Jie Zhou. Neighborhood preserv- ing hashing for scalable video retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 8212--8221, 2019.

[28]

Yanbin Hao, Tingting Mu, Richang Hong, Meng Wang, Ning An, and John Y Goulermas. Stochastic multiview hashing for large-scale near-duplicate video retrieval. IEEE Transactions on Multimedia (TMM), 19(1):1--14, 2016.

[29]

Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, and Richang Hong. Self-supervised video hashing with hierarchical binary auto- encoder. IEEE Transactions on Image Processing (TIP), 27(7):3210--3221, 2018.

[30]

Chen Sun, Abhinav Shrivastava, Saurabh Singh, and Abhinav Gupta. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 843--852, 2017.

[31]

Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.

[32]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770--778, 2016.

[33]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ?ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), pages 5998--6008, 2017.

[34]

Yujie Zhong, Relja Arandjelovic, and Andrew Zisserman. Compact deep aggregation for set retrieval. In Proceedings of the European Conference on Computer Vision (ECCV), pages 0--0, 2018.

[35]

Weihao Kong and Wu-Jun Li. Isotropic hashing. In Advances in Neural Information Processing Systems (NeurIPS), pages 1646--1654, 2012.

[36]

Jingkuan Song, Yi Yang, Zi Huang, Heng Tao Shen, and Jiebo Luo. Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Transactions on Multimedia (TMM), 15(8):1997--2008, 2013.

Digital Library

[37]

Relja Arandjelovic, Petr Gronat, Akihiko Torii, Tomas Pajdla, and Josef Sivic. Netvlad: Cnn architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5297--5307, 2016.

Cited By

Fojcik KSyga PKlonowski M(2025)Extremely compact video representation for efficient near-duplicates detectionPattern Recognition10.1016/j.patcog.2024.111016158(111016)Online publication date: Feb-2025
https://doi.org/10.1016/j.patcog.2024.111016
Chen XSatoh S(2025)Balancing Efficiency and Accuracy: An Analysis of Sampling for Video Copy DetectionMultiMedia Modeling10.1007/978-981-96-2054-8_9(111-124)Online publication date: 3-Jan-2025
https://doi.org/10.1007/978-981-96-2054-8_9
Liu YXu QWen PDai SHuang QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video RetrievalProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681110(3828-3837)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681110
Show More Cited By

Index Terms

Learn from Unlabeled Videos for Near-duplicate Video Retrieval
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Video search

Recommendations

Correlation-based retrieval for heavily changed near-duplicate videos

The unprecedented and ever-growing number of Web videos nowadays leads to the massive existence of near-duplicate videos. Very often, some near-duplicate videos exhibit great content changes, while the user perceives little information change, for ...
An image-based near-duplicate video retrieval and localization using improved Edit distance

The rapid development of social network in recent years has spurred enormous growth of near-duplicate videos. The existence of huge volumes of near-duplicates shows a rising demand on effective near-duplicate video retrieval technique in copyright ...
The Impact of Global and Local Features on Multiple Sequence Alignment Clustering-Based Near-Duplicate Video Retrieval
Proceedings of the 14th Pacific-Rim Conference on Advances in Multimedia Information Processing PCM 2013 - Volume 8294

Traditionally, the performance of Near-Duplicate Video Retrieval (NDVR) is enhanced through different video features, matching scheme and indexing methods. The video features have been intensively investigated and it has been shown that local features ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National Key R&D Program of China

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
539
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)12

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fojcik KSyga PKlonowski M(2025)Extremely compact video representation for efficient near-duplicates detectionPattern Recognition10.1016/j.patcog.2024.111016158(111016)Online publication date: Feb-2025
https://doi.org/10.1016/j.patcog.2024.111016
Chen XSatoh S(2025)Balancing Efficiency and Accuracy: An Analysis of Sampling for Video Copy DetectionMultiMedia Modeling10.1007/978-981-96-2054-8_9(111-124)Online publication date: 3-Jan-2025
https://doi.org/10.1007/978-981-96-2054-8_9
Liu YXu QWen PDai SHuang QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video RetrievalProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681110(3828-3837)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681110
Gui JChen TZhang JCao QSun ZLuo HTao D(2024)A Survey on Self-Supervised Learning: Algorithms, Applications, and Future TrendsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.341511246:12(9052-9071)Online publication date: Dec-2024
https://doi.org/10.1109/TPAMI.2024.3415112
Li WLu YHsiao WTseng YWang M(2024)DRM-SN: Detecting Reused Multimedia Content on Social Networks2024 IEEE 7th International Conference on Multimedia Information Processing and Retrieval (MIPR)10.1109/MIPR62202.2024.00033(169-175)Online publication date: 7-Aug-2024
https://doi.org/10.1109/MIPR62202.2024.00033
Deng RWu QLi YFu H(2024)Differentiable Resolution Compression and Alignment for Efficient Video Classification and RetrievalICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446442(3200-3204)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10446442
Pizzi EKordopatis-Zilos GPatel HPostelnicu GNagavara Ravindra SGupta APapadopoulos STolias GDouze M(2024)The 2023 video similarity dataset and challengeComputer Vision and Image Understanding10.1016/j.cviu.2024.103997243:COnline publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1016/j.cviu.2024.103997
Mendes HSeixas P(2024)Similarity-based ranking of videos from fixed-size one-dimensional video signatureDiscover Computing10.1007/s10791-024-09459-027:1Online publication date: 14-Aug-2024
https://doi.org/10.1007/s10791-024-09459-0
Zhang SZhang JZhang HZhuo L(2024)RaSTFormer: region-aware spatiotemporal transformer for visual homogenization recognition in short videosNeural Computing and Applications10.1007/s00521-024-09633-x36:18(10713-10732)Online publication date: 27-Mar-2024
https://doi.org/10.1007/s00521-024-09633-x
Fu YDuan RYe O(2023)A Near-Duplicate Video Cleaning Method Based on AFENet Adaptive Clustering2023 8th International Conference on Intelligent Computing and Signal Processing (ICSP)10.1109/ICSP58490.2023.10248727(689-695)Online publication date: 21-Apr-2023
https://doi.org/10.1109/ICSP58490.2023.10248727
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten