Abstract
This paper presents a comparative study of near-duplicate image detection techniques in a real-world use case scenario, where a document management company is commissioned to manually annotate a collection of scanned photographs. Detecting duplicate and near-duplicate photographs can reduce the time spent on manual annotation by archivists. This real use case differs from laboratory settings as the deployment dataset is available in advance, allowing the use of transductive learning. We propose a transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs). Our approach involves pre-training a deep neural network on a large dataset and then fine-tuning the network on the unlabeled target collection with self-supervised learning. The results show that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT press (2016)
Joachims, T.: Transductive learning via spectral graph partitioning. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03) (2003)
Thyagharajan, K.K., Kalaiarasi, G.: A review on near-duplicate detection of images using computer vision techniques. Archives Comput. Methods Eng. 28(3), 897–916 (2021)
Erin Liong, V., et al.: Deep hashing for compact binary codes learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Liu, H., et al.: Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE Conference On Computer Vision And Pattern Recognition (2016)
Wu, D., et al.: Deep supervised hashing for multi-label and large-scale image retrieval. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval (2017)
Zhao, F., et al.: Deep semantic ranking based hashing for multi-label image retrieval. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Zhou, Z., et al.: Near-duplicate image detection system using coarse-to-fine matching scheme based on global and local CNN features. Mathematics 8(4), 644 (2020)D
Morra, L., Lamberti, F.: Benchmarking unsupervised near-duplicate image detection. Expert Syst. Appl. 135, 313–326 (2019)
Zhang, Y., et al.: Single-and cross-modality near duplicate image pairs detection via spatial transformer comparing CNN. Sensors 21(1), 255 (2021)
Chum, O., Philbin, J., Zisserman, A.: Near duplicate image detection: Min-hash and TF-IDF weighting. In: Bmvc, vol. 810, pp.81-815 (2008)
Dong, W., et al.: High-confidence near-duplicate image detection. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval (2012)
He, B., et al.: Part-regularized near-duplicate vehicle re-identification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)
Zauner, C.: Implementation and Benchmarking of Perceptual Image Hash Functions (2010)
Sharif Razavian, A., et al.: CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2014)
D Yosinski, J., et al.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Dubey, S.R.: A decade survey of content based image retrieval using deep learning. IEEE Trans. Circuits Syst. Video Technol. 32(5), 2687–2704 (2021)
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference On Computer Vision And Pattern Recognition (2016)
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Chen, T., et al.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning. PMLR (2020)
He, K., et al.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, PMLR (2021)
Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: 2006 IEEE Computer Society Conference on Computer vision and pattern recognition, Vol. 2. IEEE (2006)
Deng, J. et al.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Net, F., Folia, M., Casals, P., Gómez, L. (2023). Transductive Learning for Near-Duplicate Image Detection in Scanned Photo Collections. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-41734-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)