A Novel Copy-Move Forgery Detection Algorithm via Feature Label Matching and Hierarchical Segmentation Filtering
Introduction
Copy-move image forgery is one of the popular tampering techniques in digital image forgery, where the copied region(s) of an image is pasted onto the same image to hide or cover the concerned regions [Korus, 2017]. Existing copy-move forgery detection (CMFD) methods are generally divided into classic methods and deep neural network (DNN) models.
Classic CMFD methods are generally categorized into block-based methods [Fridrich et al., 2003, Zhong et al., 2016, Emam, Han, & Niu, 2016, Cozzolino et al., 2015, Pun and Chung, 2018, Ryu et al., 2013, Bi and Pun, 2017, Bi and Pun, 2018] or keypoint-based methods [Zhong and Pun, 2019, Shivakumar and Baboo, 2011, Pun et al., 2015, Li and Zhou, 2019, Silva et al., 2015, Tsai and Leou, 2021], both of which follow four procedures: feature extraction, feature matching, filtering, and post-processing. In the literature, most attacks to CMFD (e.g., JPEG compression, rotation, additional noise attack) have been well addressed, except the homogeneous region(s) with the large-scaling attack(s) in which at least a scaling factor of ±20% is applied to the image region(s) to highlight or cover the target objects in the same image. This kind of attack is easy to implement but difficult to detect for its homogeneity of image regions. As an illustration, existing methods in the literature only respectively achieve 0.75 and 0.65 in terms of F1 scores under scaling attacks of 120% and 80% for the popular benchmark dataset IMD [Christlein et al., 2012]. Under severe large-scaling of 50% and 200%, all the block-based and keypoint-based methods can only achieve at most 0.37 of the F1 score. Even worse, if the detection methods can only address the forgery images with suspicious homogenous regions, then the F1 score gets further deteriorated under such scaling attacks, as shown in the experimental comparison in Fig. 11-(a3).
DNN models, e.g., Convolutional Neural Networks (CNN) [Rao and Ni, 2016], end-to-end DNN [Wu et al., 2018], and BusterNet [Wu et al., 2018], have been presented to address copy-move forgery detection. These methods extracted deep-level features using CNN, e.g., VGG16 [Simonyan and Zisserman, 2014] and Inception [Szegedy et al., 2016]. The features of 512 or even deeper layers can represent more inherent information. However, it is impossible to pre-train all forgery objects in DNN models [Rao and Ni, 2016, Wu et al., 2018, Wu et al., 2018, Liu et al., 2018]. For this reason, DNN models cannot address numerous untrained forgery types in the test stage due to the lack of prior training information.
Currently, some DNN methods (e.g., Deep-learning [Rao and Ni, 2016], BusterNet [Wu et al., 2018], semantic reinforcement network [Chen et al., 2021], Multi-Branch (MB) CNNs [Barni et al., 2021], AR-Net [Zhu et al., 2020] have been successfully applied in CMFD. For instance, Barni et al. (2021) improved a multi-branch CNN architecture from BusterNet to learn a set of features for revealing the presence of pasted forgeries and boundary inconsistencies in the copy-move regions. However, this DNN scheme only addresses the foreground but fails in processing homogeneous or smoothing objects or backgrounds. This detection defect also exists in the other CMFD DNNs. To date, the DNN model addressing copy-move forgery detection is still in its infancy. Therefore, the main trend is still on the classic methods.
Derivate applications are using the technique of image copy-move forgery, e.g., detection of copies of programming codes in 5G-IoT systems. Farhan et al. [Ullah et al., 2021] is only devoted to investigating the clone attacks whether exist in the modified code programming versions from the logic flow difference of the previous programming code version. They propose a hybrid approach using the Control Flow Graph features from source codes to view the logic flow of the codes. Then, these features are given to the designed RNN model for the efficient classification of code clones. This scheme does not belong to the CMFD, but its RNN network also presents a good reference for CMFD in future work. In addition,ZZa[Zza et al., 2021] proposed a novel residual visualization scheme over the residual image to reveal the correlation between the original and test images. However, this scheme is only devoted to identifying whether the images are original copies or similar images. This scheme does not address CMFD because it cannot find any forgery traces in a detected image.
Generally, existing CMFD methods suffer from the following issues:
Inherent defects of feature extraction: Block-based detection methods cannot handle large-scaling attacks because the features extracted respectively from the copied region and its scaled region are different due to the fixed-single block size. Moreover, due to the unknown scaling factor, it is difficult to apply suitable multi-scale blocks to extract block features effectively. Furthermore, the usage of multi-scale blocks for feature extractions may result in prohibitive computation costs. On the other hand, keypoint-based methods can handle large-scaling attacks to a certain extent. Keypoint-based methods generally search and extract the extrema in local high-entropy regions in the whole image as the candidate keypoints. However, based on the inherent structures, most keypoint-based methods using the local descriptor algorithms cannot detect the forgeries in the homogeneous region(s). Therefore, existing block-based and keypoint-based methods cannot provide effective feature extraction for the homogeneous region(s) simultaneously with the large-scaling attack(s).
Inefficient feature matching: Block-based methods need to search and match every pixel in an image with multiple block features so that expensive computational cost is always incurred. In keypoint-based methods, every keypoint is matched upon high-dimensional features (e.g., SIFT/SURF takes 128-dimensional features) for accurate matching. Besides, keypoint-based methods may extract a considerable number of candidate keypoints for more accurate detection that further deteriorate matching efficiency, especially in the homogeneous regions or large detected image size.
Ineffective and inefficient segmentation filtering: Keypoint-affine and segmentation-keypoint-filtering algorithms are mainly used for segmentation filtering algorithms. Generally, keypoint-affine algorithms [Ramu and Babu, 2017] require iterative matching analysis of a huge amount of keypoints to filter out the outliers, resulting in low matching efficiency. Besides, the keypoint-affine algorithm is incompetent to address multiple clusters and fails to detect multiple forgery regions. Existing segmentation-keypoint-filtering algorithms [Pun et al., 2015] require dozens of iterations for effective snippet segmentations as well. Furthermore, these algorithms can only locate the coarse segmentation matches and fail in indicating the forgery pixels.
In view of the above issues, a new CMFD scheme shows in Fig. 1 is proposed that includes i) an improved SIFT [Li et al., 2019], ii) Feature Label Matching (FLM) algorithm, and iii) Hierarchical Segmentation Filtering (HSF). The new scheme aims to achieve high accuracy and high efficiency under various attacks, especially for large-scaling in the homogeneous region(s). In our improved SIFT, its structure is modified by removing the contrast threshold to extract all effective keypoints in rough and homogeneous regions (detailed in Section II.A). However, matching with high dimensional features becomes computationally prohibitive under this massive amount of effective keypoints. Therefore, a new Feature Label Matching (FLM) algorithm is proposed to reduce computational cost on keypoint matching. (Remark: FLM can also apply to and resolve the same matching problem in other existing CMFD methods.) In FLM, the feature label is newly introduced to represent the corresponding feature of a keypoint. Subsequently, all keypoints are assigned into different feature label groups according to their feature label summation. Subsequently, the keypoints are assigned to the same feature label group according to their feature label summation. The massive keypoints are broken down into different label groups, each of which contains only a small number of keypoints. The successive feature label groups are then concatenated into a new overlapping label cluster (OLC). As a result, FLM only requires matching a small number of keypoints in small label groups for significantly improved matching efficiency and effectiveness (detailed in Section II.B). Another improvement for effectiveness and efficiency can benefit from removing the outlier keypoints among all candidate keypoints. For this purpose, the image is divided into three coarse-to-fine segmentations by the proposed HSF adaptively. Then, the metrics are applied to filter out the outlier pairs in every segmentation. The inlier (true keypoint) pairs in three hierarchical segmentations accurately indicate the corresponding forgery regions (detailed in Section II. C). Finally, three hierarchical forgery segmentations are fused to address the forgery region filling(detailed in Section II. D).
As a summary, the contributions of the proposed method are given as follows:
- i)
The improved structure of SIFT can extract as many as possible interesting and effective points in the homogeneous region(s), enabling the ability of CMFD in homogenous regions simultaneously under large-scaling attacks.
- ii)
The proposed FLM algorithm makes use of a newly designed feature label and OLC for significantly reduced computation. As a result, matching efficiency is improved dramatically.
- iii)
The proposed HSF algorithm can effectively filter out suspicious outliers. Furthermore, the fusion of the three-hierarchical filtering segmentations can indicate forgery regions precisely.
Noteworthy, our contributions ii) and iii) can also be fused to the other CMFD methods. The rest of the paper is organized as follows. Section 2 presents the related work and the proposed method. Sections 3 and 4 show the experimental results and the conclusion.
Section snippets
The Proposed Scheme
This section describes three main elements of the proposed CMFD scheme: improved SIFT for effective feature extraction, Feature Label Matching (FLM), and Hierarchical Segmentation Filtering (HSF).
Experiments and Analysis
In this section, a series of experiments on the benchmark datasets are carried out to evaluate the effectiveness and efficiency of the proposed method. In Section III-A, we present the test dataset and evaluation metrics. Section III-B presents the experimental results and analysis of the image manipulation dataset (IMD) [Christlein et al., 2012]. Section III-C presents the experimental results and analysis on the CMH [Silva et al., 2015] and the GRIP [Cozzolino et al., 2015] dataset. Section
Conclusion
This paper proposes a new CMFD scheme consisting of i) an improved structure of SIFT, ii) a novel Feature Label Matching (FLM), and iii) a novel Hierarchical Segmentation Filtering (HSF). The improved structure of SIFT enhances the capability of extracting effective keypoints in the homogeneous region. FLM breaks down the massive keypoints into different small label groups for significantly improved matching effectiveness and efficiency. The HSF algorithm is proposed to filter out suspicious
CRediT authorship contribution statement
Yanfen Gan: Conceptualization, Methodology, Validation, Writing – original draft. Junliu Zhong: Writing – original draft. Chiman Vong: Writing – review & editing.
Acknowledgment
This work was supported in part by Guangdong basic and applied basic research foundation under Grant No. 2020A151501783 (2020A1515010700), 2021 Innovation team of scientific research platform of universities in Guangdong Province Grant 2021KCXTD053, the Young creative talent projects of universities in Guangdong Province (Natural Science) under Grant 2021KQNCX164.
Yanfen Gan received the B.S. and M.S. degrees from the Guangdong University of Technology, Guangzhou, China, in 2004 and 2007, respectively. She is currently an Associate Professor with the Department of Information Science and Technology, South China Business College, Guangdong University of Foreign Studies. She is currently pursuing Ph.D. degree with the Department of Computer and Information Science, University of Macau, Macau, China. Her current research interests include deep learning
References (40)
- et al.
Fast reflective offset-guided searching method for copy-move forgery detection
Information Sciences
(2017) - et al.
Fast copy-move forgery detection using local bidirectional coherency error refinement
Pattern Recognition
(2018) Digital image integrity–a survey of protection and verification techniques
Digital Signal Processing
(2017)- et al.
Efficient parallel optimizations of a high-performance SIFT on GPUs
(2019) - et al.
A Two-stage Localization for Copy-Move Forgery Detection
Information Sciences
(2018) - et al.
Going deeper into copy-move forgery detection: Exploring image telltales via multi-scale analysis and voting processes
Journal of Visual Communication and Image Representation
(2015) - et al.
Radon odd radial harmonic Fourier moments in detecting cloned forgery image
Chaos, Solitons & Fractals
(2016) - et al.
AR-Net: Adaptive Attention and Residual Refinement Network for Copy-Move Forgery Detection
IEEE Transactions on Industrial Informatics
(2020) - et al.
SLIC superpixels compared to state-of-the-art superpixel methods
IEEE transactions on pattern analysis and machine intelligence
(2012) - et al.
Freak: Fast retina keypoint
A sift-based forensic method for copy–move attack detection and transformation recovery
IEEE transactions on information forensics and security
Copy Move Source-Target Disambiguation through Multi-Branch CNNs
IEEE Transactions on Information Forensics and Security
Building a balanced kd tree in o (kn log n) time,
Computer Science
Hybrid features and semantic reinforcement network for image forgery detection
Multimedia Systems
Global contrast based salient region detection
IEEE Transactions on Pattern Analysis and Machine Intelligence
An evaluation of popular copy-move forgery detection approaches
IEEE Transactions on information forensics and security
Efficient dense-field copy–move forgery detection
IEEE Transactions on Information Forensics and Security
PCET based copy-move forgery detection in images under geometric transforms
Multimedia Tools and Applications
Behavior Knowledge Space-Based Fusion for Copy–Move Forgery Detection
IEEE Transactions on Image Processing
Detection of copy-move forgery in digital images
Cited by (13)
Strong robust copy-move forgery detection network based on layer-by-layer decoupling refinement
2024, Information Processing and ManagementCopy move forgery detection and segmentation using improved mask region-based convolution network (RCNN)
2022, Applied Soft ComputingCitation Excerpt :Finally, the Hierarchical Segmentation Filtering technique was used to filter the outliers. This work [31] is effective for CMF detection, however, suffers from a high computational cost. In [32], the authors presented a CNN model for the identification of CMFD.
A Keypoint-Based Technique for Detecting the Copy Move Forgery in Digital Images
2024, Lecture Notes in Networks and SystemsEnhancing copy-move forgery detection through a novel CNN architecture and comprehensive dataset analysis
2024, Multimedia Tools and ApplicationsAutomated identification of copy-move forgery using Hessian and patch feature extraction techniques
2024, Journal of Forensic SciencesAccurate and robust image copy-move forgery detection using adaptive keypoints and FQGPCET-GLCM feature
2024, Multimedia Tools and Applications
Yanfen Gan received the B.S. and M.S. degrees from the Guangdong University of Technology, Guangzhou, China, in 2004 and 2007, respectively. She is currently an Associate Professor with the Department of Information Science and Technology, South China Business College, Guangdong University of Foreign Studies. She is currently pursuing Ph.D. degree with the Department of Computer and Information Science, University of Macau, Macau, China. Her current research interests include deep learning methods and information security.
Junliu Zhong received the M.Sc. degree in detection technology and automation device from Sun Yat-sen University, China, in 2006, and the Ph.D. degree in computer science from University of Macau, China, in 2020. He is currently an Associate Professor with the Department of Information and Communication Engineering, Guangzhou Maritime University, China. His current research interests include digital image/video processing, deep learning, and information security.
Chiman Vong received the M.S. and Ph.D. degrees in software engineering from the University of Macau, Macau, China, in 2000 and 2005, respectively. He is currently an Associate Professor with the Department of Computer and Information Science, University of Macau. His current research interests include machine learning methods and intelligent systems.
- †
Yan-Fen Gan and Jun-Liu Zhong contributed equally to this work.