A Novel Copy-Move Forgery Detection Algorithm via Feature Label Matching and Hierarchical Segmentation Filtering

https://doi.org/10.1016/j.ipm.2021.102783Get rights and content

Highlights

  • The improved structure of SIFT can extract as many as possible interesting and effective points in the homogeneous region(s), that enables the ability of CMFD in homogenous regions simultaneously under large-scaling attacks.

  • The proposed FLM algorithm makes use of a newly designed feature label and OLC for significantly reduced computation. As a result, matching efficiency is significantly improved.

  • The proposed HSF algorithm can effectively filter out suspicious outliers. Furthermore, the fusion of the three-hierarchical filtering segmentations can indicate forgery regions precisely.

Abstract

Recently, some copy-move forgeries have made use of homogeneous region(s) in an image with the large-scaling attack(s) to highlight or cover the target objects, which is easy to implement but difficult to detect. Unfortunately, existing Copy-Move Forgery Detection (CMFD) methods fail to detect such kinds of forgeries because they are incapable of extracting a sufficient number of effective keypoints in the homogeneous region(s), leading to inaccuracy and inefficiency in detection. In this work, a new CMFD scheme is proposed: 1) An improved SIFT structure with inherent scaling invariance is designed to enhance the capability of extracting effective keypoints in the homogeneous region. 2). The enhancement of massive keypoints extraction in the homogeneous region incurs a heavy computational burden in feature matching (Note that this is a common issue in all CMFD methods). For this reason, a new Feature Label Matching (FLM) method is proposed to break down the massive keypoints into different small label groups, each of which contains only a small number of keypoints, for significantly improved matching effectiveness and efficiency. 3) Identifying true keypoints for matching is a critical issue for performance. In our work, the Hierarchical Segmentation Filtering (HSF) algorithm is newly proposed to filter out suspicious outliers, based on the statistics on the coarse-to-fine hierarchical segmentations. 4) Finally, the fusion of the coarse-to-fine hierarchical segmentation maps fills the forgery regions precisely. In our experiments, the proposed scheme achieves excellent detection performance under various attacks, especially for the homogeneous region(s) detection under large-scaling attack(s). Extensive experimental results demonstrate that the proposed scheme achieves the best F1 scores and least computational cost in addressing the geometrical attacks on the IMD dataset (a comprehensive dataset), and CMH datasets (most forgery samples under geometric attacks). Compared to existing state-of-the-art methods, the proposed scheme raises at least 20% and 25% in terms of F1 scores under scaling factors of 50%, and 200% in large-scaling sub-datasets of IMD, respectively.

Introduction

Copy-move image forgery is one of the popular tampering techniques in digital image forgery, where the copied region(s) of an image is pasted onto the same image to hide or cover the concerned regions [Korus, 2017]. Existing copy-move forgery detection (CMFD) methods are generally divided into classic methods and deep neural network (DNN) models.

Classic CMFD methods are generally categorized into block-based methods [Fridrich et al., 2003, Zhong et al., 2016, Emam, Han, & Niu, 2016, Cozzolino et al., 2015, Pun and Chung, 2018, Ryu et al., 2013, Bi and Pun, 2017, Bi and Pun, 2018] or keypoint-based methods [Zhong and Pun, 2019, Shivakumar and Baboo, 2011, Pun et al., 2015, Li and Zhou, 2019, Silva et al., 2015, Tsai and Leou, 2021], both of which follow four procedures: feature extraction, feature matching, filtering, and post-processing. In the literature, most attacks to CMFD (e.g., JPEG compression, rotation, additional noise attack) have been well addressed, except the homogeneous region(s) with the large-scaling attack(s) in which at least a scaling factor of ±20% is applied to the image region(s) to highlight or cover the target objects in the same image. This kind of attack is easy to implement but difficult to detect for its homogeneity of image regions. As an illustration, existing methods in the literature only respectively achieve 0.75 and 0.65 in terms of F1 scores under scaling attacks of 120% and 80% for the popular benchmark dataset IMD [Christlein et al., 2012]. Under severe large-scaling of 50% and 200%, all the block-based and keypoint-based methods can only achieve at most 0.37 of the F1 score. Even worse, if the detection methods can only address the forgery images with suspicious homogenous regions, then the F1 score gets further deteriorated under such scaling attacks, as shown in the experimental comparison in Fig. 11-(a3).

DNN models, e.g., Convolutional Neural Networks (CNN) [Rao and Ni, 2016], end-to-end DNN [Wu et al., 2018], and BusterNet [Wu et al., 2018], have been presented to address copy-move forgery detection. These methods extracted deep-level features using CNN, e.g., VGG16 [Simonyan and Zisserman, 2014] and Inception [Szegedy et al., 2016]. The features of 512 or even deeper layers can represent more inherent information. However, it is impossible to pre-train all forgery objects in DNN models [Rao and Ni, 2016, Wu et al., 2018, Wu et al., 2018, Liu et al., 2018]. For this reason, DNN models cannot address numerous untrained forgery types in the test stage due to the lack of prior training information.

Currently, some DNN methods (e.g., Deep-learning [Rao and Ni, 2016], BusterNet [Wu et al., 2018], semantic reinforcement network [Chen et al., 2021], Multi-Branch (MB) CNNs [Barni et al., 2021], AR-Net [Zhu et al., 2020] have been successfully applied in CMFD. For instance, Barni et al. (2021) improved a multi-branch CNN architecture from BusterNet to learn a set of features for revealing the presence of pasted forgeries and boundary inconsistencies in the copy-move regions. However, this DNN scheme only addresses the foreground but fails in processing homogeneous or smoothing objects or backgrounds. This detection defect also exists in the other CMFD DNNs. To date, the DNN model addressing copy-move forgery detection is still in its infancy. Therefore, the main trend is still on the classic methods.

Derivate applications are using the technique of image copy-move forgery, e.g., detection of copies of programming codes in 5G-IoT systems. Farhan et al. [Ullah et al., 2021] is only devoted to investigating the clone attacks whether exist in the modified code programming versions from the logic flow difference of the previous programming code version. They propose a hybrid approach using the Control Flow Graph features from source codes to view the logic flow of the codes. Then, these features are given to the designed RNN model for the efficient classification of code clones. This scheme does not belong to the CMFD, but its RNN network also presents a good reference for CMFD in future work. In addition,ZZa[Zza et al., 2021] proposed a novel residual visualization scheme over the residual image to reveal the correlation between the original and test images. However, this scheme is only devoted to identifying whether the images are original copies or similar images. This scheme does not address CMFD because it cannot find any forgery traces in a detected image.

Generally, existing CMFD methods suffer from the following issues:

Inherent defects of feature extraction: Block-based detection methods cannot handle large-scaling attacks because the features extracted respectively from the copied region and its scaled region are different due to the fixed-single block size. Moreover, due to the unknown scaling factor, it is difficult to apply suitable multi-scale blocks to extract block features effectively. Furthermore, the usage of multi-scale blocks for feature extractions may result in prohibitive computation costs. On the other hand, keypoint-based methods can handle large-scaling attacks to a certain extent. Keypoint-based methods generally search and extract the extrema in local high-entropy regions in the whole image as the candidate keypoints. However, based on the inherent structures, most keypoint-based methods using the local descriptor algorithms cannot detect the forgeries in the homogeneous region(s). Therefore, existing block-based and keypoint-based methods cannot provide effective feature extraction for the homogeneous region(s) simultaneously with the large-scaling attack(s).

Inefficient feature matching: Block-based methods need to search and match every pixel in an image with multiple block features so that expensive computational cost is always incurred. In keypoint-based methods, every keypoint is matched upon high-dimensional features (e.g., SIFT/SURF takes 128-dimensional features) for accurate matching. Besides, keypoint-based methods may extract a considerable number of candidate keypoints for more accurate detection that further deteriorate matching efficiency, especially in the homogeneous regions or large detected image size.

Ineffective and inefficient segmentation filtering: Keypoint-affine and segmentation-keypoint-filtering algorithms are mainly used for segmentation filtering algorithms. Generally, keypoint-affine algorithms [Ramu and Babu, 2017] require iterative matching analysis of a huge amount of keypoints to filter out the outliers, resulting in low matching efficiency. Besides, the keypoint-affine algorithm is incompetent to address multiple clusters and fails to detect multiple forgery regions. Existing segmentation-keypoint-filtering algorithms [Pun et al., 2015] require dozens of iterations for effective snippet segmentations as well. Furthermore, these algorithms can only locate the coarse segmentation matches and fail in indicating the forgery pixels.

In view of the above issues, a new CMFD scheme shows in Fig. 1 is proposed that includes i) an improved SIFT [Li et al., 2019], ii) Feature Label Matching (FLM) algorithm, and iii) Hierarchical Segmentation Filtering (HSF). The new scheme aims to achieve high accuracy and high efficiency under various attacks, especially for large-scaling in the homogeneous region(s). In our improved SIFT, its structure is modified by removing the contrast threshold to extract all effective keypoints in rough and homogeneous regions (detailed in Section II.A). However, matching with high dimensional features becomes computationally prohibitive under this massive amount of effective keypoints. Therefore, a new Feature Label Matching (FLM) algorithm is proposed to reduce computational cost on keypoint matching. (Remark: FLM can also apply to and resolve the same matching problem in other existing CMFD methods.) In FLM, the feature label is newly introduced to represent the corresponding feature of a keypoint. Subsequently, all keypoints are assigned into different feature label groups according to their feature label summation. Subsequently, the keypoints are assigned to the same feature label group according to their feature label summation. The massive keypoints are broken down into different label groups, each of which contains only a small number of keypoints. The successive feature label groups are then concatenated into a new overlapping label cluster (OLC). As a result, FLM only requires matching a small number of keypoints in small label groups for significantly improved matching efficiency and effectiveness (detailed in Section II.B). Another improvement for effectiveness and efficiency can benefit from removing the outlier keypoints among all candidate keypoints. For this purpose, the image is divided into three coarse-to-fine segmentations by the proposed HSF adaptively. Then, the metrics are applied to filter out the outlier pairs in every segmentation. The inlier (true keypoint) pairs in three hierarchical segmentations accurately indicate the corresponding forgery regions (detailed in Section II. C). Finally, three hierarchical forgery segmentations are fused to address the forgery region filling(detailed in Section II. D).

As a summary, the contributions of the proposed method are given as follows:

  • i)

    The improved structure of SIFT can extract as many as possible interesting and effective points in the homogeneous region(s), enabling the ability of CMFD in homogenous regions simultaneously under large-scaling attacks.

  • ii)

    The proposed FLM algorithm makes use of a newly designed feature label and OLC for significantly reduced computation. As a result, matching efficiency is improved dramatically.

  • iii)

    The proposed HSF algorithm can effectively filter out suspicious outliers. Furthermore, the fusion of the three-hierarchical filtering segmentations can indicate forgery regions precisely.

Noteworthy, our contributions ii) and iii) can also be fused to the other CMFD methods. The rest of the paper is organized as follows. Section 2 presents the related work and the proposed method. Sections 3 and 4 show the experimental results and the conclusion.

Section snippets

The Proposed Scheme

This section describes three main elements of the proposed CMFD scheme: improved SIFT for effective feature extraction, Feature Label Matching (FLM), and Hierarchical Segmentation Filtering (HSF).

Experiments and Analysis

In this section, a series of experiments on the benchmark datasets are carried out to evaluate the effectiveness and efficiency of the proposed method. In Section III-A, we present the test dataset and evaluation metrics. Section III-B presents the experimental results and analysis of the image manipulation dataset (IMD) [Christlein et al., 2012]. Section III-C presents the experimental results and analysis on the CMH [Silva et al., 2015] and the GRIP [Cozzolino et al., 2015] dataset. Section

Conclusion

This paper proposes a new CMFD scheme consisting of i) an improved structure of SIFT, ii) a novel Feature Label Matching (FLM), and iii) a novel Hierarchical Segmentation Filtering (HSF). The improved structure of SIFT enhances the capability of extracting effective keypoints in the homogeneous region. FLM breaks down the massive keypoints into different small label groups for significantly improved matching effectiveness and efficiency. The HSF algorithm is proposed to filter out suspicious

CRediT authorship contribution statement

Yanfen Gan: Conceptualization, Methodology, Validation, Writing – original draft. Junliu Zhong: Writing – original draft. Chiman Vong: Writing – review & editing.

Acknowledgment

This work was supported in part by Guangdong basic and applied basic research foundation under Grant No. 2020A151501783 (2020A1515010700), 2021 Innovation team of scientific research platform of universities in Guangdong Province Grant 2021KCXTD053, the Young creative talent projects of universities in Guangdong Province (Natural Science) under Grant 2021KQNCX164.

Yanfen Gan received the B.S. and M.S. degrees from the Guangdong University of Technology, Guangzhou, China, in 2004 and 2007, respectively. She is currently an Associate Professor with the Department of Information Science and Technology, South China Business College, Guangdong University of Foreign Studies. She is currently pursuing Ph.D. degree with the Department of Computer and Information Science, University of Macau, Macau, China. Her current research interests include deep learning

References (40)

  • I. Amerini et al.

    A sift-based forensic method for copy–move attack detection and transformation recovery

    IEEE transactions on information forensics and security

    (2011)
  • M. Barni et al.

    Copy Move Source-Target Disambiguation through Multi-Branch CNNs

    IEEE Transactions on Information Forensics and Security

    (2021)
  • R.A.J.a.p.a. Brown

    Building a balanced kd tree in o (kn log n) time,

    Computer Science

    (2014)
  • H. Chen et al.

    Hybrid features and semantic reinforcement network for image forgery detection

    Multimedia Systems

    (2021)
  • M.-M. Cheng et al.

    Global contrast based salient region detection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2014)
  • V. Christlein et al.

    An evaluation of popular copy-move forgery detection approaches

    IEEE Transactions on information forensics and security

    (2012)
  • D. Cozzolino et al.

    Efficient dense-field copy–move forgery detection

    IEEE Transactions on Information Forensics and Security

    (2015)
  • M. Emam et al.

    PCET based copy-move forgery detection in images under geometric transforms

    Multimedia Tools and Applications

    (2016)
  • A. Ferreira et al.

    Behavior Knowledge Space-Based Fusion for Copy–Move Forgery Detection

    IEEE Transactions on Image Processing

    (2016)
  • A.J. Fridrich et al.

    Detection of copy-move forgery in digital images

  • Cited by (13)

    • Copy move forgery detection and segmentation using improved mask region-based convolution network (RCNN)

      2022, Applied Soft Computing
      Citation Excerpt :

      Finally, the Hierarchical Segmentation Filtering technique was used to filter the outliers. This work [31] is effective for CMF detection, however, suffers from a high computational cost. In [32], the authors presented a CNN model for the identification of CMFD.

    View all citing articles on Scopus

    Yanfen Gan received the B.S. and M.S. degrees from the Guangdong University of Technology, Guangzhou, China, in 2004 and 2007, respectively. She is currently an Associate Professor with the Department of Information Science and Technology, South China Business College, Guangdong University of Foreign Studies. She is currently pursuing Ph.D. degree with the Department of Computer and Information Science, University of Macau, Macau, China. Her current research interests include deep learning methods and information security.

    Junliu Zhong received the M.Sc. degree in detection technology and automation device from Sun Yat-sen University, China, in 2006, and the Ph.D. degree in computer science from University of Macau, China, in 2020. He is currently an Associate Professor with the Department of Information and Communication Engineering, Guangzhou Maritime University, China. His current research interests include digital image/video processing, deep learning, and information security.

    Chiman Vong received the M.S. and Ph.D. degrees in software engineering from the University of Macau, Macau, China, in 2000 and 2005, respectively. He is currently an Associate Professor with the Department of Computer and Information Science, University of Macau. His current research interests include machine learning methods and intelligent systems.

    Yan-Fen Gan and Jun-Liu Zhong contributed equally to this work.

    View full text