Elsevier

Neurocomputing

Volume 512, 1 November 2022, Pages 25-39
Neurocomputing

Learning to localize image forgery using end-to-end attention network

https://doi.org/10.1016/j.neucom.2022.09.060Get rights and content

Abstract

Recent advancements have increased the prevalence of digital image tampering. Anyone can manipulate multimedia content using editing software to alter the semantic meaning of images to deceive viewers. Since manipulations appear realistic, both humans and machines face challenges detecting forgeries. We propose a novel algorithm for authenticating visual content by localizing forged regions in this work. Our proposed algorithm employs channel attention convolutional blocks in an end-to-end learning framework. The channel attention infers forged regions in an image by extracting attention-aware multi-resolution features in the spatial domain and features in the frequency domain. Therefore, the proposed network is divided into two subnetworks, for extracting attention-aware multi-resolution features in the spatial and frequency domain. To predict the resulting mask, we concatenate the features of both networks. The proposed channel attention network exclusively focuses on the forged region and increases network generalization capabilities on unseen manipulations. Rigorous experiments demonstrate that the proposed algorithm outperforms state-of-the-art methods on five benchmark datasets for localizing a wide range of manipulations.

Introduction

As technology advances, smartphones make digital cameras more accessible, and millions of images are shared every day on the internet and a variety of tools are available for editing these images1,2. While the tools are specifically designed to enhance images, some individuals use them to manipulate images and disseminate false information [1]. As a result, faked images pose a severe threat, and the damage they inflict is frequently irreversible [2]. In the literature, image signatures are used to authenticate an image based on embedded information such as watermarking [3]. However, not all imaging devices support this facility, and also, embedded information is not guaranteed to be resilient and unalterable in adversarial environments. Therefore, many studies focused on image inconsistencies such as color, illumination, camera features, and blocking artifacts to detect forgeries if any of these features vary significantly across image regions [4], [5].

Splicing and copy-move are the two most common image forgeries [6], [7], [8]. Splicing involves copying a portion of one image into another such that a composite of two or several images can be used to generate this forgery. Copy-move is a technique that copies a portion of an image to another location within the same image. This type of forgeries usually aims to conceal items in an image.

Although forgery is neither obvious nor discernible to the human eye, it leaves different artifacts and modifies the underlying statistical properties of an image. For example, a forged image generated by fusing a piece of one image into another, given that the two images originated from different sources [9], can affect the underlying image statistical features of the forged image, such as light inconsistencies, blocking, and re-sampling artifacts. Traditionally, these statistical features are used to detect forgeries in a given image, however, they are ineffective in detecting forgeries present in high-resolution images and forgeries with unseen manipulations [4], [10], [11]. Furthermore, the majority of these methods are focused solely on detection [12], [5], [11], with only a few concentrating on forgery localization [9], [13], [14], [15], [16]. This necessitates the development of a robust system for detecting forgeries and their precise locations in a given forged image.

In recent years, deep learning methods have demonstrated excellent performance in many computer vision applications such as semantic segmentation, object tracking, and image recognition [17]. Inspired by these breakthroughs, several techniques based on deep learning have been proposed in the literature to detect tampering artifacts in a forged image [18], [19], [20], [21]. These techniques show encouraging performance for forgery localization tasks [22] and, many of these approaches concentrate on forgeries, such as copy-move [23], [24], [25] or splicing [26], [27], [28]. However, limiting real-world manipulations to a single type of forgery is impractical. Deep learning-based techniques for localizing forgeries rely on a pre-trained network as a backbone that performs well when the forgery localization problem is framed as a classification task, such as anomaly detection. ManTra-Net is a well-known approach for localizing multiple forgeries based on anomaly detection that uses VGG-19 [29] as its backbone architecture [19]. In general, networks that perform classification tasks generate low-resolution features; therefore, using those networks as the backbone may be insufficient to localize the forged region at the pixel level.

Few other works utilizes architecture of semantic segmentation such as U-Net [30] to address forgery localization [31] [32]. U-Net, one of the popular segmentation networks, reduces the localization loss by combining upsampled output features with high-resolution features by a symmetric expanding path [32]. Inspired by its segmentation capabilities, a modified version of U-net is introduced for forgery localization task in [33]. However, approaches based on semantic segmentation have challenges localizing only forged regions. For example, consider the presence of an object’s forged and genuine region in an image. If semantic segmentation is used to localize that object, regardless of whether it is forged or genuine, it may localize both forged and genuine object regions. Although semantic network features are effective at segmentation, they lack discriminant features to distinguish between forged and unforged regions. Other semantic segmentation networks, such as Mask R-CNN [34] and Mask Refined R-CNN [35], can also be used; however, these networks may face similar challenges in localizing forged regions. Also, object detection networks that are effective at small and occluded object detection can be extended in forgery detection; however, these networks may face challenges in localizing exact forged object boundaries [36].

Resampling is a frequent image forging operation that can be detected using multi-resolution features; as a result, techniques that rely on these features yield satisfactory results compared to techniques that solely depend on low-resolution features. Similarly, double compression is another standard operation in forging. The artifacts associated with it may be challenging to identify in the spatial domain; however, they are easily detectable in the frequency domain [37], [38], [39]. To validate this statement, a method that utilizes both spatial and frequency domain features using long short-term memory (LSTM) network with an encoder and a decoder has shown better performance in [21]. However, the extracted features from the encoder and decoder network are low resolution and have affected the pixel-level classification accuracy. Thus, techniques using multi-resolution features, including low and high-resolution features in the spatial and frequency domains, can accurately localize image forgeries. Kwon et al. addressed these issues by extracting multi-resolution features from both the spatial and frequency domains [40]. Since these features are not equally contributed to localizing forgeries and a lack of identifying those dominant features, this approach faces challenges in a few scenarios, such as detecting large forged regions in an image.

Our primary objective is to learn the artifacts due to the differences between authentic and tampered regions. Let IRh×w×c be an image, where h,w, and c denote the height, width, and the number of channels. The RGB and YCbCr image representations are utilized to extract the features in this work. Since forgery localization requires pixel-level categorization, capturing the dominant features among the extracted features is critical for improved performance. To address this, we present a channel attention network that efficiently captures the essential features related to the forgeries. The proposed technique comprises two subnetworks, Channel Attention High-Resolution Network (CA-HRNet) and Channel Attention Discrete Cosine Transform Network (CA-DCTNet), satisfying the spatial and frequency domain feature requirements. The High-Resolution Network (HRNet) structure is used as the backbone in both subnetworks [43] and the basic block in both subnetworks is replaced with the proposed channel attention block (both architectures are detailed in Section 3). The channel attention block consists of convolutional blocks, batch normalization, ReLU activation, and a skip connection. The outline of the proposed framework is shown in Fig. 2. The input image is passed through both sub-networks to extract multi-resolution features, with CA-HRNet accepting RGB color channels and CA-DCT accepting YCbCr channels. Further, a fusion unit processes the obtained features from sub-networks to create stacked features and is employed in the binary classification of the forged image at the pixel level. The end-to-end network is trained using a binary cross-entropy loss function and a stochastic gradient descent optimizer to learn a set of optimal parameters to localize forged and non-forged regions in a test image. A few examples of forged region localization on tampered images are shown in Fig. 1. The main contributions of the current work are as follows:

  • 1.

    A channel attention network avoids dimensionality reduction and uses inter-channel interaction to detect and localize image forgeries.

  • 2.

    A unified framework for localizing splicing and copy-move image forgeries. The proposed technique targets both forgeries, unlike other techniques focusing solely on copy-move or splicing localization.

  • 3.

    The proposed technique has demonstrated improved generalization capabilities on unseen manipulations using attention-aware features.

The remainder of this article is organized as follows. In Section 2, we review related works and in Section 3, we describe the proposed approach followed by the implementation details in Section 4. The experimental evaluations and discussions are presented in Section 5. Finally, the conclusions are drawn in Section 6.

Section snippets

Related Works

Numerous classical and learning-based approaches in image forgery detection and localization have been proposed in the literature [4], [44], [13], [20], [45], [23], [46], [47]. Both approaches detect forgeries by extracting information from the spatial or frequency domains or from both. Inconsistencies in forged images, such as noise and illumination, blocking artifacts, and camera metadata, are utilized as spatial features, and; resampling and double compression artifacts are used as

Proposed Image Forgery Localization Algorithm

Our proposed network consists of two networks, one for the extraction of multi-resolution spatial domain features and another for the extraction of multi-resolution frequency domain features. The architecture of the proposed algorithm is shown in Fig. 3. Multi-resolution features are employed to capture diverse image manipulations. CA-HRNet extracts the spatial domain features, and CA-DCTNet extracts the frequency domain features using HRNet as the backbone. Although the original HRNet performs

Implementation details

The proposed network is implemented using PyTorch [70]. The training is carried out using the datasets without any pre-and post-processing of images. The HRNet is initialized by the weights pretrained on the imagenet, and the weights initialize the DCT stream pretrained on single and double JPEG compressed images. The binary cross-entropy loss function is minimized using the stochastic gradient descent optimizer, starting with a learning rate of 0.005, a weight decay of 0.0005, and a momentum

Experimental evaluation

Performance of the proposed algorithm is evaluated on five datasets including CASIA v2 [71], NIST [21], FantasticReality [65], Carvalho [41], Columbia [72]. The detailed description of the datasets is given in Table 3. CASIA v2 is a popular dataset created from various sources. It has tampered images and authentic images, where the ground truth masks are obtained from [54]. NIST dataset is created using different splicing objects obtained from the MS-COCO dataset into the images of the DRESDEN.

Conclusion

This article presented an end-to-end deep neural network approach for image forgery localization using two subnetworks to extract multi-resolution spatial and frequency domain features. We introduced channel attention and combined it with the subnetworks to learn attention-aware features on forged regions. CA-HRNet derived features in the spatial domain with multiple resolutions, whereas CA-DCTNet derived features in the frequency domain. These two features are combined to localize the forged

CRediT authorship contribution statement

Iyyakutti Iyappan Ganapathi: Conceptualization, Methodology, Software, Writing - original draft. Sajid Javed: Conceptualization, Supervision, Writing - review & editing. Syed Sadaf Ali: Methodology, Software, Writing - review & editing. Arif Mahmood: Supervision, Writing - review & editing. Ngoc-Son Vu: Supervision, Writing - review & editing. Naoufel Werghi: Conceptualization, Supervision, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by research funds from Khalifa University, Ref: CIRA-2019-047.

Iyyakutti Iyappan Ganapathi is currently a postdoctoral fellow at Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, UAE. He previously worked as an Assistant Professor at Woosong University in South Korea. He earned his a PhD degree from the Indian Institute of Technology Indore in India. 3D image processing, biometrics, computer vision, and machine learning are among his research interests.

References (74)

  • M. Zampoglou et al.

    Large-scale evaluation of splicing localization algorithms for web images

    Multimedia Tools and Applications

    (2017)
  • S.S. Ali et al.

    Image forgery localization using image patches and deep learning

  • S.S. Ali et al.

    Image forgery detection using deep learning by recompressing images

    Electronics

    (2022)
  • I. Amerini et al.

    Splicing forgeries localization through the use of first digit features

  • T. Bianchi et al.

    Image forgery localization via block-grained analysis of jpeg artifacts

    IEEE Transactions on Information Forensics and Security

    (2012)
  • W. Wang et al.

    Effective image splicing detection based on image chroma

  • P. Ferrara et al.

    Image forgery localization via fine-grained analysis of cfa artifacts

    IEEE Transactions on Information Forensics and Security

    (2012)
  • X. Wang et al.

    A visual model-based perceptual image hash for content authentication

    IEEE Transactions on Information Forensics and Security

    (2015)
  • D. Cozzolino et al.

    A new blind image splicing detector

  • K. He et al.

    Deep residual learning for image recognition

  • X. Bi et al.

    Reality transform adversarial generators for image splicing forgery detection and localization

  • Y. Wu et al.

    Manipulation tracing network for detection and localization of image forgeries with anomalous features

  • X. Bi et al.

    The ringed residual U-Net for image splicing forgery detection

  • J.H. Bappy et al.

    Hybrid lstm and encoder–decoder architecture for detection of image forgeries

    IEEE Transactions on Image Processing

    (2019)
  • L. Verdoliva

    Media forensics and deepfakes: An overview

    IEEE Journal of Selected Topics in Signal Processing

    (2020)
  • Y. Zhu et al.

    AR-Net: Adaptive attention and residual refinement network for copy-move forgery detection

    IEEE Transactions on Industrial Informatics

    (2020)
  • Y. Li et al.

    Fast and effective image copy-move forgery detection via hierarchical feature point matching

    IEEE Transactions on Information Forensics and Security

    (2018)
  • J.-L. Zhong et al.

    An end-to-end dense-inceptionnet for image copy-move forgery detection

    IEEE Transactions on Information Forensics and Security

    (2019)
  • B. Liu, C.-M. Pun, Deep fusion network for splicing forgery localization, in: Proc. of the European Conference on...
  • Y. Wei et al.

    Controlling neural learning network with multiple scales for image splicing forgery detection

    ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)

    (2020)
  • X. Bi et al.

    Multi-task wavelet corrected network for image splicing forgery detection and localization

  • K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint...
  • O. Ronneberger et al.

    Convolutional networks for biomedical image segmentation

  • N. Alipour et al.

    Semantic segmentation of JPEG blocks using a deep CNN for non-aligned jpeg forgery detection and localization

    Multimedia Tools and Applications

    (2020)
  • R. Zhang et al.

    A dense u-net with cross-layer intersection for detection and localization of image forgery

  • X. Bi, Y. Liu, B. Xiao, W. Li, C.-M. Pun, G. Wang, X. Gao, D-Unet: A dual-encoder U-Net for image splicing forgery...
  • K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: Proceedings of the IEEE international conference on...
  • Cited by (6)

    • Forgery Localization in Images Using Deep Learning

      2023, Proceedings - 2023 15th IEEE International Conference on Computational Intelligence and Communication Networks, CICN 2023
    • Enhanced authentication system with robust features for the secure user template

      2023, Proceedings - 2023 12th IEEE International Conference on Communication Systems and Network Technologies, CSNT 2023

    Iyyakutti Iyappan Ganapathi is currently a postdoctoral fellow at Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, UAE. He previously worked as an Assistant Professor at Woosong University in South Korea. He earned his a PhD degree from the Indian Institute of Technology Indore in India. 3D image processing, biometrics, computer vision, and machine learning are among his research interests.

    Sajid Javed received his B.Sc. degree in computer science from the University of Hertfordshire, Hatfield, U.K., in 2010, and the combined master’s and Ph.D. degree in computer science from Kyungpook National University, Daegu, South Korea, in 2017. He is an Assistant Professor of Computer Vision with the Electrical and Computer Engineering Department, Khalifa University of Science and Technology, Abu Dhabi, UAE. Prior to that, he was a Research Scientist with Khalifa University Center for Autonomous Robotics System, Abu Dhabi, from 2019 to 2021. Before joining Khalifa University, Abu Dhabi, he was a Research Fellow with the University of Warwick, Coventry, U.K., from 2017 to 2018, where he worked on histopathological landscapes for better cancer grading and prognostication. His research interests include visual object tracking in the wild, multiobject tracking, background-foreground modeling from video sequences, moving object detection from complex scenes, and cancer image analytics, including tissue phenotyping, nucleus detection, and 830 nucleus classification problems. His research themes involve developing deep neural networks, subspace learning models, and graph neural networks.

    Syed Sadaf Ali is postdoctoral researcher at ENSEA, France. He received his Bachelor of Technology degree and PhD degree from Indian Institute of Technology Indore, India. His research interest includes image processing, computer vision, pattern recognition and biometric template security.

    Arif Mahmood is a Professor and Chairperson of Computer Science Department in Information Technology University and Director Computer Vision Lab. His current research directions in Computer Vision are person pose detection and segmentation, crowd counting and flow detection, 840 background-foreground modeling in complex scenes, object detection, humanobject interaction detection and abnormal events detection. He is also actively working in diverse Machine Learning applications including cancer grading and prognostication using histology images, predictive auto-scaling of services hosted on the cloud and the fog infrastructures, and environmental monitoring using remote sensing. He has also worked as a Research Assistant Professor with the School of Mathematics and Statistics, University of the Western Australia (UWA) where he worked on Complex Network Analysis. Before that he was a Research Assistant Professor with the School of Computer Science and Software Engineering, UWA and performed research on face recognition, object classification and action recognition.

    Since 2013, Ngoc-Son Vu is an associate professor at the ETIS lab - CY Cergy Paris University, ENSEA, and CNRS in France. He earned his PhD in 2010 from Grenoble-INP. He has held postdoctoral positions at INSA de Lyon and Grenoble-INP. Computer vision and machine learning are among his research interests. He was awarded two best paper awards at the IEEE CBMI 2016 and IEEE IJCB 2011 conferences.

    Naoufel Werghi is currently a Professor at department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, UAE. He received the Ph.D. degree in computer vision from the University of Strasbourg, Strasbourg, France, in 1996. His main area of research is image analysis and interpretation, where he has been leading several funded projects in the areas of biometrics, medical imaging, and intelligent systems.

    View full text