Source camera model identification based on convolutional neural networks with local binary patterns coding

https://doi.org/10.1016/j.image.2018.08.001Get rights and content

Highlights

  • A novel source camera model identification method using CNN is proposed.

  • Local binary patterns coding is used to reduce the influence of scene details.

  • Extensive experiments demonstrate the effectiveness of the proposed method.

Abstract

Source camera model identification has always been one of the main fields of digital image forensics since it is the foundation of solving a wide range of forensic problems. Several effective camera model identification algorithms have been developed for the practical necessity. However, they are mostly based on traditional machine learning methods and rely on well-designed features or models. Since deep learning has made great progress in computer vision tasks, significant interest has arisen in applying deep learning in image forensics. In this paper, we present a deep learning approach to tackle source camera model identification problem. We modify a convolutional neural networks (CNNs) structure similar to AlexNet and equip it with a simple local binary patterns (LBP) preprocessing operation. The identification accuracy on the public database “Dresden Image Database” achieves 98.78% over 12 camera models without any other sophisticated procedures, for instance, extra classifier, majority voting, etc.

Introduction

With the development of information technology and social media networks, digital images are found everywhere in our daily life and nearly become the most pervasive information carrier. Besides playing an important role in information spreading, digital images, as a kind of visual data, are usually regarded as a certification of truth or evidences in front of a court of law since we traditionally believe in the integrity of what we see. However, people can acquire images easily with the popularity of inexpensive cameras and cell phone devices, and more importantly, can also manipulate them from the source information to contents and even create ones as they want with the development of image manipulation softwares and social networks over the years. Therefore, the situation highlights the necessity to verify the source and authenticity of digital images. It is a key work in the field of digital image forensics [1].

As one main field of image forensics, source camera identification has two branches. One is to match an image with one individual camera, and the other is to match it with a specific camera model. For these tasks, the researchers have been devoted to studying the pipeline of image acquisition process and exploiting traces or artifacts introduced to the images to capture source information. As shown in Fig. 1, image acquisition process involves several stages each of which can be implemented differently in different cameras. Consequently, some unique traces or artifacts are introduced in the final images. According to those traces from given cameras, a variety of camera identification approaches have been proposed and they can be categorized into two groups.

The first group of methods managed to compute a hypothesized analytical model on certain stages of image acquisition and then evaluates the correlation between the model and the tested image. Lucas et al. [2] used a sensor pattern noise model to identify the source camera sensors. Choi et al. [3] chose a lens radial distortion pattern as a fingerprint. Dirik et al. [4] utilized dust-spot characteristics to identify the source digital single lens reflex camera. In order to extract reliable photo-response non-uniformity (PRNU), Amerini et al. [5] introduced a minimum mean square error (MMSE) filter in the un-decimated wavelet domain to estimate the PRNU noise. Li [6] suggested that an enhanced fingerprint can be obtained by assigning weighting factors according to the magnitude of scene details to eliminate the influence of image content. Furthermore, Tomioka et al. [7] proposed a method based on the pairwise relationships of pixel clusters to suppress the effects of noise contamination. Recently, Li et al. [8] proposed the use of principal component analysis (PCA) to formulate a compact SPN representation. Besides, a training set construction procedure was also proposed to enhance the de-noising effect in [8].

The other group of methods relied on well-designed feature vector extraction and machine learning classifiers. Swaminathan et al. [9] constructed an efficient camera identifier through estimating interpolation coefficients of color filter array (CFA). Xu et al. [10] extracted 354-dimensional features based on local binary patterns (LBP) to distinguish camera models. Hu et al. [11] developed an improved algorithm using inter-channel demosaicing traces for camera model identification. Tuama et al. [12] proposed a method to extract high order statistics consisted of co-occurrences matrix, traces of color dependencies features related to CFA interpolation arrangement, and conditional probability statistics. A better identification result compared with the correlation based method was reported in their paper. Different from [10], the recent work [13] also investigated the discriminative ability of local phase quantization (LPQ), a LBP like texture descriptor, to distinguish imaging devices. The combined texture features of LBP and LPQ resulted in higher identification accuracy compared with [10]. Chen etal [14] built a rich model of 1372-dimensional features to identify the model of an image’s source camera. They utilized two co-occurrence matrixes to capture the reconstructed error between the original image and the reconstructed version. The average identification accuracy of 99.2% over 12 camera models was reported in their paper.

On the other hand, the convolutional neural networks (CNNs) ,which are strong in feature learning, have made great success in computer vision tasks and developed rapidly since 2012 [[15], [16], [17], [18]]. These achievements arouse attention from the community of digital image forensics and several works have been done to exploit suitable approaches to apply CNNs to solve forensic problems. Baroffio et al. firstly applied CNNs to identify source cameras, but the poor performance indicated that the CNNs designed for computer vision (CV) cannot be suited to camera identification directly. On the basis, they proposed a new approach to dealing with camera identification in [19]. They simply regarded CNNs as a feature extractor and combined the networks with SVM classifiers to complete the classification tasks. Chen et al. [20] noticed that the difference among classes of image forensics problems is subtle and added a preprocessing layer before CNNs for median filtering forensics. This preprocessing process achieved a significant boost in performance. Bayer et al. [21] proposed a new convolutional layer which is similar to a preprocessing layer as a part of the CNNs to detect image manipulation. Tuama et al. [22] presented a CNNs structure similar to AlexNet and equipped it with a high-pass filter (HPF) layer [23] to cope with camera model identification. Their experimental results showed the important role that preprocessing part plays in classification accuracy and indicated that trying the bigger networks such as GoogleNet or ResNet might be promising. The experimental results in [24] also confirmed the points mentioned above. Recently, Yang et al. [25] proposed content-adaptive fusion residual networks to detect image origin and achieved satisfactory performances in the case of query images with small size in camera brand identification. But for camera model identification, the detection accuracy is only 87.55%, so there is still much room for improvement.

The CNNs method is promising but not efficient enough for camera model identification. Firstly, the database available is not as large as ImageNet. Therefore, the CNNs cannot be very deep, or rather the effectiveness of training may be affected. Secondly, the CNNs tailored for CV are sensitive to the primary visual information of an image rather than the intrinsic source information which is invisible for the naked eyes. From the view of the current situation of the related research, the intrinsic source information is mostly hidden in the noise of images or other statistical features. It might be not realistic to let CNNs capture such low signal-to-noise signals. However, we can make “glasses” for CNNs so that the effective source information can be enlarged in the “eye” of CNNs or exists in a way that is easier to be recognized by CNNs. That is to say, we can give some hints or guidance to CNNs. Therefore, we propose that CNNs can capture the information that human’s eyes cannot perceive and accomplish the camera model identification tasks with the help of research results from the experts of related fields. Particularly, we modify a shallow CNNs architecture like AlextNet and use the simplest LBP operator to help the CNNs to extract source camera information of images. Experimental studies illustrate that our method achieves better classification accuracy results compared with the state-of-art classical and CNN-based methods.

The rest of the paper is organized as follows. Section 2 explains the details of our proposed method. In Section 3, extensive experiments are carried out and comparisons of state-of-art are presented to show the superiority of proposed method. And, conclusions of the paper are drawn and some perspectives are proposed in Section 4.

Section snippets

Proposed method

In this section, we firstly present the description of coding preprocessing operation which is applied before the CNNs architecture based on local binary patterns. Then the CNNs architecture as well as the design considerations are introduced in details . Finally, the implementation details of training and testing process are presented including our selection or adjustment strategies of some key parameters. The framework of the proposed method is illustrated in Fig. 2.

Experimental results and discussion

In this section, we conduct two different experiments to evaluate our proposed method and also provide the results of state-of-the-art classical method [14] and CNN-based method [22] for comparison. We use images from well-known “Dresden Image Database” and cut them into patches as described above. As for the images from various devices of the same model, we only focus on their source camera model and mix them together so that we are able to obtain a relatively larger amount of samples, and

Conclusion

In this paper, we investigate a promising approach based on convolutional neural networks with local binary patterns for source camera model identification, which achieves better performance compared to the-state-of-art methods tested on the same dataset. In particular, the LBP coding operation exerts a significant effect on promoting the performance and it indicates that a well-designed CNNs structure with smart preprocessing hint can be an excellent tool for image forensics problems

Acknowledgments

This work is supported by the National Science Foundation of China (No. 61502076) and the Scientific Research Project of Liaoning Provincial Education Department , China (No. L2015114).

References (35)

  • TomiokaY. et al.

    Robust digital camera identification based on pairwise magnitude relations of clustered sensor pattern noise

    IEEE Trans. Inf. Forensics Secur.

    (2013)
  • SwaminathanA. et al.

    Nonintrusive component forensics of visual sensors using output images

    IEEE Trans. Inf. Forensics Secur.

    (2007)
  • XuG. et al.

    Camera model identification using local binary patterns

  • HuY. et al.

    An improved algorithm for camera model identification using inter-channel demosaicking traces

  • TuamaA. et al.

    Camera model identification based machine learning approach with high order statistics features

  • KrizhevskyA. et al.

    Imagenet classification with deep convolutional neural networks

  • RussakovskyO. et al.

    Imagenet large scale visual recognition challenge

    Int. J. Comput. Vis.

    (2015)
  • Cited by (24)

    • Virtual sample generation for few-shot source camera identification

      2022, Journal of Information Security and Applications
    • Interpol review of imaging and video 2016–2019

      2020, Forensic Science International: Synergy
      Citation Excerpt :

      This is, together with inter and intra observer variability of landmark annotation, one of the main causes of the limited value of landmark measurements on photographs [103]. However, development of pose detection and automatic landmark detection has been reported to result in almost 90% identification accuracy in side view positions [104]. For predicting face recognition performance in a video, it was observed that face detection confidence and face size serve as potentially useful quality measure metrics [105].

    • First steps toward CNN based source classification of document images shared over messaging app

      2019, Signal Processing: Image Communication
      Citation Excerpt :

      In [30–33], non-overlapping patches are extracted from each image and are used to learn a CNN model. Whereas, in [34], the authors suggest to learn the CNN model from LBP coded images obtained from the input images. On the other hand, [35] proposes to solve this problem using an augmented convolutional feature maps approach.

    View all citing articles on Scopus
    View full text