Using decision fusion of feature selection in digital forensics for camera source model identification

https://doi.org/10.1016/j.csi.2011.10.006Get rights and content

Abstract

Digital forensics, which identifies the characteristics and origin of a digital device, has become a new field of research. If digital content will serve as evidence in court, similar to its non-digital counterparts, digital forensics can play a crucial role in identifying the source model or device. To achieve this goal, the relationship between an image and its camera model will be explored. Various image-related and hardware-related features are utilized in the proposed model by a support vector machine approach along with decision fusion techniques. Furthermore, the optimum feature subset to achieve the highest accuracy rate is also explored.

Introduction

The Internet has changed the way people acquire and utilize information, especially in the realm of digital images. For everyday use, digital cameras have replaced their film-based counterparts — a large part of their popularity can be attributed to a dramatic drop-off in prices. These digital cameras do not sacrifice quality for value: they still capture high quality images, are easy to use, have various image displaying formats. However, a digital image is a vulnerable image — it is susceptible to replication or modification because of the convenient availability of so many powerful image editing software packages. If a digital image still wants to serve as evidence in court like its traditional counterpart, verifying the authenticity of a digital image, detecting forged regions, and identifying the digital source need to be addressed. Digital forensics can be defined as the collection of scientific techniques for the preservation, validation, identification, analysis, interpretation, documentation, and presentation of digital evidence derived from digital sources for the purpose of facilitating or furthering the reconstruction of events, usually of a criminal nature. Although representing information in a digital form has many compelling technical and economic advantages, it has led to new issues and significant challenges when performing forensic analysis of digital evidence. Therefore, we need the cooperation of information technology and forensic science [30] to overcome such new challenges. The former provides a platform of basic knowledge and skill; the latter offers the consistent and well-defined forensic procedures which must be employed in the court [34] for processing digital evidence. Incorporation of the two ensures validity and credibility of the digital evidence and makes this evidence admissible in court. Hence, a pioneer standardized forensic procedure to identify source device will be provided in this study. When digital evidence is necessarily required in the court, for example, voyeuristic photos are found, the procedures will be executed to extract features from the photo and identify its source device. On the other hand, we fully understand that current techniques cannot achieve 100% accuracy as matters of legality demand infallible methods. However, this study provides a stepping-stone to developing more scrupulous techniques, which can only be reached by continuous research, catalyzing a scientific revolution, and eventually arriving at the best results.

Studies on watermarks have helped determine whether an image has been altered [14]. However, watermarks need to be inserted during the creation of an image. This increases the production cost of digital cameras and complicates the design of the internal circuitry. This makes it difficult to clarify the source of the images, let alone the brand or model. The file header of most images taken by digital cameras will truly contain camera model/brand and photograph information but this information can be easily falsified. Because of this, such ease of alteration disqualifies the photograph from being used as evidence in court.

Since “Methods for identification of images acquired with digital cameras” by [19] addressed the problem of identifying the camera source from an image, several papers (summarized in Table 1) have proposed to tackle this issue by using features on intrinsic hardware artifacts caused by imperfections or on software-related fingerprints left during the image formation. When the method of detecting intrinsic hardware artifacts is adopted, artifacts such as pattern noise [18], [21], [22], [26], lens radial distortion [11], chromatic aberration [44] or sensor dust [16] are used as the fingerprint or biometric to identify either the brand/model source or device source.

Other than using hardware imperfections, there have been researches that explore software-related fingerprints such as image-related features or artifacts introduced by color filter array (CFA) interpolation. Based on the work of [32], [3], [4] argued that most proprietary interpolation algorithms will exhibit a rather linear characteristic when they are applied in smooth image parts, and therefore, can be used to classify images. [25] proposed the quadratic pixel correlation model with the assumption that demosaiced images should demonstrate spatially periodic inter-pixel correlation. In the research of [36], [37], a nonintrusive component forensic model is devised to estimate the interpolative coefficients and an efficient camera identifier is constructed to determine the source brand and model of an image. From a different standpoint, [24] believed that an output image is affected by CFA configuration/demosaicing algorithm and color processing/transformation. Therefore, based on the steganalysis research of [2], they proposed to use color-related features and Image Quality Metrics (IQM) to extract the characteristics of an image and then utilize a SVM-based classifier to identify the camera source brand or model of the image. Their study has been adopted in [39], [40] and further modified to identify not only digital cameras but also the digital cameras found in cell phones. In [6], [7], the authors applied a feature-selection algorithm to the feature set, which includes not only IQM, but also binary similarity measures and higher-order wavelet statistics for identification of the source cell-phone model.

Before we further analyze other techniques of digital camera source model identification, previous research findings in [40] that are interesting should be mentioned here. In that paper, the experimental results show that camera source model identification is not based on image content. Four different cameras are identified based on the same or similar scenes, and [40] obtained a 100% accuracy rate. These results prove that camera source model identification is not related with the function of image contents. Since such controlled experiments are rarely applied, this study will focus on camera source model identification with varied image content since such an application is more general in practice.

Although the model based on IQM features to classify camera source model achieves good results, [20] points out that device identification is generally not possible by only using IQM-based features. Hence, other than IQM-based features, a photo-response non-uniformity noise (PRNU)-based feature set will be investigated in this study. To achieve higher accuracy rates, the recent research in [5], [38] proposed combining hardware artifacts and software-related fingerprints, whereas utilizes a feature-selection algorithm to choose the important features from the feature set of binary similarity measures, image quality metrics, and high-order wavelet statistics. Although SFFS (Sequential Forward Floating Search) [33] is used in [7] to reduce the number of features, what the chosen features are and why they are selected are not given in that paper. Moreover, [41] proposed using a set of feature selection algorithms and the major voting rule to identify the most 20 important features among 34 features proposed in [24] and achieved 5% more accuracy rate than the rate gained using the method proposed by [24]. To verify the 34 software features in [24] and the 9 PRNU-related hardware artifacts, which are categorized as pattern noise in [18], [42] leveraged the feature selection model in their previous study of [41] to explore out the most 18 important features among 43 features and gained 94.95% accuracy rate when 20 camera models are used. Compared with the research in [42], this study systematically details and explains what the chosen features are and why they are selected while the number of camera used is increased to 25. However, it is noted that identifying the camera source model in this paper means that only the camera model other than the device actually taking an image is predicted by the proposed research method. Even though the proposed method investigated has successfully identified the same model by different cameras, we fully understand that identifying the source device utilizes more detailed device specific information other than source model or brand identification. Nevertheless, we expect to leverage our findings by exploring those topics in future research applications.

This paper is organized as follows: the details of the theoretical approach will be explained in Section 2. Section 3 will document the experiments and discuss the experimental results and the conclusion is in Section 4.

Section snippets

The image formation process

Although the color image formation process is different among different manufacturers, the output image is greatly influenced by the following:

  • 1.

    The Color Filter Array (CFA) configuration and demosaicing algorithm.

  • 2.

    The color processing and transformation.

As illustrated in Fig. 1(a), light from a scene passes through a lens and different optical filters, and that light is subsequently captured by an array of sensors. Most digital cameras adopt a CFA as shown in Fig. 1(b) to sample real-world scenes

Experiments and discussion

Before we discuss the experimental settings, we must first decide upon the sample and test image size. [24] applied 60 images for training and 90 images for test. [44] used 30 images for training and 60 images for test. Both [5], [7] took 100 images for training and 100 images for test. [18] utilized 45 images for training and 56 to 605 images for test. Since every technique had differing requirements to achieve the best accuracy with low false positive rates, we have adopted the setting of [24]

Conclusion and future research

This study focused on analyzing the relationship between digital cameras and their photographs through the help of support vector machines and decision fusion. The proposed approach utilizes feature selection algorithms to choose the top λ (λ = 20 based on the experimental results) important features, selected by these algorithms from 43 image features. We found the best results were obtained when we used the central 90% of the image area to extract features. Due to the influence of optical

References (46)

  • M. Chen et al.

    Determining image origin and integrity using sensor noise

    IEEE Transactions on Information Forensics and Security

    (2008)
  • K.S. Choi et al.

    Source camera identification using footprints from lens aberration

  • Choi JH, et al., Color laser printer identification by analyzing statistical features on discrete wavelet transform....
  • C. Cortes et al.

    Support-vector networks

    Machine Learning

    (1995)
  • I. Cox et al.

    Digital Watermarking and Steganography

  • I. Daubechies

    Ten Lectures on Wavelets

    (1992)
  • A.E. Dirik et al.

    Digital single lens reflex camera identification from traces of sensor dust

    IEEE Transactions on Information Forensics and Security

    (2008)
  • D. Donoho

    De-noising by soft-thresholding

    IEEE Transactions on Information Theory

    (1995)
  • T. Filler et al.

    Using Sensor Pattern Noise for Camera Model Identification

  • Z. Geradts et al.

    Methods for identification of images acquired with digital cameras

  • T. Gloe et al.

    Feature-based camera model identification works in practice — results of a comprehensive evaluation study

    Lecture Notes in Computer Science

    (2009)
  • M. Goljan et al.

    Camera identification from cropped and scaled images

  • M. Goljan et al.

    Large scale test of sensor fingerprint camera identification

  • Cited by (0)

    View full text