Cross-Class and Inter-class Alignment Based Camera Source Identification for Re-compression Images

Zhang, Guowen; Wang, Bo; Li, Yabin

doi:10.1007/978-3-319-71598-8_50

Cross-Class and Inter-class Alignment Based Camera Source Identification for Re-compression Images

Guowen Zhang¹⁶,
Bo Wang¹⁶ &
Yabin Li¹⁶

Conference paper
First Online: 30 December 2017

2055 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10668))

Abstract

With the sophisticated machine learning technology developing the state of art of model based camera source identification has achieved a high level of accuracy in the case of matching identification, which means the feature vectors of training and test sets follow the same statistical distribution. For a more practical scenario, identifying the camera source of an image transmitted via social media applications and internet is a much more interesting and challenging work. Undergoing serials of manipulations, re-compression for instance, the feature vectors of training and test sets mismatch, thus decreasing the identification accuracy. In this paper, cross-class and inter-class alignment based algorithms, inspired by transfer learning, are proposed to minimize the distribution difference between the training and the test sets. Experiments on four cameras with five image quality factors indicate that the proposed cross-class, inter-class alignment based algorithms and their combination outperform the existing LBP method, and presents high identification accuracies in re-compression images.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

As the development of the social platforms and online media brings great convenience to people, digital images, as one kind of multimedia, are becoming much more important in our daily lives. At the same time, various image processing software and applications have been very popular, and people do not need professional skills for image tampering any longer. Image tampering and forging become easy and convenient. Recently, several events of tampering and forgery of digital images have had negative effect on various fields, justice and science for instance. People’s confidence in the news and social integrity has declined. More seriously, when the tampered image is used as an evidence in the court, there would be lots of troubles in the justice system.

Because of the increasing requirements of digital media integrity and authenticity analysis, the digital image forensics technology has been rapidly developed in the last decade [1]. As one of the most important branch of digital images forensics, source camera identification focuses on identifying the device type, device model or device individual which captures the image. Correspondingly, the source camera identification is divided into three categories [2]. Device-based identification, which aims to explore the type of device capturing the image [3], usually focuses on distinguishing the computer generated image, camera image, cell phone image and scanned image. Model-based identification pays more attention on identifying the device model [4], for instance camera model, which generates the digital image. Camera-based approaches look forward to specifying the camera individual. In this paper, only the camera model identification approaches could be investigated and discussed.

In recent years, several algorithms for model-based camera source identification have been proposed. Most of these methods follow a framework, which considers the problem of camera model identification as the classification issue, and as a result, it could be solved by machine learning approaches. As a matter of fact, numerous existing methods of source camera identification in the framework have achieved an approximately high detection accuracy more than 90%. A typical approach proposed by Xu et al., extracts 354-dimensional features based on local binary patterns to distinguish camera models. Considering 13 brand and 18 camera models, the average accuracy of the LBP method reaches up to 98% [5]. Another outstanding work by Swaminathan et al. constructs an efficient liner model to estimate the interpolation coefficients of color filter array [6]. Using the CFA coefficients as the feature vector, a distinguished classifier is established for camera source identification, and a high average discrimination accuracy of 90% is achieved on a dataset of 19 camera models [4]. Although the decent high detection accuracies, these algorithms set a default scenario that the training and test image samples are considered raw and unprocessed images, in other words, in the laboratory environment.

Considering a more practical scenario, the digital image, captured by a camera or cell phone, is uploaded to the internet via the fashionable social media applications, for instance Twitter and Facebook in the U.S., QQ, WeChat and Microblog in China. Sequentially, the image could be spread and forwarded. When the digital image is adopted as an evidence in the court, the digital image is usually manipulated and processed in the internet. Because of the popularity of the social media applications, this scenario is more practical and significant for the real world. In this paper, the more practical and meaningful scenario is explored and discussed.

In the more practical social media application based scenario, the digital images may experience retouching, geometric transformation such as resizing, re-compression and even D/A, A/D transform and so on, because of the limitations of the social media application platforms and communication channels [2]. It is the enhanced operations and manipulations that make the test images, which are used for camera source identification, no longer the raw ones from the devices. Furthermore, the statistical characteristics and feature distributions deviated from that of the training raw digital images. The deviation of the training raw image samples and the test manipulated image samples makes the practical camera source identification a challenging task.

Typically, the manipulations in the social media applications and the internet is complicated and manifold. To simplify the model, we focus on the re-compression operation in the image pipeline. In spite of the restrictions of terminal screen resolution, the re-compression image via the social media applications and internet show few visual differences from its raw copy. Thus, JPEG compression is an almost essential operation of image processing in social networks, which makes the simplification reasonable and feasible. But as mentioned before, the statistical characteristics and feature distributions vary largely, which means the identification accuracy in the laboratory environment is no longer reliable.

Wang et al. analyzes the influence of JEPG compression on the typical camera model identification approaches [7]. The classification accuracies of the existing algorithms decrease rapidly with the decreasing of JPEG compression quality. An intuitionistic solution could construct an online training system, which generates the training image samples and trains the classification model online, according to the re-compression quality factor obtained from the test samples. In this scheme, the computation costs is huge and online training is time-consuming. Obviously, the scheme of online training system could work while it is unrealistic.

In this paper, we focus on the camera source identification of re-compression images. Instead of the online training system, algorithms based on cross-class alignment (CCA) and inter-class alignment (ICA), inspired by transfer learning, are proposed.

The paper is organized as follows. In Sect. 2, following by the instruction of motivation of transfer learning, CCA and ICA based methods are proposed, conjunction with a description of the LBP features used in the method. The experiments are demonstrated and discussed in Sect. 3. Finally, the paper is concluded in Sect. 4.

2 Proposed Algorithms

In this section, we first give a brief introduction of the transfer learning, which inspires us to design the algorithms. Subsequently, the cross-class and inter-class alignment based algorithms are introduced in detail.

2.1 Transfer Learning

In the practical scenario of camera source identification for a digital image, the most important reason of fast reduction of identification accuracy is the variation of statistical characteristics and feature distributions between training set and the test image samples, which is caused by the re-compression manipulation used in the social media applications and internet [7]. The variation makes the existing methods deviate from the assumption that training set and test set are subject to the same distribution. Without the foundation of classification model, the source camera identification methods based on machine learning spontaneously degrade in the performance field.

To solve the mismatch of the distribution, an important approach named transfer learning is proposed. It relaxes the two restrictions of the basic assumptions in traditional machine learning: first one is that the statistical characteristics and feature distributions of training data and test data follow the identically distribution, and the other is that there are enough labeled samples to train a good classification model. Transfer learning aims to use the learned knowledge of training data to solve the problems that there are few or even not any labeled data in test data [8]. Transfer learning suggests that if the training data and test samples are inherently correlated, the training progress in the classifier definitely contributes to the classification of the test samples, even though the distribution of training data varies from that of the test samples.

The ‘domain’ contains the feature space χ and the marginal probability distribution P(x), where the feature space χ denotes eigenvector space. The ‘task’ can be considered as the tag space y and the target prediction function f (•), where f (•) is used to predict the label of the test sample equivalent to P(x|y). In the other word, the ‘task’, including the source task T _s and target task T _t in the camera source identification, means classification, as Fig. 1 illustrated. In this scenario, transfer learning try to solve the problem described as following.

$$ \left\{ {\begin{array}{*{20}l} {D_{s} \ne D_{t} } \hfill \\ {T_{s} = T_{t} } \hfill \\ \end{array} } \right. $$

(1)

The key of transfer learning means constructing a projection, or a transformation to minimize the distribution difference between the training set and the test set.

2.2 Cross-Class Alignment Based and Inter-class Alignment Based Approaches

Inspired by the transfer learning, we try to design a transformation to minimize the distribution deviation. In this case, two independent approaches are proposed. In our work, the re-compression manipulation is the only factor that makes the distributions of source domain and the target domain deviated. Considering the characteristics of JPEG compression, a Gaussian model is used to evaluate the deviation of these two domains.

Cross-Class alignment based approach:

Based on the assumption that the test image samples are all re-compressed, a cross-class alignment base approach, which means a global alignment between the source domain (training set) and the target domain (test set), is proposed. It supposes that we have a set of n _s samples S = {p ₁, p ₂, ···, p _ns} ∈ R ^d in the source domain and a set of n _t samples T = {q ₁ , q ₂,······q _nt} ∈ R ^d in the target domain, where d is the dimension of the feature vector. To evaluate the difference between these two Gaussian model domains, the minimization of expectation and the standard deviation are considered, as Eqs. (2) and (3) shows.

$$ E(\varphi (s^{j} )) = E(t^{j} ) $$

(2)

$$ \sigma (\varphi (s^{j} )) = \sigma (t^{j} ) $$

(3)

where s ^j denotes of the j-th feature of training samples, and t ^j disnotes the j-th feature of the test samples, j = 1, 2,···, d. E(s ^j), σ(s ^j) is defined to represent the expectation and the standard deviation of the j-th feature of samples in source domain. The transformation φ(·) for each feature is defined as following [9].

$$ \varphi (s_{i}^{j} ) = (s_{i}^{j} - E(s^{j} ))\frac{{\sigma (t^{j} )}}{{\sigma (s^{j} )}} + E(t^{j} ) $$

(4)

where j = 1, 2,···, d, and i = 1, 2,···, n _s.

Inter-class alignment based approach:

The global alignment, cross-class alignment, focuses on the re-compression effect between the training set and the test set. Considering the alignment between the labels, an inter-class alignment base approach is presented. Correspondingly, we aim to minimize the intra-class expectation and standard deviation, as Eqs. (5) and (6) shows.

$$ E(\varphi (s^{j} ),y) = E(t^{j} ,y) $$

(5)

$$ \sigma (\varphi (s^{j} ),y) = \sigma (t^{j} ,y) $$

(6)

where j = 1, 2···, d.

Similar with the cross-class alignment, the transformation φ(·) could be described as following, with a specified label for each class [10].

$$ \varphi (s_{i}^{j} ) = (s_{i}^{j} - E(s^{j} ,y_{i} ))\frac{{\sigma (t^{j} ,y_{i} )}}{{\sigma (s^{j} ,y_{i} )}} + E(t^{j} ,y) $$

(7)

where j = 1, 2···, d, i = 1, 2···, n _s.

The problem of the Eq. (7) is that the label of the target domain is unavailable, which means the E(t ^j ,y) and σ(t ^j ,y) are not available. As a result, we have to use the joint estimate p(y|t _i) instead of the label y. To obtain the p(y|t _i), we directly train a classifier model using the training samples and predict the test samples. Therefore, the approximate E (t ^j ,y) and σ (t ^j ,y) could be computed as following:

$$ E(t^{j} ,y) \approx \frac{1}{{\sum\nolimits_{i = 1}^{{n_{t} }} {p(y|t_{i} )} }}\sum\limits_{i = 1}^{{n_{t} }} {t_{i}^{j} p(y|t_{i} )} $$

(8)

$$ \sigma (t^{j} ,y) \approx \sqrt {\frac{1}{{\sum\nolimits_{i = 1}^{{n_{t} }} {p(y|t_{i} )} }}\sum\limits_{i = 1}^{{n_{t} }} {(t_{i}^{j} - E(t^{j} ,y)^{2} p(y|t_{i} )} } $$

(9)

where j = 1, 2,···, d.

Furthermore, the cross-class alignment and inter-class alignment can be combined, to minimize the influence of re-compression.

2.3 LBP Features

To verify the proposed alignment based algorithms, efficient features are required in our approaches. LBP [5], CFA [6] and IQM [11] are admitted outstanding feature vectors used for model-based camera source identification. In our work, the LBP features are adopted and also in our future work, the other feature vectors are certified resultful.

The LBP features, propose by Xu et al., are designed based on uniform gray-scale invariant local binary patterns [3], which can be described as:

$$ LBP_{p,R}^{u2} = \sum\nolimits_{p = 0}^{p - 1} {s(g_{p} - g_{c} )2^{p} } $$

(10)

where R is the radius of a circularly symmetric neighborhood used for local binary patterns, and P is the number of samples around the circle. We set R = 1, P = 8. g _c and g _p represent gray levels of the center pixel and its neighbor pixels, respectively. Which is showed in Fig. 2.

Function s is defined as:

$$ s(x) = \left\{ {\begin{array}{*{20}l} {1,x \ge 0} \hfill \\ {0,x < 0} \hfill \\ \end{array} } \right. $$

(11)

The differences between the central pixels and the neighborhood pixels are firstly calculated. Subsequently, the differences are binary quantized and coded according to the function s. to form a 8-dimensional histogram with a total of 2⁸ = 256 bins named local binary patterns. Inspired by [12], both of the ‘uniform’ and ‘non-uniform’ local binary pattern are included in [5]. Considering the majority of ‘uniform’ local binary pattern in the total patterns, only 58 ‘uniform’ patterns are merged with the ‘non-uniform’ patterns to generate 59 effective patterns. For each color channel, the LBP features are extracted from (i) original image, (ii) its prediction-error counterpart, and (iii) its 1st-level diagonal wavelet subband, resulting in a total of 59 × 3 = 177 features, as Fig. 3 illustrated. Respect to the same processing strategy for red and blue channel because of Bayer CFA, we only extract LBP features from red and green channels to reduce the dimension of the feature vector. Finally, a total of 177 × 2 = 354 features are achieved.

3 Experimental Results

3.1 Experimental Setup and Parameters

To testify the performance of the proposed algorithms, four camera models from ‘Dresden Image Database’ are used in our experiments. For each camera model, 350 image samples are selected randomly and each image is cropped into 6 non-overlap sub-images. Therefore, 2100 images samples are obtained in each camera model, as Table 1 shows.

Table 1. Image dataset.

Abstract

1 Introduction

2 Proposed Algorithms

2.1 Transfer Learning

2.2 Cross-Class Alignment Based and Inter-class Alignment Based Approaches

2.3 LBP Features

3 Experimental Results

3.1 Experimental Setup and Parameters

3.2 Experimental Result and Analysis

4 Conclusion

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation