Steganalysis classifier training via minimizing sensitivity for different imaging sources

doi:10.1016/j.ins.2014.05.028

Information Sciences

Volume 281, 10 October 2014, Pages 211-224

https://doi.org/10.1016/j.ins.2014.05.028 Get rights and content

Abstract

Owing to the ever proliferation of digital cameras and image editing software, a large variety of JPEG quantization tables are used to compress JPEG images. As a result, learning-based steganalysis methods using a pre-selected quantization table for training images degrade significantly when the quantization table of testing images is different from the one used for training. Recognizing that it would be undesirable and not practical to train a steganalysis classifier with all possible quantization tables, we propose an approach that the differences in features extracted from images with different quantization tables are formulated as perturbations of those features. Then we define a stochastic sensitivity by the expected square of classifier output changes with respect to these feature perturbations to compute the robustness of classifiers with respect to perturbations. A Radial Basis Function Neural Network based steganalysis classifier trained by minimizing the sensitivity is proposed. Experimental results show that the proposed method outperforms learning methods such as Support Vector Machine and Radial Basis Function Neural Network without considering feature perturbations.

Introduction

Steganography presents a potential security threat to society, in general and corporations, in particular. Embedded messages, named stego messages, are hidden in digital media such as images [1], [2], [3], audio [4] and video [5] for secret communication. Steganalysis is a technique used to determine whether a digital media has a stego message or not. Current learning based steganalysis methods consist of two major components: a feature extractor and a classifier. Steganalysis classifiers are trained by a set of images which consist of both clean and stego images. Among different types of digital media, JPEG image is the most widely used digital media on the Internet. Therefore, JPEG is a favorable carrier of steganography. In particular, for most JPEG steganalysis, both training and testing datasets use JPEG images compressed by the same quantization table. When different compression quantization tables are used for training and testing image sets, the performance of the steganalysis classifier degrades significantly [6], [7].

With the ever proliferation of digital cameras and image editing software available today, JPEG images on the Internet are compressed by many different quantization tables [8]. Moreover, a growing number of digital camera manufacturers, such as Sony, Nikon and Pentax, adopt variable quantization tables which are computed based on the image content dynamically. By using the quantization table for training to re-compressing an image, which is compressed by an unseen quantization table, cannot recover an image directly. Extra quantization step will change the steganalysis features of the JPEG image [9]. Therefore it would be unreasonable or impracticable to assume prior knowledge of compression quantization table of an unseen image examined for stego message.

In the real world situation, both images and quantization tables could be different from those used for training the classifier. In addition to quantization table difference, the difference in image content will also affect the performance of steganalysis [10]. The extracted steganalysis feature values will be different from those of the training images [7]. These differences could be treated as perturbation in features and are unavoidable. So, the robustness of steganalysis classifier with respect to feature perturbations is essential to its performance.

Current steganalysis methods make use of off-the-shelf classification methods such as neural network [11], [12], Support Vector Machine (SVM) [13], [14], dynamic evolving neural fuzzy inference system [15] and ensemble of classifiers [16], [17]. However, none of them addresses the issue of perturbation between training and testing images.

In this work, we propose a Localized Generalization Error Steganalysis classifier (LG-Steganalyzer), which is robust to images compressed by quantization tables different from that of training images. Such a situation is unavoidable in real-world applications. A Radial Basis Function Neural Network (RBFNN) is trained via a minimization of a training error and a stochastic sensitivity. RBFNN is selected because of its fast training speed in the presence of large data which is important for dealing with network security problems. The stochastic sensitivity is proposed to capture the influence of feature perturbation created by changes in JPEG quantization table of testing images with respect to the RBFNN classification. With the proposed steganalyzer training method, the LG-Steganalyzer provides a better robustness of steganalysis in real-world applications. Major contributions of the LG-Steganalyzer to steganalysis are:

1.
RBFNN trained by the LG-Steganalyzer is robust to real-world situations, e.g. difference in quantization tables and difference in content between training and testing images.
2.
The proposed LG-Steganalyzer could be used with any compression quantization table, and any steganalysis feature extraction technique.

This paper is organized as follows: Section 2 provides a brief introduction on steganalysis and JPEG quantization tables. The LG-Steganalyzer is described in Section 3. Experimental results are presented in Section 4. Section 5 concludes this work.

Section snippets

Steganalysis and quantization table

We first provide a brief introduction to current steganalysis methods in Section 2.1. Section 2.2 discusses the importance of quantization tables in steganalysis. Section 2.3 demonstrates input perturbations caused by changes in quantization tables and image contents.

LG-Steganalyzer

Fig. 6 shows the functional blocks of the LG-Steganalyzer. The two-phase training component works only for the training of LG-Steganalyzer at the beginning or whenever update to the RBFNN is necessary. The steganalysis feature extraction is selected by user and transparent to the LG-Steganalyzer. The binary RBFNN classifier is trained via a minimization of the Localized Generalization Error Model (L-GEM) [22] to learn the two-class classification of stego and clean images, with a given set of

Experimental results

As aforementioned, it is impossible to restrict the imaging source of JPEG images being investigated by a trained steganalysis system. Therefore, the quantization tables used in compressing the unseen images are rarely the same as the ones being used for compressing the training images. In the experiments, we will first compare the testing accuracies for both training and testing images compressed by the same quantization table. To simulate the diversity of possible quantization tables in the

Conclusion

In the real-world application of steganalysis system, we cannot restrict the imaging source of JPEG being transmitted over the Internet. The discussion of this work focuses on the different quantization tables being used by different software and cameras. The steganalysis feature difference created by different quantization tables is formulated as a feature perturbation. Sensitivity is defined to measure the effect of the feature perturbation to a classifier. A two-phase training algorithm for

Acknowledgement

This work is supported by National Natural Science Foundation of China (61272201) and a Program for New Century Excellent Talents in University of China (NCET-11-0162).

References (28)

D. Yan et al.
Steganography for MP3 audio by exploiting the rule of window switching
Comput. Security
(2012)
O. Cetin et al.
A new steganography algorithm based on color histograms for data embedding into raw video streams
Comput. Security
(2009)
Q. Liu et al.
Image complexity and feature mining for steganalysis of least significant bit matching steganography
Inform. Sci.
(2008)
V. Sabeti et al.
Steganalysis and payload estimation of embedding in pixel differences using neural networks
Pattern Recogn.
(2010)
Q. Liu et al.
Feature mining and pattern classification for steganalysis of LSB matching steganography in grayscale images
Pattern Recogn.
(2008)
Q. Liu et al.
An improved approach to steganalysis of JPEG images
Inform. Sci.
(2010)
P.P.K. Chan et al.
Dynamic fusion method using localized generalization error model
Inform. Sci.
(2012)
W.W.Y. Ng et al.
Image classification with the use of radial basis function neural networks and the minimization of the localized generalization error
Pattern Recogn.
(2007)
D.S. Yeung et al.
Radial basis function network learning using localized generalization error bound
Inform. Sci.
(2009)
P. Sallee
Model-based methods for steganography and steganalysis
Int. J. Image Graph.
(2005)

J. Fridrich et al.

Statistically undetectable JPEG steganography: dead ends challenges, and opportunities

P. Sallee

Model based steganography

Z. Khan, A.B. Mansoor, An analysis of quality factor on image steganalysis, in: The 7th International Conference on...

I. Lubenko et al.

Steganalysis with mismatched covers: do simple classifiers help?

Cited by (19)

RCDD: Contrastive domain discrepancy with reliable steganalysis labeling for cover source mismatch
2024, Expert Systems with Applications
The cover source mismatch (CSM) can be very challenging for steganalysis because different distribution between source and target inevitably leads to poor performance of the steganalyzer on the target domain. In general, some methods from unsupervised domain adaptation, such as contrastive domain discrepancy (CDD), can be directly applied to steganalysis for addressing the CSM problem, but they cannot achieve satisfactory detection accuracy due to the neglect of steganographic characteristics. To solve this problem, reliable steganalysis labeling (RSL)-based CDD (RCDD) taking steganographic characteristics in account is proposed in this paper, which relies on RSL to generate reliable labels for extended target images, rather than utilizing clustering in CDD to obtain unreliable pseudo labels for target images. Through detailed deduction process, we know that RCDD draws closer the distribution of source and target classwisely so as to enhance the classification performance on the target domain. Simultaneously, a corresponding steganalysis network RCDD-Net is yielded by incorporating some backbone into RCDD. A large number of experiments verify that RCDD-Net is an innovation in steganalysis that effectively alleviates performance degradation when CSM occurs. Moreover, RCDD-Net provides better detection performance than several advanced steganalysis networks.
Sensitivity based robust learning for stacked autoencoder against evasion attack
2017, Neurocomputing
Citation Excerpt :
The sensitivity measure is defined as the change of learner’s outputs when the input has a small fluctuation. It has been shown that methods with sensitivity measure achieve good performances in many applications, for example, classifier training [25], dynamic fusion [26] and model selection [27]. The algorithm can be applied to the unsupervised representation learning and the supervised fine-tuning phase for the stacked autoencoder.
Although deep learning has achieved excellent performance in many applications, some studies indicate that deep learning algorithms are vulnerable in an adversarial environment. A small distortion on a sample leads to misclassification easily. Until now, the vulnerability issue of stacked autoencoder, which is one of the most popular deep learning algorithms, has not been investigated. In this paper, we firstly investigate the existing evasion attack to stacked autoencoder in an effort to understand whether, and to what extent, they can work efficiently. A robust learning algorithm which minimizes both its error and sensitivity is then proposed for stacked autoencoder. The sensitivity is defined as the change of the output due to a small fluctuation on the input. As the proposed algorithm considers not only accuracy but also stability, a more robust stacked autoencoder against evasion attack is expected. The performance of our methods is then evaluated and compared with conventional stacked autoencoder and denoising autoencoder experimentally in terms of accuracy, robustness and time complexity. Moreover, the experimental results also suggest that the proposed learning method is more robust than others when a training set is contaminated.
An adaptive secret image sharing with a new bitwise steganographic property
2016, Information Sciences
Citation Excerpt :
Authors of [50,51] surveyed state of the art data-based methods to earn embedded data in text-type data sets. Similarly, efforts to extract concealed data in steganography are named steganalysis [33–35]. The aforesaid schemes and similar ones, such as [4,7,46], used Least Significant Bit (LSB), which cannot withstand even simple steganalysis algorithms.
Recently, numerous studies have been conducted in the area of secret image sharing and steganography. The main objective of these schemes is to produce high-quality tampering-resistant stego images. A number of these methods are based on least significant bit embedding and hence, the presence of hidden data can be detected by well-designed steganalysis algorithms.
This paper proposes a new sharing scheme for critical images so that stego images are obtained with better visual quality, and at the same time, the authentication ability can be adjusted to any desired level (adaptive). In order to achieve these objectives, the construction of cellular automata is modified so that visual quality is improved, although authentication bits are hidden in cover images. Furthermore, authentication bits are computed in such a way that any tampering with one block affects more than one block. For steganographic purposes, least significant bit replacement is substituted with a new blockwise XOR operation, so that the scheme can withstand steganalysis attacks. The other interesting property of the proposed method is that the size of blocks is determined dynamically, therefore our approach can be adapted with secret/cover images of different sizes.
New framework for unsupervised universal steganalysis via SRISP-aided outlier detection
2016, Signal Processing: Image Communication
Citation Excerpt :
However, the performance of these approaches is still inferior to that of the matched scenario. Ng et al. [31] considered feature perturbations caused by the difference in JPEG quantization tables and proposed a steganalysis classifier based on radial basis function neural network by minimizing the defined sensitivity. This classifier is different from the aforementioned ones and has been verified to outperform other learning methods such as SVM and radial basis function neural network without considering feature perturbation.
Formulating steganalysis as a binary classification problem has been highly successful. However, the existing detection algorithms are difficult to obtain high detection accuracy when applied in real-world circumstances. Because so-called model mismatch problem often occurs owing to unknown cover source and embedding parameters. To avoid the mess of model mismatch, we propose a new unsupervised universal steganalysis framework to detect individual stego images. First, cover images with statistical properties similar to those of the given test image are searched from a retrieval cover database to establish an aided cover sample set. Second, unsupervised outlier detection is performed on a test set composed of the given test image and its aided cover sample set to determine the type (cover or stego) of the given test image. Our proposed framework, called Similarity Retrieval of Image Statistical Properties (SRISP)-aided unsupervised outlier detection, requires no training, and thus it does not suffer from model mismatch. The framework employs standard steganalysis features and detects each test image individually. Experimental results illustrate that the framework substantially outperforms one-class support vector machine and the traditional unsupervised outlier detectors without considering SRISP; its detection performance is independent of the proportion of stego images in the test samples.
LG-Trader: Stock trading decision support based on feature selection by weighted localized generalization error model
2014, Neurocomputing
Stock trading is an important financial activity of human society. Machine learning techniques are adopted to provide trading decision support by predicting the stock price or trading signals of the next day. Decisions are made by analyzing technical indices and fundamental analysis of companies. There are two major machine learning research problems for stock trading decision support: classifier architecture selection and feature selection. In this work, we propose the LG-Trader which will deal with these two problems simultaneously using a genetic algorithm minimizing a new Weighted Localized Generalization Error (wL-GEM). An issue being ignored in current machine learning based stock trading researches is the imbalance among buy, hold and sell decisions. Usually hold decision is the majority in comparison to both buy and sell decisions. So, the wL-GEM is proposed to balance classes by penalizing heavier for generalization error being made in minority classes. The feature selection based on wL-GEM helps to select most useful technical indices among choices for each stock. Experimental results demonstrate that the LG-Trader yields higher profits and rates of return in both stock and index trading.
Cover-Source Mismatch in Steganalysis: Systematic Review
2024, Research Square

View all citing articles on Scopus

View full text

Steganalysis classifier training via minimizing sensitivity for different imaging sources

Abstract

Introduction

Section snippets

Steganalysis and quantization table

LG-Steganalyzer

Experimental results

Conclusion

Acknowledgement

Comput. Security

Comput. Security

Inform. Sci.

Pattern Recogn.

Pattern Recogn.

Inform. Sci.

Inform. Sci.

Pattern Recogn.

Inform. Sci.

Model-based methods for steganography and steganalysis

Int. J. Image Graph.

Statistically undetectable JPEG steganography: dead ends challenges, and opportunities

Model based steganography

Steganalysis with mismatched covers: do simple classifiers help?