Abstract
We propose the integral prior classification approach for binary steganalysis which imply that several detectors are trained, and each detector is intended for processing only images with certain compression rate. In particular, the training set is splitted into several parts according to the images compression rate, then a corresponding number of detectors are trained, but each detector uses only an ascribed to it subset. The testing images are distributed between the detectors also according to their compression rate. We utilize BOSSbase 1.01 as benchmark data along with HUGO, WOW and S-UNIWARD as benchmark embedding algorithms. Comparison with state-of-the-art results demonstrated that, depending on the case, the integral prior classification allows to decrease the detection error by 0.05–0.16.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Information hiding
- Steganalysis
- Support vector machine
- Compression
- HUGO
- UNIWARD
- WOW
- Prior classification
- SRM
- PSRM
1 Introduction
The classic problem of steganalysis consists in distinguishing between empty and stego images via a bare detector; at that, all images are subject to processing. Recently, it was introduced an approach of how to exploit a prior classification in steganalysis [10], and, within it, there were proposed three possible methods of selecting a portion of the testing set such that a detection error, calculated over this subset, may be lower than that calculated over the whole set. In their paper, the authors also discussed an possibility of splitting the testing set into several subsets containing images with common (in some sense) properties and training an individual detector for each subset in order to decrease the detection error calculated over the whole testing set.
In this paper, we propose a compression-based method of how to turn this idea in practice. We suggest to split the training image set into several subsets according to their compression rate, then obtain a corresponding number of non-trained detectors, and train each of them utilizing a separate subset. During the testing phase, images with a certain compressing rate should be send to the detector, which has been trained on the images that have the close compression rate. The idea of using the compression rate as an indicator for the integral prior classification came from a well-known fact that noisy images are harder to steganalyze than plain ones, but noisiness is usually tightly correlated with entropy, and therefore with the compression rate. So, we guessed that the detectors for the noisy images should be better trained with noisy images, and the ones for the plain images should be trained with plain images.
The main hypothesis, which motivated our work, assumes that the detector preceded by the integral prior classification would be more accurate than the detector alone. It is worth mentioning, that compression is a very universal tool, which has been already used in steganalysis, see e.g. [2, 8]; however, these papers are devoted to distinguishing either the basic LSB steganography or to creating quantitative steganalyzers, while the current paper focuses on binary detection of the content-adaptive embedding. Moreover, in the earlier papers, the data compression was exploited for developing the stego detectors themselves, while in the current paper we do not touch the detectors, and use the compression methods in order to perform the integral prior classification.
The principle difference between the single prior classification (introduced in [10]) and the integral prior classification (being introduced in this paper) consists in the fact that the single prior classification allows to select only a part of the testing set which would provide higher accuracy, while the integral prior classification enhances the accuracy estimated over the whole testing set. Thus, although the single prior classification may obtain a rather large subset, which would be sent into the steganalyzer, all the same, it discards other images. At the same time, the integral prior classification assumes that all the testing images are subject to processing by the detector, keeping us in the traditional scenario.
Using BOSSbase 1.01 [1] as benchmark data along with content-adaptive embedding methods HUGO [12], S-UNIWARD [5] and WOW [4] as benchmark embedding algorithms we compare our results against state-of-the-art due to Holub and Fridrich [6]. Our experiments have confirmed the above hypothesis and demonstrated that prepending the integral prior classification allows to decrease the detection error of the bare detector by 0.05–0.16. For the sake of clarity, we want to emphasize that the results of the current paper are compared against accuracy of the best bare detectors, and not with that of the detectors accompanied by the single prior classification, because the latter deals only with the part of the testing set, while the integral prior classification assumes that the detection error is calculated over the whole testing set.
2 Description of Integral Prior Classification
2.1 General Scheme
A general scheme of how to train the detectors using the integral prior classification is represented at the Fig. 1. At first, we need to split the training set into subsets according to the compression rate in order to combine images with close compression rate in the same subsets. For instance, the first class contains the most compressed images, and the last class contains the least compressed images. The number of subsets would define the number of detectors for training. After the splitting, the detectors are trained: each detector is trained using the images from a certain subset. In particular, the first detector is trained using the first subset, the second detector is trained using the second subset etc.
During the testing phase, images are distributed between the detectors according to their compression rate, and each image is processed by the corresponding detector. Thus, unlike traditional bare detection scheme, which employs the detector alone, the integral prior classification assumes that the given image is sent to the detector which (we expect so), would provide the least detection error. Certainly, there may be some other ways of distributing the images (not only basing on the compression rate).
2.2 Detailed Description
In high-level steps, the detection with the integral prior classification is depicted at the Algorithm 1. The algorithm is injected with the training set \(\mathcal {X}\) and the testing set \(\mathcal {Y}\). Then a compression method \(\texttt {Compress(}\cdot \texttt {)}\), a number of subsets L, along with their sizes, \(size_1, \ldots ,\ size_{L}\) (\(size_1+\ldots +size_L=|\mathcal {X}|\)), and \(size'_1, \ldots ,\ size'_{L}\) (\(size'_1+\ldots +size'_L=|\mathcal {Y}|\)), are chosen.
The Split-Set function (see Algorithm 2) returns L non-intersected subsets which constitute a partition of the set Z. As you can see (Algorithm 1), this function is called twice: for splitting the training set \(\mathcal {X}\) and for splitting the testing set \(\mathcal {Y}\). The first step of this function is compressing every image \(z\in \mathcal {Z}\) in order to obtain its size after compression \(|\texttt {Compress}(z)|\). At the second step, the images are sorted according the this value (the sorted row we denote as follows: \(z_{(1)},~z_{(2)},~\ldots , z_{(|\mathcal {Z}|)})\). And at last, the subsets are formed. The first \(size_1\) images are ascribed to the first subset, the next \(size_2\) images are ascribed to the second subset, and so on; the last \(size_L\) images are ascribed to the last subset.
Then the Train-Detectors function (see Algorithm 3) receives L subsets and trains L detectors to distinguish between the empty and stego images. The detectors may be of different types, but, in steganography, usually the support vector machine or the ensemble classifier are employed. They utilize image features such as SRM [3], PSRM [6], SPAM [11] etc.
The testing phase is now divided into two stages: the prior classification stage and the detection stage. The prior classification stage consists in calling the Detector-Number function (see Algorithm 4), which returns the detector number for each image from the testing set \(\mathcal {Y}\). This number is obtained according to the testing set splitting (see step 7 from the Algorithm 1) but there may be other ways of implementing it.
3 Experimental Results
We performed two types of experiments. Adjustment experiments aimed at searching for the better parameters, and benchmark experiment were performed with parameters, chosen during the adjustment experiments, and intended for comparing our results the state-of-the-art ones.
3.1 Common Core of the Experiments
Images. During the adjustment experiments the image set from the Break Our Watermarking System 2 (BOWS2) contest [15] was utilized, and during the benchmark experiments—the BOSSbase 1.01 from the Break Our Steganographic System (BOSS) contest [1]. The BOWS2 image set consists of 10000 grayscale images in PGM format; the size of the images is \(512\,\times \,512\). The well-known benchmark database BOSSbase 1.01 contains 10000 images captured by seven different cameras in RAW format. These images had been converted into 8-bit grayscale format, resized and cropped to the size \(512\,\times \,512\) pixels.
Preparing the Training and the Testing Sets. The both bases, BOWS2 and BOSSbase, include 10000 images, therefore we prepared the corresponding training and testing sets in the same way. In order to prepare the training set \(\mathcal {X}^p\) and the testing set \(\mathcal {Y}^p\), where p identifies the embedding rate in bpp, the whole database was divided into two subsets \(\mathcal {X}_0\) and \(\mathcal {Y}_0\), where \(|\mathcal {X}_0|=7500\) and \(|\mathcal {Y}_0|=2500\). Then by random embedding p bpp into all the images from \(\mathcal {X}_0\) and \(\mathcal {Y}_0\) we obtained \(\mathcal {X}^p_1\) and \(\mathcal {Y}^p_1\) correspondingly. The training set was \(\mathcal {X}^p=\mathcal {X}_0\cup \mathcal {X}^p_1\) and the testing set \(\mathcal {Y}^p=\mathcal {Y}_0\cup \mathcal {Y}^p_1\). Thus, \(|\mathcal {X}^p|=15000\) and \(|\mathcal {Y}^p|=5000\). Both sets contain a half of empty images and a half of stego images. Further in the paper we omit the payload index p (it will not confuse the reader) and designate the training set as \(\mathcal {X}\) and the testing set as \(\mathcal {Y}\).
Compression Methods. We employed well-known lossless compression methods LZMA and PAQ. LZMA (Lempel-Ziv-Markov chain-Algorithm) is a method which uses a dictionary compression scheme [13]. We launched this archiver with the following script: “lzma -k -c -9”. PAQ is based on the context mixing model and prediction by partial match [14]. The launching script in our experiments was “paq -11”.
Detector. We employed a support vector machine as a detector of steganography. The Python implementation was taken from [16], where the default parameters were used except for the following: the linear kernel, shrinking—turned on, and the penalty parameter \(C=20000\).
Embedding Algorithms. In the benchmark experiments, we employed three embedding algorithms: HUGO, WOW and S-UNIWARD, because exactly these algorithms were used by Holub and Fridrich in their state-of-the-art paper [6]. HUGO (Highly Undetectable Steganography) is a content-adaptive algorithm based on so-called syndrome-trellis codes [12]. WOW (Wavelet Obtained Weights) uses wavelet-based distortion [4], and S-UNIWARD [5] is a simplified modification of WOW. In the adjusting experiments only HUGO was used.
Feature Set. We utilize Spatial Rich Model (SRM) features [3] as one of the most popular instruments for steganalysis. The newer Projection Spatial Rich Model features (PSRM) [6] only slightly decrease the detection error, but significantly increase complexity. SRM features have a total dimension of 34,671.
Detection Error. We measured detection accuracy in a standard manner via calculating the detection error \(P_E=\frac{1}{2}(P_{FA}+P_{MD}),\) where \(P_{FA}\) is the probability of false alarms, and \(P_{MD}\) is the probability of missed detections (see e.g. [3, 7, 9, 11]).
3.2 Adjusting Experiments
The goal of this experimental phase is to choose parameters which will be used in the benchmark experiments in order to compare our results with the state-of-the-art. The task consists in choosing the following parameters: a compression method; a number of splitting classes (L); sizes of these subsets.
Due to a long training process of SVM it was infeasible to work over many possible values of the prior classification parameters in order to search for the very best of them. That is why we have chosen three reasonable numbers of the subsets, equal to 2, 3 and 5. The thresholds for the compression rate are determined by their sizes. Here 5 subsets are of the same size, and 2 or 3 subsets have been formed by aggregation of the least compressed subsets. Trying \(L=2\) and \(L=3\) we had hoped that training the detector on images which are harder to steganalyze would provide better accuracy. However, the Table 1 demonstrate that the best accuracy is provided by \(L=5\).
3.3 Benchmark Experiments
The goal of this section is to demonstrate that prepending the prior classification stage (aimed at choosing the appropriate detector for each image), enhances the stego detectors accuracy. We compare the detection error with the state-of-the-art data provided by Holub and Fridrich in [6]. Unlike us, they employed the ensemble classifier [7], which is known to be faster but slightly less accurate than support vector machine. In order to be more persuasive, we calculated the detection errors for our support vector machine implementation (without prior classification) and show that they are close to that for the ensemble classifier. Anyway, integral prior classification allows to exceed both results.
See Tables 2, 3 and 4, where this comparison is provided for HUGO, WOW and S-UNIWARD embedding algorithms correspondingly. Prior classification parameters are as follows (they were chosen during the adjusting experiments): PAQ compression method; \(L=5;~size_1=size_2=size_3=size_4=size_5=3000;~size'_1=size'_2=size'_3=size'_4=size'_5=1000\).
The results demonstrate that, depending on the case, the integral prior classification substantially increases the accuracy. The most impressing results (see HUGO 0.1 bpp, WOW 0.1 bpp, WOW 0.2 bpp, S-UNIWARD 0.1 bpp, S-UNIWARD 0.2 bpp, S-UNIWARD 0.4 bpp) provide the accuracy decrease for more than 0.1. In the Tables 2, 3 and 4, we mark out and type in bold those values which are compared against each other. In particular, we compare the detection errors obtained for the integral prior classification against the least errors among the errors of our support vector machine (SVM) implementation and two Fridrich and Holub results. For example, in the Table 2 for HUGO 0.1 bpp we compare 0.24 against 0.35, and in the Table 3 for WOW 0.4 bpp we compare 0.08 against 0.17. As you can see, if two implementations provide the same error they are both marked out.
4 Conclusion
In this paper we have proposed the integral prior classification approach aimed at increasing the stego detectors accuracy. Although the basic idea of this approach is rather definite, it may have many possible implementations. For instance, it may be interesting (and, what is more important, it might lead to constructing even more accurate detectors) to classify images not according to their compression rates but some how else.
In the adjusting experiments we considered only three variants of the training image set splitting. Nevertheless, it was enough to reach the goal of our research and to demonstrate that prepending the integral prior classification before detection allows to exceed the accuracy of the state-of-the-art detectors. Thus, one of the possible future work directions may consist in conducting some theoretical research in order to elaborate recommendations of how to choose the number of subsets along with their size which would provide the better accuracy without necessity of heavy adjusting experiments.
It is worthwhile to notice, that in order to determine which detector would process which image, in the current implementation the testing set was splitted into several equal-size parts. However, it is not quite convenient if the testing images arrive one by one, unless we are able to wait until a sufficient quantity accumulates. That is why, in such a case the detector’s number can be established according to the compression rates thresholds, instead of the testing set splitting.
The integral prior classification approach extends the single prior classification approach [10], which is intended for only selecting images which can be reliably detected and discards other images, though the selected images may constitute a rather large subset. The main idea of our extension is employing several detectors, each of which processes a certain testing subset or, in other words, images with special properties.
The efficiency of the integral prior classification has been demonstrated for HUGO, WOW and S-UNIWARD utilizing the BOSSbase 1.01 images. Depending on the payload and the embedding algorithm, the detection error decrease, comparing to the state-of-the-art, amounted to 0.05–0.16.
References
Bas, P., Filler, T., Pevný, T.: “Break our steganographic system”: the ins and outs of organizing BOSS. In: Filler, T., Pevný, T., Craver, S., Ker, A. (eds.) IH 2011. LNCS, vol. 6958, pp. 59–70. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24178-9_5
Boncelet, C., Marvel, L., Raqlin, A.: Compression-based steganalysis of LSB embedded images. In: Proceedings of SPIE, Security, Steganography, and Watermarking of Multimedia Contents VIII, vol. 6072, pp. 75-84 (2006)
Fridrich, J.: Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 7(3), 868–882 (2012)
Holub, V., Fridrich, J.: Designing steganographic distortion using directional filters. In: Proceedings of 4th IEEE International Workshop on Information Forensics and Security, pp. 234–239 (2012)
Holub, V., Fridrich, J.: Digital image steganography using universal distortion. In: Proceedings of 1st ACM Workshop, pp. 59–68 (2013)
Holub, V., Fridrich, J.: Random projections of residuals for digital image steganalysis. IEEE Trans. Inf. Forensics Secur. 8(12), 1996–2006 (2013)
Kodovsky, J., Fridrich, J., Holub, V.: Ensemble classifiers for steganalysis of digital media. IEEE Trans. Inf. Forensics Secur. 7(2), 434–444 (2011)
Monarev, V., Pestunov, A.: A new compression-based method for estimating LSB replacement rate in color and grayscale images. In: Proceedings of IEEE 7th International Conference on Intelligent Informationa Hiding and Multimedia Signal Processing, IIH-MSP, pp. 57–60 (2011)
Monarev, V., Pestunov, A.: A known-key scenario for steganalysis and a highly accurate detector within it. In: Proceedings of IEEE 10th International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIH-MSP, pp. 175–178 (2014)
Monarev, V., Pestunov, A.: Prior classification of stego containers as a new approach for enhancing steganalyzers accuracy. In: Qing, S., Okamoto, E., Kim, K., Liu, D. (eds.) ICICS 2015. LNCS, vol. 9543, pp. 445–457. Springer, Heidelberg (2016). doi:10.1007/978-3-319-29814-6_38
Pevny, T., Bas, P., Fridrich, J.: Steganalysis by subtractive pixel adjacency matrix. IEEE Trans. Inf. Forensics Secur. 5(2), 215–224 (2010)
Pevný, T., Filler, T., Bas, P.: Using high-dimensional image models to perform highly undetectable steganography. In: Böhme, R., Fong, P.W.L., Safavi-Naini, R. (eds.) IH 2010. LNCS, vol. 6387, pp. 161–177. Springer, Heidelberg (2010). doi:10.1007/978-3-642-16435-4_13
LZMA SDK (Software Development Kit). http://www.7-zip.org/sdk.html/
Large Text Compression Benchmark. http://mattmahoney.net/dc/text.html
Break Our Watermarking System, 2nd edn. http://bows2.ec-lille.fr/
scikit-learn: Machine Learning in Python. http://scikit-learn.org/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Monarev, V., Duplischev, I., Pestunov, A. (2016). Compression-Based Integral Prior Classification for Improving Steganalysis. In: Lam, KY., Chi, CH., Qing, S. (eds) Information and Communications Security. ICICS 2016. Lecture Notes in Computer Science(), vol 9977. Springer, Cham. https://doi.org/10.1007/978-3-319-50011-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-50011-9_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50010-2
Online ISBN: 978-3-319-50011-9
eBook Packages: Computer ScienceComputer Science (R0)