Abstract
With the rapid development of natural language processing (NLP) technology in the past few years, the automatic steganographic texts generation methods have been greatly developed. Benefiting from the powerful feature extraction and expression capabilities of neural networks, these methods can generate steganographic texts with both relatively high concealment and high hidden capacity at the same time. For these steganographic methods, previous steganalysis models show unsatisfactory detection performance, which remains an unsolved problem and poses a great threat to the security of cyberspace. In this paper, we first collect a large text steganalysis (T-Steg) dataset, which contains a total number of 396,000 texts with various embedding rates under various formats. We analyze that there are three kinds of word correlation patterns in texts. Then we propose a new text steganalysis model based on convolutional sliding windows (TS-CSW), which use convolutional sliding windows (CSW) with multiple sizes to extract those correlation features. We observed that these word correlation features in the generated steganographic texts would be distorted after being embedded with secret information. These subtle changes of correlation feature distribution could then be used for text steganalysis. We use the samples collected in T-Steg dataset to train and test the proposed steganalysis method. Experimental results show that the proposed model can not only achieve a high steganalysis performance, but can even estimate the amount of secret information embedded in the generated steganographic texts, which shows a state-of-the-art performance.
Similar content being viewed by others
References
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. Comput Sci
Bitar AW, Darazi R, Couchot JF, Couturier R (2017) Blind digital watermarking in pdf documents using spread transform dither modulation. Multimedia Tools and Applications 76(1):143–161
Cachin C (2004) An information-theoretic model for steganography. Inf Comput 192(1):41–56
Chapman M, Davida G (1997) Hiding the hidden: A software system for concealing ciphertext as innocuous text. In: International Conference on Information and Communications Security, pp. 335–345. Springer
Chen Z, Huang L, Yu Z, Yang W, Li L, Zheng X, Zhao X (2008) Linguistic steganography detection using statistical characteristics of correlations between words. In: International Workshop on Information Hiding, pp. 224–235. Springer
Dai W, Yu Y, Deng B (2009) Bintext steganography based on markov state transferring probability. In: Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, pp. 1306–1311. ACM
Dai W, Yu Y, Dai Y, Deng B (2010) Text steganography system using markov chain source model and des algorithm. JSW 5(7):785–792
Din R, Yusof SAM, Amphawan A, Hussain HS, Yaacob H, Jamaludin N, Samsudin A (2015) Performance analysis on text steganalysis method using a computational intelligence approach. In: Proceeding of International Conference on Electrical Engineering, Computer Science and Informatics (EECSI 2015), Palembang, Indonesia, pp. 19–20
Fang T, Jaggi M, Argyraki K (2017) Generating steganographic text with lstms. arXiv preprint arXiv:1705.10742
Fridrich J (2009) Steganography in digital media: principles, algorithms, and applications. Cambridge University Press
Fridrich J, Goljan M, Du R (2001) Detecting lsb steganography in color, and gray-scale images. IEEE multimedia 8(4):22–28
Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1(12)
Huang Y, Tang S, Zhang Y (2011) Detection of covert voice-over internet protocol communications using sliding window-based steganalysis. IET communications 5(7):929–936
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196
Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Lifeng S, Zhengdong L, Hang L (2015) Neural responding machine for short-text conversation pp. 52–58
Lin Z, Huang Y, Wang J (2018) Rnn-sm: Fast steganalysis of voip streams using recurrent neural network. IEEE Transactions on Information Forensics & Security PP(99), 1–1
Liu Y, Sun X, Gan C, Hong W (2007) An efficient linguistic steganography for chinese text. In: IEEE International Conference on Multimedia & Expo
Luo Y, Huang Y (2017) Text steganography with high embedding rate: Using recurrent neural networks to generate chinese classic poetry. In: ACM Workshop on Information Hiding and Multimedia Security, pp. 99–104
Luo Y, Huang Y, Li F, Chang C (2016) Text steganography based on ci-poetry generation using markov chain model. Ksii Transactions on Internet & Information Systems 10(9):4568–4584
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, pp. 142–150. Association for Computational Linguistics
Maaten LVD (2014) Accelerating t-SNE using tree-based algorithms. JMLR.org
Mahato S, Khan DA, Yadav DK (2017) A modified approach to data hiding in microsoft word documents by change-tracking technique. Journal of King Saud University - Computer and Information Sciences
Meng YY, Gao BJ, Yuan Q, Yu FG, Wang CF (2008) A novel steganalysis of data hiding in binary text images. In, IEEE Singapore International Conference on Communication Systems
Meng P, Hang L, Yang W, Chen Z, Zheng H (2009) Linguistic Steganography Detection Algorithm Using Statistical Language Model. IEEE Computer Society
Mikolov T, Yih WT, Zweig G (2013) Linguistic regularities in continuous space word representations. In HLT-NAACL
Moraldo HH (2014) An approach for text steganography based on markov chains. arXiv preprint arXiv:1409.0915
Murphy B, Vogel C (2007) The syntax of concealment: reliable methods for plain text information hiding. Proc Spie
Odeh A, Elleithy K, Faezipour M (2014) Steganography in text by using ms word symbols. In, American Society for Engineering Education
Samanta S, Dutta S, Sanyal G (2016) A real time text steganalysis by using statistical method. In: Engineering and Technology (ICETECH), 2016 IEEE International Conference on, pp. 264–268. IEEE
Schmidhuber J (2015) Deep learning in neural networks: An overview. Neural Netw 61:85–117
Shannon CE (1949) Communication theory of secrecy systems. Bell Labs Technical Journal 28(4):656–715
Shirali-Shahreza MH, Shirali-Shahreza M (2008) A new synonym text steganography. In: Intelligent Information Hiding and Multimedia Signal Processing, 2008. IIHMSP’08 International Conference on, pp. 1524–1526. IEEE
Shniperov A, Nikitina K (2016) A text steganography method based on markov chains. Autom Control Comput Sci 50(8):802–808
Simmons GJ (1984) The prisoners’ problem and the subliminal channel. Advances in Cryptology Proc Crypto pp. 51–67
Taskiran CM, Topkara M, Delp EJ (2006) Attacks on lexical natural language steganography systems. Proceedings of SPIE - The International Society for Optical Engineering 6072:607209–607209–9
Thompson A (2017) Kaggle. https://www.kaggle.com/snapcrack/all-the-news/data
Wayner P (1992) Mimic functions. Cryptologia 16(3):193–214
Xiang L, Sun X, Gang L, Gan C (2007) Research on steganalysis for text steganography based on font format. In: International Symposium on Information Assurance & Security
Xie C, Cheng Y, Chen Y (2011) An active steganalysis approach for echo hiding based on sliding windowed cepstrum. Signal Processing 91(4):877–889
Yang H, Cao X (2010) Linguistic steganalysis based on meta features and immune mechanism. Chinese Journal of Electronics 19(4):661–666
Yang C, Liu F, Luo X, Liu B (2008) Steganalysis frameworks of embedding in multiple least-significant bits. IEEE Transactions on Information Forensics and Security 3(4):662–672
Yang Z, Peng X, Huang Y (2017) A sudoku matrix-based method of pitch period steganography in low-rate speech coding. In: International Conference on Security and Privacy in Communication Systems, pp. 752–762. Springer
Yang Z, Zhang YJ, ur Rehman S, Huang Y (2017) Image captioning with object detection and localization. In: International Conference on Image and Graphics, pp. 109–118. Springer
Yang Z, Du X, Tan Y, Huang Y, Zhang YJ (2018) Aag-stega: Automatic audio generation-based steganography. arXiv preprint arXiv:1809.03463
Yang Z, Huang Y, Jiang Y, Sun Y, Zhang YJ, Luo P (2018) Clinical assistant diagnosis for electronic medical record based on convolutional neural network. Scientific reports 8(1):6329
Yang Z, Zhang P, Jiang M, Huang Y, Zhang YJ (2018) Rits: Real-time interactive text steganography based on automatic dialogue model. In: International Conference on Cloud Computing and Security, pp. 253–264. Springer
Yang ZL, Jin S, Huang YF, Zhang YJ, Li H (2018) Automatically generate steganographic text based on markov model and huffman coding. arXiv preprint arXiv:1811.04720
Yang Z, Hu Y, Huang Y, Zhang Y (2019) Behavioral security in covert communication systems. arXiv preprint arXiv:1910.09759
Yang Z, Huang Y, Zhang YJ (2019) A fast and efficient text steganalysis method. IEEE Signal Processing Letters pp. 1–1
Yang Z, Wang K, Ma S, Huang Y, Kang X, Zhao X (2019) Istego100k: Large-scale image steganalysis dataset. arXiv preprint arXiv:1911.05542
Yang Z, Yang H, Hu Y, Huang Y, Zhang YJ (2019) Real-time steganalysis for stream media based on multi-channel convolutional sliding windows. arXiv preprint arXiv:1902.01286
Yang ZL, Guo XQ, Chen ZM, Huang YF, Zhang YJ (2019) Rnn-stega: Linguistic steganography based on recurrent neural networks. IEEE Transactions on Information Forensics and Security 14(5):1280–1295
Yuling L, Xingming S, Can G, Hong W (2007) An efficient linguistic steganography for chinese text. In: 2007 IEEE International Conference on Multimedia and Expo, pp. 2094–2097. IEEE
Zhang J, Shen J, Wang L, Lin H (2016) Coverless text information hiding method based on the word rank map. In: International Conference on Cloud Computing and Security, pp. 145–155
Zhou Z, Mu Y, Wu QJ (2018) Coverless image steganography using partial-duplicate image retrieval. Soft Computing pp. 1–12
Acknowledgements
The authors thank Dr. Zhiwen Hu for serving as scientific advisors to this research and thank Qi Li for participating in writing of the manuscript. This research is supported by the National Key R&D Program (2018YFB0804103) and the National Natural Science Foundation of China (No.U1536207, No.U1636113 and No.61862002).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, Z., Huang, Y. & Zhang, YJ. TS-CSW: text steganalysis and hidden capacity estimation based on convolutional sliding windows. Multimed Tools Appl 79, 18293–18316 (2020). https://doi.org/10.1007/s11042-020-08716-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08716-w