Abstract
With the effect of digitalization, the transfer of all text documents over the Internet rather than human transmission has increased, and this situation has revealed the idea that text documents can be used as a carrier that can safely store information. Realizing that methods such as word-line shifting, usage of spaces, replacement of the word with its synonym are fragile against steganalysis, led to new searches and it was determined that deep learning models were more resistant to detecting the presence of hidden words. In this study, the text generation based on the information that is wanted to be hidden without a carrier text, both at word and character level, was performed. Arithmetic coding, perfect tree and Huffman coding methods were used as secret information embedding methods in text generation based on word level. In this part of the study, bidirectional LSTM architecture with attention mechanism was created as language model. In text generation based on character level, a new secret information embedding algorithm is created by combining the LZW compression algorithm with the Char Index (LZW-Char Index Encoding) method. The character-level model is created as a result of using the encoder–decoder architecture together with bidirectional LSTM and Bahdanau attention. The proposed method was evaluated from the perspectives of information embedding efficiency, information imperceptibility and hidden information capacity. As a result of the experiments, it was determined that the method exceeded the state-of-the-art performance and was more resistant to steganalysis.
Similar content being viewed by others
References
Yang Z, Guo X, Chen Z, Huang Y, Zhang Y (2019) RNN-Stega: linguistic steganography based on recurrent neural networks. In IEEE Trans Inf For Secur 14:1280–1295. https://doi.org/10.1109/TIFS.2018.2871746
Kang H, Wu H, Zhang X (2020) Generative text steganography based on LSTM network and attention mechanism with keywords. Electron Imaging Media Watermark Secur For. https://doi.org/10.2352/ISSN.2470-1173.2020.4.MWSF-291
Zhou Z, Sun H, Harit R, Chen X, Sun X (2016) Coverless image steganography without embedding. In International conference on cloud computing and security. Springer. https://doi.org/10.1007/978-3-319-27051-7_11
Fridrich J (2009) Steganography in digital media: principles, algorithms, and applications. Cambridge University Press, New York, Binghamton
Li B, Tan S, Wang M, Huang J (2014) Investigation on cost assignment in spatial image steganography. IEEE Trans Inf For Secur 9:1264–1278. https://doi.org/10.1109/TIFS.2014.2326954
Liao X, Yin J, Chen M, Qin Z (2020) Adaptive payload distribution in multiple images steganography based on image texture features. IEEE Trans Depend Secure Comput (Early Access). https://doi.org/10.1109/TDSC.2020.3004708
Taha A, Hammad AS, Selim MM (2020) A high capacity algorithm for information hiding in Arabic text. J King Saud Univ Comput Inf Sci 32:658–665. https://doi.org/10.1016/j.jksuci.2018.07.007
Lingyun X, Yang S, Liu Y, Li Q, Zhu C (2020) Novel linguistic steganography based on character-level text generation. Mathematics 8:1–18. https://doi.org/10.3390/math8091558
Yang Z, Jin S, Huang Y, Zhang Y, Li L (2018) Automatically generate steganographic text based on markov model and huffman coding. https://arxiv.org/abs/1811.04720.
Lockwood R, Curran K (2017) Text based steganography. Int J Inf Privacy Secur Integr. https://doi.org/10.1504/IJIPSI.2017.10009581
Chotikakamthorn N (1998) Electronic document data hiding technique using inter-character space. 1998 IEEE Asia-Pacific conference on circuits and systems. Microelectronics and integrating systems. Proceedings (Cat. No.98EX242). https://doi.org/10.1109/APCCAS.1998.743799
Shirali-Shahreza MH, Shirali-Shahreza M (2006) A new approach to Persian/Arabic text steganography. 5th IEEE/ACIS international conference on computer and information science and 1st IEEE/ACIS international workshop on component-based software engineering, software architecture and reuse (ICIS-COMSAR'06) (2006). https://doi.org/10.1109/ICIS-COMSAR.2006.10
Low SH, Maxemchuk NF, Lapone AM (1998) Document identification for copyright protection using centroid detection. IEEE Trans Commun 46:372–383. https://doi.org/10.1109/26.662643
Altigani A, Barry B (2013) A hybrid approach to secure transmitted message using advanced encryption standard (AES) and word shift coding protocol. In: 2013 international conference on computing, electrical and electronic engineering (Icceee) (2013). https://doi.org/10.1109/ICCEEE.2013.6633920
Wang Z, Chang C, Lin C, Li M (2009) A reversible information hiding scheme using left-right and up- down Chinese character representation. J Syst Softw 82:1362–1369. https://doi.org/10.1016/j.jss.2009.04.045
Por LY, Delina B (2008) Information in text hiding: A new approach steganography. In 7th WSEAS international conference on applied computers &applied computational science (ACACOS’08). https://doi.org/10.18201/ijisae.05687
Wang ZH (2009) Emoticon-based text steganography in chat. In: Second Asia Pacific conference on computational intelligence and industrial application. https://doi.org/10.1109/PACIIA.2009.5406559
Khairullah MD (2009) A novel text steganography system using font color of the invisible characters in microsoft word. In: Second international conference on computer and electrical engineering. https://doi.org/10.1109/ICCEE.2009.127
Bhaya W (2013) Text steganography based on font type in MS-word documents. J Comput Sci 99:898–904. https://doi.org/10.3844/jcssp.2013.898.904
Bhattacharyya S, Indu P, Dutta S, Biswas A, Sanyal G (2011) Hiding data in text through changing in alphabet letter patterns (CALP). J Glob Res Comput Sci 2:33–39
Roy S, Manasmita M (2011) A novel approach to format based text steganography. In: ICCCS’11:Proceedings of the 2011 international conference on communication, computing & security. https://doi.org/10.1145/1947940.1948046
Agarwal M (2013) Text steganographic approaches: a comparison. Int J Netw Secur Appl 5:91–106. https://doi.org/10.5121/ijnsa.2013.5107
Shirali-Shahreza M (2008) Text steganography by changing words spelling. In: 10th international conference on advanced communication technology. https://doi.org/10.1109/ICACT.2008.4494159
Singh P, Chaudhary R, Agarwal A (2012) A novel approach of text steganography based on null spaces. IOSR J Comput Eng 3:11–17. https://doi.org/10.9790/0661-0341117
Thabit R, Udzir NI, Yasin SM, Asmawi A, Roslan NA, Din R (2021) A comparative analysis of arabic text steganography. Appl Sci 11(15):6851. https://doi.org/10.3390/app11156851
Mohammed AM, Rossilawati S, Zarina S, Mohammad KH (2021) A review on text steganography techniques. Mathematics 9(21):1–28
Wu N, Ma W, Liu Z, Shang P, Yang Z, Fan J (2019) Coverless Text Steganography Based on Half Frequency Crossover Rule. In: Proceedings of the 2019 4th international conference on mechanical, control and computer engineering (ICMCCE). pp 726–7263. https://doi.org/10.1109/ICMCCE48743.2019.00168
Alghamdi N, Berriche L (2019) Capacity investigation of Markov chain-based statistical text steganography. Arabic language case. In: Proceedings of the 2019 Asia Pacific information technology conference, pp 37–43. https://doi.org/10.1145/3314527.3314532
Alanazi N, Khan E, Gutub A (2020) Efficient security and capacity techniques for Arabic text steganography via engaging Unicode standard encoding. Multimed Tools Appl 80:1403–1431. https://doi.org/10.1007/s11042-020-09667-y
Bhat D, Krithi V, Manjunath KN, Prabhu S, Renuka A (2017) Information hiding through dynamic text steganography and cryptography. Comput Inform. https://doi.org/10.1109/ICACCI.2017.8126110
Jayapandiyan JR, Kavitha C, Sakthivel K (2020) Enhanced least significant bit replacement algorithm in spatial domain of steganography using character sequence optimization. IEEE Access 8:136537–136545. https://doi.org/10.1109/ACCESS.2020.3009234
Wu N, Liu Z, Ma W, Shang P, Yang, Z, Fan J (2019) Research on coverless text steganography based on multi-rule language models alternation. In: Proceedings of the 2019 4th international conference on mechanical, control and computer engineering (ICMCCE), pp 803–8033. https://doi.org/10.1109/ICMCCE48743.2019.00184
Murphy B, Vogel C (2007) The syntax of concealment: reliable methods for plain text information hiding. Proc SPIE Int Soc Opt Eng. https://doi.org/10.1117/12.713357
Meral HM, Sankur B, Ozsoy AS, Gungor T, Sevinc E (2009) Natural language watermarking via morphosyntactic alterations. Comput Speech Lang 23:107–125. https://doi.org/10.1016/j.csl.2008.04.001
Muhammad HZ, Rahman SMSAA, Shakil A (2009) Synonym based Malay linguistic text steganography. In: Proceedings of the innovative technologies in intelligent systems and industrial applications, CITISIA (2009). https://doi.org/10.1109/CITISIA.2009.5224169
Xiang L, Wu W, Li X, Yang C (2018) A linguistic steganography based on word indexing compression and candidate selection. Multimed Tools Appl 77:28969–28989. https://doi.org/10.1007/s11042-018-6072-8
Xiang L, Wang X, Yang C, Liu P (2017) A novel linguistic steganography based on synonym run-length encoding. IEICE Trans Inf Syst 100:313–322. https://doi.org/10.1587/transinf.2016EDP7358
Li M, Mu K, Zhong P, Wen J, Xue Y (2019) Generating steganographic image description by dynamic synonym substitution. Signal Process 164:193–201. https://doi.org/10.1016/j.sigpro.2019.06.014
Topkara M, Topkara U, Atallah MJ (2007) Information hiding through errors: a confusing approach. Proc SPIE Int Soc Opt Eng. https://doi.org/10.1117/12.706980
Chang CY, Clark S (2010) Linguistic steganography using automatically generated paraphrases. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics. https://aclanthology.org/N10-1084
Naqvi N, Abbasi AT, Hussain R, Khan MA, Ahmad B (2018) Multilayer partially homomorphic encryption text steganography (MLPHE-TS): a zero steganography approach. Wirel Pers Commun 103:1563–1585. https://doi.org/10.1007/s11277-018-5868-1
Mansor FZ, Mustapha A, Din R, Abas A, Utama S (2018) An antonym substitution-based model on linguistic steganography method. Indonesian. J Electr Eng Comput Sci 12: 225–232. https://doi.org/10.1159/ijeecs.v12.i1.pp225-232
Mahato S, Khan DA, Yadav DK (2020) A modified approach to data hiding in Microsoft Word documents by change-tracking technique. J King Saud Univ Comput Inf Sci 32:216–224. https://doi.org/10.1016/j.jksuci.2017.08.004
Wu N, Shang P, Fan J, Yang Z, Ma W, Liu Z (2019) Research on coverless text steganography based on single bit rules. J Phys: Conf Ser 1237:1–6. https://doi.org/10.1088/1742-6596/1237/2/022077
Chen X, Sun H, Tobe Y, Zhou Z (2015) Sun X (2015) Coverless information hiding method based on the chinese mathematical expression. Int Conf Cloud Comput Secur. https://doi.org/10.1007/978-3-319-27051-7_12
Wang K, Gao Q (2019) A coverless plain text steganography based on character features. In IEEE Access 7:95665–95676. https://doi.org/10.1109/ACCESS.2019.2929123
Wu N, Shang P, Fan J, Yang Z, Ma W, Liu Z (2019) Coverless text steganography based on maximum variable bit embedding rules. J Phys: Conf Ser 1237:1–6. https://doi.org/10.1088/1742-6596/1237/2/022078
Wu N, Yang Z, Yang Y, Li L, Shang P, Ma W, Liu Z (2020) STBS-Stega: Coverless text steganography based on state transition-binary sequence. Int J Distrib Sens Netw 16:1–12. https://doi.org/10.1177/1550147720914257
Zhang W, Wang, X, Zhang C, Zhang J (2020) Coverless text steganography method based on characteristics of word association. In: 2020 IEEE 20th international conference on communication technology (ICCT). https://doi.org/10.1109/ICCT50939.2020.9295910
Yang R, Ling Z (2019) Linguistic Steganography by Sampling-based Language Generation. In: 2019 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). https://doi.org/10.1109/APSIPAASC47483.2019.9023313
Fang T, Jaggi M, Argyraki K (2017) Generating steganographic text with LSTMs. https://arxiv.org/abs/1705.10742: 100–106. https://aclanthology.org/P17-3017
Tong Y, Liu Y, Wang J, Xin G (2019) Text steganography on RNN-generated lyrics. Math Biosci Eng 16:5451–5463. https://doi.org/10.3934/mbe.2019271
Dai F, Cai Z (2019) Towards near-imperceptible steganographic text. In: Proceedings of the 57th annual meeting of the association for computational linguistics. https://doi.org/10.18653/v1/P19-1422
Ziegler Z, Deng Y, Rush A (2019) Neural linguistic steganography. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). https://doi.org/10.18653/v1/D19-1115
Shniperov AN, Nikitina KA (2016) A text steganography method based on Markov chains. Autom Control Comput Sci 50:802–808. https://doi.org/10.3103/S0146411616080174
Luo Y, Huang Y, Li F, Chang C (2016) Text steganography based on ci-poetry generation using markov chain model. Ksii Trans Internet Inf Syst 10:4568–4584. https://doi.org/10.3837/tiis.2016.09.029
Moraldo HH (2014) An approach for text steganography based on markov chains. Aut Control Comp Sci 50:802–808. https://doi.org/10.3103/S0146411616080174
Dai W, Yu Y, Deng B (2009) BinText steganography based on Markova state transferring probability. In: Proceedings of the 2nd international conference on interaction sciences: information technology, culture and human, ICIS’09 (2009). https://doi.org/10.1145/1655925.1656165
Dai W, Yu Y, Dai Y, Deng B (2010) Text steganography system using markov chain source model and des algorithm. J Softw 5:785–792. https://doi.org/10.4304/jsw.5.7.785-792
Shen J, Heng J, Han J (2020) Near-imperceptible Neural Linguistic Steganography via Self-Adjusting Arithmetic Coding. EMNLP 2020.
Luo Y, Huang Y (2017) Text steganography with high embedding rate: using recurrent neural networks to generate chinese classic poetry. In: IH&MMSec '17: Proceedings of the 5th ACM workshop on information hiding and multimedia security. https://doi.org/10.1145/3082031.3083240
Zhou X, Peng W, Yang B, Wen J, Xue Y, Zhong P (2021) Linguistic steganography based on adaptive probability distribution. IEEE Trans Dependable Secure Comput (Early Access). https://doi.org/10.1109/TDSC.2021.3079957
Yang Z, Xiang L, Zhang S, Sun X, Huang Y (2021) Linguistic generative steganography with enhanced cognitive-imperceptibility. IEEE Signal Process Lett 28:409–413. https://doi.org/10.1109/LSP.2021.3058889
Yang ZL, Zhang SY, Hu YT, Hu ZW, Huang YF (2021) VAE-Stega: linguistic steganography based on variational auto-encoder. In IEEE Trans Inf For Secur 16:880–895. https://doi.org/10.1109/TIFS.2020.3023279
Kumar R, Chand S, Singh S (2014) An Email based high capacity text steganography scheme using combinatorial compression. In: 2014 5th international conference - confluence the next generation information technology summit (confluence). https://doi.org/10.1109/CONFLUENCE.2014.6949231
Kumar R, Malik A, Singh S, Chand S (2016) A high capacity email based text steganography scheme using Huffman compression. In: 2016 3rd international conference on signal processing and integrated networks (SPIN). https://doi.org/10.1109/SPIN.2016.7566661
Tutuncu K, Hassan AA (2015) New approach in e-mail based text steganography. Int J Intell Syst Appl Eng 3: 54–56. https://doi.org/10.18201/ijisae.05687
Malik A, Sikka G, Verma HK (2017) A high capacity text steganography scheme compression and color coding. Eng LZW Sci Technol Int J 20:72–79. https://doi.org/10.1016/j.jestch.2016.06.005
Fateh M, Rezvani M (2018) An email-based high capacity text steganography using repeating characters. Int J Comput Appl 43:226–232. https://doi.org/10.1080/1206212X.2018.1517713
Berglund M, Raiko T, Honkala M, Kärkkäinen L, Vetek A, Karhunen J (2015) Bidirectional recurrent neural networks as generative models. In: NIPS'15: Proceedings of the 28th international conference on neural information processing systems 1: 856–864. https://doi.org/10.1021/acs.jcim.9b00943
Wang H, Zhang W, Zhu Y, Bai Z (2019) Data-to-text generation with attention recurrent unit. In: 2019 international joint conference on neural networks (IJCNN), (2019). https://doi.org/10.1109/IJCNN.2019.8852343
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. Preprint https://arxiv.org/abs/1409.0473.
Luong M, Pham H, Manning C (2015) Effective approaches to attention-based neural machine translation. Preprint https://arxiv.org/abs/1508.04025. https://aclanthology.org/D15-1166.pdf
Oinar C (2021) Introduction to Attention Mechanism: Bahdanau and Luong Attention. Artificial Intelligence. https://ai.plainenglish.io/introduction-to-attention-mechanism-bahdanau-and-luong-attention-e2efd6ce22da. Accessed 11 August 2021
Khandelwal R (2020) Attention: Sequence 2 Sequence model with Attention Mechanism. Towards Data Science. https://towardsdatascience.com/sequence-2-sequence-model-with-attention-mechanism-9e9ca2a613a. Accessed 15 August 2021
Welch TA (1984) A technique for high performance data compression. Computer 17:8–19. https://doi.org/10.1109/MC.1984.1659158
Varian C, Munır R (2019) Modified email header steganography using LZW compression algorithm. In: Proceedings of the Sriwijaya international conference on information technology and its applications (SICONIAN 2019). https://doi.org/10.2991/aisr.k.200424.016
Chen C, Chang C (2010) High-capacity reversible data-hiding for LZW codes. In: 2010 second international conference on computer modeling and simulation. https://doi.org/10.1109/ICCMS.2010.346
Kaggle. https://www.kaggle.com/. Accessed 17 August 2021
Siyah B (2018) newspaper article Turkish (for simple exercises). Kaggle. https://www.kaggle.com/bulentsiyah/hurriyet (2018). Accessed 18 August 2021
Boğan H (2021) Turkish Corpus. Kaggle. https://www.kaggle.com/redrussianarmy/turkish-corpus. Accessed 18 August 2021
Erdem H (2021) Turkish Sentence by Kuzgunlar. Kaggle. https://www.kaggle.com/rootofarch/kuzgunlar-acikhack-tr-sentence. Accessed 18 August 2021
Ozturk O (2021) 910 Turkish Articles by 69 Columnists. Kaggle. https://www.kaggle.com/oktayozturk010/910-turkish-articles-by-69-columnists. Accessed 18 August 2021
Mikolov T, Karafiát M, Burget L, Cernocký J, Khudanpur S (2010) Recurrent neural network based language model. In Proc. Interspeech. https://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf
Zhang Z, Liu J, Ke Y, Li J, Zhang M, Yang X (2019) Generative steganography by sampling. IEEE Access 7:118586–118597. https://doi.org/10.1109/ACCESS.2019.2920313
Zhang R, Dong S, Liu J (2019) Invisible steganography via generative adversarial networks. Multimed Tools Appl 78:8559–8575. https://doi.org/10.1007/s11042-018-6951-z
Rosa dos Reis T (2020) Measuring the statistical similarity between two samples using Jensen-Shannon and Kullback-Leibler divergences. Medium. https://medium.com/datalab-log/measuring-the-statistical-similarity-between-two-samples-using-jensen-shannon-and-kullback-leibler-. Accessed 20 August 2021
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In Proc Int Conf Mach Learn 32:1188–1196
Yang Z, Huang Y, Zhang YJ (2019) A fast and efficient text steganalysis method. IEEE Signal Process Lett 26:627–631. https://doi.org/10.1109/LSP.2019.2902095
Din R, Yusof SAM, Amphawan A, Hussain HS, Yaacob H, Jamaludin N, Samsudin A (2015) Performance analysis on text steganalysis method using a computational intelligence approach. In: International conference on electrical engineering, computer science and informatics (EECSI 2015). https://doi.org/10.11591/eecsi.v2.772
Wen J, Zhou X, Zhong P, Xue Y (2019) Convolutional neural network based text steganalysis. IEEE Signal Process Lett 26:460–464. https://doi.org/10.1109/LSP.2019.2895286
Vania C, Grivas A, Lopez A (2018) What do character-level models learn about morphology? The case of dependency parsing. In: Proceedings of the 2018 conference on empirical methods in natural language processing. https://doi.org/10.18653/v1/D18-1278
Acknowledgements
I would like to thank the "Sky Translation Office" for the language editing of the article.
Funding
No funding was received to assist with the preparation of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Author Merve Varol Arısoy has received assistance from Professor Dr. Ecir Uğur Küçüksille only in terms of sharing information about solving the problems encountered in the project and providing the necessary guidance during the realization of the study. Except this, the author has no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Varol Arısoy, M. LZW-CIE: a high-capacity linguistic steganography based on LZW char index encoding. Neural Comput & Applic 34, 19117–19145 (2022). https://doi.org/10.1007/s00521-022-07499-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07499-5