Abstract
Linguistic steganography is a useful technique to hide secret messages within a normal cover text, playing a crucial role in the field of data protection. Compared to the data encryption techniques, steganography can make the security data transmission process more imperceptible because the outputted stego (steganography) texts are not garbled codes but look like normal texts. Consequently, the essential of linguistic steganography is to improve the imperceptibility of the outputted stego texts. Although prior works can already generate fluent stego texts, how to ensure the semantics of stego text be natural and reasonable in human cognition is still a challenging problem. To alleviate this issue, this work proposes a novel Semantic-Preserved Linguistic Steganography Auto-Encoder (SPLS-AutoEncoder) to improve imperceptibility by enhancing semantic consistency. SPLS-AutoEncoder first minimizes the possible distortion when embedding the secret message into the cover text by using the denoising auto-encoder BART as the backbone model. Then, we propose a novel Dynamic Semantic-Constrained Huffman Coding, which uses a dynamic context information embedding and a global topic embedding to ensure semantic consistency between the cover text and stego text. Experimental results on two Chinese datasets show that our method has excellent performance compared with the previous methods. The datasets and code are released at https://github.com/Y-NLP/LinguisticSteganography/tree/main/NLPCC2023_SPLS-AutoEncoder.
Similar content being viewed by others
References
Cachin, C.: An information-theoretic model for steganography. Inf. Comput. 192, 41–56 (2004)
Dai, F., Cai, Z.: Towards near-imperceptible steganographic text. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019)
Djebbar, F., Ayad, B., Meraim, K.A., Hamam, H.: Comparative study of digital audio steganography techniques. EURASIP J AUDIO SPEE 2012, 1–16 (2012)
Fang, T., Jaggi, M., Argyraki, K.: Generating steganographic text with LSTMs. In: ACL (2017)
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020)
Li, F., et al.: Rewriting-Stego: generating natural and controllable steganographic text with pre-trained language model. In: Wang, X., et al. (eds.) DASFAA 2023. Lecture Notes in Computer Science, vol. 13943. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30637-2_41
Li, Y., Zhang, J., Yang, Z., Zhang, R.: Topic-aware neural linguistic steganography based on knowledge graphs. ACM/IMS Trans. Data Sci. 2, 1–13 (2021)
Minzhi, Z., Xingming, S., Huazheng, X.: Research on the Chinese text steganography based on the modification of the empty word. Comput. Eng. Appl. 42, 158–160 (2006)
Niu, Y., Wen, J., Zhong, P., Xue, Y.: A hybrid R-BILSTM-C neural network based text steganalysis. IEEE Signal Process. Lett. 26, 1907–1911 (2019)
Provos, N., Honeyman, P.: Hide and seek: an introduction to steganography. IEEE Secur. Priv. 1, 32–44 (2003)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019)
Shen, J., Ji, H., Han, J.: Near-imperceptible neural linguistic steganography via self-adjusting arithmetic coding. In: EMNLP (2020)
Tang, X., Chen, M.: Design and implementation of information hiding system based on RGB. In: CECNet (2013)
Ueoka, H., Murawaki, Y., Kurohashi, S.: Frustratingly easy edit-based linguistic steganography with a masked language model (2021)
Volkhonskiy, D., Borisenko, B., Burnaev, E.: Generative adversarial networks for image steganography (2016)
Wang, K., Zhao, H., Wang, H.: Video steganalysis against motion vector-based steganography by adding or subtracting one motion vector value. TIFS 9, 741–751 (2014)
Xu, M.: Text2vec: text to vector toolkit. https://github.com/shibing624/text2vec (2022)
Yang, H., Bao, Y., Yang, Z., Liu, S., Huang, Y., Jiao, S.: Linguistic steganalysis via densely connected LSTM with feature pyramid. In: ACM Workshop on Information Hiding and Multimedia Security (2020)
Yang, J., Yang, Z., Zhang, S., Tu, H., Huang, Y.: SeSy: linguistic steganalysis framework integrating semantic and syntactic features. IEEE Signal Process. Lett. 29, 31–35 (2021)
Yang, J., Yang, Z., Zou, J., Tu, H., Huang, Y.: Linguistic steganalysis toward social network. TDSC 18, 859–871 (2023)
Yang, T., Wu, H., Yi, B., Feng, G., Zhang, X.: Semantic-preserving linguistic steganography by pivot translation and semantic-aware bins coding. TDSC (2023)
Yang, X., Li, F., Xiang, L.: Synonym substitution-based steganographic algorithm with matrix coding. Chin. Comput. Syst. 36, 1296–1300 (2015)
Yang, Z.L., Guo, X.Q., Chen, Z.M., Huang, Y.F., Zhang, Y.J.: RNN-stega: linguistic steganography based on recurrent neural networks. TIFS 14, 1280–1295 (2018)
Yang, Z., Wang, K., Li, J., Huang, Y., Zhang, Y.J.: TS-RNN: text steganalysis based on recurrent neural networks. IEEE Signal Process. Lett. 26, 1743–1747 (2019)
Yang, Z., Wei, N., Sheng, J., Huang, Y., Zhang, Y.J.: TS-CNN: text steganalysis from semantic space based on convolutional neural network. arXiv preprint: arXiv:1810.08136 (2018)
Yi, B., Wu, H., Feng, G., Zhang, X.: ALiSa: acrostic linguistic steganography based on BERT and Gibbs sampling. IEEE Signal Process. Lett. 29, 687–691 (2022)
Zhang, S., Yang, Z., Yang, J., Huang, Y.: Provably secure generative linguistic steganography. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP (2021)
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with BERT. arXiv preprint: arXiv:1904.09675 (2019)
Zhao, Z., et al.: UER: an open-source toolkit for pre-training models. EMNLP-IJCNLP (2019)
Ziegler, Z., Deng, Y., Rush, A.: Neural linguistic steganography. In: EMNLP-IJCNLP (2019)
Acknowledgements
This work is supported in part by Yunnan Province Education Department Foundation under Grant No.2022j0008, in part by the National Natural Science Foundation of China under Grant 62162067 and 62101480, Research and Application of Object detection based on Artificial Intelligence, in part by the Yunnan Province expert workstations under Grant 202205AF150145.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, S., Li, F., Yu, J., Lai, H., Wu, S., Zhou, W. (2023). Enhancing Semantic Consistency in Linguistic Steganography via Denosing Auto-Encoder and Semantic-Constrained Huffman Coding. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14303. Springer, Cham. https://doi.org/10.1007/978-3-031-44696-2_62
Download citation
DOI: https://doi.org/10.1007/978-3-031-44696-2_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44695-5
Online ISBN: 978-3-031-44696-2
eBook Packages: Computer ScienceComputer Science (R0)