Enhancing Semantic Consistency in Linguistic Steganography via Denosing Auto-Encoder and Semantic-Constrained Huffman Coding

Wang, Shuoxin; Li, Fanxiao; Yu, Jiong; Lai, Haosen; Wu, Sixing; Zhou, Wei

doi:10.1007/978-3-031-44696-2_62

Shuoxin Wang^11,12,
Fanxiao Li^11,12,
Jiong Yu^11,12,
Haosen Lai^11,12,
Sixing Wu^11,12 &
…
Wei Zhou^11,12

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14303))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1347 Accesses

Abstract

Linguistic steganography is a useful technique to hide secret messages within a normal cover text, playing a crucial role in the field of data protection. Compared to the data encryption techniques, steganography can make the security data transmission process more imperceptible because the outputted stego (steganography) texts are not garbled codes but look like normal texts. Consequently, the essential of linguistic steganography is to improve the imperceptibility of the outputted stego texts. Although prior works can already generate fluent stego texts, how to ensure the semantics of stego text be natural and reasonable in human cognition is still a challenging problem. To alleviate this issue, this work proposes a novel Semantic-Preserved Linguistic Steganography Auto-Encoder (SPLS-AutoEncoder) to improve imperceptibility by enhancing semantic consistency. SPLS-AutoEncoder first minimizes the possible distortion when embedding the secret message into the cover text by using the denoising auto-encoder BART as the backbone model. Then, we propose a novel Dynamic Semantic-Constrained Huffman Coding, which uses a dynamic context information embedding and a global topic embedding to ensure semantic consistency between the cover text and stego text. Experimental results on two Chinese datasets show that our method has excellent performance compared with the previous methods. The datasets and code are released at https://github.com/Y-NLP/LinguisticSteganography/tree/main/NLPCC2023_SPLS-AutoEncoder.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

High-Performance Linguistic Steganalysis, Capacity Estimation and Steganographic Positioning

TStego-THU: Large-Scale Text Steganalysis Dataset

A Comprehensive Review on Deep Learning-Based Generative Linguistic Steganography

References

Cachin, C.: An information-theoretic model for steganography. Inf. Comput. 192, 41–56 (2004)
Article MathSciNet MATH Google Scholar
Dai, F., Cai, Z.: Towards near-imperceptible steganographic text. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019)
Google Scholar
Djebbar, F., Ayad, B., Meraim, K.A., Hamam, H.: Comparative study of digital audio steganography techniques. EURASIP J AUDIO SPEE 2012, 1–16 (2012)
Google Scholar
Fang, T., Jaggi, M., Argyraki, K.: Generating steganographic text with LSTMs. In: ACL (2017)
Google Scholar
Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020)
Google Scholar
Li, F., et al.: Rewriting-Stego: generating natural and controllable steganographic text with pre-trained language model. In: Wang, X., et al. (eds.) DASFAA 2023. Lecture Notes in Computer Science, vol. 13943. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30637-2_41
Chapter Google Scholar
Li, Y., Zhang, J., Yang, Z., Zhang, R.: Topic-aware neural linguistic steganography based on knowledge graphs. ACM/IMS Trans. Data Sci. 2, 1–13 (2021)
Google Scholar
Minzhi, Z., Xingming, S., Huazheng, X.: Research on the Chinese text steganography based on the modification of the empty word. Comput. Eng. Appl. 42, 158–160 (2006)
Google Scholar
Niu, Y., Wen, J., Zhong, P., Xue, Y.: A hybrid R-BILSTM-C neural network based text steganalysis. IEEE Signal Process. Lett. 26, 1907–1911 (2019)
Article Google Scholar
Provos, N., Honeyman, P.: Hide and seek: an introduction to steganography. IEEE Secur. Priv. 1, 32–44 (2003)
Article Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019)
Google Scholar
Shen, J., Ji, H., Han, J.: Near-imperceptible neural linguistic steganography via self-adjusting arithmetic coding. In: EMNLP (2020)
Google Scholar
Tang, X., Chen, M.: Design and implementation of information hiding system based on RGB. In: CECNet (2013)
Google Scholar
Ueoka, H., Murawaki, Y., Kurohashi, S.: Frustratingly easy edit-based linguistic steganography with a masked language model (2021)
Google Scholar
Volkhonskiy, D., Borisenko, B., Burnaev, E.: Generative adversarial networks for image steganography (2016)
Google Scholar
Wang, K., Zhao, H., Wang, H.: Video steganalysis against motion vector-based steganography by adding or subtracting one motion vector value. TIFS 9, 741–751 (2014)
Google Scholar
Xu, M.: Text2vec: text to vector toolkit. https://github.com/shibing624/text2vec (2022)
Yang, H., Bao, Y., Yang, Z., Liu, S., Huang, Y., Jiao, S.: Linguistic steganalysis via densely connected LSTM with feature pyramid. In: ACM Workshop on Information Hiding and Multimedia Security (2020)
Google Scholar
Yang, J., Yang, Z., Zhang, S., Tu, H., Huang, Y.: SeSy: linguistic steganalysis framework integrating semantic and syntactic features. IEEE Signal Process. Lett. 29, 31–35 (2021)
Article Google Scholar
Yang, J., Yang, Z., Zou, J., Tu, H., Huang, Y.: Linguistic steganalysis toward social network. TDSC 18, 859–871 (2023)
Google Scholar
Yang, T., Wu, H., Yi, B., Feng, G., Zhang, X.: Semantic-preserving linguistic steganography by pivot translation and semantic-aware bins coding. TDSC (2023)
Google Scholar
Yang, X., Li, F., Xiang, L.: Synonym substitution-based steganographic algorithm with matrix coding. Chin. Comput. Syst. 36, 1296–1300 (2015)
Google Scholar
Yang, Z.L., Guo, X.Q., Chen, Z.M., Huang, Y.F., Zhang, Y.J.: RNN-stega: linguistic steganography based on recurrent neural networks. TIFS 14, 1280–1295 (2018)
Google Scholar
Yang, Z., Wang, K., Li, J., Huang, Y., Zhang, Y.J.: TS-RNN: text steganalysis based on recurrent neural networks. IEEE Signal Process. Lett. 26, 1743–1747 (2019)
Article Google Scholar
Yang, Z., Wei, N., Sheng, J., Huang, Y., Zhang, Y.J.: TS-CNN: text steganalysis from semantic space based on convolutional neural network. arXiv preprint: arXiv:1810.08136 (2018)
Yi, B., Wu, H., Feng, G., Zhang, X.: ALiSa: acrostic linguistic steganography based on BERT and Gibbs sampling. IEEE Signal Process. Lett. 29, 687–691 (2022)
Article Google Scholar
Zhang, S., Yang, Z., Yang, J., Huang, Y.: Provably secure generative linguistic steganography. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP (2021)
Google Scholar
Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with BERT. arXiv preprint: arXiv:1904.09675 (2019)
Zhao, Z., et al.: UER: an open-source toolkit for pre-training models. EMNLP-IJCNLP (2019)
Google Scholar
Ziegler, Z., Deng, Y., Rush, A.: Neural linguistic steganography. In: EMNLP-IJCNLP (2019)
Google Scholar

Download references

Acknowledgements

This work is supported in part by Yunnan Province Education Department Foundation under Grant No.2022j0008, in part by the National Natural Science Foundation of China under Grant 62162067 and 62101480, Research and Application of Object detection based on Artificial Intelligence, in part by the Yunnan Province expert workstations under Grant 202205AF150145.

Author information

Authors and Affiliations

Engineering Research Center of Cyberspace, Yunnan University, Kunming, China
Shuoxin Wang, Fanxiao Li, Jiong Yu, Haosen Lai, Sixing Wu & Wei Zhou
National Pilot School of Software, Yunnan University, Kunming, China
Shuoxin Wang, Fanxiao Li, Jiong Yu, Haosen Lai, Sixing Wu & Wei Zhou

Authors

Shuoxin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fanxiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Haosen Lai
View author publications
You can also search for this author in PubMed Google Scholar
Sixing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sixing Wu .

Editor information

Editors and Affiliations

Emory University, Atlanta, GA, USA
Fei Liu
Microsoft Research Asia, Beijing, China
Nan Duan
Soochow University, Suzhou, China
Qingting Xu
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, S., Li, F., Yu, J., Lai, H., Wu, S., Zhou, W. (2023). Enhancing Semantic Consistency in Linguistic Steganography via Denosing Auto-Encoder and Semantic-Constrained Huffman Coding. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14303. Springer, Cham. https://doi.org/10.1007/978-3-031-44696-2_62

Download citation

DOI: https://doi.org/10.1007/978-3-031-44696-2_62
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44695-5
Online ISBN: 978-3-031-44696-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Enhancing Semantic Consistency in Linguistic Steganography via Denosing Auto-Encoder and Semantic-Constrained Huffman Coding

Abstract

Access this chapter

Similar content being viewed by others

High-Performance Linguistic Steganalysis, Capacity Estimation and Steganographic Positioning

TStego-THU: Large-Scale Text Steganalysis Dataset

A Comprehensive Review on Deep Learning-Based Generative Linguistic Steganography

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Enhancing Semantic Consistency in Linguistic Steganography via Denosing Auto-Encoder and Semantic-Constrained Huffman Coding

Abstract

Access this chapter

Similar content being viewed by others

High-Performance Linguistic Steganalysis, Capacity Estimation and Steganographic Positioning

TStego-THU: Large-Scale Text Steganalysis Dataset

A Comprehensive Review on Deep Learning-Based Generative Linguistic Steganography

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation