Skip to main content

Enhancing Semantic Consistency in Linguistic Steganography via Denosing Auto-Encoder and Semantic-Constrained Huffman Coding

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14303))

  • 1347 Accesses

Abstract

Linguistic steganography is a useful technique to hide secret messages within a normal cover text, playing a crucial role in the field of data protection. Compared to the data encryption techniques, steganography can make the security data transmission process more imperceptible because the outputted stego (steganography) texts are not garbled codes but look like normal texts. Consequently, the essential of linguistic steganography is to improve the imperceptibility of the outputted stego texts. Although prior works can already generate fluent stego texts, how to ensure the semantics of stego text be natural and reasonable in human cognition is still a challenging problem. To alleviate this issue, this work proposes a novel Semantic-Preserved Linguistic Steganography Auto-Encoder (SPLS-AutoEncoder) to improve imperceptibility by enhancing semantic consistency. SPLS-AutoEncoder first minimizes the possible distortion when embedding the secret message into the cover text by using the denoising auto-encoder BART as the backbone model. Then, we propose a novel Dynamic Semantic-Constrained Huffman Coding, which uses a dynamic context information embedding and a global topic embedding to ensure semantic consistency between the cover text and stego text. Experimental results on two Chinese datasets show that our method has excellent performance compared with the previous methods. The datasets and code are released at https://github.com/Y-NLP/LinguisticSteganography/tree/main/NLPCC2023_SPLS-AutoEncoder.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

References

  1. Cachin, C.: An information-theoretic model for steganography. Inf. Comput. 192, 41–56 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Dai, F., Cai, Z.: Towards near-imperceptible steganographic text. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (2019)

    Google Scholar 

  3. Djebbar, F., Ayad, B., Meraim, K.A., Hamam, H.: Comparative study of digital audio steganography techniques. EURASIP J AUDIO SPEE 2012, 1–16 (2012)

    Google Scholar 

  4. Fang, T., Jaggi, M., Argyraki, K.: Generating steganographic text with LSTMs. In: ACL (2017)

    Google Scholar 

  5. Lewis, M., et al.: BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In: ACL (2020)

    Google Scholar 

  6. Li, F., et al.: Rewriting-Stego: generating natural and controllable steganographic text with pre-trained language model. In: Wang, X., et al. (eds.) DASFAA 2023. Lecture Notes in Computer Science, vol. 13943. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-30637-2_41

    Chapter  Google Scholar 

  7. Li, Y., Zhang, J., Yang, Z., Zhang, R.: Topic-aware neural linguistic steganography based on knowledge graphs. ACM/IMS Trans. Data Sci. 2, 1–13 (2021)

    Google Scholar 

  8. Minzhi, Z., Xingming, S., Huazheng, X.: Research on the Chinese text steganography based on the modification of the empty word. Comput. Eng. Appl. 42, 158–160 (2006)

    Google Scholar 

  9. Niu, Y., Wen, J., Zhong, P., Xue, Y.: A hybrid R-BILSTM-C neural network based text steganalysis. IEEE Signal Process. Lett. 26, 1907–1911 (2019)

    Article  Google Scholar 

  10. Provos, N., Honeyman, P.: Hide and seek: an introduction to steganography. IEEE Secur. Priv. 1, 32–44 (2003)

    Article  Google Scholar 

  11. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019)

    Google Scholar 

  12. Shen, J., Ji, H., Han, J.: Near-imperceptible neural linguistic steganography via self-adjusting arithmetic coding. In: EMNLP (2020)

    Google Scholar 

  13. Tang, X., Chen, M.: Design and implementation of information hiding system based on RGB. In: CECNet (2013)

    Google Scholar 

  14. Ueoka, H., Murawaki, Y., Kurohashi, S.: Frustratingly easy edit-based linguistic steganography with a masked language model (2021)

    Google Scholar 

  15. Volkhonskiy, D., Borisenko, B., Burnaev, E.: Generative adversarial networks for image steganography (2016)

    Google Scholar 

  16. Wang, K., Zhao, H., Wang, H.: Video steganalysis against motion vector-based steganography by adding or subtracting one motion vector value. TIFS 9, 741–751 (2014)

    Google Scholar 

  17. Xu, M.: Text2vec: text to vector toolkit. https://github.com/shibing624/text2vec (2022)

  18. Yang, H., Bao, Y., Yang, Z., Liu, S., Huang, Y., Jiao, S.: Linguistic steganalysis via densely connected LSTM with feature pyramid. In: ACM Workshop on Information Hiding and Multimedia Security (2020)

    Google Scholar 

  19. Yang, J., Yang, Z., Zhang, S., Tu, H., Huang, Y.: SeSy: linguistic steganalysis framework integrating semantic and syntactic features. IEEE Signal Process. Lett. 29, 31–35 (2021)

    Article  Google Scholar 

  20. Yang, J., Yang, Z., Zou, J., Tu, H., Huang, Y.: Linguistic steganalysis toward social network. TDSC 18, 859–871 (2023)

    Google Scholar 

  21. Yang, T., Wu, H., Yi, B., Feng, G., Zhang, X.: Semantic-preserving linguistic steganography by pivot translation and semantic-aware bins coding. TDSC (2023)

    Google Scholar 

  22. Yang, X., Li, F., Xiang, L.: Synonym substitution-based steganographic algorithm with matrix coding. Chin. Comput. Syst. 36, 1296–1300 (2015)

    Google Scholar 

  23. Yang, Z.L., Guo, X.Q., Chen, Z.M., Huang, Y.F., Zhang, Y.J.: RNN-stega: linguistic steganography based on recurrent neural networks. TIFS 14, 1280–1295 (2018)

    Google Scholar 

  24. Yang, Z., Wang, K., Li, J., Huang, Y., Zhang, Y.J.: TS-RNN: text steganalysis based on recurrent neural networks. IEEE Signal Process. Lett. 26, 1743–1747 (2019)

    Article  Google Scholar 

  25. Yang, Z., Wei, N., Sheng, J., Huang, Y., Zhang, Y.J.: TS-CNN: text steganalysis from semantic space based on convolutional neural network. arXiv preprint: arXiv:1810.08136 (2018)

  26. Yi, B., Wu, H., Feng, G., Zhang, X.: ALiSa: acrostic linguistic steganography based on BERT and Gibbs sampling. IEEE Signal Process. Lett. 29, 687–691 (2022)

    Article  Google Scholar 

  27. Zhang, S., Yang, Z., Yang, J., Huang, Y.: Provably secure generative linguistic steganography. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP (2021)

    Google Scholar 

  28. Zhang, T., Kishore, V., Wu, F., Weinberger, K.Q., Artzi, Y.: BERTScore: evaluating text generation with BERT. arXiv preprint: arXiv:1904.09675 (2019)

  29. Zhao, Z., et al.: UER: an open-source toolkit for pre-training models. EMNLP-IJCNLP (2019)

    Google Scholar 

  30. Ziegler, Z., Deng, Y., Rush, A.: Neural linguistic steganography. In: EMNLP-IJCNLP (2019)

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by Yunnan Province Education Department Foundation under Grant No.2022j0008, in part by the National Natural Science Foundation of China under Grant 62162067 and 62101480, Research and Application of Object detection based on Artificial Intelligence, in part by the Yunnan Province expert workstations under Grant 202205AF150145.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sixing Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, S., Li, F., Yu, J., Lai, H., Wu, S., Zhou, W. (2023). Enhancing Semantic Consistency in Linguistic Steganography via Denosing Auto-Encoder and Semantic-Constrained Huffman Coding. In: Liu, F., Duan, N., Xu, Q., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2023. Lecture Notes in Computer Science(), vol 14303. Springer, Cham. https://doi.org/10.1007/978-3-031-44696-2_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-44696-2_62

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-44695-5

  • Online ISBN: 978-3-031-44696-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics