Skip to main content

A CNN Based Visual Audio Steganography Model

  • Conference paper
  • First Online:
Artificial Intelligence and Security (ICAIS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13338))

Included in the following conference series:

Abstract

Deep learning based steganography is an important protection for secret message, especially for secret images. For different type of cover and secret message, such as audio cover and secret image, the imperceptibility of steganography can be improved, however, the representation difference between audio cover and secret image becomes a great challenge. In this paper, we propose a visual audio steganography model to based on convolutional neural network (CNN). In our model, we design an audio visualization method with STFT and DWT transformation. Then we exploit ISGAN to build an auto encoder, in order to embed a grayscale image into a segment of audio in the embedding stage. Experimental results show that generated stego audio fidelity is indistinguishable to the listener, and we can extract high-quality grayscale images from stego audio in the extraction stage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Sahiner, B., Chan, H.P., Petrick, N.: Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images. IEEE Trans. Med. Imaging 15(5), 598–610 (2002)

    Article  Google Scholar 

  2. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434 (2015)

  3. Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)

    Google Scholar 

  4. Zhang, R., Dong, S., Liu, J.: Invisible steganography via generative adversarial networks. arXiv:1807.0857 (2018)

  5. Chan, C.K., Cheng, L.M.: Hiding data in images by simple LSB substitution. Pattern Recogn. 37, 469–474 (2004)

    Article  Google Scholar 

  6. Sharp, T.: An implementation of key-based digital signal steganography. In: International Workshop on Information Hiding, pp. 13–26. Springer, Berlin, Heidelberg (2001)

    Google Scholar 

  7. Mielikainen, J.: LSB matching revisited. IEEE Sig. Process. Lett. 13(5), 285–287 (2006)

    Article  Google Scholar 

  8. Pevny, T., Filler, T., Bas, P.: Using high-dimensional image models to perform highly undetectable steganography. In: International Workshop on Information Hiding. Springer, Berlin, Heidelberg, 161–177 (2010)

    Google Scholar 

  9. Holub, V., Fridrich, J.: Designing steganographic distortion using directional filters. In: 2012 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 234–239, IEEE (2012)

    Google Scholar 

  10. Holub, V., Fridrich, J.: Digital image steganography using universal distortion. In: Proceedings of the first ACM Workshop on Information Hiding and Multimedia Security, pp. 59–68 ACM (2013)

    Google Scholar 

  11. Li, B., Tan, S., Wang, M., et al.: Investigation on cost assignment in spatial image steganography. IEEE Trans. Inf. Forensics Secur. 9(8), 1264–1277 (2014)

    Article  Google Scholar 

  12. Wang, C., Ni, J.: An efficient JPEG steganographic scheme based on the block entropy of DCT. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1785–1788 (2012)

    Google Scholar 

  13. Guo, L., Ni, J., Shi, Y.Q.: Uniform embedding for efficient JPEG steganography. IEEE Trans. Inf. Forensics Secur. 9, 814–825 (2014)

    Article  Google Scholar 

  14. Zhang, X., Sun, X., Sun, X., Sun, W., Jha, S.K.: Robust reversible audio watermarking scheme for telemedicine and privacy protection. Comput. Mater. Cont. 71(2), 3035–3050 (2022)

    Google Scholar 

  15. Volkhonskiy, D., Nazarov, I., Borisenko, B., et al.: Steganographic generative adversarial networks. arXiv:1703.05502 (2017)

  16. Yang, Z., Guo, X., Chen, Z., Huang, Y., Zhang, Y.: RNN-Stega: linguistic steganography based on recurrent neural networks. IEEE Trans. Inf. Forensics Secur. 14(5), 1280–1295 (2018)

    Article  Google Scholar 

  17. Wang, Y., Fu, Z., Sun, X.: High visual quality image steganography based on encoder- decoder model. J. Cyber Secur. 2(3), 115–121 (2020)

    Article  Google Scholar 

  18. Zhang, R., Zhu, F., Liu, J., Liu, G.: Depth-wise separable convolutions and multi-level pooling for an efficient spatial CNN-based steganalysis. IEEE Trans. Inf. Forensics Secur. 15(1), 1138–1150 (2020)

    Article  Google Scholar 

  19. Shi, H., Dong, J., Wang, W., et al.: Ssgan: Secure steganography based on generative adversarial networks. In: Pacific Rim Conference on Multimedia, pp. 534–544. Springer, Cham (2017)

    Google Scholar 

  20. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. In: Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 214–223 (2017)

    Google Scholar 

  21. Hayes, J., Danezis, G.: Generating steganographic images via adversarial training. In: Advances in Neural Information Processing Systems, pp. 1954–1963 (2017)

    Google Scholar 

  22. Zhu, J., Kaplan, R., Johnson, J., Fei-Fei, L.: HiDDeN: Hiding Data With Deep Networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 682–697. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_40

    Chapter  Google Scholar 

  23. Shumeet, B.: Hiding images in plain sight: deep steganography. In: Proceedings of Advances in Neural Information Processing Systems, vol. 30, pp. 2069–2079 (2017)

    Google Scholar 

  24. ur Rehman, A., Rahim, R., Nadeem, S., ul Hussain, S.: End-to-End Trained CNN Encoder-Decoder Networks for Image Steganography. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11132, pp. 723–729. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11018-5_64

    Chapter  Google Scholar 

  25. Alsaedi, N.H., Jaha, E.S.: Dynamic audio-visual biometric fusion for person recognition. Computers, Materials & Continua 71(1), 1283–1311 (2022)

    Article  Google Scholar 

  26. Learned-Miller, E., Huang, G.B., Roychowdhury, A., Li, H., Hua, G.: Labeled faces in the wild: a survey. In: Advances in Face Detection and Facial Image Analysis, pp. 189–248, Springer (2016)

    Google Scholar 

  27. Garofolo, J.S.: Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium (1993)

    Google Scholar 

  28. Diederik, P. Kingma, Jimmy, Ba. Adam: A Method for Stochastic Optimization. In: The 3rd International Conference for Learning Reoresentations (2015)

    Google Scholar 

Download references

Acknowledgement

The authors are indebted to anonymous reviewers for their helpful suggestions and valuable comments. The work is supported by the National Key Research and Development Program of China (No. 2019YFB1406504), the National Natural Science Foundation of China (No.U1836108, No.U1936216, No.62002197, No.62001038) and the Fundamental Research Funds for the Central Universities (No.2021RC30).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhen Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, R., Dong, H., Yang, Z., Ying, W., Liu, J. (2022). A CNN Based Visual Audio Steganography Model. In: Sun, X., Zhang, X., Xia, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2022. Lecture Notes in Computer Science, vol 13338. Springer, Cham. https://doi.org/10.1007/978-3-031-06794-5_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06794-5_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06793-8

  • Online ISBN: 978-3-031-06794-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics