Abstract
Deep learning based steganography is an important protection for secret message, especially for secret images. For different type of cover and secret message, such as audio cover and secret image, the imperceptibility of steganography can be improved, however, the representation difference between audio cover and secret image becomes a great challenge. In this paper, we propose a visual audio steganography model to based on convolutional neural network (CNN). In our model, we design an audio visualization method with STFT and DWT transformation. Then we exploit ISGAN to build an auto encoder, in order to embed a grayscale image into a segment of audio in the embedding stage. Experimental results show that generated stego audio fidelity is indistinguishable to the listener, and we can extract high-quality grayscale images from stego audio in the extraction stage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sahiner, B., Chan, H.P., Petrick, N.: Classification of mass and normal breast tissue: a convolution neural network classifier with spatial domain and texture images. IEEE Trans. Med. Imaging 15(5), 598–610 (2002)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434 (2015)
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
Zhang, R., Dong, S., Liu, J.: Invisible steganography via generative adversarial networks. arXiv:1807.0857 (2018)
Chan, C.K., Cheng, L.M.: Hiding data in images by simple LSB substitution. Pattern Recogn. 37, 469–474 (2004)
Sharp, T.: An implementation of key-based digital signal steganography. In: International Workshop on Information Hiding, pp. 13–26. Springer, Berlin, Heidelberg (2001)
Mielikainen, J.: LSB matching revisited. IEEE Sig. Process. Lett. 13(5), 285–287 (2006)
Pevny, T., Filler, T., Bas, P.: Using high-dimensional image models to perform highly undetectable steganography. In: International Workshop on Information Hiding. Springer, Berlin, Heidelberg, 161–177 (2010)
Holub, V., Fridrich, J.: Designing steganographic distortion using directional filters. In: 2012 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 234–239, IEEE (2012)
Holub, V., Fridrich, J.: Digital image steganography using universal distortion. In: Proceedings of the first ACM Workshop on Information Hiding and Multimedia Security, pp. 59–68 ACM (2013)
Li, B., Tan, S., Wang, M., et al.: Investigation on cost assignment in spatial image steganography. IEEE Trans. Inf. Forensics Secur. 9(8), 1264–1277 (2014)
Wang, C., Ni, J.: An efficient JPEG steganographic scheme based on the block entropy of DCT. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1785–1788 (2012)
Guo, L., Ni, J., Shi, Y.Q.: Uniform embedding for efficient JPEG steganography. IEEE Trans. Inf. Forensics Secur. 9, 814–825 (2014)
Zhang, X., Sun, X., Sun, X., Sun, W., Jha, S.K.: Robust reversible audio watermarking scheme for telemedicine and privacy protection. Comput. Mater. Cont. 71(2), 3035–3050 (2022)
Volkhonskiy, D., Nazarov, I., Borisenko, B., et al.: Steganographic generative adversarial networks. arXiv:1703.05502 (2017)
Yang, Z., Guo, X., Chen, Z., Huang, Y., Zhang, Y.: RNN-Stega: linguistic steganography based on recurrent neural networks. IEEE Trans. Inf. Forensics Secur. 14(5), 1280–1295 (2018)
Wang, Y., Fu, Z., Sun, X.: High visual quality image steganography based on encoder- decoder model. J. Cyber Secur. 2(3), 115–121 (2020)
Zhang, R., Zhu, F., Liu, J., Liu, G.: Depth-wise separable convolutions and multi-level pooling for an efficient spatial CNN-based steganalysis. IEEE Trans. Inf. Forensics Secur. 15(1), 1138–1150 (2020)
Shi, H., Dong, J., Wang, W., et al.: Ssgan: Secure steganography based on generative adversarial networks. In: Pacific Rim Conference on Multimedia, pp. 534–544. Springer, Cham (2017)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. In: Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 214–223 (2017)
Hayes, J., Danezis, G.: Generating steganographic images via adversarial training. In: Advances in Neural Information Processing Systems, pp. 1954–1963 (2017)
Zhu, J., Kaplan, R., Johnson, J., Fei-Fei, L.: HiDDeN: Hiding Data With Deep Networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 682–697. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_40
Shumeet, B.: Hiding images in plain sight: deep steganography. In: Proceedings of Advances in Neural Information Processing Systems, vol. 30, pp. 2069–2079 (2017)
ur Rehman, A., Rahim, R., Nadeem, S., ul Hussain, S.: End-to-End Trained CNN Encoder-Decoder Networks for Image Steganography. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11132, pp. 723–729. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11018-5_64
Alsaedi, N.H., Jaha, E.S.: Dynamic audio-visual biometric fusion for person recognition. Computers, Materials & Continua 71(1), 1283–1311 (2022)
Learned-Miller, E., Huang, G.B., Roychowdhury, A., Li, H., Hua, G.: Labeled faces in the wild: a survey. In: Advances in Face Detection and Facial Image Analysis, pp. 189–248, Springer (2016)
Garofolo, J.S.: Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium (1993)
Diederik, P. Kingma, Jimmy, Ba. Adam: A Method for Stochastic Optimization. In: The 3rd International Conference for Learning Reoresentations (2015)
Acknowledgement
The authors are indebted to anonymous reviewers for their helpful suggestions and valuable comments. The work is supported by the National Key Research and Development Program of China (No. 2019YFB1406504), the National Natural Science Foundation of China (No.U1836108, No.U1936216, No.62002197, No.62001038) and the Fundamental Research Funds for the Central Universities (No.2021RC30).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, R., Dong, H., Yang, Z., Ying, W., Liu, J. (2022). A CNN Based Visual Audio Steganography Model. In: Sun, X., Zhang, X., Xia, Z., Bertino, E. (eds) Artificial Intelligence and Security. ICAIS 2022. Lecture Notes in Computer Science, vol 13338. Springer, Cham. https://doi.org/10.1007/978-3-031-06794-5_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-06794-5_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06793-8
Online ISBN: 978-3-031-06794-5
eBook Packages: Computer ScienceComputer Science (R0)