Abstract
In this paper, a frame-level steganalysis of Quantization Index Modulation (QIM) steganography in compressed speech streams is proposed for the first time. The proposed method builds a neural network classification framework based on multi-dimensional perspective of codeword correlations, which is inspired by cognitive biology. Four dimensions are employed: global-to-local, local-to-global, forward and backward. First, the codeword embedding method is utilized to map each codeword into a compact representation. Next, Bi-LSTM is used to consider the steganographic features in time sequence and reverse time sequence. Subsequently, a dual-thread attention mechanism is designed to extract local and global features at the same time. Finally, a channel attention mechanism is employed to increase the weight that contributes the most to the current task and the convolution and fully connected layers are used to generate the frame-level steganographic label. Experimental results show that the proposed method is effective and practical in frame-level detection tasks.
Similar content being viewed by others
Notes
Our codes and dataset can be found at https://zenodo.org/record/5457267.
References
Aoki N (2010) A semi-lossless steganography technique for G.711 telephony speech. In: Proceedings—2010 6th international conference on intelligent information hiding and multimedia signal processing, IIHMSP, pp 534–537, https://doi.org/10.1109/IIHMSP.2010.136
Benyassine A, Shlomot E, Su H, Massaloux D, Lamblin C, Petit J (1997) Itu-t recommendation g.729 annex b: a silence compression scheme for use with g.729 optimized for v.70 digital simultaneous voice and data applications. IEEE Commun Mag 35(9):64–73. https://doi.org/10.1109/35.620527
Chen B, Wornell G (2001) Quantization index modulation: a class of provably good methods for digital watermarking and information embedding. IEEE Trans Inf Theory 47(4):1423–1443. https://doi.org/10.1109/18.923725
Chen B, Luo W, Li H (2017) Audio steganalysis with convolutional neural network. In: IH and MMSec 2017—proceedings of the 2017 ACM workshop on information hiding and multimedia security, ACM Press, New York, USA, pp 85–90, https://doi.org/10.1145/3082031.3083234, http://dl.acm.org/citation.cfm?doid=3082031.3083234
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)
Gers F, Schmidhuber J, Cummins F (1999) Learning to forget: Continual prediction with LSTM. IEE conference publication, IEE, Vol. 2, pp. 850–855. https://doi.org/10.1049/cp:19991218
Gong C, Yi X, Zhao X, Ma Y (2019) Recurrent convolutional neural networks for AMR steganalysis based on pulse position. In: IH and MMSec 2019—proceedings of the ACM workshop on information hiding and multimedia security, association for computing machinery, Inc, New York, NY, USA, pp 2–13, https://dl.acm.org/doi/10.1145/3335203.3335708
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778, https://doi.org/10.1109/CVPR.2016.90
Holub V, Fridrich J, Denemark T (2014) Universal distortion function for steganography in an arbitrary domain. EURASIP J Inf Secur https://doi.org/10.1186/1687-417X-2014-1
Hu J, Shen L, Albanie S, Sun G, Wu E (2017) Squeeze-and-excitation networks. 1709.01507
Hu Y, Huang Y, Yang Z, Huang Y (2021) Detection of heterogeneous parallel steganography for low bit-rate VoIP speech streams. Neurocomputing 419:70–79. https://doi.org/10.1016/j.neucom.2020.08.002
Hua G, Huang J, Shi Y, Goh J, Thing V (2016) Twenty years of digital audio watermarking-a comprehensive review. Signal Process 128(C):222–242
Huang Y, Xiao B, Xiao H (2008) Implementation of covert communication based on steganography. In: Proceedings—2008 4th international conference on intelligent information hiding and multimedia signal processing, IIH-MSP 2008, pp 1512–1515, https://doi.org/10.1109/IIH-MSP.2008.174
Huang Y, Tang S, Zhang Y (2011) Detection of covert voice-over Internet protocol communications using sliding window-based steganalysis. IET Commun 5(7):929–936. https://doi.org/10.1049/iet-com.2010.0348
Huang Y, Liu C, Tang S, Bai S (2012) Steganography integration into a low-bit rate speech codec. IEEE Trans Inf Forensics Secur 7(6):1865–1875. https://doi.org/10.1109/TIFS.2012.2218599
Huang Y, Tao H, Xiao B, Chang C (2017) Steganography in low bit-rate speech streams based on quantization index modulation controlled by keys. Sci China Technol Sci 60(10):1585–1596. https://doi.org/10.1007/s11431-016-0707-3
Kazuhiro G (2009) global and local processing in vision: perspectives from comparative cognition. Shinrigaku Kenkyu Jpn J Psychol 80(4):352
Kim M, Kim J, Shin M (2020) Word embedding based knowledge representation with extracting relationship between scientific terminologies. Intell Autom Soft Comput 26(1):141–147
Lin Z, Huang Y, Wang J (2018) RNN-SM: fast steganalysis of VoIP streams using recurrent neural network. IEEE Trans Inf Forensics Secur 13(7):1854–1868. https://doi.org/10.1109/TIFS.2018.2806741
Liu L, Li M, Li Q, Liang Y (2008) Perceptually transparent information hiding in G.729 bitstream. In: Proceedings—2008 4th international conference on intelligent information hiding and multimedia signal processing, IIH-MSP 2008, pp. 406–409, https://doi.org/10.1109/IIH-MSP.2008.297
Munoz R, David O, Ponomaryov V, Reyes R, Cruz C, Ponomaryov D (2019) Steganographic framework for hiding a color image into digital images. In: 2019 IEEE international scientific-practical conference: problems of infocommunications science and technology, PIC S and T 2019—proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 63–66, https://doi.org/10.1109/PICST47496.2019.9061223
Navon D (1977) Forest before trees: the precedence of global features in visual perception. Cogn Psychol 9(3):353–383. https://doi.org/10.1016/0010-0285(77)90012-3
Ren Y, Wu H, Wang L (2018) An AMR adaptive steganography algorithm based on minimizing distortion. Multimed Tools Appl 77(10):12095–12110. https://doi.org/10.1007/s11042-017-4860-1
Ren Y, Yang H, Wu H, Tu W, Wang L (2019) A secure AMR fixed codebook steganographic scheme based on pulse distribution model. IEEE Trans Inf Forensics Secur 14(10):2649–2661. https://doi.org/10.1109/TIFS.2019.2905760
Sadek M, Khalifa A, Mostafa M (2015) Video steganography: a comprehensive review. Multimed Tools Appl 74(17):7063–7094
Tian H, Liu J, Li S (2014) Improving security of quantization-index-modulation steganography in low bit-rate speech streams. Multimed Syst 20(2):143–154. https://doi.org/10.1007/s00530-013-0302-8
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) arXiv, Curran Associates, Inc., pp. 5998–6008, http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Wu H, Yi B, Ding F, Feng G, Zhang X (2021) Linguistic steganalysis with graph neural networks. IEEE Signal Process Lett 28:558–562. https://doi.org/10.1109/LSP.2021.3062233
Xiao B, Huang Y, Tang S (2008) An approach to information hiding in low bit-rate speech stream. In: IEEE GLOBECOM 2008—2008 IEEE global telecommunications conference, pp. 1–5
Yang H, Yang Z, Bao Y, Huang Y (2019) Hierarchical representation network for steganalysis of qim steganography in low-bit-rate speech signals. In: International conference on information and communications security, Springer, pp. 783–798
Yang H, Yang Z, Bao Y, Liu S, Huang Y (2020) Fcem: a novel fast correlation extract model for real time steganalysis of voip stream via multi-head attention. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2822–2826, https://doi.org/10.1109/ICASSP40776.2020.9054361
Zhao H, Dai Q, Ren J, Wei W, Xiao Y, Li C (2018) Robust information hiding in low-resolution videos with quantization index modulation in DCT-CS domain. Multimed Tools Appl 77(14):18827–18847. https://doi.org/10.1007/s11042-017-5223-7
Acknowledgements
This work was supported in part by the Important Science and Technology Project of Hainan Province under Grant ZDKJ2020010, partly by the Hainan Provincial Natural Science Foundation of China under grant 618QN309, and partly by the IACAS Free Exploration Project.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wei, M., Li, S., Liu, P. et al. Frame-level steganalysis of QIM steganography in compressed speech based on multi-dimensional perspective of codeword correlations. J Ambient Intell Human Comput 14, 8421–8431 (2023). https://doi.org/10.1007/s12652-021-03608-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03608-9