Skip to main content
Log in

Frame-level steganalysis of QIM steganography in compressed speech based on multi-dimensional perspective of codeword correlations

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

In this paper, a frame-level steganalysis of Quantization Index Modulation (QIM) steganography in compressed speech streams is proposed for the first time. The proposed method builds a neural network classification framework based on multi-dimensional perspective of codeword correlations, which is inspired by cognitive biology. Four dimensions are employed: global-to-local, local-to-global, forward and backward. First, the codeword embedding method is utilized to map each codeword into a compact representation. Next, Bi-LSTM is used to consider the steganographic features in time sequence and reverse time sequence. Subsequently, a dual-thread attention mechanism is designed to extract local and global features at the same time. Finally, a channel attention mechanism is employed to increase the weight that contributes the most to the current task and the convolution and fully connected layers are used to generate the frame-level steganographic label. Experimental results show that the proposed method is effective and practical in frame-level detection tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Our codes and dataset can be found at https://zenodo.org/record/5457267.

References

  • Aoki N (2010) A semi-lossless steganography technique for G.711 telephony speech. In: Proceedings—2010 6th international conference on intelligent information hiding and multimedia signal processing, IIHMSP, pp 534–537, https://doi.org/10.1109/IIHMSP.2010.136

  • Benyassine A, Shlomot E, Su H, Massaloux D, Lamblin C, Petit J (1997) Itu-t recommendation g.729 annex b: a silence compression scheme for use with g.729 optimized for v.70 digital simultaneous voice and data applications. IEEE Commun Mag 35(9):64–73. https://doi.org/10.1109/35.620527

    Article  Google Scholar 

  • Chen B, Wornell G (2001) Quantization index modulation: a class of provably good methods for digital watermarking and information embedding. IEEE Trans Inf Theory 47(4):1423–1443. https://doi.org/10.1109/18.923725

    Article  MathSciNet  MATH  Google Scholar 

  • Chen B, Luo W, Li H (2017) Audio steganalysis with convolutional neural network. In: IH and MMSec 2017—proceedings of the 2017 ACM workshop on information hiding and multimedia security, ACM Press, New York, USA, pp 85–90, https://doi.org/10.1145/3082031.3083234, http://dl.acm.org/citation.cfm?doid=3082031.3083234

  • Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)

  • Gers F, Schmidhuber J, Cummins F (1999) Learning to forget: Continual prediction with LSTM. IEE conference publication, IEE, Vol. 2, pp. 850–855. https://doi.org/10.1049/cp:19991218

  • Gong C, Yi X, Zhao X, Ma Y (2019) Recurrent convolutional neural networks for AMR steganalysis based on pulse position. In: IH and MMSec 2019—proceedings of the ACM workshop on information hiding and multimedia security, association for computing machinery, Inc, New York, NY, USA, pp 2–13, https://dl.acm.org/doi/10.1145/3335203.3335708

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778, https://doi.org/10.1109/CVPR.2016.90

  • Holub V, Fridrich J, Denemark T (2014) Universal distortion function for steganography in an arbitrary domain. EURASIP J Inf Secur https://doi.org/10.1186/1687-417X-2014-1

  • Hu J, Shen L, Albanie S, Sun G, Wu E (2017) Squeeze-and-excitation networks. 1709.01507

  • Hu Y, Huang Y, Yang Z, Huang Y (2021) Detection of heterogeneous parallel steganography for low bit-rate VoIP speech streams. Neurocomputing 419:70–79. https://doi.org/10.1016/j.neucom.2020.08.002

    Article  Google Scholar 

  • Hua G, Huang J, Shi Y, Goh J, Thing V (2016) Twenty years of digital audio watermarking-a comprehensive review. Signal Process 128(C):222–242

    Article  Google Scholar 

  • Huang Y, Xiao B, Xiao H (2008) Implementation of covert communication based on steganography. In: Proceedings—2008 4th international conference on intelligent information hiding and multimedia signal processing, IIH-MSP 2008, pp 1512–1515, https://doi.org/10.1109/IIH-MSP.2008.174

  • Huang Y, Tang S, Zhang Y (2011) Detection of covert voice-over Internet protocol communications using sliding window-based steganalysis. IET Commun 5(7):929–936. https://doi.org/10.1049/iet-com.2010.0348

    Article  Google Scholar 

  • Huang Y, Liu C, Tang S, Bai S (2012) Steganography integration into a low-bit rate speech codec. IEEE Trans Inf Forensics Secur 7(6):1865–1875. https://doi.org/10.1109/TIFS.2012.2218599

    Article  Google Scholar 

  • Huang Y, Tao H, Xiao B, Chang C (2017) Steganography in low bit-rate speech streams based on quantization index modulation controlled by keys. Sci China Technol Sci 60(10):1585–1596. https://doi.org/10.1007/s11431-016-0707-3

    Article  Google Scholar 

  • Kazuhiro G (2009) global and local processing in vision: perspectives from comparative cognition. Shinrigaku Kenkyu Jpn J Psychol 80(4):352

    Article  Google Scholar 

  • Kim M, Kim J, Shin M (2020) Word embedding based knowledge representation with extracting relationship between scientific terminologies. Intell Autom Soft Comput 26(1):141–147

    Google Scholar 

  • Lin Z, Huang Y, Wang J (2018) RNN-SM: fast steganalysis of VoIP streams using recurrent neural network. IEEE Trans Inf Forensics Secur 13(7):1854–1868. https://doi.org/10.1109/TIFS.2018.2806741

    Article  Google Scholar 

  • Liu L, Li M, Li Q, Liang Y (2008) Perceptually transparent information hiding in G.729 bitstream. In: Proceedings—2008 4th international conference on intelligent information hiding and multimedia signal processing, IIH-MSP 2008, pp. 406–409, https://doi.org/10.1109/IIH-MSP.2008.297

  • Munoz R, David O, Ponomaryov V, Reyes R, Cruz C, Ponomaryov D (2019) Steganographic framework for hiding a color image into digital images. In: 2019 IEEE international scientific-practical conference: problems of infocommunications science and technology, PIC S and T 2019—proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 63–66, https://doi.org/10.1109/PICST47496.2019.9061223

  • Navon D (1977) Forest before trees: the precedence of global features in visual perception. Cogn Psychol 9(3):353–383. https://doi.org/10.1016/0010-0285(77)90012-3

    Article  Google Scholar 

  • Ren Y, Wu H, Wang L (2018) An AMR adaptive steganography algorithm based on minimizing distortion. Multimed Tools Appl 77(10):12095–12110. https://doi.org/10.1007/s11042-017-4860-1

    Article  Google Scholar 

  • Ren Y, Yang H, Wu H, Tu W, Wang L (2019) A secure AMR fixed codebook steganographic scheme based on pulse distribution model. IEEE Trans Inf Forensics Secur 14(10):2649–2661. https://doi.org/10.1109/TIFS.2019.2905760

    Article  Google Scholar 

  • Sadek M, Khalifa A, Mostafa M (2015) Video steganography: a comprehensive review. Multimed Tools Appl 74(17):7063–7094

    Article  Google Scholar 

  • Tian H, Liu J, Li S (2014) Improving security of quantization-index-modulation steganography in low bit-rate speech streams. Multimed Syst 20(2):143–154. https://doi.org/10.1007/s00530-013-0302-8

    Article  Google Scholar 

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) arXiv, Curran Associates, Inc., pp. 5998–6008, http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

  • Wu H, Yi B, Ding F, Feng G, Zhang X (2021) Linguistic steganalysis with graph neural networks. IEEE Signal Process Lett 28:558–562. https://doi.org/10.1109/LSP.2021.3062233

    Article  Google Scholar 

  • Xiao B, Huang Y, Tang S (2008) An approach to information hiding in low bit-rate speech stream. In: IEEE GLOBECOM 2008—2008 IEEE global telecommunications conference, pp. 1–5

  • Yang H, Yang Z, Bao Y, Huang Y (2019) Hierarchical representation network for steganalysis of qim steganography in low-bit-rate speech signals. In: International conference on information and communications security, Springer, pp. 783–798

  • Yang H, Yang Z, Bao Y, Liu S, Huang Y (2020) Fcem: a novel fast correlation extract model for real time steganalysis of voip stream via multi-head attention. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2822–2826, https://doi.org/10.1109/ICASSP40776.2020.9054361

  • Zhao H, Dai Q, Ren J, Wei W, Xiao Y, Li C (2018) Robust information hiding in low-resolution videos with quantization index modulation in DCT-CS domain. Multimed Tools Appl 77(14):18827–18847. https://doi.org/10.1007/s11042-017-5223-7

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Important Science and Technology Project of Hainan Province under Grant ZDKJ2020010, partly by the Hainan Provincial Natural Science Foundation of China under grant 618QN309, and partly by the IACAS Free Exploration Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Songbin Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, M., Li, S., Liu, P. et al. Frame-level steganalysis of QIM steganography in compressed speech based on multi-dimensional perspective of codeword correlations. J Ambient Intell Human Comput 14, 8421–8431 (2023). https://doi.org/10.1007/s12652-021-03608-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-03608-9

Keywords

Navigation