A new scheme for covert communication via 3G encoded speech

doi:10.1016/j.compeleceng.2012.05.003

Computers & Electrical Engineering

Volume 38, Issue 6, November 2012, Pages 1490-1501

https://doi.org/10.1016/j.compeleceng.2012.05.003 Get rights and content

Abstract

Mobile communication through 3G network has grown rapidly in recent years. It might be of interest to transmit secret messages over 3G voice channels. In this paper, we introduce a new covert communication scheme via Adaptive Multi-Rate Wideband (AMR-WB) encoded speech. An adaptive suboptimal pulse combination constrained (ASOPCC) method is presented to embed data on compressed speech signal of AMR-WB codec. The method takes advantage of the “redundancy”, created by non-exhaustive search of algebraic codebook, to encode secret information. An embedding factor η is used to control embedding bits. By properly setting η, ASOPCC can offer a better trade-off between speech quality and embedding capacity in the process of coding mode switching. Experimental results show that the proposed method is quite promising for both high capacity and good imperceptivity. Although ASOPCC is only applied to AMR-WB codec in this article, it can be further used by any other speech coding based on Algebraic Coded Exited Linear Prediction (ACELP).

Graphical abstract

Secret messages can be hidden into fixed codebook parameters by using ASOPCC-based steganographic search procedure. The procedure aims to search a suboptimal codevector which matches a constraint condition. The selected suboptimal codevector carries the secret messages. So it is also called stego-codevector.

Highlights

► A novel steganography method on 3G encoded speech is proposed. ► Secret messages are encoded by searching suboptimal pulse combination. ► A well balance between speech quality and embedding capacity is got by adjusting embedding factor η. ► Embedding rate (1600–3200 kbps) is superior to related schemes. ► The method generates less abnormal statistics and is difficult to detect.

Introduction

Steganography is a technique of covert communication. It conveys secret messages hidden in digital media in such a way that the existence of the messages is concealed [1], [2]. Many steganographic methods have been proposed by using text, image, audio and video as cover medium. In recent years, with the significant development of 3G mobile technology (e.g., CDMA2000, WCDMA and TD-SCDMA), wireless communication through 3G network has become a part of our daily life. Given the advantages of mobility and instantaneity, covert communication through 3G voice channel might be interesting. 3G voice channel usually adopts one of the speech coding standards: AMR-WB, EVRC-B and VMR-WB. In this paper, we just focus on AMR-WB, which is widely used in WCDMA and TD-SCDMA system.

Steganography and watermarking are two branches of data hiding. They both describe methods to embed data transparently into a carrier signal. Watermarking is mainly used for copyright protection, copy protection and content authentication. Therefore robustness against attacks is a crucial issue. Steganography aims to establish a covert information channel in end-to-end connections. It pays more attention to hide the fact of transmitting secret message from third people. In this case, embedding capacity, transparency (in terms of perceptual quality) and statistical undetection become important factors.

Lots of steganographic methods on speech signals have been developed so far. Conventional schemes are mostly performed on signal domain or transformed domain. They can be classified into six approaches: least significant bit (LSB) [3], phase coding (PE) [4], spread spectrum (SS) [5], cepstrum domain (CD) [6], echo data hiding (EH) [7] and tone insertion (TI) [8], [9]. However, these approaches are no longer usable for 3G encoded speech, because that speech coding and decoding process will disturb messages embedded in stego signals. In order to convey information through 3G voice channel, steganographic data should be embedded into compressed or encoded signals. The manipulation is then performed either during coding process or after that. The former usually is combined with coding algorithms and changes the codec, by which, some coding parameters are modified to carry secret information. In contrast to that, the latter works on content of the compressed bitstream directly, for example, overwriting least significant bits. Naturally, the two methods would be appropriate to transmission systems using audio compression.

In this paper, based on the former idea, a new steganographic scheme on 3G encoded speech is proposed for covert communication. The scheme relies on an adaptive suboptimal-pulse-combination constrained (ASOPCC) method, searching a suboptimal codevector, whose pulse combination meets certain constraint. ASOPCC is integrated with AMR-WB codec, causing algebraic codebook parameters of encoded signals to be altered, creating a steganographic effect and achieving the purpose of secret communication. Then, considering that AMR-WB has nine encoding modes, we introduce an embedding factor η to control embedding rate in different modes. In fact, the prime criteria for steganography are perfect transparency and high embedding capacity. Transparency is a measure of distortion due to message embedding, in terms of speech quality here. As is well known, there is a trade-off between the two criteria, for that the pursuit of high capacity is bound to decrease transparency. In our research, we attempt to make a worthy trade-off to enhance the balance between speech quality and embedding rate. Several experiments are done to investigate how speech quality is sensitive to embedding strength, whereby different values are assigned to η in different modes. Finally, it turns out that, by properly setting η, ASOPCC not only leads to high embedding rate, but also gains quite good speech quality, as well as lowers the possibility of detection. We give some recommended values for η in different coding modes in the present paper. Meanwhile, our method also allows users to determine η to customize embedding rate and speech quality according to their own needs. That is to say, the proposed method is both adaptively self-adjustable and user-customizable. The proposed covert communication system is illustrated in Fig. 1.

The remainder of this paper is organized as follows. Section 2 introduces an overview of the background, including related works in steganography for mobile voice channel and AMR-WB coding standard. In Section 3, we focuses on description of the ASOPCC method, followed by Section 4, which gives the experiments, test results and analysis. Finally, conclusions are presented in Section 5.

Section snippets

Related works in steganography for mobile voice channel

Not a significant amount of steganography method is available on mobile voice channel. This can be attributed to the existence of compression encoder in modern mobile communication systems, e.g. GSM and 3G, and most traditional steganography methods are not robust to speech coding. Moreover, owning to the fact that 3G is an emerging technology in many countries, steganography for 3G voice channel has not been developed so far. Related works in GSM system are briefly referred as follows [10],

A steganographic scheme on AMR-WB encoded speech

In our proposal, the steganographic scheme is combined with AMR-WB coding process. More specifically, using ASOPCC-based method to search algebraic codebook so that algebraic codebook parameters are modified to embed data.

For explaining the scheme more clearly, this section first introduces the algebraic codebook structure (refer Section 3.1) and standard codebook search (refer Section 3.2), then describes the ASOPCC-based codebook search procedure (refer Section 3.3).

Experimental results

To demonstrate the performance of the proposed method, several experimental results are given in this section. Operation complexity evaluation, imperceptibility test, and anti-detection test are performed. The experiments were realized by using the 3GPP/ITU AMR-WB Speech Coder Fixed-point C simulation [22]. The audio databases we use include the Digital Test Sequences [23] supplied by 3GPP and the CMU Audio Databases [24]. Before embedding, we first check test speech data to make sure all of

Conclusions

In this paper, we introduced a novel ASOPCC based method for covert communication via 3G encoded speech. The method searches an alternative suboptimal codevector under certain constraints. So hidden data are embedded into algebraic codebook parameters. Different from those methods that directly modify encoded bitstream (e.g., overwriting the bits of less important parameters), ASOPCC generates less abnormal statistics. In addition, by using an embedding factor η, the method allows adjusting

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 60903217), the Natural Science Foundation of Jiangsu Province of China (No. BK2010255) and the Scientific and Technical Plan of Suzhou (No. SYG201010).

The authors would like to thank the anonymous reviewers for their helpful comments and suggestions.

Haibo Miao is currently a Ph.D. student in School of Computer Science and Technology at University of Science and Technology of China (USTC). His research interests mainly include information security, information hiding and covert channel.

References (35)

F.A.P. Petitcolas et al.
Information hiding – a survey
Proc IEEE Spec Issue Prot Multimedia Content
(1999)
H. Wang et al.
Cyber warfare, steganography vs. steganalysis
Commun ACM
(2004)
Moskowitz Scott A, Cooperman M. Steganographic method and device. USA Patent;...
W. Bender et al.
Techniques for data hiding
IBM Syst J
(1996)
Lee C, Moallemi K, Warren R. Method and apparatus for transporting auxiliary data in audio signals. USA Patent;...
Sang-Kwang Lee et al.
Digital audio watermarking in the cepstrum domain
IEEE Trans Consum Electron
(2000)
D. Gruhl et al.
Echo hiding. Information hiding: 1st international workshop
Proc Lect Notes Comput Sci
(1996)
Gopalan K, Wenndt S, Noga A, Haddad D, Adams S. Covert Speech communication via cover speech by tone insertion. In:...
Gopalan K, Wenndt S. Audio Steganography for covert data transmission by imperceptible tone insertion. In: Proceedings...
Licai Hu, Shuozhong Wang. Information hiding based on GSM full rate speech coding. In: Proceedings of WiCOM, Wuhan;...

Christabel Koh Jun-Li, Emmanuel Sabu, Kankanhalli Mohan S. Quality-aware GSM speech watermarking. In: Proceedings of...

Shahbazi A, Soltanmohammadi E, Rezaei AH, Sayadiyan A, Mosayyebpour S. Content dependent data hiding on GSM full rate...

ZM. Lu et al.

Watermarking combined with CELP speech coding for authentication

IEICE Trans Inform Syst

(2005)

Geiser B, Vary P. Backwards compatible wideband telephony in mobile networks: CELP watermarking and bandwidth...

Geiser B, Vary P. High rate data hiding in ACELP speech codec. In: Proceedings of ICASSP, Caesars Palace, Las Vegas,...

Nishimura A. Data hiding in pitch delay data of the adaptive multi-rate narrow-band speech codec. In: Proceedings of...

3GPP TS 26.190 V6.1.1. Speech codec speech processing functions. Adaptive multi-rate-wideband (AMR-WB) speech codec....

Cited by (49)

Steganalysis of adaptive multi-rate speech with unknown embedding rates using clustering and ensemble learning
2023, Computers and Electrical Engineering
With the widespread use of adaptive multi-rate (AMR) speech-based applications, AMR speech-based steganography has witnessed significant growth. Consequently, steganalysis approaches have garnered attention to mitigate network security risks associated with AMR speech-based steganography. However, existing studies often assume known embedding rates of test samples, leaving steganography detection under unknown embedding rates—an encountered practical scenario—unresolved. To tackle this challenge, this paper presents a novel detection scheme for AMR speech-based steganography, skillfully combining clustering and ensemble learning. The training phase utilizes K-means clustering to pre-classify speech samples, grouping them into distinct clusters based on their feature distribution and embedding rates. Subsequently, a classifier based on extreme gradient boosting (XGBoost) is trained for each cluster. The experimental results demonstrate that the proposed scheme exhibits significant improvements in terms of recall rate when compared to existing steganalysis techniques.
MFPD-LSTM: A steganalysis method based on multiple features of pitch delay using RNN-LSTM
2023, Journal of Information Security and Applications
In recent years, adaptive multi-rate (AMR) steganalysis based on pitch delay has become a hot research area for researchers. The main reason is that the steganalysis algorithm based on AMR can sensitively detect the change in the statistical characteristics of the pitch delay before and after the secret information is embedded, and achieve high accuracy. At present, many excellent AMR steganalysis algorithms based on pitch delay have been proposed. However, when these algorithms are aimed at short-term, low-embedding-rate speech samples, there is room for improvement in accuracy and performance. In this paper, a Recurrent Neural Network-Long-Short Term Memory (RNN-LSTM) is used to design a steganalysis method through multi-feature fusion based on pitch delay (MF-PD). This method extracts the pitch delay sequence in the speech information stream, mines four features that characterize the statistical feature of the intra-frame and the inter-frame, and establishes the feature matrix. Then construct the RNN-LSTM model, the dropout layer is introduced to avoid over-fitting and the model is pruned to improve the efficiency. Experimental results show that when the speech sample length is 0.1s, this method can achieve a detection accuracy of more than 87%, which is significantly higher than other steganalysis algorithms. For samples with low embedding rate, the method proposed in this paper can also achieve better performance, which satisfies the need for short time and low embedding rate sample detection.
Monitoring the security of audio biomedical signals communications in wearable IoT healthcare
2023, Digital Communications and Networks
The COVID-19 pandemic has imposed new challenges on the healthcare industry as hospital staff are exposed to a massive coronavirus load when registering new patients, taking temperatures, and providing care. The Ebola epidemic of 2014 is another example of a pandemic which a hospital in New York decided to use an audio-based communication system to protect nurses. This idea quickly turned into an Internet of Things (IoT) healthcare solution to help to communicate with patients remotely. However, it has grabbed the attention of criminals who use this medium as a cover for secret communication. The merging of signal processing and machine-learning techniques has led to the development of steganalyzers with very higher efficiencies, but since the statistical properties of normal audio files differ from those of purely speech audio files, the current steganalysis practices are not efficient enough for this type of content. This research considers the Percent of Equal Adjacent Samples (PEAS) feature for speech steganalysis. This feature efficiently discriminates the least significant bit stego speech samples from clean ones with a single analysis dimension. A sensitivity of 99.82% was achieved for the steganalysis of 50% embedded stego instances using a classifier based on the Gaussian membership function.
Steganalysis of adaptive multi-rate speech streams with distributed representations of codewords
2022, Journal of Information Security and Applications
Citation Excerpt :
Due to the widespread deployment, AMR has become an important research branch in VoIP-based steganography and steganalysis. Based on Algebraic Code Excited Linear Prediction (ACELP), AMR speech streams possess three embedding domains for information hiding, including Linear Predictive Coefficient (LPC) [11–14], Adaptive Codebook (ACB) [15–19], and Fixed Codebook (FCB) [20–23]. Based on Analysis-by-Synthesis (ABS) method, linear predictive coding [24] is one of the critical components of the AMR speech codec and related steganography methods are among the focuses.
With the wide application of adaptive multi-rate (AMR) speech coder, steganography and steganalysis based on AMR coded speech streams have become a hot spot in the field of information hiding. Quantization Index Modulation (QIM)-based steganography is one of the most effective approaches to hide secret information into AMR coded speech streams with excellent imperceptibility and robustness. So far, accurate detection for QIM-based steganography in short-length or low-embedding-rate speech streams remains an open problem, though some approaches can complete detection in a short time. To address this challenge, we first analyze and verify the characteristics of QIM-based steganography, and present a novel and high time-efficient steganalysis model based on distributed representations. Specifically, a codeword embedding layer is introduced to capture distributed representations with a denser space; then we introduce a bidirectional Long Short-Term Memory (LSTM) layer and propose a gated attention mechanism to provide contextual distribution features with better generalization capabilities; finally, a Multi-Layer Perceptron (MLP) classifier is designed to distinguish normal or steganographic objects. The experimental results demonstrate that the proposed model can effectively detect QIM-based steganography in AMR speech streams and outperform the state-of-the-art ones.
Speech Steganography Based on Dynamic Search of Fractional Pitch Delay
2023, Tien Tzu Hsueh Pao/Acta Electronica Sinica
Separable Convolution Network With Dual-Stream Pyramid Enhanced Strategy for Speech Steganalysis
2023, IEEE Transactions on Information Forensics and Security

View all citing articles on Scopus

Liusheng Huang received his M.S. degree in computer science from University of Science and Technology of China (USTC) in 1988. He is currently a professor and Ph.D. supervisor of School of Computer Science and Technology at USTC. His research interests are in the areas of wireless sensor networks, information security, distributed computing and high performance algorithms.

Zhili Chen received his Ph.D. degree in computer science from University of Science and Technology of China (USTC) in 2009. He is currently a postdoctoral research fellow of School of Computer Science and Technology at USTC. His research interests include information hiding, linguistic steganography and authorship analysis.

Wei Yang received his Ph.D. degree in computer science from University of Science and Technology of China (USTC) in 2007. He is currently a postdoctoral research fellow of School of Computer Science and Technology at USTC. His research interests include information theory, quantum information and cryptology.

Ammar Al-hawbani is currently a Ph.D. student in School of Computer Science and Technology at University of Science and Technology of China (USTC). His research interests mainly include information security and coverage and connectivity of wireless sensor network.

^☆: Reviews processed and approved for publication by Editor-in-Chief Dr. Manu Malek.

View full text

A new scheme for covert communication via 3G encoded speech☆

Abstract

Graphical abstract

Highlights

Introduction

Section snippets

Related works in steganography for mobile voice channel

A steganographic scheme on AMR-WB encoded speech

Experimental results

Conclusions

Acknowledgements

Information hiding – a survey

Proc IEEE Spec Issue Prot Multimedia Content

Cyber warfare, steganography vs. steganalysis

Commun ACM

Techniques for data hiding

IBM Syst J

Digital audio watermarking in the cepstrum domain

IEEE Trans Consum Electron

Echo hiding. Information hiding: 1st international workshop

Proc Lect Notes Comput Sci

Watermarking combined with CELP speech coding for authentication

IEICE Trans Inform Syst