Discrete cosine transform-based data hiding for speech bandwidth extension

Koduri, Sunil Kumar; T, Kishore Kumar

doi:10.1007/s10772-022-09980-x

Discrete cosine transform-based data hiding for speech bandwidth extension

Published: 24 June 2022

Volume 25, pages 697–706, (2022)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

145 Accesses
Explore all metrics

Abstract

The limited narrow frequency range of 300–3400 Hz used in public switched telephone networks causes a significant reduction of speech quality. To address this drawback, a new robust transform-domain speech bandwidth extension method is proposed in this paper. The method uses the discrete Cosine transform-based data hiding (DCTBDH) technique to provide a better-quality wideband speech signal. The spectral envelope parameters are extracted from the high-frequency components of speech signal existing above narrowband, which are then spread by using spreading sequences, and are embedded within the DCT coefficients of narrowband signal. A better-quality wideband signal is reconstructed using the extracted embedded information at the receiver end. In simulations, the high-quality wideband speech was obtained from speech transmitted over a public switched telephone network. The spectral envelope parameters of the high-frequency components of the speech signal are transparently embedded with a mean square error of 5.78 × 10^–4. In a mean opinion score (MOS) listening test, we verified that the proposed method yields improved perceptual transparency compared to conventional methods of about 0.21 points on the MOS scale. The log spectral distortion value obtained was 2.2248 which showed that the proposed technique yields an improved quality of speech signal compared to conventional methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Multi-scale Subconvolutional U-Net with Time-Frequency Attention Mechanism for Single Channel Speech Enhancement

Article 28 May 2024

Review of wavelet denoising algorithms

Article 03 April 2023

Feature extraction algorithms to improve the speech emotion recognition rate

Article 14 January 2020

References

Abel, J., & Fingscheidt, T. (2017). A DNN Regression Approach to Speech Enhancement by Artificial Bandwidth Extension. In Proceedings of IEEE workshop on applications of signal processing to audio and acoustics, pp. 219–223.
Archit, G., Brendan, S., Yannis, A. & Thomas, C. W. (2019). Speech bandwidth extension with wavenet. In Proceedings of IEEE workshop on applications of signal processing to audio and acoustics, pp. 205–208.
Berthy, F., Zeyu, J., Jiaqi, S., & Adam, F. (2019). Learning bandwidth expansion using perceptually-motivated loss. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 606–610.
Bhatt, N., & Kosta, Y. (2015). A novel approach for artificial bandwidth extension of speech signals by LPC technique over proposed GSM FR NB coder using high band feature extraction and various extension of excitation methods. International Journal of Speech Technology, 18(1), 57–64.
Article Google Scholar
Bong-Ki, L., Kyoungjin, N., Joon-Hyuk, C., Kihyun, Ch., & Eunmi, O. (2018). Sequential deep neural networks ensemble for speech bandwidth extension. IEEE Access, 6, 27039–27047.
Article Google Scholar
Chen, S., & Leung, H. (2005). Artificial bandwidth extension of telephony speech by data hiding. In Proceedings of International Symposium on Circuits and Systems (ISCAS), pp. 3151–3154.
Chen, S., Leung, H., & Ding, H. (2007). Telephony speech enhancement by data hiding. IEEE Transactions on Instrumentation and Measurement, 56(1), 63–74.
Article Google Scholar
Chen, S., & Leung, H. (2007). Speech bandwidth extension by data hiding and phonetic classification. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 593–596.
Chen, Z., Zhao, C., Geng, G., & Yin, F. (2013). An audio watermark based speech bandwidth extension method. EURASIP Journal Audio, Speech and Music Processing, 10, 1–8.
Google Scholar
Dinan, E. H., & Jabbari, E. H. (1998). Spreading codes for direct sequence CDMA and wideband CDMA cellular networks. IEEE Communications Magazine, 36(9), 48–54.
Article Google Scholar
ETSI ES 201 108 V1.1.2 (2000). Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Front-end feature extraction algorithm; Compression algorithms.
Garofalo, J. S., Lamel, L. F., & Fisher, W. M. (2013). Getting started with the DARPA TIMIT CD-ROM: An acoustic phonetic continuous speech database, National Institute of Standards and Technology (NIST).
Geiser, B., Jax, P., & Vary, P. (2005). Artificial bandwidth extension of speech supported by watermark-transmitted side information. In Proceedings of the 9th European Conference on Speech Communication and Technology, pp. 1497–1500.
Geiser, B., & Vary, P. (2007). Backwards compatible wideband telephony in mobile networks: CELP watermarking and bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 533–536.
Geiser, B., & Vary, P. (2013). Speech bandwidth extension based on in-band transmission of higher frequencies. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 7507–7511.
Goldsmith, A. (2006). Wireless communications. Cambridge University Press.
Book Google Scholar
Hanzo, L. L., Somerville, F. C. A., & Woodard, J. P. (2001). Voice compression and communications: Principles and applications for fixed and wireless channels. Wiley.
Book Google Scholar
Hassan, A., Hershey, J. E., & Saulnier, G. J. (1998). Perspectives in spread spectrum. Kluwer Academic Publishers.
Book Google Scholar
ITU-T. (2001). ITU-T Rec. P.862: Perceptual evaluation of speech quality (PESQ): An objective method for end to-end speech quality assessment of narrow-band telephone networks and speech codecs.
ITU-T. (2005). Recommendation P.862.2: Wideband extension to recommendation P.862 for the assessment of wideband telephone networks and speech codecs.
Jax, P. (2002). Enhancement of bandlimited speech signals: Algorithms and theoretical bounds. Ph.D. dissertation, RWTH Aachen University, Aachen, Germany.
Jax, P., & Vary, P. (2002). An upper bound on the quality of artificial bandwidth extension of narrowband speech signals. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 237–240.
Jax, P., & Vary, P. (2003). On artificial bandwidth extension of telephone speech. Signal Processing, 83(8), 1707–1719.
Article Google Scholar
Jax, P., & Vary, P. (2006). Bandwidth extension of speech signals: A catalyst for the introduction of wideband speech coding? IEEE Communication Magazine, 44(5), 106–111.
Article Google Scholar
Johannes, A., & Tim, F. (2019). Sinusoidal-based lowband synthesis for artificial speech bandwidth extension. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(4), 765–776.
Article Google Scholar
Jonas, S., Friedrich, F., Markus, B., & Gerhard, S. (2019). Artificial bandwidth extension using a conditional generative adversarial network with discriminative training. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP) (pp. 7005–7009).
Kanhe, A., & Aghila, G. (2016). DCT based Audio Steganography in Voiced and Un-voiced Frames. In Proceedings of International Conference of Information and Analytics, pp. 1–4.
Keiser, B. E., & Strange, E. (1995). Digital telephony and network integration. Van Nostrand Reinhold.
Book Google Scholar
Kosta, Y. (2016). Simulation and overall comparative evaluation of performance between different techniques for high band feature extraction based on artificial bandwidth extension of speech over proposed global system for mobile full rate narrow band coder. International Journal of Speech Technology, 19(4), 881–893.
Article Google Scholar
Kyoungjin, N., & Joon-Hyuk, Ch. (2020). Deep neural network ensemble for reducing artificial noise in bandwidth extension. Digital Signal Processing, 102, 1–6.
Google Scholar
Mathieu, L., & Felix, G. (2020). Bandwidth extension of musical audio signals with no side information using dilated convolutional neural networks. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 801–805.
Nilsson, M., & Kleijn. W. B. (2001). Avoiding overestimation in bandwidth extension of telephony speech. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 869–872.
Pramod, B., Massimiliano, T., & Nicholas, E. (2019). Latent representation learning for artificial bandwidth extension using a conditional variational auto-encoder. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 7010–7014.
Prasad, N., & Kishore Kumar, T. (2016). Bandwidth extension of speech signals: A comprehensive review. International Journal of Intelligent Systems and Applications, 8(2), 45–52.
Article Google Scholar
Prasad, N., & Kishore Kumar, T. (2017). Speech bandwidth extension aided by spectral magnitude data hiding. Circuits, Systems, and Signal Processing, 36(11), 4512–4540.
Article Google Scholar
Sagi, A., & Malah, D. (2007). Bandwidth extension of telephone speech aided by data embedding. EURASIP Journal on Advances in Signal Processing, 2007, 37–52.
MATH Google Scholar
Sunil Kumar, K., & Kishore Kumar, T. (2019). Speech Bandwidth Extension Aided by Hybrid Model Transform Domain Data Hiding. In Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–5.
Xiang, H., Chenglin, X., Nana, H., Lei, X., EngSiong, Ch., & Haizhou, L. (2020). Time-domain neural network approach for speech bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 866–870.
Yingwue, W., Shenghui, Z., & Dan, Q., (2016). Using conditional restricted Boltzmann machines for spectral envelope modelling in speech bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 5930–5934.
Yuanjie, D., Yaxing, L., Xiaoqi, L., Shan, X., Dan, W., Zhihui, Z., & Shengwu, X. (2020). A time-frequency network with channel attention and non-local modules for artificial bandwidth extension. In Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp. 6954–6958.
Zhen-Hua, L., Yang, A., & Yu, G. (2018). Waveform modelling and generation using hierarchical recurrent neural networks for speech bandwidth extension. IEEE/ACM Transaction Audio, Speech, and Language Process, 26(5), 883–894.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, National Institute of Technology Warangal, Warangal, India
Sunil Kumar Koduri & Kishore Kumar T

Authors

Sunil Kumar Koduri
View author publications
You can also search for this author in PubMed Google Scholar
Kishore Kumar T
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sunil Kumar Koduri.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Koduri, S., T, K. Discrete cosine transform-based data hiding for speech bandwidth extension. Int J Speech Technol 25, 697–706 (2022). https://doi.org/10.1007/s10772-022-09980-x

Download citation

Received: 11 March 2021
Accepted: 28 May 2022
Published: 24 June 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10772-022-09980-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discrete cosine transform-based data hiding for speech bandwidth extension

Abstract

Access this article

Similar content being viewed by others

A Multi-scale Subconvolutional U-Net with Time-Frequency Attention Mechanism for Single Channel Speech Enhancement

Review of wavelet denoising algorithms

Feature extraction algorithms to improve the speech emotion recognition rate

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discrete cosine transform-based data hiding for speech bandwidth extension

Abstract

Access this article

Similar content being viewed by others

A Multi-scale Subconvolutional U-Net with Time-Frequency Attention Mechanism for Single Channel Speech Enhancement

Review of wavelet denoising algorithms

Feature extraction algorithms to improve the speech emotion recognition rate

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation