RNN-based signal classification for hybrid audio data compression

Tu, Weiping; Yang, Yuhong; Du, Bo; Yang, Wanzhao; Zhang, Xiong; Zheng, Jiaxi

doi:10.1007/s00607-019-00713-8

RNN-based signal classification for hybrid audio data compression

Published: 26 March 2019

Volume 102, pages 813–827, (2020)
Cite this article

Computing Aims and scope Submit manuscript

Weiping Tu ORCID: orcid.org/0000-0002-6933-3298¹,
Yuhong Yang¹,
Bo Du¹,
Wanzhao Yang¹,
Xiong Zhang¹ &
…
Jiaxi Zheng¹

555 Accesses
4 Citations
Explore all metrics

Abstract

Audio data are a fundamental component of multimedia big data. Switched audio codec has been proved to be efficient for compressing a large range of audio signals at low bit rates. However, coding quality strongly relies on an exact classification of the input signals. Two coding mode selection methods are adopted in AMR-WB+, the state-of-the-art switched audio coder. The closed-loop method obtains good quality, but it has a high computation complexity. Conversely, the open-loop method reduces complexity but has unsatisfactory coding quality. Therefore, in this study, a speech/music discrimination based on a recurrent neural network (RNN) model is investigated to improve the coding performance of AMR-WB+. An RNN model is chosen for its outstanding performance on processing time series. The recurrent structure of RNN makes it capable of learning and making full use of the temporal information of the input sequences to make up for the deficiencies of the short-term features. We quantitatively analyze the quality loss caused by two types of misclassification and the tune parameter of the classifier to improve the signal-to-noise ratio (SNR) of the synthesized signals. The experimental results show that the proposed method increases the accuracy of the mode selection with a rate of 18% and the coding quality of 0.21 dB in segmental SNR in comparison with the open-loop method. Moreover, it reduces the computational complexity by about 43% in comparison with the closed-loop method in AMR-WB+.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An RNN-Based Speech-Music Discrimination Used for Hybrid Audio Coder

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music Using Discrete Wavelet Transform

Article 19 March 2024

Sukanta Kumar Dash, S. S. Solanki & Soubhik Chakraborty

Speech coding techniques and challenges: a comprehensive literature survey

Article 14 September 2023

Nagaraja B G, Mohamed Anees & Thimmaraja Yadava G

References

GPP (2005) Recommendation ETSI TS 126 290. Extended adaptive multi-rate-wideband (AMR-WB+) codec
GPP (2014) Recommendation TS 26.441. Codec for enhanced voice services (EVS)
MPEG (2011) Recommendation ISO/IEC 23003-3, information technology–MPEG audio technologies—part 3: unified speech and audio coding
Jérémie L, Roch L, Guy R (2007) An improved low complexity AMR-WB+ encoder using neural networks for mode selection. In: 123rd convention of audio engineering society
Jong-Kyu K, Nam-Soo K (2008) Improved frame mode selection for AMR-WB+ based on decision tree. IEICE Trans INF Syst E 91-D(6):1830–1833
Google Scholar
Mu-Liang W, Mn-Ta L (2010) A neural network-based coding mode selection scheme of hybrid audio coder. In: IEEE international conference on wireless communications, pp 107–110
Alessandro B, Alessandra F, Pierangelo M (2002) Audio classification in speech and music: a comparison between a statistical and a neural approach. Eur J Appl Sign Process 4:372–378
MATH Google Scholar
Zhonghua F, JhingFa W, Lei X (2009) Noise robust features for speech/music discrimination in real-time telecommunication. In: IEEE international conference on multimedia and expo, pp 574–577
Costas P, George T (2005) A speech/music discriminator based on RMS and zero-crossing. IEEE Trans Multimed 7(1):155–166
Article Google Scholar
Michael JC, Eluned SP, Harvey L (1999) A comparison of features for speech, music discrimination. In: IEEE international conference in acoustic, speech, and signal processing, pp 149–152
Jun W, Qiong W, Haojiang D, Qin Y (2008) Real-time speech/music classification with a hierarchical oblique decision tree. In: International conference on acoustic, speech, and signal processing, pp 2033–2036
Lie L, Stan ZL, Hong-Jiang Z (2003) Content-based audio classification and segmentation by using support vector machines. ACM J Multimed Syst 8(6):482–492
Article Google Scholar
Juan Jose B, Alexander L (2004) Hierarchical automatic audio signal classification. J. Audio Eng Soc 52(7):724–739
Google Scholar
Ewald W, Matthias H, Markus S (2014) Speech/music discrimination in a large data base of radio broadcasts from the wild. In: IEEE international conference on acoustic, speech and signal processing, pp 2134–2138
Khaled E, Mark K, Grace P, Peter K (2000) Speech/music discrimination for multimedia application. In: IEEE international conference on acoustics, speech, and signal processing, pp 2445–2448
Wu C, Liang G (2001) Robust singing detection in speech/music discriminator design. In: IEEE international conference on acoustics, speech and signal processing, pp 865–868
Kun-Ching W, Yung-Ming Y, Ying-Ru Y (2017) Speech/music discrimination using hybrid-based feature extraction for audio data Indexing. In: IEEE international conference on system science and engineering, pp 515–519
Eya M, Maha C, Chokri Ben A (2016) Multi-feature speech/music discrimination based on mid-term level statistics and supervised classifiers. In: IEEE international conference on computer systems and applications
Arijit G, Bibhas Chandra D, Sanjoy Kumar S (2011) Speech/music classification using empirical mode decomposition. In: IEEE international conference on emerging applications of information technology, pp 49–52
Kashif MKS, Wasfi G (2006) Machine-learning based classification of speech and music. J Multimed Syst 12(1):55–67
Article Google Scholar
Aggelos P, Sergios T (2014) Speech-music discrimination: a deep learning perspective. In: IEEE international conference on signal processing, pp 616–620
Shinichi O, Ikusaburo K, Takumi I (2006) Time series data classification using recurrent neural network with ensemble learning. In: Gabrys B, Howlett RJ, Jain LC (eds) Knowledge-based intelligent information and engineering systems. Springer, Berlin, pp 742–748
Google Scholar
Alex G, Abdel-rahman M, Geoffrey H (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing pp 6645–6649
Michael H, Peter S (2003) Recurrent neural networks for time series classification. Neurocomputing 50(C):223–235
MATH Google Scholar
Suman R, Andreas S (2016) A comparative study of recurrent neural network models for lexical domain classification. In: IEEE international conference on acoustics, speech and signal processing pp 6075–6079
Huy P, Philipp K, Fabrice K, Marco M, Radoslaw M, Alfred M (2017) Audio scene classification with deep recurrent neural networks. In: Interspeech, pp 3043–3047
Zhibin Y, Rammohan M, Minho L (2013) Supervised multiple timescale recurrent neuron network model for human action classification. In: International conference on neural information processing pp 196–203
Emmanuel M, Guillaume C, Yuliya T, Pierre A (2017) Recurrent neural networks to correct satellite image classification maps. IEEE Trans Geosci Remote Sens 55(9):4962–4971
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Nature Science Foundation of China (No. 61671335) and the Technological Innovation Major Project of Hubei Province (No. 2017AAA123).

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, Hubei Province, China
Weiping Tu, Yuhong Yang, Bo Du, Wanzhao Yang, Xiong Zhang & Jiaxi Zheng

Authors

Weiping Tu
View author publications
You can also search for this author in PubMed Google Scholar
Yuhong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Du
View author publications
You can also search for this author in PubMed Google Scholar
Wanzhao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiaxi Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiping Tu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tu, W., Yang, Y., Du, B. et al. RNN-based signal classification for hybrid audio data compression. Computing 102, 813–827 (2020). https://doi.org/10.1007/s00607-019-00713-8

Download citation

Received: 13 November 2018
Accepted: 16 March 2019
Published: 26 March 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s00607-019-00713-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

RNN-based signal classification for hybrid audio data compression

Abstract

Access this article

Similar content being viewed by others

An RNN-Based Speech-Music Discrimination Used for Hybrid Audio Coder

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music Using Discrete Wavelet Transform

Speech coding techniques and challenges: a comprehensive literature survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

RNN-based signal classification for hybrid audio data compression

Abstract

Access this article

Similar content being viewed by others

An RNN-Based Speech-Music Discrimination Used for Hybrid Audio Coder

Deep Convolutional Neural Networks for Predominant Instrument Recognition in Polyphonic Music Using Discrete Wavelet Transform

Speech coding techniques and challenges: a comprehensive literature survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation