Abstract
Audio data are a fundamental component of multimedia big data. Switched audio codec has been proved to be efficient for compressing a large range of audio signals at low bit rates. However, coding quality strongly relies on an exact classification of the input signals. Two coding mode selection methods are adopted in AMR-WB+, the state-of-the-art switched audio coder. The closed-loop method obtains good quality, but it has a high computation complexity. Conversely, the open-loop method reduces complexity but has unsatisfactory coding quality. Therefore, in this study, a speech/music discrimination based on a recurrent neural network (RNN) model is investigated to improve the coding performance of AMR-WB+. An RNN model is chosen for its outstanding performance on processing time series. The recurrent structure of RNN makes it capable of learning and making full use of the temporal information of the input sequences to make up for the deficiencies of the short-term features. We quantitatively analyze the quality loss caused by two types of misclassification and the tune parameter of the classifier to improve the signal-to-noise ratio (SNR) of the synthesized signals. The experimental results show that the proposed method increases the accuracy of the mode selection with a rate of 18% and the coding quality of 0.21 dB in segmental SNR in comparison with the open-loop method. Moreover, it reduces the computational complexity by about 43% in comparison with the closed-loop method in AMR-WB+.
Similar content being viewed by others
References
GPP (2005) Recommendation ETSI TS 126 290. Extended adaptive multi-rate-wideband (AMR-WB+) codec
GPP (2014) Recommendation TS 26.441. Codec for enhanced voice services (EVS)
MPEG (2011) Recommendation ISO/IEC 23003-3, information technology–MPEG audio technologies—part 3: unified speech and audio coding
Jérémie L, Roch L, Guy R (2007) An improved low complexity AMR-WB+ encoder using neural networks for mode selection. In: 123rd convention of audio engineering society
Jong-Kyu K, Nam-Soo K (2008) Improved frame mode selection for AMR-WB+ based on decision tree. IEICE Trans INF Syst E 91-D(6):1830–1833
Mu-Liang W, Mn-Ta L (2010) A neural network-based coding mode selection scheme of hybrid audio coder. In: IEEE international conference on wireless communications, pp 107–110
Alessandro B, Alessandra F, Pierangelo M (2002) Audio classification in speech and music: a comparison between a statistical and a neural approach. Eur J Appl Sign Process 4:372–378
Zhonghua F, JhingFa W, Lei X (2009) Noise robust features for speech/music discrimination in real-time telecommunication. In: IEEE international conference on multimedia and expo, pp 574–577
Costas P, George T (2005) A speech/music discriminator based on RMS and zero-crossing. IEEE Trans Multimed 7(1):155–166
Michael JC, Eluned SP, Harvey L (1999) A comparison of features for speech, music discrimination. In: IEEE international conference in acoustic, speech, and signal processing, pp 149–152
Jun W, Qiong W, Haojiang D, Qin Y (2008) Real-time speech/music classification with a hierarchical oblique decision tree. In: International conference on acoustic, speech, and signal processing, pp 2033–2036
Lie L, Stan ZL, Hong-Jiang Z (2003) Content-based audio classification and segmentation by using support vector machines. ACM J Multimed Syst 8(6):482–492
Juan Jose B, Alexander L (2004) Hierarchical automatic audio signal classification. J. Audio Eng Soc 52(7):724–739
Ewald W, Matthias H, Markus S (2014) Speech/music discrimination in a large data base of radio broadcasts from the wild. In: IEEE international conference on acoustic, speech and signal processing, pp 2134–2138
Khaled E, Mark K, Grace P, Peter K (2000) Speech/music discrimination for multimedia application. In: IEEE international conference on acoustics, speech, and signal processing, pp 2445–2448
Wu C, Liang G (2001) Robust singing detection in speech/music discriminator design. In: IEEE international conference on acoustics, speech and signal processing, pp 865–868
Kun-Ching W, Yung-Ming Y, Ying-Ru Y (2017) Speech/music discrimination using hybrid-based feature extraction for audio data Indexing. In: IEEE international conference on system science and engineering, pp 515–519
Eya M, Maha C, Chokri Ben A (2016) Multi-feature speech/music discrimination based on mid-term level statistics and supervised classifiers. In: IEEE international conference on computer systems and applications
Arijit G, Bibhas Chandra D, Sanjoy Kumar S (2011) Speech/music classification using empirical mode decomposition. In: IEEE international conference on emerging applications of information technology, pp 49–52
Kashif MKS, Wasfi G (2006) Machine-learning based classification of speech and music. J Multimed Syst 12(1):55–67
Aggelos P, Sergios T (2014) Speech-music discrimination: a deep learning perspective. In: IEEE international conference on signal processing, pp 616–620
Shinichi O, Ikusaburo K, Takumi I (2006) Time series data classification using recurrent neural network with ensemble learning. In: Gabrys B, Howlett RJ, Jain LC (eds) Knowledge-based intelligent information and engineering systems. Springer, Berlin, pp 742–748
Alex G, Abdel-rahman M, Geoffrey H (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing pp 6645–6649
Michael H, Peter S (2003) Recurrent neural networks for time series classification. Neurocomputing 50(C):223–235
Suman R, Andreas S (2016) A comparative study of recurrent neural network models for lexical domain classification. In: IEEE international conference on acoustics, speech and signal processing pp 6075–6079
Huy P, Philipp K, Fabrice K, Marco M, Radoslaw M, Alfred M (2017) Audio scene classification with deep recurrent neural networks. In: Interspeech, pp 3043–3047
Zhibin Y, Rammohan M, Minho L (2013) Supervised multiple timescale recurrent neuron network model for human action classification. In: International conference on neural information processing pp 196–203
Emmanuel M, Guillaume C, Yuliya T, Pierre A (2017) Recurrent neural networks to correct satellite image classification maps. IEEE Trans Geosci Remote Sens 55(9):4962–4971
Acknowledgements
This work was supported by the National Nature Science Foundation of China (No. 61671335) and the Technological Innovation Major Project of Hubei Province (No. 2017AAA123).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tu, W., Yang, Y., Du, B. et al. RNN-based signal classification for hybrid audio data compression. Computing 102, 813–827 (2020). https://doi.org/10.1007/s00607-019-00713-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-019-00713-8