Abstract
In this paper, enhancements of online speech activity detection (SAD) is presented. Our proposed approach combines standard signal processing methods and modern deep-learning methods which allows simultaneous training of the detector’s parts that are usually trained or designed separately. In our SAD, an NN-based early score computation system, an NN-based score smoothing system and proposed online decoding system were incorporated in a training process. Besides the CNN and DNN, spectral flux and spectral variance features are also investigated. The proposed approach was tested on a Czech Radio broadcasting corpus. The corpus was used for investigation supervised and also semi-supervised machine learning.
J. Zelinka—This work was supported by the European Regional Development Fund under the project AI&Reasoning (reg. no. CZ.02.1.01/0.0/0.0/15 003/0000466).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, J., Wang, Y., Wang, D.: A feature study for classification-based speech separation at very low signal-to-noise ratio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7039–7043, May 2014
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Hughes, T., Mierle, K.: Recurrent neural networks for voice activity detection. In: ICASSP, pp. 7378–7382 (2013)
Lehner, B., Widmer, G., Sonnleitner, R.: On the reduction of false positives in singing voice detection. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7480–7484 (2014)
Mateju, L., Cerva, P., Zdansky, J., Malek, J.: Speech activity detection in online broadcast transcription using deep neural networks and weighted finite state transducers. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5460–5464, March 2017
Sadjadi, S.O., Hansen, J.H.L.: Unsupervised speech activity detection using voicing measures and perceptual spectral flux. IEEE Signal Process. Lett. 20, 197–200 (2013)
Saon, G., Thomas, S., Soltau, H., Ganapathy, S., Kingsbury, B.: The IBM speech activity detection system for the DARPA RATS program, pp. 3497–3501, January 2013
Sehgal, A., Kehtarnavaz, N.: A convolutional neural network smartphone app for real-time voice activity detection. IEEE Access 6, 9017–9026 (2018)
Thomas, S., Ganapathy, S., Saon, G., Soltau, H.: Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2519–2523, May 2014
Thomas, S., Saon, G., Segbroeck, M.V., Narayanan, S.S.: Improvements to the IBM speech activity detection system for the DARPA RATS program. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4500–4504 (2015)
Zhang, X.L., Wang, D.: Boosting contextual information for deep neural network based voice activity detection. IEEE/ACM Trans. Audio, Speech, Lang. Process. 24, 252–264 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zelinka, J. (2018). Deep Learning and Online Speech Activity Detection for Czech Radio Broadcasting. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-00794-2_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00793-5
Online ISBN: 978-3-030-00794-2
eBook Packages: Computer ScienceComputer Science (R0)