Automated Learning of In-vehicle Noise Representation with Triplet-Loss Embedded Convolutional Beamforming Network

Bu, Seok-Jun; Cho, Sung-Bae

doi:10.1007/978-3-030-62365-4_48

Automated Learning of In-vehicle Noise Representation with Triplet-Loss Embedded Convolutional Beamforming Network

Seok-Jun Bu¹² &
Sung-Bae Cho¹²

Conference paper
First Online: 27 October 2020

1537 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12490))

Abstract

In spite of various deep learning models devised, it is still a challenging task to classify in-vehicle noise because of the reverberation and the variance in the low-frequency band generated from the narrow interior space. Considering the impulsive characteristics of the vehicle noise and the multi-channel sampling environment at the same time, it is essential to automatically learn the disentangled noise representation as well as parameterize the conventional beamforming operation. We propose a method to overcome the above two major hurdles by parameterizing a beamforming operation based on convolutional neural network. Moreover, we improve the structure of the beamforming network by explicitly learning of the distance between vehicle noises within the triplet network framework. Experiments with the dataset consisting of a total 241,958,848 time-series collected by a global motor company show that the proposed model improves the classification accuracy by 5% compared to the latest deep acoustic models. The detailed analysis shows that the proposed method can potentially compensate for the disjoint issues between the learning and validation vehicle types.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bu, S.-J., Park, N., Nam, G.-H., Seo, J.-Y., Cho, S.-B.: A Monte Carlo search-based triplet sampling method for learning disentangled representation of impulsive noise on steering gear. In: International Conference on Acoustics, Speech and Signal Processing, pp. 3057–3061. IEEE (2020)
Google Scholar
Bu, S.-J., Cho, S.-B.: Classifying in-vehicle noise from multi-channel sound spectrum by deep beamforming networks. In: International Conference on Big Data, pp. 3545–3552. IEEE (2019)
Google Scholar
Cerrato, G.: Automotive sound quality-accessories, BSR, and brakes. Sound Vibr. 43, 10 (2009)
Google Scholar
Zhang, C., Koishida, K.: End-to-end text-independent speaker verification with triplet loss on short utterances. In: Interspeech, pp. 1487–1491 (2017)
Google Scholar
Bredin, H.: TristouNet: triplet loss for speaker turn embedding. In: International Conference on Acoustics, Speech and Signal Processing, pp. 5430–5434. IEEE (2017)
Google Scholar
Yang, H.-C., Tsai, F.-S., Weng, Y.-M., Ng, C.-J., Lee, C.-C.: A triplet-loss embedded deep regressor network for estimating blood pressure changes using prosodic features. In: International Conference on Acoustics, Speech and Signal Processing, pp. 6019–6023. IEEE (2018)
Google Scholar
Novoselov, S., Shchemelinin, V., Shulipa, A., Kozlov, A., Kremnev, I.: Triplet loss based cosine similarity metric learning for text-independent speaker recognition. In: Interspeech, pp. 2242–2246 (2018)
Google Scholar
Wang, J., Wang, K.-C., Law, M.T., Rudzicz, F., Brudno, M.: Centroid-based deep metric learning for speaker recognition. In: International Conference on Acoustics, Speech and Signal Processing, pp. 3652–3656. IEEE (2019)
Google Scholar
Turpault, N., Serizel, R., Vincent, E.: Semi-supervised triplet loss based learning of ambient audio embeddings. In: International Conference on Acoustics, Speech and Signal Processing, pp. 760–764. IEEE (2019)
Google Scholar
Zhao, F., Li, H., Zhang, X.: A robust text-independent speaker verification method based on speech separation and deep speaker. In: International Conference on Acoustics, Speech and Signal Processing, pp. 6101–6105. IEEE (2019)
Google Scholar
Mingote, V., et al.: Language recognition using triplet neural networks. In: Interspeech, pp. 4025–4029 (2019)
Google Scholar
Xiao, X., et al.: Deep beamforming networks for multi-channel speech recognition. In: International Conference on Acoustics, Speech and Signal Processing, pp. 5745–5749. IEEE (2016)
Google Scholar
Markovich, S., Gannot, S., Cohen, I.: Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals. IEEE Trans. Audio Speech Lang. Process. 17, 1071–1086 (2009)
Article Google Scholar
Sainath, T., Parada, C.: Convolutional neural networks for small-footprint keyword spotting. In: Interspeech, pp. 1478–1482 (2015)
Google Scholar
Kim, T.Y., Cho, S.B.: Predicting residential energy consumption using CNN-LSTM neural networks. Energy 182, 72–81 (2019)
Article Google Scholar
Ribeiro, L.N., de Almeida, A.L., Mota, J.C.: Tensor beamforming for multilinear translation invariant arrays. In: International Conference on Acoustics, Speech and Signal Processing, pp. 2966–2970. IEEE (2016)
Google Scholar
Ramón, M.M., Xu, N., Christodoulou, C.G.: Beamforming using support vector machines. IEEE Antennas Wirel. Propag. Lett. 4, 439–442 (2005)
Article Google Scholar
Salvati, D., Drioli, C., Foresti, G.L.: A weighted MVDR beamformer based on SVM learning for sound source localization. Pattern Recognit. Lett. 84, 15–21 (2016)
Article Google Scholar
Bell, K.L., Ephraim, Y., Van Trees, H.L.: A Bayesian approach to robust adaptive beamforming. IEEE Trans. Signal Process. 48, 386–398 (2000)
Article Google Scholar
Donahue, J., et al.: Long-term Recurrent Convolutional Networks for Visual Recognition and Description. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Google Scholar
Kim, J.Y., Cho, S.B.: Electric energy consumption prediction by deep learning with state explainable autoencoder. Energies 12, 739 (2019)
Article Google Scholar

Download references

Acknowledgments

This work was partly supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University)) and Hyundai Motors, Inc.

Author information

Authors and Affiliations

Department of Computer Science, Yonsei University, Seoul, 03722, South Korea
Seok-Jun Bu & Sung-Bae Cho

Authors

Seok-Jun Bu
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Bae Cho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sung-Bae Cho .

Editor information

Editors and Affiliations

University of Minho, Braga, Portugal
Cesar Analide
University of Minho, Braga, Portugal
Paulo Novais
Technical University of Madrid, Madrid, Spain
David Camacho
University of Manchester, Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bu, SJ., Cho, SB. (2020). Automated Learning of In-vehicle Noise Representation with Triplet-Loss Embedded Convolutional Beamforming Network. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2020. IDEAL 2020. Lecture Notes in Computer Science(), vol 12490. Springer, Cham. https://doi.org/10.1007/978-3-030-62365-4_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-62365-4_48
Published: 27 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62364-7
Online ISBN: 978-3-030-62365-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics