Skip to main content

Automated Learning of In-vehicle Noise Representation with Triplet-Loss Embedded Convolutional Beamforming Network

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12490))

Abstract

In spite of various deep learning models devised, it is still a challenging task to classify in-vehicle noise because of the reverberation and the variance in the low-frequency band generated from the narrow interior space. Considering the impulsive characteristics of the vehicle noise and the multi-channel sampling environment at the same time, it is essential to automatically learn the disentangled noise representation as well as parameterize the conventional beamforming operation. We propose a method to overcome the above two major hurdles by parameterizing a beamforming operation based on convolutional neural network. Moreover, we improve the structure of the beamforming network by explicitly learning of the distance between vehicle noises within the triplet network framework. Experiments with the dataset consisting of a total 241,958,848 time-series collected by a global motor company show that the proposed model improves the classification accuracy by 5% compared to the latest deep acoustic models. The detailed analysis shows that the proposed method can potentially compensate for the disjoint issues between the learning and validation vehicle types.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bu, S.-J., Park, N., Nam, G.-H., Seo, J.-Y., Cho, S.-B.: A Monte Carlo search-based triplet sampling method for learning disentangled representation of impulsive noise on steering gear. In: International Conference on Acoustics, Speech and Signal Processing, pp. 3057–3061. IEEE (2020)

    Google Scholar 

  2. Bu, S.-J., Cho, S.-B.: Classifying in-vehicle noise from multi-channel sound spectrum by deep beamforming networks. In: International Conference on Big Data, pp. 3545–3552. IEEE (2019)

    Google Scholar 

  3. Cerrato, G.: Automotive sound quality-accessories, BSR, and brakes. Sound Vibr. 43, 10 (2009)

    Google Scholar 

  4. Zhang, C., Koishida, K.: End-to-end text-independent speaker verification with triplet loss on short utterances. In: Interspeech, pp. 1487–1491 (2017)

    Google Scholar 

  5. Bredin, H.: TristouNet: triplet loss for speaker turn embedding. In: International Conference on Acoustics, Speech and Signal Processing, pp. 5430–5434. IEEE (2017)

    Google Scholar 

  6. Yang, H.-C., Tsai, F.-S., Weng, Y.-M., Ng, C.-J., Lee, C.-C.: A triplet-loss embedded deep regressor network for estimating blood pressure changes using prosodic features. In: International Conference on Acoustics, Speech and Signal Processing, pp. 6019–6023. IEEE (2018)

    Google Scholar 

  7. Novoselov, S., Shchemelinin, V., Shulipa, A., Kozlov, A., Kremnev, I.: Triplet loss based cosine similarity metric learning for text-independent speaker recognition. In: Interspeech, pp. 2242–2246 (2018)

    Google Scholar 

  8. Wang, J., Wang, K.-C., Law, M.T., Rudzicz, F., Brudno, M.: Centroid-based deep metric learning for speaker recognition. In: International Conference on Acoustics, Speech and Signal Processing, pp. 3652–3656. IEEE (2019)

    Google Scholar 

  9. Turpault, N., Serizel, R., Vincent, E.: Semi-supervised triplet loss based learning of ambient audio embeddings. In: International Conference on Acoustics, Speech and Signal Processing, pp. 760–764. IEEE (2019)

    Google Scholar 

  10. Zhao, F., Li, H., Zhang, X.: A robust text-independent speaker verification method based on speech separation and deep speaker. In: International Conference on Acoustics, Speech and Signal Processing, pp. 6101–6105. IEEE (2019)

    Google Scholar 

  11. Mingote, V., et al.: Language recognition using triplet neural networks. In: Interspeech, pp. 4025–4029 (2019)

    Google Scholar 

  12. Xiao, X., et al.: Deep beamforming networks for multi-channel speech recognition. In: International Conference on Acoustics, Speech and Signal Processing, pp. 5745–5749. IEEE (2016)

    Google Scholar 

  13. Markovich, S., Gannot, S., Cohen, I.: Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals. IEEE Trans. Audio Speech Lang. Process. 17, 1071–1086 (2009)

    Article  Google Scholar 

  14. Sainath, T., Parada, C.: Convolutional neural networks for small-footprint keyword spotting. In: Interspeech, pp. 1478–1482 (2015)

    Google Scholar 

  15. Kim, T.Y., Cho, S.B.: Predicting residential energy consumption using CNN-LSTM neural networks. Energy 182, 72–81 (2019)

    Article  Google Scholar 

  16. Ribeiro, L.N., de Almeida, A.L., Mota, J.C.: Tensor beamforming for multilinear translation invariant arrays. In: International Conference on Acoustics, Speech and Signal Processing, pp. 2966–2970. IEEE (2016)

    Google Scholar 

  17. Ramón, M.M., Xu, N., Christodoulou, C.G.: Beamforming using support vector machines. IEEE Antennas Wirel. Propag. Lett. 4, 439–442 (2005)

    Article  Google Scholar 

  18. Salvati, D., Drioli, C., Foresti, G.L.: A weighted MVDR beamformer based on SVM learning for sound source localization. Pattern Recognit. Lett. 84, 15–21 (2016)

    Article  Google Scholar 

  19. Bell, K.L., Ephraim, Y., Van Trees, H.L.: A Bayesian approach to robust adaptive beamforming. IEEE Trans. Signal Process. 48, 386–398 (2000)

    Article  Google Scholar 

  20. Donahue, J., et al.: Long-term Recurrent Convolutional Networks for Visual Recognition and Description. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)

    Google Scholar 

  21. Kim, J.Y., Cho, S.B.: Electric energy consumption prediction by deep learning with state explainable autoencoder. Energies 12, 739 (2019)

    Article  Google Scholar 

Download references

Acknowledgments

This work was partly supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. 2020-0-01361, Artificial Intelligence Graduate School Program (Yonsei University)) and Hyundai Motors, Inc.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sung-Bae Cho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bu, SJ., Cho, SB. (2020). Automated Learning of In-vehicle Noise Representation with Triplet-Loss Embedded Convolutional Beamforming Network. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2020. IDEAL 2020. Lecture Notes in Computer Science(), vol 12490. Springer, Cham. https://doi.org/10.1007/978-3-030-62365-4_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62365-4_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62364-7

  • Online ISBN: 978-3-030-62365-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics