Skip to main content

Phase Analysis and Labeling Strategies in a CNN-Based Speaker Change Detection System

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

Abstract

In this paper we analyze different labeling strategies and their impact on speaker change detection rates. We explore binary, linear fuzzy, quadratic and Gaussian labeling functions. We come to the conclusion that the labeling function is very important and the linear variant outperforms the rest. We also add phase information from the spectrum to the input of our convolutional neural network. Experiments show that even though the phase is informative its benefit is negligible and may be omitted. In the experiments we use a coverage-purity measure which is independent on tolerance parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hrúz, M., Zajíc, Z.: Convolutional neural network for speaker change detection in telephone speaker diarization system. In: 42nd IEEE International Conferecnce on Acoustics, Speech and Signal Processing, ICASSP (2017, in press)

    Google Scholar 

  2. Bredin, H.: TristouNet: triplet loss for speaker turn embedding. In: 42nd IEEE International Conferecnce on Acoustics, Speech and Signal Processing, ICASSP (2017, in press)

    Google Scholar 

  3. Bredin, H., Gelly, G.: Improving speaker diarization of TV series using talking-face detection and clustering. In: Proceedings of the 2016 ACM on Multimedia Conference, Series, MM 2016, pp. 157–161. ACM, New York (2016). doi:10.1145/2964284.2967202

  4. Hrúz, M., Kunešová, M.: Convolutional neural network in the task of speaker change detection. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 191–198. Springer, Cham (2016). doi:10.1007/978-3-319-43958-7_22

    Chapter  Google Scholar 

  5. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1106–1114 (2012)

    Google Scholar 

  6. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167 (2015)

    Google Scholar 

  7. Nesterov, Y.: A method of solving a convex programming problem with convergence rate O (1/k2). Soviet Math. Doklady 27(2), 372–376 (1983)

    MATH  Google Scholar 

  8. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. 1139–1147 (2013)

    Google Scholar 

  9. Canavan, A., Graff, D., Zipperlen, G.: CALLHOME American English Speech LDC97S42. Linguistic Data Consortium, DVD, Philadelphia (1997)

    Google Scholar 

  10. Oo, Z., Kawakami, Y., Wang, L., Nakagawa, S., Xiao, X., Iwahashi, M.: DNN-based amplitude and phase feature enhancement for noise robust speaker identification. In: INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, 8–12 September 2016, pp. 2204–2208 (2016)

    Google Scholar 

Download references

Acknowledgment

This research was supported by the Grand Agency of the Czech Republic, project No. P103/12/G084. We would also like to thank the grant of the University of West Bohemia, project No. SGS-2016-039. Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum, provided under the programme “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042), is greatly appreciated.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Hrúz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Hrúz, M., Salajka, P. (2017). Phase Analysis and Labeling Strategies in a CNN-Based Speaker Change Detection System. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_61

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_61

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics