Skip to main content

Speaker Verification Using Spectral and Durational Segmental Characteristics

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

Abstract

In the present paper we report on some of the results obtained by fusion of human assisted speaker verification methods based on formant features and statistics of phone durations. Our experiments on the database of spontaneous speech demonstrate that using segmental durational characteristics leads to better performance, which shows the applicability of these features for the speaker verification task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Superscript t is omitted for the sake of presentation clarity.

References

  1. Kunzel, H., Masthoff, H., Koster, J.: The relation between speech tempo, loudness, and fundamental frequency: an important issue in forensic speaker recognition. Sci. Justice 35(4), 291–295 (1995)

    Article  Google Scholar 

  2. Nolan, F.: Intonation in speaker identification: an experiment on pitch alignment features. Forensic Linguist. 9(1), 1–21 (2002)

    Google Scholar 

  3. Smirnova, N., et al.: Using parameters of identical pitch contour elements for speaker discrimination. In: Proceedings of the 12th International Conference on Speech and Computer, SPECOM 2007, Moscow, Russia, pp. 361–366 (2007)

    Google Scholar 

  4. Morrison, G.: Likelihood-ratio-based forensic speaker comparison using representations of vowel formant trajectories. J. Acoust. Soc. Am. 125, 2387–2397 (2009)

    Article  Google Scholar 

  5. Nolan, F., Grigoras, C.: A case for formant analysis in forensic speaker identification. J. Speech Lang. Law 12(2), 143–173 (2005)

    Article  Google Scholar 

  6. Rose, P., Osanai, T., Kinoshita, Y.: Strength of forensic speaker identification evidence: multispeaker formant-and cepstrum-based segmental discrimination with a Bayesian likelihood ratio as threshold. Forensic Linguist. 10(2), 179–202 (2003)

    Google Scholar 

  7. Becker, T., Jessen, M., Grigoras, C.: Forensic speaker verification using formant features and Gaussian mixture models. In: Proceedings of the Interspeech 2008 Incorporating SST, International Speech Communication Association, pp. 1505–1508 (2008)

    Google Scholar 

  8. Dellwo, V., Leemann, A., Kolly, M.-J.: Speaker idiosyncratic rhythmic features in the speech signal. In: Proceedings of Interspeech, Portland, USA, 9–13 September, pp. 1584–1587 (2012)

    Google Scholar 

  9. Leemann, A., Kolly, M.-J., Dellwo, V.: Speaker-individuality in suprasegmental temporal features: implications for forensic voice comparison. Forensic Sci. Int. 238, 59–67 (2014)

    Article  Google Scholar 

  10. Van Heerden, C., Barnard, E.: Speaker-specific variability of phoneme durations. S. Afr. Comput. J. (SACJ) 40, 44–50 (2008)

    Google Scholar 

  11. Schwarz, P.: Phoneme recognition based on long temporal context. Ph.D. thesis, Brno University of Technology (2009)

    Google Scholar 

  12. Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 435–442. Springer, Heidelberg (2014)

    Google Scholar 

  13. Moreno, P., Joerg C., Van Thong, J.-M., Glickman, O.: A recursive algorithm for the forced alignment of very long audio segments. In: Proceedings of ICSLP 1998, Sydney, Australia, pp. 2711–2714. IEEE Press (1998)

    Google Scholar 

  14. Tomashenko, N.A., Khokhlov, Y.Y.: Fast algorithm for automatic alignment of speech and imperfect text data. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 146–153. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  15. The NIST year 2010 Speaker Recognition Evaluation plan (2010). http://www.itl.nist.gov/iad/mig/tests/sre/2010/NIST_SRE10_evalplan.r6.pdf

Download references

Acknowledgments

This work was financially supported by the Government of the Russian Federation, Grant 074-U01.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Natalia Tomashenko .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Bulgakova, E., Sholohov, A., Tomashenko, N., Matveev, Y. (2015). Speaker Verification Using Spectral and Durational Segmental Characteristics. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23132-7_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23131-0

  • Online ISBN: 978-3-319-23132-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics