Speaker Verification Using Spectral and Durational Segmental Characteristics

Bulgakova, Elena; Sholohov, Aleksei; Tomashenko, Natalia; Matveev, Yuri

doi:10.1007/978-3-319-23132-7_49

Elena Bulgakova⁷,
Aleksei Sholohov⁷,
Natalia Tomashenko^7,8 &
…
Yuri Matveev^7,8

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

International Conference on Speech and Computer

1654 Accesses
2 Citations

Abstract

In the present paper we report on some of the results obtained by fusion of human assisted speaker verification methods based on formant features and statistics of phone durations. Our experiments on the database of spontaneous speech demonstrate that using segmental durational characteristics leads to better performance, which shows the applicability of these features for the speaker verification task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Semi-automatic Speaker Verification System Based on Analysis of Formant, Durational and Pitch Characteristics

Speaker Verification Systems: A Comprehensive Review

Language and Text-Independent Speaker Recognition System Using Energy Spectrum and MFCCs

Notes

1.
Superscript t is omitted for the sake of presentation clarity.

References

Kunzel, H., Masthoff, H., Koster, J.: The relation between speech tempo, loudness, and fundamental frequency: an important issue in forensic speaker recognition. Sci. Justice 35(4), 291–295 (1995)
Article Google Scholar
Nolan, F.: Intonation in speaker identification: an experiment on pitch alignment features. Forensic Linguist. 9(1), 1–21 (2002)
Google Scholar
Smirnova, N., et al.: Using parameters of identical pitch contour elements for speaker discrimination. In: Proceedings of the 12th International Conference on Speech and Computer, SPECOM 2007, Moscow, Russia, pp. 361–366 (2007)
Google Scholar
Morrison, G.: Likelihood-ratio-based forensic speaker comparison using representations of vowel formant trajectories. J. Acoust. Soc. Am. 125, 2387–2397 (2009)
Article Google Scholar
Nolan, F., Grigoras, C.: A case for formant analysis in forensic speaker identification. J. Speech Lang. Law 12(2), 143–173 (2005)
Article Google Scholar
Rose, P., Osanai, T., Kinoshita, Y.: Strength of forensic speaker identification evidence: multispeaker formant-and cepstrum-based segmental discrimination with a Bayesian likelihood ratio as threshold. Forensic Linguist. 10(2), 179–202 (2003)
Google Scholar
Becker, T., Jessen, M., Grigoras, C.: Forensic speaker verification using formant features and Gaussian mixture models. In: Proceedings of the Interspeech 2008 Incorporating SST, International Speech Communication Association, pp. 1505–1508 (2008)
Google Scholar
Dellwo, V., Leemann, A., Kolly, M.-J.: Speaker idiosyncratic rhythmic features in the speech signal. In: Proceedings of Interspeech, Portland, USA, 9–13 September, pp. 1584–1587 (2012)
Google Scholar
Leemann, A., Kolly, M.-J., Dellwo, V.: Speaker-individuality in suprasegmental temporal features: implications for forensic voice comparison. Forensic Sci. Int. 238, 59–67 (2014)
Article Google Scholar
Van Heerden, C., Barnard, E.: Speaker-specific variability of phoneme durations. S. Afr. Comput. J. (SACJ) 40, 44–50 (2008)
Google Scholar
Schwarz, P.: Phoneme recognition based on long temporal context. Ph.D. thesis, Brno University of Technology (2009)
Google Scholar
Chernykh, G., Korenevsky, M., Levin, K., Ponomareva, I., Tomashenko, N.: State level control for acoustic model training. In: Ronzhin, A., Potapova, R., Delic, V. (eds.) SPECOM 2014. LNCS, vol. 8773, pp. 435–442. Springer, Heidelberg (2014)
Google Scholar
Moreno, P., Joerg C., Van Thong, J.-M., Glickman, O.: A recursive algorithm for the forced alignment of very long audio segments. In: Proceedings of ICSLP 1998, Sydney, Australia, pp. 2711–2714. IEEE Press (1998)
Google Scholar
Tomashenko, N.A., Khokhlov, Y.Y.: Fast algorithm for automatic alignment of speech and imperfect text data. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS, vol. 8113, pp. 146–153. Springer, Heidelberg (2013)
Chapter Google Scholar
The NIST year 2010 Speaker Recognition Evaluation plan (2010). http://www.itl.nist.gov/iad/mig/tests/sre/2010/NIST_SRE10_evalplan.r6.pdf

Download references

Acknowledgments

This work was financially supported by the Government of the Russian Federation, Grant 074-U01.

Author information

Authors and Affiliations

ITMO University, St. Petersburg, Russia
Elena Bulgakova, Aleksei Sholohov, Natalia Tomashenko & Yuri Matveev
Speech Technology Center, St. Petersburg, Russia
Natalia Tomashenko & Yuri Matveev

Authors

Elena Bulgakova
View author publications
You can also search for this author in PubMed Google Scholar
Aleksei Sholohov
View author publications
You can also search for this author in PubMed Google Scholar
Natalia Tomashenko
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Matveev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Natalia Tomashenko .

Editor information

Editors and Affiliations

SPIIRAS, Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Patras, Patras, Greece
Nikos Fakotakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bulgakova, E., Sholohov, A., Tomashenko, N., Matveev, Y. (2015). Speaker Verification Using Spectral and Durational Segmental Characteristics. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_49

Download citation

DOI: https://doi.org/10.1007/978-3-319-23132-7_49
Published: 04 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics