Skip to main content

Speaker identification using harmonic structure of LP-residual spectrum

  • Text-independent Speaker Authentication
  • Conference paper
  • First Online:
Book cover Audio- and Video-based Biometric Person Authentication (AVBPA 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1206))

Abstract

The harmonic structure of LP-residual spectrum is different in speakers. Therefore the harmonic structure may be useful for speaker recognition. In order to prove this hypothesis, Power Difference of Spectra in Subband (PDSS) is proposed as a new feature parameter to extract information of the harmonic structure of the linear prediction residual spectrum. VQ-based text-independent speaker identification experiments for 25 male and 25 female speakers are conducted to investigate the speaker identification ability of PDSS. Experimental results show that PDSS alone provides 66.9% maximal identification. In addition, it was found that the LPC cepstrum combined with PDSS results in a 41.2% reduction in identification errors compared with using only the LPC cepstrum. Moreover, a 52.4% reduction of identification errors over using only LPC cepstrum is attained by combining the LPC cepstrum with both delta cepstrum and PDSS. It is shown that PDSS can compensate for the LPC cepstrum and delta cepstrum for improving speaker identification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Furui, S.: “Cepstral analysis technique for automatic speaker verification”, IEEE Trans. Acoust., & Speech, Signal Process., ASSP-29, No.2, pp.254–272 (1981).

    Google Scholar 

  2. Gray, A. H. Jr. and Markel, J. D.: “A spectral-flatness measure for studying the autocorrelation method of linear prediction of speech analysis”, IEEE Trans.Acoust.,Speech, & Signal Process. ASSP-22, No.3, pp.207–217 (1974).

    Google Scholar 

  3. He, J., Liu, L. and Palm, G.: “On the use of features from prediction residual signals in speaker identification,” ESCA Proc. EUROSPEECH, pp.313–316, (1995).

    Google Scholar 

  4. Itakura, F. and Saito, S.: “Analysis synthesis telephony based upon the maximum likelihood method,” Reports of 6th Int. Cong. Acoust., ed. by Y. Kohasi, C-5-5, pp. 17–20 (1968).

    Google Scholar 

  5. Kashiwagi, H., Nakamura, S. and Takanashi, M.: “Speaker identification by spectral envelope of linear prediction residual”, IECE Trans. A J68-A, No.7, pp.702–703 (1985). (in Japanese)

    Google Scholar 

  6. Linde, Y., Buzo, A. and Gray, R. M.: “An algorithm for vector quantizer design”, IEEE Trans. Comm. COM-28, No.1, pp.84–95 (1980).

    Google Scholar 

  7. Makhoul, J.: “Linear prediction: A tutorial review”, Proc. of IEEE. 63, No.4, pp.561–580 (1975).

    Google Scholar 

  8. Markel, J. D. and Gray, A. H. Jr.: Linear prediction of speech, Springer-Verlag (1976).

    Google Scholar 

  9. Matsui, T. and Furui, S.: “Text-independent speaker recognition using vocal tract and pitch information”, Proc. ICSLP, Vol.1, pp.137–140 (1990).

    Google Scholar 

  10. Rosenberg, A. E. and Soong, F. K.: “Recent Research in Automatic Speaker Recognition”, Advances in Speech Signal Processing, ed.by S. Furui and M. M. Sondhi, pp.701–738, Marcel Dekker, New York, (1992).

    Google Scholar 

  11. Soong, F. K. and Rosenberg, A. E.: “On the use of instantaneous and transitional spectral information in speaker recognition”, IEEE Trans. Acoust., & Speech, Signal Process., ASSP-36, No.6, pp.871–879 (1988).

    Google Scholar 

  12. Thévenaz, P. and Hügli, H.: “Usefulness of the LPC-residue in text-independent speaker verification”, Speech Communication, 17, pp. 145–157 (1995).

    Google Scholar 

  13. Tohkura, Y.: “A weighted cepstral distance measure for speech recognition”, IEEE Trans. Acoust., & Speech, Signal Process., ASSP-35, No.10, pp.1414–1422 (1987).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Josef Bigün Gérard Chollet Gunilla Borgefors

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hayakawa, S., Takeda, K., Itakura, F. (1997). Speaker identification using harmonic structure of LP-residual spectrum. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0016002

Download citation

  • DOI: https://doi.org/10.1007/BFb0016002

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62660-2

  • Online ISBN: 978-3-540-68425-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics