Skip to main content
Log in

Dictionary design in subspace model for speaker identification

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Sparse representation or compressive sensing has been shown prosperous for state-of-the-art speech processing. This paper investigates mel filter bank log energies features based on the subspace model to explore the speaker identification task. It has been shown that the identification accuracy decreases, partly due to the subspace model overlapping of different speakers, therefore subspace model dictionary should be compressed. Two methods are proposed in this paper to reduce the dictionary size for subspace model. For the first approach, we proposed a new model feature-Dictionary Atom feature, which considers the reconstruction dispensability for speaker identification, and the learned dictionary can be under-complete. In addition, dictionary size reducing method employing probability statistics is investigated in this paper. In the recognition processes for both dictionary reducing methods, the vectors of Mel Filterbank Log Energies coefficients of the unknown speaker are projected into each subspace to decide the matching speaker. Experiments have been conducted on the corpus collected in our anechoic chamber, and a comparison with the approved subspace model-based speaker identification system shows better performance for the two proposed dictionary size reduction algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions On Signal Processing, 54(11), 4311–4322.

  • Baraniuk, R. G. (2007). Compressive sensing. IEEE Signal Processing Magazine, 24(4), 118–121.

    Article  Google Scholar 

  • Campbell, J. P. (1997). Speaker recognition: A tutorial. Proceedings of the IEEE, 85(9), 1437–1462.

    Article  Google Scholar 

  • Candès, E. J., Romberg, J., & Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete Fourier information. IEEE Transactions on Information Theory, 52(2), 489–509.

    Article  MATH  Google Scholar 

  • Candès, E. J., & Tao, T. (2006). Near optimal signal recovery from random projections: Universal encoding strategies. IEEE Transactions on Information Theory, 52(12), 5406–5425.

    Article  MATH  Google Scholar 

  • Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306.

    Article  MATH  MathSciNet  Google Scholar 

  • Haris, B. C., & Sinha, R. (2012). Sparse representation over learned and discriminatively learned dictionaries for speaker verification. In ICASSP, IEEE international conference on acoustics, speech and signal processing (pp. 4785–4788).

  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.

    Article  Google Scholar 

  • Kua, J.M.K., Ambikairajah, E., Epps, J., & Togneri, R. (2011). Speaker verification using sparse representation classification. In ICASSP, IEEE international conference on acoustics, speech and signal processing (pp. 4548–4551).

  • Kutyniok, G. (2012). Compressed sensing: Theory and applications, http://www.math.tu-berlin.de/fileadmin/i26_fg-kutyniok/Kutyniok/Papers/SurveyCompressedSensing_Revision.pdf

  • Moody D. I., Brumby S. P., Myers K. L., et al. (2011). Classification of transient signals using sparse representations over adaptive dictionaries. In SPIE defense, security, and sensing (pp. 805804–805804-11). International Society for Optics and Photonics.

  • Naseem, I., Togneri, R., & Bennamoun, M. (2010). Sparse representation for speaker identification. In International conference on pattern recognition (pp. 4460–4463).

  • Rauhut, H., Schnass, K., & Vandergheynst, P. (2008). Compressed sensing and redundant dictionaries. Information Theory, IEEE, 54(5), 2210–2219.

    Article  MATH  MathSciNet  Google Scholar 

  • Saon, G., Padmanabhan, M., Gopinath, R., & Chen, S. (2000). Maximum likelihood discriminant feature spaces. In Proceedings of the ICASSP (vol. 2, pp. 1129–1132).

  • Thiruvaran, T., Ambikairajah, E., & Epps, J. (2006). Speaker identification using FM features. Proceedings of the 11th Australasian International Conference on Speech Science and Technology, Auckland (pp. 148–152).

  • Wang, D. W., Wu, N., & Ma, X. Y. (2009). Multiple dictionaries-based radar target identification via a likelihood ratio test[C], Information and Automation, ICIA’09. International Conference on IEEE, 2009, 1252–1257.

  • Wang, D. W., Ma, X., & Su, Y. (2008). Undercomplete dictionary-based feature extraction for radar target identification[J]. Progress In Electromagnetics Research M, 1, 1–19.

    Article  Google Scholar 

  • Xu, L., & Yang, Z. (2013). Speaker identification based on sparse subspace model. APCC.

  • Yu, H., & Waibel, A. (2000). Streaming the front-end of a speech recognizer. In Proceedings of the ICSLP.

Download references

Acknowledgments

This work is supported by the National Basic Research Program of China (973 Program) (No. 2011CB302903), the National Natural Science Foundation of China (No. 60971129, 61271335), the Scientific Innovation Research Programs of College Graduate in Jiangsu Province (Grant No. CXZZ13_0488), and the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant No. 13KJB510020).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Longting Xu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, L., Yang, Z. & Shao, X. Dictionary design in subspace model for speaker identification. Int J Speech Technol 18, 177–186 (2015). https://doi.org/10.1007/s10772-014-9258-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-014-9258-0

Keywords

Navigation