Skip to main content
Log in

The Optimized Dictionary based Robust Speaker Recognition

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

The mismatch between the training and the testing environments greatly degrades the performance of speaker recognition. Although many robust techniques have been proposed, the mismatch problem is still a challenge for speaker recognition system. To solve this problem, we propose an optimized dictionary based sparse representation for robust speaker recognition. To this end, we first train a speech dictionary and a noise dictionary, and concatenate them for sparse representation; then design an optimization algorithm to reduce the mutual coherence between the two learned dictionaries; after that, utilize mixture k-means to model speaker corresponding to sparse feature; and finally, present a distance divergence to measure the similarity. Compared with the Mel-frequency cepstral coefficients based speaker recognition, our preliminary experiments show that the proposed recognition framework consistently improve the robustness in the mismatched condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  1. Ailon, N., Jaiswal, R., & Monteleoni, C. (2009). Streaming k-means approximation. In Advances in neural information processing systems (pp. 10–18).

  2. Bonastre, J., Wils, F., & Meignier, S. (2005). ALIZE, a free toolkit for speaker recognition. ICASSP, 1, 737–740.

    Google Scholar 

  3. Brown, G., Ferr, R., & Meddis, R. (2010). A computer model of auditory efferent suppression: implications for the recognition of speech in noise. The Journal of the Acoustical Society of America, 127(2), 943–954.

    Article  Google Scholar 

  4. Chen, S., Donoho, D., & Saunders, M. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.

    Article  MathSciNet  MATH  Google Scholar 

  5. Chibelushi, C., Deravi, F., & Mason, J. (2002). A review of speech-based bimodal recognition. IEEE Transactions on Multimedia, 4(1), 23–37.

    Article  Google Scholar 

  6. Christian, D., Tomas, D., & Joachim, M. (2010). Speech enhancement with sparse coding learned dictionaries. IEEE International Conference on Acoustics, Speech, and Signal Processing, 20, 4758–4761.

    Google Scholar 

  7. Fine, S., Navratil, J., Gopinath, R., & et al. (2001). A hybrid GMM/SVM approach to speaker identification. In Acoustics, speech, and signal processing, 2001. 2001 IEEE International Conference on Proceedings. (ICASSP’01), (Vol. 1 pp. 417–420): IEEE.

  8. Gai, K., & Li, S. (2012). Towards cloud computing: a literature review on cloud computing and its development trends. In The 4th IEEE international conference on multimedia information networking and security (pp. 142–146). Nanjing.

  9. Gai, K., Qiu, M., Zhao, H., Tao, L., & Zong, Z. (2015). Dynamic energy-aware cloudlet-based mobile cloud computing model for green computing. Journal of Network and Computer Applications, 59, 46–54.

    Article  Google Scholar 

  10. Gai, K., Du, Z., Qiu, M., & Zhao, H. (2015). Efficiency-aware workload optimizations of heterogenous cloud computing for capacity planning in financial industry. In The 2nd IEEE international conference on cyber security and cloud computing (pp. 1–6). New York: IEEE.

    Google Scholar 

  11. Gai, K., Qiu, M., Chen, L., & Liu, M. (2015). Electronic health record error prevention approach using ontology in big data. In The 17th IEEE international conference on high performance computing and communications (pp. 752–757). New York: IEEE.

    Google Scholar 

  12. Gai, K., Qiu, M., Tao, L., & Zhu, Y. (2015). Intrusion detection techniques for mobile cloud computing in heterogeneous 5G. Security and Communication Networks, 1–10.

  13. Gai, K., Qiu, M., Jayaraman, S., & Tao, L. (2015). Ontology-based knowledge representation for secure self-diagnosis in patient-centered Telehealth with cloud systems. In The 2nd IEEE international conference on cyber security and cloud computing (pp. 98–103). New York: IEEE.

    Google Scholar 

  14. Gai, K., Qiu, M., Thuraisingham, B., & Tao, L. (2015). Proactive attribute-based secure data schema for mobile cloud in financial industry. In The IEEE international symposium on big data security on cloud (pp. 1332–1337). New York: IEEE.

    Google Scholar 

  15. Gemmeke, J., Hurmalainen, A., Virtanen, T., & Sun, Y. (2011). Toward a practical implementation of exemplar-based noise robust ASR. In Signal processing conference, 2011 19th European, IEEE (pp. 1490–1494): IEEE.

  16. Hromádka, T., DeWeese, M., Zador, A., & et al. (2008). Sparse representation of sounds in the unanesthetized auditory cortex. PLoS Biol, 6(1), 16.

    Article  Google Scholar 

  17. Hu, Y., & Loizou, P. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7), 588–601.

    Article  Google Scholar 

  18. Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.

    Article  Google Scholar 

  19. Kullback, S., & Leibler, R. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 79–86.

  20. Lee, C., Glass, J., & Ghitza, O. (2011). An efferent-inspired auditory model front-end for speech recognition. In INTERSPEECH (pp. 49–52).

  21. Lee, H., Battle, A., Raina, R., & Ng, A. (2006). Efficient sparse coding algorithms. In Advances in neural information processing systems (pp. 801–808).

  22. Li, J., Ming, Z., Qiu, M., Quan, G., Qin, X., & Chen, T. (2011). Resource allocation robustness in multicore embedded systems with inaccurate information. Journal of Systems Architecture, 57(9), 840–849.

    Article  Google Scholar 

  23. Li, J., Qiu, M., Ming, Z., Quan, G., Qin, X., & Gu, Z. (2012). Online optimization for scheduling preemptable tasks on IaaS cloud systems. Journal of Parallel and Distributed Computing, 72(5), 666–677.

    Article  Google Scholar 

  24. Li, J., Qiu, M., Niu, J., Yang, L., Zhu, Y., & Ming, Z. (2013). Thermal-aware task scheduling in 3D chip multiprocessor with real-time constrained workloads. ACM Transactions on Embedded Computing Systems (TECS), 12(2), 24.

    Article  Google Scholar 

  25. Li, Q., & Huang, Y. (IEEE). Robust speaker identification using an auditory-based feature. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP), IEEE (pp. 4514–4517).

  26. Li, Y., Dai, W., Ming, Z., & Qiu, M. (2015). Privacy protection for preventing data over-collection in smart city. IEEE Transactions on Computers, PP, 1.

    Google Scholar 

  27. Liang, H., & Gai, K. (2015). Internet-based anti-counterfeiting pattern with using big data in China. In The IEEE international symposium on big data security on cloud (pp. 1387–1392). New York: IEEE.

  28. Ma, J., Hu, Y., & Loizou, P. (2009). Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. The Journal of the Acoustical Society of America, 125(5), 3387–3405.

    Article  Google Scholar 

  29. Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning (pp. 689–696): ACM.

  30. Mallat, S. (2008). A wavelet tour of signal processing: the sparse way, (p. 805): Academic.

  31. Qiu, M., & Sha, E. (2009). Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems. ACM Transactions on Design Automation of Electronic System, 14(2), 25.

    Article  Google Scholar 

  32. Qiu, M., Sha, E., Liu, M., Lin, M., Hua, S., & Yang, L. (2008). Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP. Journal of Parallel and Distributed Computing, 68 (4), 443–455.

    Article  MATH  Google Scholar 

  33. Qiu, M., Yang, L., Shao, Z., & Sha, E. (2010). Dynamic and leakage energy minimization with soft realtime loop scheduling and voltage assignment. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(3), 501–504.

    Article  Google Scholar 

  34. Qiu, M., Ming, Z., Li, J., & et al. (2012). Three-phase time-aware energy minimization with DVFS and unrolling for chip multiprocessors. Journal of System Architecture, 58(10), 439–445.

    Article  Google Scholar 

  35. Qiu, M., Chen, Z., & Liu, M. (2014). Low-power low-latency data allocation for hybrid scratch-pad memory. IEEE Embedded Systems Letters, 6, 69–72.

    Article  Google Scholar 

  36. Qiu, M., Chen, Z., Niu, J., Quan, G., Qin, X., & Yang, L. (2015). Data allocation for hybrid memory with genetic algorithm. IEEE Transactions on Emerging Topics in Computing, 1–11.

  37. Qiu, M., Zhong, M., Li, J., Gai, K., & Zong, Z. (2015). Phase-change memory optimization for green cloud with genetic algorithm. IEEE Transactions on Computers, 64(12), 3528–3540.

    Article  MathSciNet  Google Scholar 

  38. Qiu, M., Gai, K., Thuraisingham, B., Tao, L., & Zhao, H. (2016). Proactive user-centric secure data scheme using attribute-based semantic access controls for mobile clouds in financial industry. Future Generation Computer Systems, PP, 1.

    Google Scholar 

  39. Smit, W., & Barnard, E. (2009). Continuous speech recognition with sparse coding. Computer Speech & Language, 23(2), 200–219.

    Article  Google Scholar 

  40. Smith, E., & Lewicki, M. (2006). Efficient auditory coding. Nature, 439(7079), 978–982.

    Article  Google Scholar 

  41. Stevens, S. (1957). On the psychophysical law. Psychological review, 64(3), 153.

    Article  Google Scholar 

  42. Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3), 273–282.

    Article  MathSciNet  Google Scholar 

  43. Wu, G., Zhang, H., Qiu, M., Ming, Z., Li, J., & Qin, X. (2013). A decentralized approach for mining event correlations in distributed system monitoring. Journal of parallel and Distributed Computing, 73(3), 330–340.

    Article  MATH  Google Scholar 

  44. Xiang, B., Chaudhari, U., Navrátil, J., Ramaswamy, G., Gopinath, R., & et al. (2002). Short-time Gaussianization for robust speaker verification. In 2002 IEEE international conference on acoustics, speech, and signal processing (ICASSP), (Vol. 1 p. 681): IEEE.

  45. Yin, H., & Gai, K. (2015). An empirical study on preprocessing high-dimensional class-imbalanced data for classification. In The IEEE international symposium on big data security on cloud (pp. 1314–1319). New York: IEEE.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Baojun Qiao.

Additional information

This work was supported by the National Natural Science Foundation of China under Grant No. 61272544 and No. 61170243

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

You, D., Qiao, B. & Li, J. The Optimized Dictionary based Robust Speaker Recognition. J Sign Process Syst 86, 289–297 (2017). https://doi.org/10.1007/s11265-016-1121-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-016-1121-x

Keywords

Navigation