The Optimized Dictionary based Robust Speaker Recognition

You, Datao; Qiao, Baojun; Li, Jie

doi:10.1007/s11265-016-1121-x

The Optimized Dictionary based Robust Speaker Recognition

Published: 11 March 2016

Volume 86, pages 289–297, (2017)
Cite this article

Journal of Signal Processing Systems Aims and scope Submit manuscript

Datao You¹,
Baojun Qiao¹ &
Jie Li¹

397 Accesses
Explore all metrics

Abstract

The mismatch between the training and the testing environments greatly degrades the performance of speaker recognition. Although many robust techniques have been proposed, the mismatch problem is still a challenge for speaker recognition system. To solve this problem, we propose an optimized dictionary based sparse representation for robust speaker recognition. To this end, we first train a speech dictionary and a noise dictionary, and concatenate them for sparse representation; then design an optimization algorithm to reduce the mutual coherence between the two learned dictionaries; after that, utilize mixture k-means to model speaker corresponding to sparse feature; and finally, present a distance divergence to measure the similarity. Compared with the Mel-frequency cepstral coefficients based speaker recognition, our preliminary experiments show that the proposed recognition framework consistently improve the robustness in the mismatched condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Milestones in speaker recognition

Article Open access 15 February 2024

References

Ailon, N., Jaiswal, R., & Monteleoni, C. (2009). Streaming k-means approximation. In Advances in neural information processing systems (pp. 10–18).
Bonastre, J., Wils, F., & Meignier, S. (2005). ALIZE, a free toolkit for speaker recognition. ICASSP, 1, 737–740.
Google Scholar
Brown, G., Ferr, R., & Meddis, R. (2010). A computer model of auditory efferent suppression: implications for the recognition of speech in noise. The Journal of the Acoustical Society of America, 127(2), 943–954.
Article Google Scholar
Chen, S., Donoho, D., & Saunders, M. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.
Article MathSciNet MATH Google Scholar
Chibelushi, C., Deravi, F., & Mason, J. (2002). A review of speech-based bimodal recognition. IEEE Transactions on Multimedia, 4(1), 23–37.
Article Google Scholar
Christian, D., Tomas, D., & Joachim, M. (2010). Speech enhancement with sparse coding learned dictionaries. IEEE International Conference on Acoustics, Speech, and Signal Processing, 20, 4758–4761.
Google Scholar
Fine, S., Navratil, J., Gopinath, R., & et al. (2001). A hybrid GMM/SVM approach to speaker identification. In Acoustics, speech, and signal processing, 2001. 2001 IEEE International Conference on Proceedings. (ICASSP’01), (Vol. 1 pp. 417–420): IEEE.
Gai, K., & Li, S. (2012). Towards cloud computing: a literature review on cloud computing and its development trends. In The 4th IEEE international conference on multimedia information networking and security (pp. 142–146). Nanjing.
Gai, K., Qiu, M., Zhao, H., Tao, L., & Zong, Z. (2015). Dynamic energy-aware cloudlet-based mobile cloud computing model for green computing. Journal of Network and Computer Applications, 59, 46–54.
Article Google Scholar
Gai, K., Du, Z., Qiu, M., & Zhao, H. (2015). Efficiency-aware workload optimizations of heterogenous cloud computing for capacity planning in financial industry. In The 2nd IEEE international conference on cyber security and cloud computing (pp. 1–6). New York: IEEE.
Google Scholar
Gai, K., Qiu, M., Chen, L., & Liu, M. (2015). Electronic health record error prevention approach using ontology in big data. In The 17th IEEE international conference on high performance computing and communications (pp. 752–757). New York: IEEE.
Google Scholar
Gai, K., Qiu, M., Tao, L., & Zhu, Y. (2015). Intrusion detection techniques for mobile cloud computing in heterogeneous 5G. Security and Communication Networks, 1–10.
Gai, K., Qiu, M., Jayaraman, S., & Tao, L. (2015). Ontology-based knowledge representation for secure self-diagnosis in patient-centered Telehealth with cloud systems. In The 2nd IEEE international conference on cyber security and cloud computing (pp. 98–103). New York: IEEE.
Google Scholar
Gai, K., Qiu, M., Thuraisingham, B., & Tao, L. (2015). Proactive attribute-based secure data schema for mobile cloud in financial industry. In The IEEE international symposium on big data security on cloud (pp. 1332–1337). New York: IEEE.
Google Scholar
Gemmeke, J., Hurmalainen, A., Virtanen, T., & Sun, Y. (2011). Toward a practical implementation of exemplar-based noise robust ASR. In Signal processing conference, 2011 19th European, IEEE (pp. 1490–1494): IEEE.
Hromádka, T., DeWeese, M., Zador, A., & et al. (2008). Sparse representation of sounds in the unanesthetized auditory cortex. PLoS Biol, 6(1), 16.
Article Google Scholar
Hu, Y., & Loizou, P. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7), 588–601.
Article Google Scholar
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Article Google Scholar
Kullback, S., & Leibler, R. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 79–86.
Lee, C., Glass, J., & Ghitza, O. (2011). An efferent-inspired auditory model front-end for speech recognition. In INTERSPEECH (pp. 49–52).
Lee, H., Battle, A., Raina, R., & Ng, A. (2006). Efficient sparse coding algorithms. In Advances in neural information processing systems (pp. 801–808).
Li, J., Ming, Z., Qiu, M., Quan, G., Qin, X., & Chen, T. (2011). Resource allocation robustness in multicore embedded systems with inaccurate information. Journal of Systems Architecture, 57(9), 840–849.
Article Google Scholar
Li, J., Qiu, M., Ming, Z., Quan, G., Qin, X., & Gu, Z. (2012). Online optimization for scheduling preemptable tasks on IaaS cloud systems. Journal of Parallel and Distributed Computing, 72(5), 666–677.
Article Google Scholar
Li, J., Qiu, M., Niu, J., Yang, L., Zhu, Y., & Ming, Z. (2013). Thermal-aware task scheduling in 3D chip multiprocessor with real-time constrained workloads. ACM Transactions on Embedded Computing Systems (TECS), 12(2), 24.
Article Google Scholar
Li, Q., & Huang, Y. (IEEE). Robust speaker identification using an auditory-based feature. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP), IEEE (pp. 4514–4517).
Li, Y., Dai, W., Ming, Z., & Qiu, M. (2015). Privacy protection for preventing data over-collection in smart city. IEEE Transactions on Computers, PP, 1.
Google Scholar
Liang, H., & Gai, K. (2015). Internet-based anti-counterfeiting pattern with using big data in China. In The IEEE international symposium on big data security on cloud (pp. 1387–1392). New York: IEEE.
Ma, J., Hu, Y., & Loizou, P. (2009). Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. The Journal of the Acoustical Society of America, 125(5), 3387–3405.
Article Google Scholar
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning (pp. 689–696): ACM.
Mallat, S. (2008). A wavelet tour of signal processing: the sparse way, (p. 805): Academic.
Qiu, M., & Sha, E. (2009). Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems. ACM Transactions on Design Automation of Electronic System, 14(2), 25.
Article Google Scholar
Qiu, M., Sha, E., Liu, M., Lin, M., Hua, S., & Yang, L. (2008). Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP. Journal of Parallel and Distributed Computing, 68 (4), 443–455.
Article MATH Google Scholar
Qiu, M., Yang, L., Shao, Z., & Sha, E. (2010). Dynamic and leakage energy minimization with soft realtime loop scheduling and voltage assignment. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(3), 501–504.
Article Google Scholar
Qiu, M., Ming, Z., Li, J., & et al. (2012). Three-phase time-aware energy minimization with DVFS and unrolling for chip multiprocessors. Journal of System Architecture, 58(10), 439–445.
Article Google Scholar
Qiu, M., Chen, Z., & Liu, M. (2014). Low-power low-latency data allocation for hybrid scratch-pad memory. IEEE Embedded Systems Letters, 6, 69–72.
Article Google Scholar
Qiu, M., Chen, Z., Niu, J., Quan, G., Qin, X., & Yang, L. (2015). Data allocation for hybrid memory with genetic algorithm. IEEE Transactions on Emerging Topics in Computing, 1–11.
Qiu, M., Zhong, M., Li, J., Gai, K., & Zong, Z. (2015). Phase-change memory optimization for green cloud with genetic algorithm. IEEE Transactions on Computers, 64(12), 3528–3540.
Article MathSciNet Google Scholar
Qiu, M., Gai, K., Thuraisingham, B., Tao, L., & Zhao, H. (2016). Proactive user-centric secure data scheme using attribute-based semantic access controls for mobile clouds in financial industry. Future Generation Computer Systems, PP, 1.
Google Scholar
Smit, W., & Barnard, E. (2009). Continuous speech recognition with sparse coding. Computer Speech & Language, 23(2), 200–219.
Article Google Scholar
Smith, E., & Lewicki, M. (2006). Efficient auditory coding. Nature, 439(7079), 978–982.
Article Google Scholar
Stevens, S. (1957). On the psychophysical law. Psychological review, 64(3), 153.
Article Google Scholar
Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3), 273–282.
Article MathSciNet Google Scholar
Wu, G., Zhang, H., Qiu, M., Ming, Z., Li, J., & Qin, X. (2013). A decentralized approach for mining event correlations in distributed system monitoring. Journal of parallel and Distributed Computing, 73(3), 330–340.
Article MATH Google Scholar
Xiang, B., Chaudhari, U., Navrátil, J., Ramaswamy, G., Gopinath, R., & et al. (2002). Short-time Gaussianization for robust speaker verification. In 2002 IEEE international conference on acoustics, speech, and signal processing (ICASSP), (Vol. 1 p. 681): IEEE.
Yin, H., & Gai, K. (2015). An empirical study on preprocessing high-dimensional class-imbalanced data for classification. In The IEEE international symposium on big data security on cloud (pp. 1314–1319). New York: IEEE.
Google Scholar

Download references

Author information

Authors and Affiliations

Software School, Henan University, Kaifeng, Henan, 475000, China
Datao You, Baojun Qiao & Jie Li

Authors

Datao You
View author publications
You can also search for this author in PubMed Google Scholar
Baojun Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Jie Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Baojun Qiao.

Additional information

This work was supported by the National Natural Science Foundation of China under Grant No. 61272544 and No. 61170243

Rights and permissions

Reprints and permissions

About this article

Cite this article

You, D., Qiao, B. & Li, J. The Optimized Dictionary based Robust Speaker Recognition. J Sign Process Syst 86, 289–297 (2017). https://doi.org/10.1007/s11265-016-1121-x

Download citation

Received: 24 October 2015
Revised: 16 December 2015
Accepted: 21 February 2016
Published: 11 March 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11265-016-1121-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Optimized Dictionary based Robust Speaker Recognition

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Milestones in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The Optimized Dictionary based Robust Speaker Recognition

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Milestones in speaker recognition

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation