Abstract
The mismatch between the training and the testing environments greatly degrades the performance of speaker recognition. Although many robust techniques have been proposed, the mismatch problem is still a challenge for speaker recognition system. To solve this problem, we propose an optimized dictionary based sparse representation for robust speaker recognition. To this end, we first train a speech dictionary and a noise dictionary, and concatenate them for sparse representation; then design an optimization algorithm to reduce the mutual coherence between the two learned dictionaries; after that, utilize mixture k-means to model speaker corresponding to sparse feature; and finally, present a distance divergence to measure the similarity. Compared with the Mel-frequency cepstral coefficients based speaker recognition, our preliminary experiments show that the proposed recognition framework consistently improve the robustness in the mismatched condition.
Similar content being viewed by others
References
Ailon, N., Jaiswal, R., & Monteleoni, C. (2009). Streaming k-means approximation. In Advances in neural information processing systems (pp. 10–18).
Bonastre, J., Wils, F., & Meignier, S. (2005). ALIZE, a free toolkit for speaker recognition. ICASSP, 1, 737–740.
Brown, G., Ferr, R., & Meddis, R. (2010). A computer model of auditory efferent suppression: implications for the recognition of speech in noise. The Journal of the Acoustical Society of America, 127(2), 943–954.
Chen, S., Donoho, D., & Saunders, M. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.
Chibelushi, C., Deravi, F., & Mason, J. (2002). A review of speech-based bimodal recognition. IEEE Transactions on Multimedia, 4(1), 23–37.
Christian, D., Tomas, D., & Joachim, M. (2010). Speech enhancement with sparse coding learned dictionaries. IEEE International Conference on Acoustics, Speech, and Signal Processing, 20, 4758–4761.
Fine, S., Navratil, J., Gopinath, R., & et al. (2001). A hybrid GMM/SVM approach to speaker identification. In Acoustics, speech, and signal processing, 2001. 2001 IEEE International Conference on Proceedings. (ICASSP’01), (Vol. 1 pp. 417–420): IEEE.
Gai, K., & Li, S. (2012). Towards cloud computing: a literature review on cloud computing and its development trends. In The 4th IEEE international conference on multimedia information networking and security (pp. 142–146). Nanjing.
Gai, K., Qiu, M., Zhao, H., Tao, L., & Zong, Z. (2015). Dynamic energy-aware cloudlet-based mobile cloud computing model for green computing. Journal of Network and Computer Applications, 59, 46–54.
Gai, K., Du, Z., Qiu, M., & Zhao, H. (2015). Efficiency-aware workload optimizations of heterogenous cloud computing for capacity planning in financial industry. In The 2nd IEEE international conference on cyber security and cloud computing (pp. 1–6). New York: IEEE.
Gai, K., Qiu, M., Chen, L., & Liu, M. (2015). Electronic health record error prevention approach using ontology in big data. In The 17th IEEE international conference on high performance computing and communications (pp. 752–757). New York: IEEE.
Gai, K., Qiu, M., Tao, L., & Zhu, Y. (2015). Intrusion detection techniques for mobile cloud computing in heterogeneous 5G. Security and Communication Networks, 1–10.
Gai, K., Qiu, M., Jayaraman, S., & Tao, L. (2015). Ontology-based knowledge representation for secure self-diagnosis in patient-centered Telehealth with cloud systems. In The 2nd IEEE international conference on cyber security and cloud computing (pp. 98–103). New York: IEEE.
Gai, K., Qiu, M., Thuraisingham, B., & Tao, L. (2015). Proactive attribute-based secure data schema for mobile cloud in financial industry. In The IEEE international symposium on big data security on cloud (pp. 1332–1337). New York: IEEE.
Gemmeke, J., Hurmalainen, A., Virtanen, T., & Sun, Y. (2011). Toward a practical implementation of exemplar-based noise robust ASR. In Signal processing conference, 2011 19th European, IEEE (pp. 1490–1494): IEEE.
Hromádka, T., DeWeese, M., Zador, A., & et al. (2008). Sparse representation of sounds in the unanesthetized auditory cortex. PLoS Biol, 6(1), 16.
Hu, Y., & Loizou, P. (2007). Subjective comparison and evaluation of speech enhancement algorithms. Speech Communication, 49(7), 588–601.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Kullback, S., & Leibler, R. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 79–86.
Lee, C., Glass, J., & Ghitza, O. (2011). An efferent-inspired auditory model front-end for speech recognition. In INTERSPEECH (pp. 49–52).
Lee, H., Battle, A., Raina, R., & Ng, A. (2006). Efficient sparse coding algorithms. In Advances in neural information processing systems (pp. 801–808).
Li, J., Ming, Z., Qiu, M., Quan, G., Qin, X., & Chen, T. (2011). Resource allocation robustness in multicore embedded systems with inaccurate information. Journal of Systems Architecture, 57(9), 840–849.
Li, J., Qiu, M., Ming, Z., Quan, G., Qin, X., & Gu, Z. (2012). Online optimization for scheduling preemptable tasks on IaaS cloud systems. Journal of Parallel and Distributed Computing, 72(5), 666–677.
Li, J., Qiu, M., Niu, J., Yang, L., Zhu, Y., & Ming, Z. (2013). Thermal-aware task scheduling in 3D chip multiprocessor with real-time constrained workloads. ACM Transactions on Embedded Computing Systems (TECS), 12(2), 24.
Li, Q., & Huang, Y. (IEEE). Robust speaker identification using an auditory-based feature. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP), IEEE (pp. 4514–4517).
Li, Y., Dai, W., Ming, Z., & Qiu, M. (2015). Privacy protection for preventing data over-collection in smart city. IEEE Transactions on Computers, PP, 1.
Liang, H., & Gai, K. (2015). Internet-based anti-counterfeiting pattern with using big data in China. In The IEEE international symposium on big data security on cloud (pp. 1387–1392). New York: IEEE.
Ma, J., Hu, Y., & Loizou, P. (2009). Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions. The Journal of the Acoustical Society of America, 125(5), 3387–3405.
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning (pp. 689–696): ACM.
Mallat, S. (2008). A wavelet tour of signal processing: the sparse way, (p. 805): Academic.
Qiu, M., & Sha, E. (2009). Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems. ACM Transactions on Design Automation of Electronic System, 14(2), 25.
Qiu, M., Sha, E., Liu, M., Lin, M., Hua, S., & Yang, L. (2008). Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP. Journal of Parallel and Distributed Computing, 68 (4), 443–455.
Qiu, M., Yang, L., Shao, Z., & Sha, E. (2010). Dynamic and leakage energy minimization with soft realtime loop scheduling and voltage assignment. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(3), 501–504.
Qiu, M., Ming, Z., Li, J., & et al. (2012). Three-phase time-aware energy minimization with DVFS and unrolling for chip multiprocessors. Journal of System Architecture, 58(10), 439–445.
Qiu, M., Chen, Z., & Liu, M. (2014). Low-power low-latency data allocation for hybrid scratch-pad memory. IEEE Embedded Systems Letters, 6, 69–72.
Qiu, M., Chen, Z., Niu, J., Quan, G., Qin, X., & Yang, L. (2015). Data allocation for hybrid memory with genetic algorithm. IEEE Transactions on Emerging Topics in Computing, 1–11.
Qiu, M., Zhong, M., Li, J., Gai, K., & Zong, Z. (2015). Phase-change memory optimization for green cloud with genetic algorithm. IEEE Transactions on Computers, 64(12), 3528–3540.
Qiu, M., Gai, K., Thuraisingham, B., Tao, L., & Zhao, H. (2016). Proactive user-centric secure data scheme using attribute-based semantic access controls for mobile clouds in financial industry. Future Generation Computer Systems, PP, 1.
Smit, W., & Barnard, E. (2009). Continuous speech recognition with sparse coding. Computer Speech & Language, 23(2), 200–219.
Smith, E., & Lewicki, M. (2006). Efficient auditory coding. Nature, 439(7079), 978–982.
Stevens, S. (1957). On the psychophysical law. Psychological review, 64(3), 153.
Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: a retrospective. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3), 273–282.
Wu, G., Zhang, H., Qiu, M., Ming, Z., Li, J., & Qin, X. (2013). A decentralized approach for mining event correlations in distributed system monitoring. Journal of parallel and Distributed Computing, 73(3), 330–340.
Xiang, B., Chaudhari, U., Navrátil, J., Ramaswamy, G., Gopinath, R., & et al. (2002). Short-time Gaussianization for robust speaker verification. In 2002 IEEE international conference on acoustics, speech, and signal processing (ICASSP), (Vol. 1 p. 681): IEEE.
Yin, H., & Gai, K. (2015). An empirical study on preprocessing high-dimensional class-imbalanced data for classification. In The IEEE international symposium on big data security on cloud (pp. 1314–1319). New York: IEEE.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the National Natural Science Foundation of China under Grant No. 61272544 and No. 61170243
Rights and permissions
About this article
Cite this article
You, D., Qiao, B. & Li, J. The Optimized Dictionary based Robust Speaker Recognition. J Sign Process Syst 86, 289–297 (2017). https://doi.org/10.1007/s11265-016-1121-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-016-1121-x