Statistical voice activity detection based on sparse representation over learned dictionary
Section snippets
Shi-Wen Deng received the B.E. degree in the Institute of Technology from Jia Mu Si University, JiaMuSi, China, in 1997, the M.E. in the School of Computer Science from Harbin Normal University, Harbin, China, in 2005, and the Ph.D degree in the School of Computer Science from Harbin Institute of Technology in 2012. Currently, he is with the School of Mathematical Sciences, Harbin Normal University, Harbin, China. His research interests are in the area of speech and audio signal processing,
References (26)
- et al.
Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator
IEEE Trans. Acoust. Speech Signal Process.
(December 1984) - et al.
A statistical model-based voice activity detection
IEEE Signal Process. Lett.
(January 1999) - et al.
Voice activity detection based on complex Laplacian model
Electron. Lett.
(April 2003) - et al.
Voice activity detection based on multiple statistical models
IEEE Trans. Signal Process.
(June 2006) - et al.
Statistical voice activity detection using a multiple observation likelihood ratio test
IEEE Signal Process. Lett.
(2005) - et al.
Jointly Gaussian PDF-based likelihood ratio test for voice activity detection
IEEE Trans. Speech Audio Process.
(2008) - et al.
Generalized LRT-based voice activity detector
IEEE Signal Process. Lett.
(2006) - et al.
Sparse coding with an overcomplete basis set: A strategy employed by V1?
Vis. Res.
(1997) - K. Huang, S. Aviyente, Sparse representation for signal classification, in: Neural Information Processing Systems...
- et al.
Supervised dictionary learning
Efficient sparse coding algorithms
Image denoising via sparse and redundant representations over learned dictionaries
IEEE Trans. Image Process.
Robust face recognition via sparse representation
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (15)
Audiovisual speaker indexing for Web-TV automations
2021, Expert Systems with ApplicationsCitation Excerpt :Varela, San-Segundo, and Hernández (2011) propose the utilization of pulse-based measurements in a Decision Tree which makes use of the predictions of a baseline VAD, an HMM segmentation module, and a pulse detection module. In the article of (Deng, & Han, 2013), a decision-based statistical approach is evaluated, by creating a sparse representation of audio signals over a learned dictionary, which is proved to outperform Gaussian, Laplacian, and Gamma representations. Mak and Yu, (2014) investigate the effect of pre-processing and speech enhancement in the improvement of the robustness of a statistic-based VAD, directed to NIST Speaker Recognition Evaluation tasks.
Optimization of learned dictionary for sparse coding in speech processing
2016, NeurocomputingCitation Excerpt :Up to now, there is no report about CD-DNN-HMM speech recognition combined with sparse coding. Moreover, sparse coding has been used in voice activity detection [13,14], which can be treated as outlier signal detection [15,16] because it is a binary classification. Typical work for outlier detection can be found in [16], where an integrated incremental self-organizing map and hierarchical neural network approach is proposed.
A speech enhancement method based on sparse reconstruction of power spectral density
2014, Computers and Electrical EngineeringCitation Excerpt :Other applications have been found in cognitive radios [7], direction-of-arrival estimation [8] and so forth. The applications of sparse representation in audio signal processing mainly focus on voice activity detection (VAD) [9], pitch estimation [10], speaker identification [11] and speech recognition [5]. Currently, speech enhancement methods based on sparse representation have been discussed in some references.
A Cross Dataset Approach for Noisy Speech Identification
2023, Lecture Notes in Electrical EngineeringAUC optimization for deep learning-based voice activity detection
2022, Eurasip Journal on Audio, Speech, and Music ProcessingReal time implementation of voice activity detection based on false acceptance regulation
2020, International Journal on Electrical Engineering and Informatics
Shi-Wen Deng received the B.E. degree in the Institute of Technology from Jia Mu Si University, JiaMuSi, China, in 1997, the M.E. in the School of Computer Science from Harbin Normal University, Harbin, China, in 2005, and the Ph.D degree in the School of Computer Science from Harbin Institute of Technology in 2012. Currently, he is with the School of Mathematical Sciences, Harbin Normal University, Harbin, China. His research interests are in the area of speech and audio signal processing, including content-based audio analysis, noise suppression, speech/audio classification/detection.
Ji-Qing Han received the B.S., M.S. in electrical engineering, and Ph.D. degrees in computer science from the Harbin Institute of Technology, Harbin, China, in 1987, 1990, and 1998, respectively. Currently, he is the associate dean of the School of Computer Science and Technology, Harbin Institute of Technology. He is a member of IEEE, member of the editorial board of Journal of Chinese Information Processing, and member of the editorial board of the Journal of Data Acquisition & Processing. Prof. Han is undertaking several projects from the National Natural Science Foundation, 863Hi-tech Program, National Basic Research Program. He has won three Second Prize and two Third Prize awards of Science and Technology of Ministry/Province. He has published more than 100 papers and 2 books. His research fields include speech signal processing and audio information processing.