We study the influence of using class-specific dictionaries for enhancement over class-independent dictionary in phoneme recognition of noisy speech. We hypothesize that, using class-specific dictionaries would remove the noise more compared to a class-independent dictionary, thereby resulting in better phoneme recognition. Experiments are performed with speech data from TIMIT corpus and noise samples from NOISEX-92 database. Using KSVD, four types of dictionaries have been learned: class-independent, manner-of-articulation-class, place-of-articulation-class and 39 phoneme-class. Initially, a set of labels are obtained by recognizing the speech, enhanced using a class-independent dictionary. Using these approximate labels, the corresponding class-specific dictionaries are used to enhance each frame of the original noisy speech, and this enhanced speech is then recognized. Compared to the results obtained using the class-independent dictionary, the 39 phoneme-class based dictionaries provide a relative phoneme recognition accuracy improvement of 5.5%, 3.7%, 2.4% and 2.2%, respectively for factory2, m109, leopard and babble noises, when averaged over 0, 5 and 10 dB SNRs.
Cite as: P.M., N., Ramakrishnan, A.G., Ghosh, P.K. (2016) A Class-Specific Speech Enhancement for Phoneme Recognition: A Dictionary Learning Approach. Proc. Interspeech 2016, 3728-3732, doi: 10.21437/Interspeech.2016-236
@inproceedings{pm16_interspeech, author={Nazreen P.M. and A.G. Ramakrishnan and Prasanta Kumar Ghosh}, title={{A Class-Specific Speech Enhancement for Phoneme Recognition: A Dictionary Learning Approach}}, year=2016, booktitle={Proc. Interspeech 2016}, pages={3728--3732}, doi={10.21437/Interspeech.2016-236} }