A Class-Specific Speech Enhancement for Phoneme Recognition: A Dictionary Learning Approach

P.M., Nazreen; Ramakrishnan, A.G.; Ghosh, Prasanta Kumar

doi:10.21437/Interspeech.2016-236

A Class-Specific Speech Enhancement for Phoneme Recognition: A Dictionary Learning Approach

Nazreen P.M., A.G. Ramakrishnan, Prasanta Kumar Ghosh

We study the influence of using class-specific dictionaries for enhancement over class-independent dictionary in phoneme recognition of noisy speech. We hypothesize that, using class-specific dictionaries would remove the noise more compared to a class-independent dictionary, thereby resulting in better phoneme recognition. Experiments are performed with speech data from TIMIT corpus and noise samples from NOISEX-92 database. Using KSVD, four types of dictionaries have been learned: class-independent, manner-of-articulation-class, place-of-articulation-class and 39 phoneme-class. Initially, a set of labels are obtained by recognizing the speech, enhanced using a class-independent dictionary. Using these approximate labels, the corresponding class-specific dictionaries are used to enhance each frame of the original noisy speech, and this enhanced speech is then recognized. Compared to the results obtained using the class-independent dictionary, the 39 phoneme-class based dictionaries provide a relative phoneme recognition accuracy improvement of 5.5%, 3.7%, 2.4% and 2.2%, respectively for factory2, m109, leopard and babble noises, when averaged over 0, 5 and 10 dB SNRs.

doi: 10.21437/Interspeech.2016-236

Cite as: P.M., N., Ramakrishnan, A.G., Ghosh, P.K. (2016) A Class-Specific Speech Enhancement for Phoneme Recognition: A Dictionary Learning Approach. Proc. Interspeech 2016, 3728-3732, doi: 10.21437/Interspeech.2016-236

@inproceedings{pm16_interspeech,
  author={Nazreen P.M. and A.G. Ramakrishnan and Prasanta Kumar Ghosh},
  title={{A Class-Specific Speech Enhancement for Phoneme Recognition: A Dictionary Learning Approach}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={3728--3732},
  doi={10.21437/Interspeech.2016-236}
}