ISCA Archive Interspeech 2016
ISCA Archive Interspeech 2016

Noise-Robust Hidden Markov Models for Limited Training Data for Within-Species Bird Phrase Classification

Kantapon Kaewtip, Charles Taylor, Abeer Alwan

Hidden Markov Models (HMMs) have been studied and used extensively in speech and birdsong recognition, but they are not robust to limited training data and noise. This paper presents two novel approaches to training continuous and discrete HMMs with extremely limited data. First, the algorithm learns the global Gaussian Mixture Models (GMMs) for all training phrases available. GMM parameters are then used to initialize state parameters of each individual model. For the GMM-HMM framework, the number of states and the mixture components for each state are determined by the acoustic variation of each phrase type. The (high-energy) time-frequency prominent regions are used to compute the state emitting probability to increase noise-robustness. For the discrete HMM framework, the probability distribution of each state is initialized by the global GMMs in training. In testing, the probability of each codebook is estimated using the prominent regions of each state to increase noise-robustness. In Cassins Vireo phrase classification using 75 phrase types, the new GMM-HMM approach achieves 79.5% and 87% classification accuracy using 1 and 2 phrases, respectively, while HTK’s GMM-HMM framework makes guess predictions resulting in 1.33% accuracy. The performance of the other algorithm is presented in the paper.


doi: 10.21437/Interspeech.2016-1360

Cite as: Kaewtip, K., Taylor, C., Alwan, A. (2016) Noise-Robust Hidden Markov Models for Limited Training Data for Within-Species Bird Phrase Classification. Proc. Interspeech 2016, 2587-2591, doi: 10.21437/Interspeech.2016-1360

@inproceedings{kaewtip16_interspeech,
  author={Kantapon Kaewtip and Charles Taylor and Abeer Alwan},
  title={{Noise-Robust Hidden Markov Models for Limited Training Data for Within-Species Bird Phrase Classification}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={2587--2591},
  doi={10.21437/Interspeech.2016-1360}
}