Abstract:
It is widely known that database quality has a huge impact on speech recognition system performance, most especially when the expected domain is well represented. In this...Show MoreMetadata
Abstract:
It is widely known that database quality has a huge impact on speech recognition system performance, most especially when the expected domain is well represented. In this paper, we use this idea as leverage for a data-driven solution to the problem of code-switching in Filipino. Practical Filipino conversations often contain English and other loan words in varying frequencies, demanding better training of parameters and models for its speech recognition system. We alleviate the underrepresentation of loan words through the development of a new speech database for training, and applied appropriate data analysis to make reliable evaluation results. The best system was searched via lattice rescoring from a cross-validation set containing almost three hours of unknown speech data. The description and results of our experiments serve as a new and competent baseline model for succeeding future developments.
Date of Conference: 01-05 June 2014
Date Added to IEEE Xplore: 26 July 2014
ISBN Information: