Abstract:
Data sparseness has been an inherited issue of statistical language models and smoothing method shave been used to resolve the issue of zero count. 20 Chinese language mo...Show MoreMetadata
Abstract:
Data sparseness has been an inherited issue of statistical language models and smoothing method shave been used to resolve the issue of zero count. 20 Chinese language models from 1M to 20M Chinese words of CGW have been generated on small sizes corpus because of worse situation of zero count issue. Five smoothing methods, such as Good Turing and Advanced Good Turing smoothing, including our 2 proposed methods, are evaluated and analyzed on inside testing and outside testing. It is shown that to alleviate the issue of data sparseness on various sizes of language models. The best one among these methods is our proposed YH-B which performs best in all the various models.
Date of Conference: 13-16 July 2014
Date Added to IEEE Xplore: 15 January 2015
ISBN Information: