Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load

Wu, Bin; Sakti, Sakriani; Zhang, Jinsong; Nakamura, Satoshi

doi:10.21437/SLTU.2018-1

Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load

Bin Wu, Sakriani Sakti, Jinsong Zhang, Satoshi Nakamura

Inspired by infant language acquisition, unsupervised subword discovery of zero-resource languages has gained attention recently. The Dirichlet Process Gaussian Mixture Model (DPGMM) achieves top results evaluated by the ABX discrimination test. However, the DPGMM model is too sensitive to acoustic variation and often produces too many types of subword units and a relatively high-dimensional posteriorgram, which implies high computational cost to perform learning and inference, as well as more tendencies to be overfitting. This paper proposes applying functional load to reduce the number of sub-word units from DPGMM. We greedily merge pairs of units with the lowest functional load, causing the least information loss of the language. Results on the Xitsonga corpus with the official setting of Zerospeech 2015 show that we can reduce the number of sub-word units by more than two thirds without hurting the ABX error rate. The number of units is close to that of phonemes in human language.

doi: 10.21437/SLTU.2018-1

Cite as: Wu, B., Sakti, S., Zhang, J., Nakamura, S. (2018) Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load. Proc. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018), 1-5, doi: 10.21437/SLTU.2018-1

@inproceedings{wu18_sltu,
  author={Bin Wu and Sakriani Sakti and Jinsong Zhang and Satoshi Nakamura},
  title={{Optimizing DPGMM Clustering in Zero Resource Setting Based on Functional Load}},
  year=2018,
  booktitle={Proc. 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018)},
  pages={1--5},
  doi={10.21437/SLTU.2018-1}
}