Abstract
The handling of ties between equiprobable derivations during Viterbi training is often glossed over in research paper, whether they are broken randomly when they occur, or on an ad-hoc basis decided by the algorithm or implementation, or whether all equiprobable derivations are enumerated with the counts uniformly distributed among them, is left to the readers imagination. The first hurts rarely occurring rules, which run the risk of being randomly eliminated, the second suffers from algorithmic biases, and the last is correct but potentially very inefficient. We show that it is possible to Viterbi train correctly without enumerating all equiprobable best derivations. The method is analogous to expectation maximization, given that the automatic differentiation view is chosen over the reverse value/outside probability view, as the latter calculates the wrong quantity for reestimation under the Viterbi semiring. To get the automatic differentiation to work we devise an unbiased subderivative for the \(\mathrm {max}\) function.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baccelli, F., Cohen, G., Older, G.J., Quadrat, J.P.: Synchronization and Linearity: An Algebra For Discrete Event Systems. Wiley Series in Probability and Mathematical Statistics. Wiley, Chichester (1992)
Corliss, G., Faure, C., Griewank, A., Hascoët, L., Naumann, U. (eds.): Automatic Differntiation of Algorithms: From Simulation to Optimization. Springer, New York (2002). https://doi.org/10.1007/978-1-4613-0075-5
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–38 (1977)
Eisner, J.: Inside-outside and forward-backward algorithms are just backprop. In: Proceedings of the EMNLP Workshop on Structured Prediction for NLP, Austin, Texas, November 2016
Eisner, J., Goldlust, E., Smith, N.A.: Compiling comp Ling: weighted dynamic programming and the Dyna language. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP), Vancouver, Canada, pp. 281–290, October 2005
Goodman, J.: Semiring parsing. Comput. Linguist. 25(4), 573–605 (1999)
Juang, B.H., Rabiner, L.R.: The segmental K-means algorithm for estimating parameters of hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 38, 1639–1641 (1990)
Li, Z., Eisner, J.: First- and second-order expectation semirings with applications to minimum-risk training on translation forests. In: 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), Singapore, pp. 40–51, August 2009
Pereira, F.C.N., Warren, D.H.D.: Parsing as deduction. In: Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics (ACL 1983), Cambridge, Massachusetts, pp. 137–144, June 1983
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Saers, M., Wu, D.: Reestimation of reified rules in semiring parsing and biparsing. In: Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-5), Portland, Oregon, pp. 70–78, June 2011
Shieber, S.M., Schabes, Y., Pereira, F.C.: Principles and implementation of deductive parsing. J. Logic Program. 24(1–2), 3–36 (1995)
Simon, I.: Recognizable sets with multiplicities in the tropical semiring. In: Chytil, M.P., Koubek, V., Janiga, L. (eds.) MFCS 1988. LNCS, vol. 324, pp. 107–120. Springer, Heidelberg (1988). https://doi.org/10.1007/BFb0017135
Smith, N.A.: Linguistic structure prediction. Synth. Lect. Hum. Lang. Technol. 4(2), 1–274 (2011)
Acknowledgements
This material is based upon work supported in part by the Defense Advanced Research Projects Agency (DARPA) under LORELEI contract HR0011-15-C-0114, BOLT contracts HR0011-12-C-0014 and HR0011-12-C-0016, and GALE contracts HR0011-06-C-0022 and HR0011-06-C-0023; by the European Union under the Horizon 2020 grant agreement 645452 (QT21) and FP7 grant agreement 287658; and by the Hong Kong Research Grants Council (RGC) research grants GRF16210714, GRF16214315, GRF620811 and GRF621008. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of DARPA, the EU, or RGC. The authors would also like to thank the anonymous reviewers for valuable feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Saers, M., Wu, D. (2018). Handling Ties Correctly and Efficiently in Viterbi Training Using the Viterbi Semiring. In: Klein, S., MartÃn-Vide, C., Shapira, D. (eds) Language and Automata Theory and Applications. LATA 2018. Lecture Notes in Computer Science(), vol 10792. Springer, Cham. https://doi.org/10.1007/978-3-319-77313-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-77313-1_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77312-4
Online ISBN: 978-3-319-77313-1
eBook Packages: Computer ScienceComputer Science (R0)