Skip to main content

Policy Gradient for Arabic to English Neural Machine Translation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 489))

Abstract

In this work, we present a strategy of using the policy gradient training method from deep reinforcement learning to train a Seq2Seq model. This strategy is based on the combination of the classical cross-entropy and the policy gradient in the training phase. To evaluate the effectiveness of this strategy, we compare two Seq2Seq models trained with two different training methods to translate from Arabic to English. The first method is the cross-entropy, and the second one is a combination of cross-entropy and policy gradient with different amounts. Experimental results show that the second training method leads to improve the performance of the sequence-to-sequence model by 0.71 BLEU points.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://conferences.unite.un.org/UNCorpus.

References

  1. Almahairi, A., Cho, K., Habash, N., Courville, A.: First result on arabic neural machine translation (2016). ArXiv https://arxiv.org/pdf/1606.02680

  2. Alrajeh, A.: A recipe for Arabic-English neural machine translation (2018). ArXiv https://arxiv.org/pdf/1808.06116

  3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2015). CoRR https://arxiv.org/pdf/1409.0473

  4. Cho, K., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation (2014). arXiv preprint http://arxiv.org/abs/1406.1078

  5. Durrani, N., Dalvi, F., Sajjad, H., Vogel, S.: Qcri machine translation systems for iwslt 16 (2017). ArXiv https://arxiv.org/pdf/1701.03924

  6. Hassan, H., et al.: Achieving human parity on automatic chinese to english news translation (2018). arXiv preprint http://arxiv.org/abs/1803.05567

  7. He, D., Lu, H., Xia, Y., Qin, T., Wang, L., Liu, T.Y.: Decoding with value networks for neural machine translation. In: Advances in Neural Information Processing Systems, pp. 178–187 (2017)

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  9. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint http://arxiv.org/abs/1412.6980

  10. Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 127–133 (2003). https://www.aclweb.org/anthology/N03-1017

  11. Kreutzer, J., Uyheng, J., Riezler, S.: Reliability and learnability of human bandit feedback for sequence-to-sequence reinforcement learning (2018). arXiv preprint http://arxiv.org/abs/1805.10627

  12. Li, J., Monroe, W., Jurafsky, D.: Learning to decode for future success (2017). arXiv preprint http://arxiv.org/abs/1701.06549

  13. Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)

    Google Scholar 

  14. Oudah, M., Almahairi, A., Habash, N.: The impact of preprocessing on Arabic-English statistical and neural machine translation (2019). arXiv preprint http://arxiv.org/abs/1906.11751

  15. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, p. 311–318. ACL 2002, Association for Computational Linguistics, USA (2002). https://doi.org/10.3115/1073083.1073135

  16. Ranzato, M., Chopra, S., Auli, M., Zaremba, W.: Sequence level training with recurrent neural networks (2015). arXiv preprint http://arxiv.org/abs/1511.06732

  17. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  18. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3104–3112. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf

  19. Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT press, Cambridge (1998)

    Google Scholar 

  20. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)

    Google Scholar 

  21. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992). https://doi.org/10.1007/BF00992696

    Article  MATH  Google Scholar 

  22. Wu, L., Tian, F., Qin, T., Lai, J., Liu, T.Y.: A study of reinforcement learning for neural machine translation (2018). arXiv preprint http://arxiv.org/abs/1808.08866

  23. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). ArXiv https://arxiv.org/abs/1609.08144

  24. Zakraoui, J., Saleh, M., Al-Maadeed, S., AlJa’am, J.M.: Evaluation of Arabic to English machine translation systems. In: 2020 11th International Conference on Information and Communication Systems (ICICS), pp. 185–190. IEEE (2020)

    Google Scholar 

  25. Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B.: The united nations parallel corpus v1. 0. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 3530–3534 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Zouidine .

Editor information

Editors and Affiliations

A Proof of Equation 11

A Proof of Equation 11

Here we demonstrate the Eq. (11).

We have the Eq. (10):

$$ L_{PG} = - E_{\tau \sim \pi _{\theta }}\left[ R(\tau ) \right] $$
$$ \nabla _{\theta } L_{PG} = -\nabla _{\theta } E_{\tau \sim \pi _{\theta }}\left[ R(\tau ) \right] $$

Given a function f(x), \(p(x|\theta )\) a parametrized probability distribution, and its expectation \(E_{x \sim p(x|\theta )}\left[ f(x) \right] \):

$$\begin{aligned} \nabla _{\theta }E_{x \sim p(x|\theta )}\left[ f(x) \right]= & {} \nabla _{\theta } \int dx\; f(x) p(x|\theta ))\\= & {} \int dx\; \nabla _{\theta } \left( f(x)p(x|\theta ) \right) \\= & {} \int dx\; \left( \nabla _{\theta } f(x)p(x|\theta ) + f(x) \nabla _{\theta } p(x|\theta ) \right) \\= & {} \int dx\; f(x)\nabla _{\theta }p(x|\theta ) \\= & {} \int dx\; f(x)p(x|\theta ) \frac{\nabla _{\theta }p(x|\theta )}{p(x|\theta )} \\= & {} \int dx f(x)p(x|\theta )\nabla _{\theta }\log p(x|\theta ) \\= & {} E_{x \sim p(x|\theta )}\left[ f(x)\nabla _{\theta }\log p(x|\theta ) \right] \end{aligned}$$

Now, we suppose \(x = \tau \), \(f(x)=R(\tau )\), and \(p(x|\theta )=p(\tau |\theta )\):

$$\begin{aligned} \nabla _{\theta } L_{PG} = - E_{\tau \sim \pi _{\theta }}\left[ R(\tau )\nabla _{\theta }\log p(\tau |\theta ) \right]&(*) \end{aligned}$$

The trajectory \(\tau \) is a sequence of events \(a_t\) and \(s_{t+1}\), respectively sampled from the agent’s policy \(\pi _{\theta }(a_{t}|s_{t})\) and the probability of transition \(p(s_{t+1}|s_{t}, \; a_{t})\). The probability of the complete trajectory is the product of the individual probabilities:

$$p(\tau |\theta ) = \prod _{t=0}^{T}p(s_{t+1}|s_{t}, \; a_{t}) \pi _{\theta } (a_{t}|s_{t})$$
$$\log p(\tau |\theta ) = \log \prod _{t=0}^{T}p(s_{t+1}|s_{t}, \; a_{t}) \pi _{\theta } (a_{t}|s_{t})$$
$$\log p(\tau |\theta ) = \sum _{t=0}^{T} \left( \log p(s_{t+1}|s_{t}, \; a_{t})+\log \pi _{\theta } (a_{t}|s_{t}) \right) $$
$$\begin{aligned} \nabla _{\theta }\log p(\tau |\theta ) = \nabla _{\theta }\sum _{t=0}^{T} \log \pi _{\theta } (a_{t}|s_{t})&(**) \end{aligned}$$

By putting (**) in (*), we find the Eq. (11):

$$\nabla _{\theta }L_{PG} = - E_{\tau \sim \pi _{\theta }}\left[ R(\tau ) \sum _{t=0}^{T}\nabla _{\theta } \log \pi _{\theta }(a_{t} | s_{t}) \right] $$

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zouidine, M., Khalil, M., Farouk, A.I.E. (2022). Policy Gradient for Arabic to English Neural Machine Translation. In: Lazaar, M., Duvallet, C., Touhafi, A., Al Achhab, M. (eds) Proceedings of the 5th International Conference on Big Data and Internet of Things. BDIoT 2021. Lecture Notes in Networks and Systems, vol 489. Springer, Cham. https://doi.org/10.1007/978-3-031-07969-6_35

Download citation

Publish with us

Policies and ethics