Policy Gradient for Arabic to English Neural Machine Translation

Zouidine, Mohamed; Khalil, Mohammed; Farouk, Abdelhamid Ibn El

doi:10.1007/978-3-031-07969-6_35

Policy Gradient for Arabic to English Neural Machine Translation

Mohamed Zouidine ORCID: orcid.org/0000-0002-0848-1027¹³,
Mohammed Khalil¹³ &
Abdelhamid Ibn El Farouk¹⁴

Conference paper
First Online: 03 July 2022

368 Accesses
2 Citations

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 489))

Abstract

In this work, we present a strategy of using the policy gradient training method from deep reinforcement learning to train a Seq2Seq model. This strategy is based on the combination of the classical cross-entropy and the policy gradient in the training phase. To evaluate the effectiveness of this strategy, we compare two Seq2Seq models trained with two different training methods to translate from Arabic to English. The first method is the cross-entropy, and the second one is a combination of cross-entropy and policy gradient with different amounts. Experimental results show that the second training method leads to improve the performance of the sequence-to-sequence model by 0.71 BLEU points.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://conferences.unite.un.org/UNCorpus.

References

Almahairi, A., Cho, K., Habash, N., Courville, A.: First result on arabic neural machine translation (2016). ArXiv https://arxiv.org/pdf/1606.02680
Alrajeh, A.: A recipe for Arabic-English neural machine translation (2018). ArXiv https://arxiv.org/pdf/1808.06116
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2015). CoRR https://arxiv.org/pdf/1409.0473
Cho, K., et al.: Learning phrase representations using rnn encoder-decoder for statistical machine translation (2014). arXiv preprint http://arxiv.org/abs/1406.1078
Durrani, N., Dalvi, F., Sajjad, H., Vogel, S.: Qcri machine translation systems for iwslt 16 (2017). ArXiv https://arxiv.org/pdf/1701.03924
Hassan, H., et al.: Achieving human parity on automatic chinese to english news translation (2018). arXiv preprint http://arxiv.org/abs/1803.05567
He, D., Lu, H., Xia, Y., Qin, T., Wang, L., Liu, T.Y.: Decoding with value networks for neural machine translation. In: Advances in Neural Information Processing Systems, pp. 178–187 (2017)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint http://arxiv.org/abs/1412.6980
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, pp. 127–133 (2003). https://www.aclweb.org/anthology/N03-1017
Kreutzer, J., Uyheng, J., Riezler, S.: Reliability and learnability of human bandit feedback for sequence-to-sequence reinforcement learning (2018). arXiv preprint http://arxiv.org/abs/1805.10627
Li, J., Monroe, W., Jurafsky, D.: Learning to decode for future success (2017). arXiv preprint http://arxiv.org/abs/1701.06549
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)
Google Scholar
Oudah, M., Almahairi, A., Habash, N.: The impact of preprocessing on Arabic-English statistical and neural machine translation (2019). arXiv preprint http://arxiv.org/abs/1906.11751
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, p. 311–318. ACL 2002, Association for Computational Linguistics, USA (2002). https://doi.org/10.3115/1073083.1073135
Ranzato, M., Chopra, S., Auli, M., Zaremba, W.: Sequence level training with recurrent neural networks (2015). arXiv preprint http://arxiv.org/abs/1511.06732
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3104–3112. Curran Associates, Inc. (2014). http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT press, Cambridge (1998)
Google Scholar
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992). https://doi.org/10.1007/BF00992696
Article MATH Google Scholar
Wu, L., Tian, F., Qin, T., Lai, J., Liu, T.Y.: A study of reinforcement learning for neural machine translation (2018). arXiv preprint http://arxiv.org/abs/1808.08866
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). ArXiv https://arxiv.org/abs/1609.08144
Zakraoui, J., Saleh, M., Al-Maadeed, S., AlJa’am, J.M.: Evaluation of Arabic to English machine translation systems. In: 2020 11th International Conference on Information and Communication Systems (ICICS), pp. 185–190. IEEE (2020)
Google Scholar
Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B.: The united nations parallel corpus v1. 0. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 3530–3534 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

LMCSA, FSTM, Hassan II University of Casablanca, Casablanca, Morocco
Mohamed Zouidine & Mohammed Khalil
LLEC, FLSH, Hassan II University of Casablanca, Casablanca, Morocco
Abdelhamid Ibn El Farouk

Authors

Mohamed Zouidine
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Khalil
View author publications
You can also search for this author in PubMed Google Scholar
Abdelhamid Ibn El Farouk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Zouidine .

Editor information

Editors and Affiliations

ENSIAS, Mohammed V University, Rabat, Morocco
Mohamed Lazaar
UNILEHAVRE, UNIROUEN, Normandie Université, Le Havre, France
Claude Duvallet
Vrije Universiteit Brussel, Brussels, Belgium
Abdellah Touhafi
ENSA, Abdelmalek Essaâdi University, Tetuan, Morocco
Mohammed Al Achhab

A Proof of Equation 11

Here we demonstrate the Eq. (11).

We have the Eq. (10):

$$ L_{PG} = - E_{\tau \sim \pi _{\theta }}\left[ R(\tau ) \right] $$

$$ \nabla _{\theta } L_{PG} = -\nabla _{\theta } E_{\tau \sim \pi _{\theta }}\left[ R(\tau ) \right] $$

Given a function f(x), $p(x|\theta )$ a parametrized probability distribution, and its expectation $E_{x \sim p(x|\theta )}\left[ f(x) \right] $:

$$\begin{aligned} \nabla _{\theta }E_{x \sim p(x|\theta )}\left[ f(x) \right]= & {} \nabla _{\theta } \int dx\; f(x) p(x|\theta ))\\= & {} \int dx\; \nabla _{\theta } \left( f(x)p(x|\theta ) \right) \\= & {} \int dx\; \left( \nabla _{\theta } f(x)p(x|\theta ) + f(x) \nabla _{\theta } p(x|\theta ) \right) \\= & {} \int dx\; f(x)\nabla _{\theta }p(x|\theta ) \\= & {} \int dx\; f(x)p(x|\theta ) \frac{\nabla _{\theta }p(x|\theta )}{p(x|\theta )} \\= & {} \int dx f(x)p(x|\theta )\nabla _{\theta }\log p(x|\theta ) \\= & {} E_{x \sim p(x|\theta )}\left[ f(x)\nabla _{\theta }\log p(x|\theta ) \right] \end{aligned}$$

Now, we suppose $x = \tau $, $f(x)=R(\tau )$, and $p(x|\theta )=p(\tau |\theta )$:

$$\begin{aligned} \nabla _{\theta } L_{PG} = - E_{\tau \sim \pi _{\theta }}\left[ R(\tau )\nabla _{\theta }\log p(\tau |\theta ) \right]&(*) \end{aligned}$$

The trajectory $\tau $ is a sequence of events $a_t$ and $s_{t+1}$, respectively sampled from the agent’s policy $\pi _{\theta }(a_{t}|s_{t})$ and the probability of transition $p(s_{t+1}|s_{t}, \; a_{t})$. The probability of the complete trajectory is the product of the individual probabilities:

$$p(\tau |\theta ) = \prod _{t=0}^{T}p(s_{t+1}|s_{t}, \; a_{t}) \pi _{\theta } (a_{t}|s_{t})$$

$$\log p(\tau |\theta ) = \log \prod _{t=0}^{T}p(s_{t+1}|s_{t}, \; a_{t}) \pi _{\theta } (a_{t}|s_{t})$$

$$\log p(\tau |\theta ) = \sum _{t=0}^{T} \left( \log p(s_{t+1}|s_{t}, \; a_{t})+\log \pi _{\theta } (a_{t}|s_{t}) \right) $$

$$\begin{aligned} \nabla _{\theta }\log p(\tau |\theta ) = \nabla _{\theta }\sum _{t=0}^{T} \log \pi _{\theta } (a_{t}|s_{t})&(**) \end{aligned}$$

By putting (**) in (*), we find the Eq. (11):

$$\nabla _{\theta }L_{PG} = - E_{\tau \sim \pi _{\theta }}\left[ R(\tau ) \sum _{t=0}^{T}\nabla _{\theta } \log \pi _{\theta }(a_{t} | s_{t}) \right] $$

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zouidine, M., Khalil, M., Farouk, A.I.E. (2022). Policy Gradient for Arabic to English Neural Machine Translation. In: Lazaar, M., Duvallet, C., Touhafi, A., Al Achhab, M. (eds) Proceedings of the 5th International Conference on Big Data and Internet of Things. BDIoT 2021. Lecture Notes in Networks and Systems, vol 489. Springer, Cham. https://doi.org/10.1007/978-3-031-07969-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-07969-6_35
Published: 03 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07968-9
Online ISBN: 978-3-031-07969-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Proof of Equation 11

A Proof of Equation 11

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation