Abstract
In many Natural Language Processing problems the combination of machine learning and optimization techniques is essential. One of these problems is the estimation of the human effort needed to improve a text that has been translated using a machine translation method. Recent advances in this area have shown that Gaussian Processes can be effective in post-editing effort prediction. However, Gaussian Processes require a kernel function to be defined, the choice of which highly influences the quality of the prediction. On the other hand, the extraction of features from the text can be very labor-intensive, although recent advances in sentence embedding have shown that this process can be automated. In this paper, we use a Genetic Programming algorithm to evolve kernels for Gaussian Processes to predict post-editing effort based on sentence embeddings. We show that the combination of evolutionary optimization and Gaussian Processes removes the need for a-priori specification of the kernel choice, and, by using a multi-objective variant of the Genetic Programming approach, kernels that are suitable for predicting several metrics can be learned. We also investigate the effect that the choice of the sentence embedding method has on the kernel learning process.
Similar content being viewed by others
References
Araujo, L.: How evolutionary algorithms are applied to statistical natural language processing. Artificial Intelligence Review 28(4), 275–303 (2007). https://doi.org/10.1007/s10462-009-9104-y
Artetxe, M., Schwenk, H.: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond. Transactions of the Association for Computational Linguistics 7, 597–610 (2019). Publisher: MIT Press
Beck, D.: Modelling Representation Noise in Emotion Analysis using Gaussian Processes. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 140–145. Asian Federation of Natural Language Processing, Taipei, Taiwan (2017). https://www.aclweb.org/anthology/I17-2024
Beck, D., Cohn, T., Hardmeier, C., Specia, L.: Learning Structural Kernels for Natural Language Processing. Transactions of the Association for Computational Linguistics 3, 461–473 (2015). https://doi.org/10.1162/tacl_a_00151.
Beck, D., Specia, L., Cohn, T.: Exploring prediction uncertainty in machine translation quality estimation. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 208–218 (2016)
Bing, W., Wen-qiong, Z., Ling, C., Jia-hong, L.: A GP-Based Kernel Construction and Optimization Method for RVM. In: 2010 the 2Nd International Conference on Computer and Automation Engineering (ICCAE), Vol. 4, pp. 419–423. https://doi.org/10.1109/ICCAE.2010.5451646 (2010)
Blum, M., Riedmiller, M.: Optimization of Gaussian Process Hyperparameters using Rprop. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (2013). https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2013-51.pdf
Brochu, E., Cora, V. ., de Freitas, N.: A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. arXiv:1012.2599 [cs]. 1012.2599 (2010)
Bungum, L.: Evolutionary Algorithms in Natural Language Processing. In: Norwegian Artificial Intelligence Symposium, Vol. 22 (2010)
Callison-Burch, C., Koehn, P., Monz, C., Zaidan, O.F.: Findings of the 2011 Workshop on Statistical Machine Translation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, WMT ’11, pp. 22–64. Association for Computational Linguistics, Stroudsburg, PA, USA (2011). http://dl.acm.org/citation.cfm?id=2132960.2132964. Event-place: Edinburgh, Scotland
Chu, W., Ghahramani, Z.: Gaussian Processes for Ordinal Regression. Journal of Machine Learning Research 6(Jul), 1019–1041 (2005). http://www.jmlr.org/papers/v6/chu05a.html
Cohn, T., Preotiuc-Pietro, D., Lawrence, N.: Gaussian processes for natural language processing. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Tutorials, pp. 1–3 (2014)
Cohn, T., Specia, L.: Modelling annotator bias with multi-task gaussian processes: an application to machine translation quality estimation. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 32–42 (2013)
Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimization: NSGA-II. In: M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J.J. Merelo, H.P. Schwefel (eds.) Parallel Problem Solving from Nature PPSN VI, Lecture Notes in Computer Science, pp. 849–858. Springer Berlin Heidelberg (2000)
Deriu, J., Lucchi, A., De Luca, V., Severyn, A., Müller, S., Cieliebak, M., Hofmann, T., Jaggi, M.: Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification. In: Proceedings of the 26th International Conference on World Wide Web, WWW ’17, pp. 1045–1052. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2017). https://doi.org/10.1145/3038912.3052611. Event-place: Perth, Australia
Diosan, L., Rogozan, A., Pecuchet, J. P.: Evolving Kernel Functions for SVMs by Genetic Programming. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 19–24. https://doi.org/10.1109/ICMLA.2007.70 (2007)
Dioşan, L., Rogozan, A., Pecuchet, J.P.: Improving classification performance of Support Vector Machine by genetically optimising kernel shape and hyper-parameters. Applied Intelligence 36(2), 280–294 (2012). https://doi.org/10.1007/s10489-010-0260-1. https://link.springer.com/article/10.1007/s10489-010-0260-1
Duvenaud, D.: Automatic model construction with Gaussian processes. Thesis, University of Cambridge (2014). http://www.repository.cam.ac.uk/handle/1810/247281
Duvenaud, D., Lloyd, J., Grosse, R., Tenenbaum, J., Zoubin, G.: Structure Discovery in Nonparametric Regression through Compositional Kernel Search. In: Proceedings of The 30th International Conference on Machine Learning, pp. 1166–1174 (2013). http://jmlr.org/proceedings/papers/v28/duvenaud13.html
Fortin, F.A., Rainville, F.M.D., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13(Jul), 2171–2175 (2012). http://www.jmlr.org/papers/v13/fortin12a.html
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the american statistical association 32 (200), 675–701 (1937)
Gagné, C., Schoenauer, M., Sebag, M., Tomassini, M.: Genetic Programming for Kernel-Based Learning with Co-evolving Subsets Selection. In: Parallel Problem Solving from Nature - PPSN IX, Lecture Notes in Computer Science, pp. 1008–1017. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11844297_102. https://link.springer.com/chapter/10.1007/11844297_102
Genton, M.G.: Classes of Kernels for Machine Learning: A Statistics Perspective. J. Mach. Learn. Res. 2, 299–312 (2002). http://dl.acm.org/citation.cfm?id=944790.944815
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., Mikolov, T.: Learning word vectors for 157 languages. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), 3483–3487. European Languages Resources Association (ELRA), Miyazaki, Japan (2018)
Howley, T., Madden, M.G.: An Evolutionary Approach to Automatic Kernel Construction. In: Artificial Neural Networks – ICANN 2006, Lecture Notes in Computer Science, pp. 417–426. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/11840930_43. https://link.springer.com/chapter/10.1007/11840930_43
Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S.: Skip-Thought Vectors. In: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama, R. Garnett (eds.) Advances in Neural Information Processing Systems 28, pp. 3294–3302. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5950-skip-thought-vectors.pdf
Koch, P., Bischl, B., Flasch, O., Bartz-Beielstein, T., Weihs, C., Konen, W.: Tuning and evolution of support vector kernels. Evolutionary Intelligence 5(3), 153–170 (2012). https://doi.org/10.1007/s12065-012-0073-8. https://link.springer.com/article/10.1007/s12065-012-0073-8
Koza, J. R.: Genetic programming: on the programming of computers by means of natural selection MIT press (1992)
Kronberger, G., Kommenda, M.: Evolution of Covariance Functions for Gaussian Process Regression Using Genetic Programming. In: Computer Aided Systems Theory - EUROCAST 2013, Lecture Notes in Computer Science, pp. 308–315. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-53856-8_39. https://link.springer.com/chapter/10.1007/978-3-642-53856-8_39
Lampos, V., Zou, B., Cox, I.J.: Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance. In: Proceedings of the 26th International Conference on World Wide Web, WWW ’17, pp. 695–704. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2017). https://doi.org/10.1145/3038912.3052622. Event-place: Perth, Australia
Le, Q., Mikolov, T.: Distributed Representations of Sentences and Documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014). http://proceedings.mlr.press/v32/le14.html. ISSN: 1938-7228 Section: Machine Learning
Lloyd, J.R., Duvenaud, D., Grosse, R., Tenenbaum, J., Ghahramani, Z.: Automatic Construction and Natural-Language Description of Nonparametric Regression Models. In: Twenty-Eighth AAAI Conference on Artificial Intelligence (2014). https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8240
MacKay, D.J.C.: Bayesian Methods for Backpropagation Networks. In: Models of Neural Networks III, Physics of Neural Networks, pp. 211–254. Springer, New York, NY (1996). https://doi.org/10.1007/978-1-4612-0723-8_6. https://link.springer.com/chapter/10.1007/978-1-4612-0723-8_6
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. In: C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 26, pp. 3111–3119. Curran Associates, Inc. (2013). http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Montana, D.J.: Strongly Typed Genetic Programming. Evolutionary Computation 3(2), 199–230 (1995). https://doi.org/10.1162/evco.1995.3.2.199
Neal, R.M.: Bayesian Learning for Neural Networks. Lecture Notes in Statistics. Springer-Verlag, New York (1996). https://www.springer.com/gp/book/9780387947242
Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543 (2014)
Polajnar, T., Rogers, S., Girolami, M.: Protein interaction detection in sentences via Gaussian processes: a preliminary evaluation. International journal of data mining and bioinformatics 5(1), 52–72 (2011)
Powell, M.J.D.: An efficient method for finding the minimum of a function of several variables without calculating derivatives. The Computer Journal 7(2), 155–162 (1964). https://doi.org/10.1093/comjnl/7.2.155. http://comjnl.oxfordjournals.org/content/7/2/155
Preoţiuc-Pietro, D., Cohn, T.: A temporal model of text periodicities using Gaussian Processes. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 977–988 (2013)
Rasmussen, C. E., Williams, C. K.: Gaussian processes for machine learning MIT Press (2006)
Roman, I., Mendiburu, A., Santana, R., Lozano, J.A.: Sentiment analysis with genetically evolved Gaussian kernels. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’19, pp. 1328–1337. Association for Computing Machinery, Prague, Czech Republic (2019). https://doi.org/10.1145/3321707.3321779
Roman, I., Santana, R., Mendiburu, A., Lozano, J. A.: Evolving Gaussian Process kernels from elementary mathematical expressions. arXiv:1910.05173 [cs, stat] (2019). 1910.05173. ArXiv:1910.05173
Roman, I., Santana, R., Mendiburu, A., Lozano, J.A.: Evolving Gaussian Process Kernels for Translation Editing Effort Estimation. In: N.F. Matsatsinis, Y. Marinakis, P. Pardalos (eds.) Learning and Intelligent Optimization, Lecture Notes in Computer Science, pp. 304–318. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-38629-0_25
Santana, R.: Reproducing and learning new algebraic operations on word embeddings using genetic programming. arXiv:1702.05624 [cs]. 1702.05624. ArXiv:1702.05624 (2017)
Schwarz, G.: Estimating the Dimension of a Model. The Annals of Statistics 6(2), 461–464 (1978). https://doi.org/10.1214/aos/1176344136. https://projecteuclid.org/euclid.aos/1176344136
Shaffer, J.P.: Modified Sequentially Rejective Multiple Test Procedures. Journal of the American Statistical Association (2012). https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1986.10478341
Shah, K., Cohn, T., Specia, L.: An investigation on the effectiveness of features for translation quality estimation. In: Proceedings of the Machine Translation Summit, vol. 14, pp. 167–174 (2013)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Specia, L.: Exploiting objective annotations for measuring translation post-editing effort. In: Proceedings of the 15th Conference of the European Association for Machine Translation, pp. 73–80 (2011)
Specia, L., Shah, K., de Souza, J. G., Cohn, T.: Quest - A translation quality estimation framework. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 79–84. Association for Computational Linguistics, Sofia, Bulgaria (2013)
Sullivan, K.M., Luke, S.: Evolving Kernels for Support Vector Machine Classification. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07, pp. 1702–1707. ACM, New York, NY, USA (2007). https://doi.org/10.1145/1276958.1277292
Vapnik, V. N.: The nature of statistical learning theory. Springer-verlag, berlin heidelberg (1995)
Wang, Z., de Freitas, N.: Theoretical Analysis of Bayesian Optimisation with Unknown Gaussian Process hyper-Parameters. arXiv:1406.7758 [cs, stat] (2014)
Yankovskaya, E., Tättar, A., Fishel, M.: Quality Estimation and Translation Metrics via Pre-trained Word and Sentence Embeddings. In: Proceedings of the Fourth Conference on Machine Translation (Volume 3: Shared Task Papers, Day 2), pp. 101–105. Association for Computational Linguistics, Florence, Italy (2019). https://doi.org/10.18653/v1/W19-5410. https://www.aclweb.org/anthology/W19-5410
Acknowledgements
This work has been supported by the Spanish Ministry of Science and Innovation (project PID2019-104966GB-I00), and the Basque Government (projects KK-2020/00049 and IT1244-19, and ELKARTEK program).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Roman, I., Santana, R., Mendiburu, A. et al. Evolution of Gaussian Process kernels for machine translation post-editing effort estimation. Ann Math Artif Intell 89, 835–856 (2021). https://doi.org/10.1007/s10472-021-09751-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-021-09751-5