Abstract
Many languages still lack the annotated training data needed for supervised learning. This issue is often addressed by using auxiliary supervision and the so called transfer learning. In this work we focus on the problem of combining two types of auxiliary supervision – cross-lingual and cross-task. Previous work has shown promising results for this combination. Here, we aim to explore various advanced parameter sharing techniques to improve the results. We propose three distinct techniques with various properties and evaluate their performance on four Indo-European languages and four distinct NLP tasks (dependency parsing, language modeling, named entity recognition and part-of-speech tagging). We conclude that the proposed techniques significantly improve the performance for zero-shot learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Akhtar, M.S., Chauhan, D., Ghosal, D., Poria, S., Ekbal, A., Bhattacharyya, P.: Multi-task learning for multi-modal emotion recognition and sentiment analysis. In: Proceedings of the 2019 Conference of NAACL, Minneapolis, Minnesota, pp. 370–379. ACL (2019)
Benikova, D., Biemann, C., Reznicek, M.: Nosta-d named entity annotation for German: guidelines and dataset. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, 26–31 May 2014, Reykjavik, Iceland, pp. 2524–2531. ELRA (2014)
Bos, J., Basile, V., Evang, K., Venhuizen, N.J., Bjerva, J.: The Groningen meaning bank. In: Ide, N., Pustejovsky, J. (eds.) Handbook of Linguistic Annotation, pp. 463–496. Springer, Dordrecht (2017). https://doi.org/10.1007/978-94-024-0881-2_18
Caruana, R.: Multitask learning: a knowledge-based source of inductive bias. In: Machine Learning, Proceedings of the Tenth International Conference, 27–29 June 1993, University of Massachusetts, Amherst, MA, USA, pp. 41–48 (1993)
Conneau, A., Lample, G., Ranzato, M., Denoyer, L., Jégou, H.: Word translation without parallel data. In: 6th International Conference on Learning Representations. Vancouver, Canada (2018)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of NAACL, Minneapolis, Minnesota, pp. 4171–4186. ACL (2019)
Ganin, Y., Lempitsky, V.S.: Unsupervised domain adaptation by backpropagation. In: Proceedings of the 32nd ICML 2015, Lille, France, 6–11 July 2015. JMLR Workshop and Conference Proceedings, vol. 37, pp. 1180–1189. JMLR.org (2015)
Guo, J., Che, W., Yarowsky, D., Wang, H., Liu, T.: Cross-lingual dependency parsing based on distributed representations. In: Proceedings of the 53rd Annual Meeting of the ACL and the 7th IJCNLP, Beijing, China, pp. 1234–1244. ACL (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9–15 June 2019, Long Beach, California, USA. Proceedings of Machine Learning Research, vol. 97, pp. 2790–2799. PMLR (2019). http://proceedings.mlr.press/v97/houlsby19a.html
Hu, J., Ruder, S., Siddhant, A., Neubig, G., Firat, O., Johnson, M.: XTREME: a massively multilingual multi-task benchmark for evaluating cross-lingual generalization. CoRR abs/2003.11080 (2020)
Hwa, R., Resnik, P., Weinberg, A., Kolak, O.: Evaluating translational correspondence using annotation projection. In: Proceedings of the 40th Annual Meeting of the ACL, Philadelphia, USA, pp. 392–399. ACL (2002)
Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. ACL 5, 339–351 (2017)
Joty, S., Nakov, P., Màrquez, L., Jaradat, I.: Cross-language learning with adversarial neural networks. In: Proceedings of the 21st CoNLL, Vancouver, Canada, pp. 226–237. ACL (2017)
Karthikeyan, K., Wang, Z., Mayhew, S., Roth, D.: Cross-lingual ability of multilingual BERT: an empirical study. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. OpenReview.net (2020)
Klementiev, A., Titov, I., Bhattarai, B.: Inducing crosslingual distributed representations of words. In: Proceedings of COLING 2012, pp. 1459–1474. The COLING 2012 Organizing Committee, Mumbai, India (2012)
Kravalova, J., Zabokrtsky, Z.: Czech named entity corpus and SVM-based recognizer. In: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009), pp. 194–201. ACL (2009)
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th ICML, Williams College, Williamstown, USA, pp. 282–289. Morgan Kaufmann (2001)
Lin, Y., Yang, S., Stoyanov, V., Ji, H.: A multi-lingual multi-task architecture for low-resource sequence labeling. In: Proceedings of the 56th Annual Meeting of the ACL, Melbourne, Australia, pp. 799–809. ACL (2018)
McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of HLT-EMNLP, Vancouver, British Columbia, Canada, pp. 523–530. ACL (2005)
McDonald, R., Petrov, S., Hall, K.: Multi-source transfer of delexicalized dependency parsers. In: Proceedings of the 2011 EMNLP, Edinburgh, Scotland, UK, pp. 62–72. ACL (2011)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: 27th NIPS Proceedings, 5–8 December 2013, Lake Tahoe, Nevada, United States, pp. 3111–3119 (2013)
Pikuliak, M., Šimko, M.: Combining cross-lingual and cross-task supervision for zero-shot learning. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds.) TSD 2020. LNCS (LNAI), vol. 12284, pp. 162–170. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58323-1_17
Sang, E.F.T.K.: Introduction to the conll-2002 shared task: language-independent named entity recognition. CoRR cs.CL/0209010 (2002)
Wan, X.: Co-training for cross-lingual sentiment classification. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, pp. 235–243. ACL (2009)
Wang, D., Zheng, T.F.: Transfer learning for speech and language processing. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2015, Hong Kong, 16–19 December 2015, pp. 1225–1237 (2015)
Zapotoczny, M., Rychlikowski, P., Chorowski, J.: On multilingual training of neural dependency parsers. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 326–334. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_37
Acknowledgments
This work was partially supported by the Scientific Grant Agency of the Slovak Republic, grants No. VG 1/0725/19 and VG 1/0667/18 and by the Slovak Research and Development Agency under the contracts No. APVV-15-0508, APVV-17-0267 and APVV SK-IL-RD-18-0004.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Pikuliak, M., Šimko, M. (2020). Exploring Parameter Sharing Techniques for Cross-Lingual and Cross-Task Supervision. In: Espinosa-Anke, L., Martín-Vide, C., Spasić, I. (eds) Statistical Language and Speech Processing. SLSP 2020. Lecture Notes in Computer Science(), vol 12379. Springer, Cham. https://doi.org/10.1007/978-3-030-59430-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-59430-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59429-9
Online ISBN: 978-3-030-59430-5
eBook Packages: Computer ScienceComputer Science (R0)