Abstract
Conversational assistants with increasing NLP capabilities are becoming commodity functionality for most new devices. However, the underlying language models responsible for language-related intelligence are typically characterized by a large number of parameters and high demand for memory and resources. This makes them a no-go for edge and low-resource devices, forcing them to be cloud-hosted, hence experiencing delays. To this end, we design a systematic language-agnostic methodology to develop powerful lightweight NLP models using knowledge distillation techniques, this way building models suitable for such low resource devices. We follow the steps of the proposed approach for the Greek language and build the first - to the best of our knowledge - lightweight Greek language model, which we make publicly available. We train and evaluate GloVe word embeddings in Greek and efficiently distill Greek-BERT into various BiLSTM models, without considerable loss in performance. Experiments indicate that knowledge distillation and data augmentation can improve the performance of simple BiLSTM models for two NLP tasks in Modern Greek, i.e., Topic Classification and Natural Language Inference, making them suitable candidates for low-resource devices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002). https://doi.org/10.1145/503104.503110
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. ACL 5, 135–146 (2017)
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the ACL, pp. 8440–8451. ACL, July 2020. https://doi.org/10.18653/v1/2020.acl-main.747
Conneau, A., et al.: XNLI: evaluating cross-lingual sentence representations. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October–November 2018, pp. 2475–2485. ACL (2018). https://doi.org/10.18653/v1/D18-1269
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2019)
He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with disentangled attention (2021)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Deep Learning and Representation Learning Workshop (2015). http://arxiv.org/abs/1503.02531
Honnibal, M., Montani, I.: spaCy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017)
Jiao, X., et al.: TinyBERT: distilling BERT for natural language understanding. In: Findings of the ACL: EMNLP 2020, pp. 4163–4174. ACL, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.372
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)
Koehn, P.: Europarl: a parallel corpus for statistical machine translation (2005)
Koutsikakis, J., Chalkidis, I., Malakasiotis, P., Androutsopoulos, I.: GREEK-BERT: the Greeks visiting sesame street. In: 11th Hellenic Conference on Artificial Intelligence, SETN 2020, pp. 110–117. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3411408.3411440
Kovaleva, O., Romanov, A., Rogers, A., Rumshisky, A.: Revealing the dark secrets of BERT. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 4365–4374. ACL, November 2019. https://doi.org/10.18653/v1/D19-1445
Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. ACL 3, 211–225 (2015). https://doi.org/10.1162/tacl_a_00134
Lioudakis, M., Outsios, S., Vazirgiannis, M.: An ensemble method for producing word representations focusing on the Greek language. arXiv preprint arXiv:1904.04032 (2020)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019)
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bkg6RiCqY7
Malamas, N., Symeonidis, A.: Embedding rasa in edge devices: capabilities and limitations. Procedia Comput. Sci. 192, 109–118 (2021). https://doi.org/10.1016/j.procs.2021.08.012
McCarley, J.S., Chakravarti, R., Sil, A.: Structured pruning of a BERT-based question answering model. \(\rm arXiv{:}\) Computation and Language (2019)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR 2013 (2013)
Ortiz Suárez, P.J., Sagot, B., Romary, L.: Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures. In: Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019, Cardiff, Leibniz-Institut für Deutsche Sprache, Mannheim, 22nd July 2019, pp. 9–16 (2019). https://doi.org/10.14618/ids-pub-9021
Outsios, S., Karatsalos, C., Skianis, K., Vazirgiannis, M.: Evaluation of Greek word embeddings. arXiv preprint arXiv:1904.04032 (2019)
Papantoniou, K., Tzitzikas, Y.: NLP for the Greek language: a brief survey. In: 11th Hellenic Conference on Artificial Intelligence, SETN 2020, pp. 101–109. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3411408.3411410
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532–1543. ACL, October 2014. https://doi.org/10.3115/v1/D14-1162
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies, New Orleans, Louisiana, vol. 1, pp. 2227–2237. ACL, June 2018. https://doi.org/10.18653/v1/N18-1202
Radford, A., Narasimhan, K.: Improving language understanding by generative pre-training (2018)
Rogers, A., Kovaleva, O., Rumshisky, A.: A primer in BERTology: what we know about how BERT works. Trans. ACL 8, 842–866 (2020). https://doi.org/10.1162/tacl_a_00349
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 (2019)
Shen, S., et al.: Q-BERT: Hessian based ultra low precision quantization of BERT (2019)
Sun, S., Cheng, Y., Gan, Z., Liu, J.: Patient knowledge distillation for BERT model compression. In: Proceedings of the 2019 EMNLP-IJCNLP, Hong Kong, China, pp. 4323–4332. ACL, November 2019. https://doi.org/10.18653/v1/D19-1441
Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., Lin, J.J.: Distilling task-specific knowledge from BERT into simple neural networks. arXiv:1903.12136 (2019)
Turc, I., Chang, M., Lee, K., Toutanova, K.: Well-read students learn better: the impact of student initialization on knowledge distillation. CoRR arXiv:1908.08962 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wu, B., et al.: Towards non-task-specific distillation of BERT via sentence representation approximation. In: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the ACL and the 10th International Joint Conference on Natural Language Processing, Suzhou, China, pp. 70–79. ACL, December 2020. https://aclanthology.org/2020.aacl-main.9
Acknowledgements
This research has been co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH - CREATE - INNOVATE (project code: T1EDK-02347).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Goulas, A., Malamas, N., Symeonidis, A.L. (2022). A Methodology for Enabling NLP Capabilities on Edge and Low-Resource Devices. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-08473-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08472-0
Online ISBN: 978-3-031-08473-7
eBook Packages: Computer ScienceComputer Science (R0)