Skip to main content

A Methodology for Enabling NLP Capabilities on Edge and Low-Resource Devices

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13286))

  • 1429 Accesses

Abstract

Conversational assistants with increasing NLP capabilities are becoming commodity functionality for most new devices. However, the underlying language models responsible for language-related intelligence are typically characterized by a large number of parameters and high demand for memory and resources. This makes them a no-go for edge and low-resource devices, forcing them to be cloud-hosted, hence experiencing delays. To this end, we design a systematic language-agnostic methodology to develop powerful lightweight NLP models using knowledge distillation techniques, this way building models suitable for such low resource devices. We follow the steps of the proposed approach for the Greek language and build the first - to the best of our knowledge - lightweight Greek language model, which we make publicly available. We train and evaluate GloVe word embeddings in Greek and efficiently distill Greek-BERT into various BiLSTM models, without considerable loss in performance. Experiments indicate that knowledge distillation and data augmentation can improve the performance of simple BiLSTM models for two NLP tasks in Modern Greek, i.e., Topic Classification and Natural Language Inference, making them suitable candidates for low-resource devices.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/AuthEceSoftEng/Greek-NLP-Distillation-Paper.

  2. 2.

    https://inventory.clarin.gr/corpus/909.

References

  1. Placing search in context: the concept revisited. ACM Trans. Inf. Syst. 20(1), 116–131 (2002). https://doi.org/10.1145/503104.503110

  2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. ACL 5, 135–146 (2017)

    Google Scholar 

  3. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. In: Proceedings of the 58th Annual Meeting of the ACL, pp. 8440–8451. ACL, July 2020. https://doi.org/10.18653/v1/2020.acl-main.747

  4. Conneau, A., et al.: XNLI: evaluating cross-lingual sentence representations. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October–November 2018, pp. 2475–2485. ACL (2018). https://doi.org/10.18653/v1/D18-1269

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2019)

  6. He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with disentangled attention (2021)

    Google Scholar 

  7. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Deep Learning and Representation Learning Workshop (2015). http://arxiv.org/abs/1503.02531

  8. Honnibal, M., Montani, I.: spaCy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (2017)

    Google Scholar 

  9. Jiao, X., et al.: TinyBERT: distilling BERT for natural language understanding. In: Findings of the ACL: EMNLP 2020, pp. 4163–4174. ACL, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.372

  10. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2014)

    Google Scholar 

  11. Koehn, P.: Europarl: a parallel corpus for statistical machine translation (2005)

    Google Scholar 

  12. Koutsikakis, J., Chalkidis, I., Malakasiotis, P., Androutsopoulos, I.: GREEK-BERT: the Greeks visiting sesame street. In: 11th Hellenic Conference on Artificial Intelligence, SETN 2020, pp. 110–117. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3411408.3411440

  13. Kovaleva, O., Romanov, A., Rogers, A., Rumshisky, A.: Revealing the dark secrets of BERT. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 4365–4374. ACL, November 2019. https://doi.org/10.18653/v1/D19-1445

  14. Levy, O., Goldberg, Y., Dagan, I.: Improving distributional similarity with lessons learned from word embeddings. Trans. ACL 3, 211–225 (2015). https://doi.org/10.1162/tacl_a_00134

    Article  Google Scholar 

  15. Lioudakis, M., Outsios, S., Vazirgiannis, M.: An ensemble method for producing word representations focusing on the Greek language. arXiv preprint arXiv:1904.04032 (2020)

  16. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019)

    Google Scholar 

  17. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=Bkg6RiCqY7

  18. Malamas, N., Symeonidis, A.: Embedding rasa in edge devices: capabilities and limitations. Procedia Comput. Sci. 192, 109–118 (2021). https://doi.org/10.1016/j.procs.2021.08.012

    Article  Google Scholar 

  19. McCarley, J.S., Chakravarti, R., Sil, A.: Structured pruning of a BERT-based question answering model. \(\rm arXiv{:}\) Computation and Language (2019)

    Google Scholar 

  20. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR 2013 (2013)

    Google Scholar 

  21. Ortiz Suárez, P.J., Sagot, B., Romary, L.: Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures. In: Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019, Cardiff, Leibniz-Institut für Deutsche Sprache, Mannheim, 22nd July 2019, pp. 9–16 (2019). https://doi.org/10.14618/ids-pub-9021

  22. Outsios, S., Karatsalos, C., Skianis, K., Vazirgiannis, M.: Evaluation of Greek word embeddings. arXiv preprint arXiv:1904.04032 (2019)

  23. Papantoniou, K., Tzitzikas, Y.: NLP for the Greek language: a brief survey. In: 11th Hellenic Conference on Artificial Intelligence, SETN 2020, pp. 101–109. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3411408.3411410

  24. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1532–1543. ACL, October 2014. https://doi.org/10.3115/v1/D14-1162

  25. Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the ACL: Human Language Technologies, New Orleans, Louisiana, vol. 1, pp. 2227–2237. ACL, June 2018. https://doi.org/10.18653/v1/N18-1202

  26. Radford, A., Narasimhan, K.: Improving language understanding by generative pre-training (2018)

    Google Scholar 

  27. Rogers, A., Kovaleva, O., Rumshisky, A.: A primer in BERTology: what we know about how BERT works. Trans. ACL 8, 842–866 (2020). https://doi.org/10.1162/tacl_a_00349

    Article  Google Scholar 

  28. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 (2019)

  29. Shen, S., et al.: Q-BERT: Hessian based ultra low precision quantization of BERT (2019)

    Google Scholar 

  30. Sun, S., Cheng, Y., Gan, Z., Liu, J.: Patient knowledge distillation for BERT model compression. In: Proceedings of the 2019 EMNLP-IJCNLP, Hong Kong, China, pp. 4323–4332. ACL, November 2019. https://doi.org/10.18653/v1/D19-1441

  31. Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., Lin, J.J.: Distilling task-specific knowledge from BERT into simple neural networks. arXiv:1903.12136 (2019)

  32. Turc, I., Chang, M., Lee, K., Toutanova, K.: Well-read students learn better: the impact of student initialization on knowledge distillation. CoRR arXiv:1908.08962 (2019)

  33. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

  34. Wu, B., et al.: Towards non-task-specific distillation of BERT via sentence representation approximation. In: Proceedings of the 1st Conference of the Asia-Pacific Chapter of the ACL and the 10th International Joint Conference on Natural Language Processing, Suzhou, China, pp. 70–79. ACL, December 2020. https://aclanthology.org/2020.aacl-main.9

Download references

Acknowledgements

This research has been co-financed by the European Regional Development Fund of the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH - CREATE - INNOVATE (project code: T1EDK-02347).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikolaos Malamas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Goulas, A., Malamas, N., Symeonidis, A.L. (2022). A Methodology for Enabling NLP Capabilities on Edge and Low-Resource Devices. In: Rosso, P., Basile, V., Martínez, R., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2022. Lecture Notes in Computer Science, vol 13286. Springer, Cham. https://doi.org/10.1007/978-3-031-08473-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08473-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08472-0

  • Online ISBN: 978-3-031-08473-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics