Conferences >2023 26th Conference of the O...

Efficiently Transferring Pre-trained Language Model RoBERTa Base English to Hindi Using WECHSEL

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

A crucial element of Natural Language Processing (NLP) is to make it possible for computers to comprehend and process human language, Language models (LMs), have taken ov...Show More

Metadata

Abstract:

A crucial element of Natural Language Processing (NLP) is to make it possible for computers to comprehend and process human language, Language models (LMs), have taken over the discipline in recent years. LMs are pre-trained models that can be customized and have significantly improved performance for a variety of challenging natural language tasks. Bidirectional Encoders for Transformers (BERT) used in both English and other languages, is one of the most well-known LMs. Large pre-trained LMs require enormous computational resources to train on English text. This makes training these models in other languages difficult. This paper used a unique approach termed WECHSEL to address this problem on the Hindi dataset. The WECHSEL method to the RoBERTa model and assess its effectiveness. The source model’s English tokenizer is replaced with a tokenizer in the target language, Hindi. The advantages of this approach for Hindi demonstrate that WECHSEL outperforms other models of comparable size built from scratch with up to 64 times less training effort. WECHSEL RoBERTa-based Hindi was fine-tuned on NER task using SemEval-2022 datasets, and the accuracy is greater than any other BERT-based models. In this paper, WECHSEL RoBERTa-based Hindi achieved an accuracy of 73.45%, which is higher than the rest of the Hindi-language based BERT models.

Published in: 2023 26th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)

Date of Conference: 04-06 December 2023

Date Added to IEEE Xplore: 02 April 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/O-COCOSDA60357.2023.10482976

Conference Location: Delhi, India

Contents

References is not available for this document.

Efficiently Transferring Pre-trained Language Model RoBERTa Base English to Hindi Using WECHSEL

Abstract:

Metadata

Abstract:

ISSN Information:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Efficiently Transferring Pre-trained Language Model RoBERTa Base English to Hindi Using WECHSEL

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?