loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Diego Bernardes de Lima Santos 1 ; Frederico Giffoni de Carvalho Dutra 2 ; Fernando Silva Parreiras 3 and Wladmir Cardoso Brandão 1

Affiliations: 1 Department of Computer Science, Pontifical Catholic University of Minas Gerais (PUC Minas), Belo Horizonte, Brazil ; 2 Companhia Energética de Minas Gerais (CEMIG), Belo Horizonte, Brazil ; 3 Laboratory for Advanced Information Systems, FUMEC University, Belo Horizonte, Brazil

Keyword(s): Named Entity Recognition, Text Embedding, Neural Network, Transformer, Multilingual, Portuguese.

Abstract: Recent state of the art named entity recognition approaches are based on deep neural networks that use an attention mechanism to learn how to perform the extraction of named entities from relevant fragments of text. Usually, training models in a specific language leads to effective recognition, but it requires a lot of time and computational resources. However, fine-tuning a pre-trained multilingual model can be simpler and faster, but there is a question on how effective that recognition model can be. This article exploits multilingual models for named entity recognition by adapting and training tranformer-based architectures for Portuguese, a challenging complex language. Experimental results show that multilingual trasformer-based text embeddings approaches fine tuned with a large dataset outperforms state of the art trasformer-based models trained specifically for Portuguese. In particular, we build a comprehensive dataset from different versions of HAREM to train our multilingua l transformer-based text embedding approach, which achieves 88.0% of precision and 87.8% in F1 in named entity recognition for Portuguese, with gains of up to 9.89% of precision and 11.60% in F1 compared to the state of the art single-lingual approach trained specifically for Portuguese. (More)

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.188.142.146

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Santos, D.; Dutra, F.; Parreiras, F. and Brandão, W. (2021). Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese. In Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS; ISBN 978-989-758-509-8; ISSN 2184-4992, SciTePress, pages 473-483. DOI: 10.5220/0010443204730483

@conference{iceis21,
author={Diego Bernardes de Lima Santos. and Frederico Giffoni de Carvalho Dutra. and Fernando Silva Parreiras. and Wladmir Cardoso Brandão.},
title={Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese},
booktitle={Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS},
year={2021},
pages={473-483},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0010443204730483},
isbn={978-989-758-509-8},
issn={2184-4992},
}

TY - CONF

JO - Proceedings of the 23rd International Conference on Enterprise Information Systems - Volume 1: ICEIS
TI - Assessing the Effectiveness of Multilingual Transformer-based Text Embeddings for Named Entity Recognition in Portuguese
SN - 978-989-758-509-8
IS - 2184-4992
AU - Santos, D.
AU - Dutra, F.
AU - Parreiras, F.
AU - Brandão, W.
PY - 2021
SP - 473
EP - 483
DO - 10.5220/0010443204730483
PB - SciTePress