NER in Archival Finding Aids: Next Level

da Costa Cunha, Luís Filipe; Ramalho, José Carlos

doi:10.1007/978-3-031-04819-7_33

Luís Filipe da Costa Cunha¹³ &
José Carlos Ramalho¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 469))

Included in the following conference series:

World Conference on Information Systems and Technologies

1259 Accesses

Abstract

Currently, there is a vast amount of archival finding aids in Portuguese archives, however, these documents lack structure (are not annotated) making them hard to process and work with. In this way, we intend to extract and classify entities of interest, like geographical locations, people’s names, dates, etc. For this, we will use an architecture that has been revolutionizing several NLP tasks, Transformers, presenting several models in order to achieve high results. It is also intended to understand what will be the degree of improvement that this new mechanism will present in comparison with previous architectures. Can Transformer-based models replace the LSTMs in NER? We intend to answer this question along this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alammar, J.: The illustrated transformer. http://jalammar.github.io/illustrated-transformer. Accessed 18 July 2021
Alvi, A., Kharya, P.: Using deepspeed and megatron to train megatron-turing NLG 530b, the world’s largest and most powerful generative language model (2021). https://www.microsoft.com/en-us/research/blog/using-deepspeed-and-megatron-to-train-megatron-turing-nlg-530b-the-worlds-largest-and-most-powerful-generative-language-model/. Accessed 15 Oct 2021
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2016)
Google Scholar
Cunha, L.F.C., Ramalho, J.C.: http://ner.epl.di.uminho.pt/
Cunha, L.F.C., Ramalho, J.C.: NER in Archival Finding Aids (2021). https://doi.org/10.4230/OASIcs.SLATE.2021.8
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2019)
Google Scholar
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification (2018). https://doi.org/10.18653/v1/p18-1031
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: Global vectors for word representation (2014). https://doi.org/10.3115/v1/D14-1162
Radford, A., Narasimhan, K.: Improving language understanding by generative pre-training (2018)
Google Scholar
Ruder, S.: NLP’s ImageNet moment has arrived. https://ruder.io/nlp-imagenet/ (2018). Accessed 07 Oct 2021
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge (2015)
Google Scholar
Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models for Brazilian Portuguese. In: 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23 (2020). (to appear)
Google Scholar
Vaswani, A., et al.: Attention is all you need (2017)
Google Scholar
Wagner, J., Wilkens, R., Idiart, M., Villavicencio, A.: The brWaC corpus: a new open resource for Brazilian Portuguese (2018)
Google Scholar
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Minho, Braga, Portugal
Luís Filipe da Costa Cunha
Department of Informatics, University of Minho, Braga, Portugal
José Carlos Ramalho

Authors

Luís Filipe da Costa Cunha
View author publications
You can also search for this author in PubMed Google Scholar
José Carlos Ramalho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luís Filipe da Costa Cunha .

Editor information

Editors and Affiliations

ISEG, Universidade de Lisboa, Lisboa, Portugal
Alvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
Institute of Data Science and Digital Te, Vilnius University, Vilnius, Lithuania
Gintautas Dzemyda
DCT, Universidade Portucalense, PORTO, Portugal
Fernando Moreira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

da Costa Cunha, L.F., Ramalho, J.C. (2022). NER in Archival Finding Aids: Next Level. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 469. Springer, Cham. https://doi.org/10.1007/978-3-031-04819-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-031-04819-7_33
Published: 17 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04818-0
Online ISBN: 978-3-031-04819-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

NER in Archival Finding Aids: Next Level