Skip to main content

The Systematic Construction of Multiple Types of Corpora Through the Lapelinc Framework

  • Conference paper
  • First Online:
Computational Processing of the Portuguese Language (PROPOR 2022)

Abstract

The lack of a pattern widely accepted in Corpus Linguistics for the construction of text corpora has resulted in multiple possible paths to compilation, resulting in research products that are difficult to interface and have low continuity. The variety of existing corpora aggravates this problem since individual studies use different forms of expression to describe and solve similar problems. Thus, a gap is identified in the production initiatives to create corpora regarding the “exploratory” way of conducting such initiatives. Based on the Lapelinc Workflow for the construction of historical corpora, the Lapelinc Framework presents a working pattern for construction and research activities involving Portuguese language corpora, thereby establishing a set of steps and by-products common to corpora-based linguistic research. Providing a set of tools in the Import Stage with a reduced learning curve and allowing the convergence of efforts during development, the Lapelinc Framework tools add functionalities for multiple purposes in the construction of corpora, simplifying the construction of multi-type corpora, facilitated by the resources it makes available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Portela, M.: Humanidades digitais: as humanidades na era da Web 2.0. Rua Larga - Rectory Magazine of the University of Coimbra, vol. 147 (2013)

    Google Scholar 

  2. Magalhães, I.L., Pinheiro, W.B.: Gerenciamento de Serviços de TI na Prática: uma abordagem com base na ITIL. Novatec, São Paulo (2007)

    Google Scholar 

  3. Costa, B.S., Santos, J.V., Namiuti, C.: Uma proposta metodológica para a construção de corpora através de estruturas de trabalho: O LAPELINC FRAMEWORK, In: II International Congress of Digital Humanities - HDRIO 2020/2021. HDRio, Rio de Janeiro (2021)

    Google Scholar 

  4. Sardinha, T.B., Leila, B.: Brazilian journal in applied linguistics. Belo Horizonte 5, 5 (2005)

    Google Scholar 

  5. Costa, B.S.: Um framework integrado para a criação, o gerenciamento e a disponibilização de corpora digitais em língua portuguesa [Doctoral thesis project]. State University of Southwest Bahia, Vitória da Conquista (2019)

    Google Scholar 

  6. Santos, J.V., Namiuti, C.: O futuro das humanidades digitais é o passado. In: V CILH - International Congress of Historical Linguistics, São Paulo (2017)

    Google Scholar 

  7. BNC. http://www.natcorp.ox.ac.uk

  8. Kroch, A., Taylor, A.: The Penn-Helsinki Parsed Corpus of Middle English (PPCME2). http://www.ling.upenn.edu/ppche-release-2016/PPCME2-RELEASE-4

  9. OANC. http://www.anc.org/data/oanc/

  10. CRPC. http://www.clul.ulisboa.pt/en/10-research/713-crpc-reference-corpus-of-contemporary-portuguese

  11. NILC - São Carlos. http://www.nilc.icmc.usp.br/nilc/index.php

  12. Fayad, M.E., Schmidt, D.C., JOHNSON, R. E.: Building Application Frameworks: Object-Oriented Foundations of Framework Design. Wiley, New York (1999)

    Google Scholar 

Download references

Acknowledgment

We thank FAPESB and CNPq as this work is linked to thematic projects funded by FAPESB (APP0007/2016 and APP0014/2016) and CNPq (436209/2018-7); the Graduate Program in Linguistics (PPGLIN); to the Corpus Linguistics Research Laboratory (LAPELINC); to the State University of Southwest Bahia – UESB; and advisors Prof. Dr. Jorge Viana Santos and Prof. Dr. Cristiane Namiuti.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bruno Silvério Costa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Costa, B.S., Santos, J.V., Namiuti, C., Costa, A.S. (2022). The Systematic Construction of Multiple Types of Corpora Through the Lapelinc Framework. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98305-5_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98304-8

  • Online ISBN: 978-3-030-98305-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics