Skip to main content

Comparison Between Performance of Various Database Systems for Implementing a Language Corpus

  • Conference paper
Book cover Beyond Databases, Architectures and Structures (BDAS 2015)

Abstract

Data storage and information retrieval are some of the most important aspects when it comes to the development of a language corpus. Currently most corpora use either relational databases or indexed file systems. When selecting a data storage system, most important facts to consider are the speeds of data insertion and information retrieval. Other than the aforementioned two approaches, currently there are various database systems which have different strengths that can be more useful. This paper compares the performance of data storage and retrieval mechanisms which use relational databases, graph databases, column store databases and indexed file systems for various steps such as inserting data into corpus and retrieving information from it, and tries to suggest an optimal storage architecture for a language corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aston, G., Burnard, L.: The BNC Handbook:Exploring the British National Corpus with SARA, http://corpus.leeds.ac.uk/teaching/aston-burnard-bnc.pdf

  2. Bennet, G.R.: Using Corpora in the Language Learning Classroom, http://www.international.ucla.edu/media/files/Using-corpora-in-the-language-learning-classroom-Corpus-linguistics-for-teachers-my-atc.pdf

  3. Davies, M.: The advantage of using relational databases for large corpora. International Journal of Corpus Linguistics 10(3), 307–335 (2005)

    Article  Google Scholar 

  4. Davies, M.: The 385+ million word corpus of contemporary American English (1990–2008+) design, architecture, and linguistic insights. International Journal of Corpus Linguistics 14(2), 159–191 (2009)

    Article  Google Scholar 

  5. H2: Performance, http://www.h2database.com/html/performance.html

  6. Jouili, S., Vansteenberghe, V.: The advantage of using relational databases for large corpora. In: International Conference on Social Computing (SocialCom), pp. 708–715 (2013)

    Google Scholar 

  7. Rabl, T.M., et al.: Solving big data challenges for enterprise application performance management. PVLDB, 1724–1735 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimuthu Upeksha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Upeksha, D. et al. (2015). Comparison Between Performance of Various Database Systems for Implementing a Language Corpus. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. BDAS 2015. Communications in Computer and Information Science, vol 521. Springer, Cham. https://doi.org/10.1007/978-3-319-18422-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18422-7_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18421-0

  • Online ISBN: 978-3-319-18422-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics