Comparison Between Performance of Various Database Systems for Implementing a Language Corpus

Upeksha, Dimuthu; Wijayarathna, Chamila; Siriwardena, Maduranga; Lasandun, Lahiru; Wimalasuriya, Chinthana; de Silva, N. H. N. D.; Dias, Gihan

doi:10.1007/978-3-319-18422-7_7

Dimuthu Upeksha⁶,
Chamila Wijayarathna⁶,
Maduranga Siriwardena⁶,
Lahiru Lasandun⁶,
Chinthana Wimalasuriya⁶,
N. H. N. D. de Silva⁶ &
…
Gihan Dias⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 521))

Included in the following conference series:

International Conference: Beyond Databases, Architectures and Structures

1535 Accesses
1 Citations
4 Altmetric

Abstract

Data storage and information retrieval are some of the most important aspects when it comes to the development of a language corpus. Currently most corpora use either relational databases or indexed file systems. When selecting a data storage system, most important facts to consider are the speeds of data insertion and information retrieval. Other than the aforementioned two approaches, currently there are various database systems which have different strengths that can be more useful. This paper compares the performance of data storage and retrieval mechanisms which use relational databases, graph databases, column store databases and indexed file systems for various steps such as inserting data into corpus and retrieving information from it, and tries to suggest an optimal storage architecture for a language corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aston, G., Burnard, L.: The BNC Handbook:Exploring the British National Corpus with SARA, http://corpus.leeds.ac.uk/teaching/aston-burnard-bnc.pdf
Bennet, G.R.: Using Corpora in the Language Learning Classroom, http://www.international.ucla.edu/media/files/Using-corpora-in-the-language-learning-classroom-Corpus-linguistics-for-teachers-my-atc.pdf
Davies, M.: The advantage of using relational databases for large corpora. International Journal of Corpus Linguistics 10(3), 307–335 (2005)
Article Google Scholar
Davies, M.: The 385+ million word corpus of contemporary American English (1990–2008+) design, architecture, and linguistic insights. International Journal of Corpus Linguistics 14(2), 159–191 (2009)
Article Google Scholar
H2: Performance, http://www.h2database.com/html/performance.html
Jouili, S., Vansteenberghe, V.: The advantage of using relational databases for large corpora. In: International Conference on Social Computing (SocialCom), pp. 708–715 (2013)
Google Scholar
Rabl, T.M., et al.: Solving big data challenges for enterprise application performance management. PVLDB, 1724–1735 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Moratuwa, Moratuwa, Sri Lanka
Dimuthu Upeksha, Chamila Wijayarathna, Maduranga Siriwardena, Lahiru Lasandun, Chinthana Wimalasuriya, N. H. N. D. de Silva & Gihan Dias

Authors

Dimuthu Upeksha
View author publications
You can also search for this author in PubMed Google Scholar
Chamila Wijayarathna
View author publications
You can also search for this author in PubMed Google Scholar
Maduranga Siriwardena
View author publications
You can also search for this author in PubMed Google Scholar
Lahiru Lasandun
View author publications
You can also search for this author in PubMed Google Scholar
Chinthana Wimalasuriya
View author publications
You can also search for this author in PubMed Google Scholar
N. H. N. D. de Silva
View author publications
You can also search for this author in PubMed Google Scholar
Gihan Dias
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimuthu Upeksha .

Editor information

Editors and Affiliations

Silesian University of Technology, Gliwice, Poland
Stanisław Kozielski
Silesian University of Technology, Gliwice, Poland
Dariusz Mrozek
Silesian University of Technology, Gliwice, Poland
Paweł Kasprowski
Silesian University of Technology, Gliwice, Poland
Bożena Małysiak-Mrozek
Silesian University of Technology, Gliwice,, Poland
Daniel Kostrzewa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Upeksha, D. et al. (2015). Comparison Between Performance of Various Database Systems for Implementing a Language Corpus. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. BDAS 2015. Communications in Computer and Information Science, vol 521. Springer, Cham. https://doi.org/10.1007/978-3-319-18422-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-18422-7_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18421-0
Online ISBN: 978-3-319-18422-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics