Abstract
DSpace, Fedora, and Greenstone are three widely used open source digital library systems. In this paper we report on scalability tests performed on these tools by ourselves and others. These range from repositories populated with synthetically produced data to real world deployment with content measured in millions of items. A case study is presented that details how one of the systems performed when used to produce fully-searchable newspaper collections containing in excess of 20 GB of raw text (2 billion words, with 60 million unique terms), 50 GB of metadata, and 570 GB of images.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lagoze, C., Payette, S., Shin, E., Wilper, C.: Fedora: an architecture for complex objects and their relationships. International Journal on Digital Libraries 6(2), 124–138 (2006)
Littman, J.: Technical approach and distributed model for validation of digital objects. D-Lib Magazine 12(5) (2006)
Misr, D., Seamans, J., Thoma, G.R.: Testing the scalability of a DSpace-based archive. Technical report, National Library of Medicine, Bethesda, Maryland, USA (2007)
Payette, S., Lagoze, C.: Flexible and extensible digital object and repository architecture (FEDORA). In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 41–59. Springer, Heidelberg (1998)
Reynaert, M.: Non-interactive OCR post-correction for giga-scale digitization projects. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 617–630. Springer, Heidelberg (2008)
Smith, M., Bass, M., McClella, G., Tansley, R., Barton, M., Branschofsky, M., Stuve, D., Walker, J.: DSpace: An open source dynamic digital repository. D-Lib Magazine 9(1) (2003), doi:10.1045/january2003-smith
Witten, I.H., Bainbridge, D.: A retrospective look at greenstone: lessons from the first decade. In: JCDL 2007: Proceedings of the 2007 conference on Digital libraries, pp. 147–156. ACM Press, New York (2007)
Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: compressing and indexing documents and images, 2nd edn. Morgan Kaufmann, San Francisco (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bainbridge, D., Witten, I.H., Boddie, S., Thompson, J. (2009). Stress-Testing General Purpose Digital Library Software. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2009. Lecture Notes in Computer Science, vol 5714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04346-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-04346-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04345-1
Online ISBN: 978-3-642-04346-8
eBook Packages: Computer ScienceComputer Science (R0)