Skip to main content

Stress-Testing General Purpose Digital Library Software

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5714))

Included in the following conference series:


DSpace, Fedora, and Greenstone are three widely used open source digital library systems. In this paper we report on scalability tests performed on these tools by ourselves and others. These range from repositories populated with synthetically produced data to real world deployment with content measured in millions of items. A case study is presented that details how one of the systems performed when used to produce fully-searchable newspaper collections containing in excess of 20 GB of raw text (2 billion words, with 60 million unique terms), 50 GB of metadata, and 570 GB of images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Lagoze, C., Payette, S., Shin, E., Wilper, C.: Fedora: an architecture for complex objects and their relationships. International Journal on Digital Libraries 6(2), 124–138 (2006)

    Article  Google Scholar 

  2. Littman, J.: Technical approach and distributed model for validation of digital objects. D-Lib Magazine 12(5) (2006)

    Google Scholar 

  3. Misr, D., Seamans, J., Thoma, G.R.: Testing the scalability of a DSpace-based archive. Technical report, National Library of Medicine, Bethesda, Maryland, USA (2007)

    Google Scholar 

  4. Payette, S., Lagoze, C.: Flexible and extensible digital object and repository architecture (FEDORA). In: Nikolaou, C., Stephanidis, C. (eds.) ECDL 1998. LNCS, vol. 1513, pp. 41–59. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  5. Reynaert, M.: Non-interactive OCR post-correction for giga-scale digitization projects. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 617–630. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Smith, M., Bass, M., McClella, G., Tansley, R., Barton, M., Branschofsky, M., Stuve, D., Walker, J.: DSpace: An open source dynamic digital repository. D-Lib Magazine 9(1) (2003), doi:10.1045/january2003-smith

    Google Scholar 

  7. Witten, I.H., Bainbridge, D.: A retrospective look at greenstone: lessons from the first decade. In: JCDL 2007: Proceedings of the 2007 conference on Digital libraries, pp. 147–156. ACM Press, New York (2007)

    Chapter  Google Scholar 

  8. Witten, I.H., Moffat, A., Bell, T.C.: Managing gigabytes: compressing and indexing documents and images, 2nd edn. Morgan Kaufmann, San Francisco (1999)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bainbridge, D., Witten, I.H., Boddie, S., Thompson, J. (2009). Stress-Testing General Purpose Digital Library Software. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2009. Lecture Notes in Computer Science, vol 5714. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04345-1

  • Online ISBN: 978-3-642-04346-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics