Skip to main content
Log in

A workflow management system to feed digital libraries: proposal and case study

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Building a digital library of antique documents involves not only technical implementation issues, but also aspects related to the digitization of large collections of documents. Antique documents are usually delicate and need to be handled with care. Also, a poor state of preservation and the use of unrecognizable font types make automatic text recognition more difficult, hence requiring a further human revision to perform text corrections. This makes the participation of experts in the digitization process mandatory and, therefore, costly. In this paper, we present a framework for managing the workflow of the digitization of large collections of antique documents. We describe the digitization process, and a tool supporting all of its phases and tasks. We also present a case study in which we describe how the workflow management system was applied to the digitization of more than 10,000 documents from journals of the 19th century. In addition, we describe the resulting digital library, focusing on the most important technological issues.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Notes

  1. http://bdh.bne.es/

  2. http://memory.loc.gov/ammem/about/techIn.html

  3. http://www.dli.gov.in/

  4. http://www.lib.uchicago.edu/e/ets/eos/

  5. http://www.dlib.org/dlib/september10/cramer/09cramer.html

  6. http://www.greenstone.org

  7. http://fedora-commons.org/

  8. http://www.dspace.org

  9. http://lucene.apache.org/core/

References

  1. Aalst WMP, Hee KM (2002) Workflow management: models, methods, and systems. MIT Press, Cambridge, MA

    Google Scholar 

  2. Arms, C. R. (2000), “Keeping Memory Alive: Practices for Preserving Digital Content at the National Digital Library Program of the Library of Congress”, RLG DigiNews, Vol 4 No 3, available at: http://www.rlg.org/legacy/preserv/diginews/diginews4-3.html#feature1 (accessed 11 May 2007)

  3. Bainbridge D, Thompson J, Witten IH (2003) “Assembling and enriching library collections”, proceedings of JCDL’03: joint conference on digital libraries, May 27–31. Houston, Texas, USA

    Google Scholar 

  4. Baird HS (2003) “Digital libraries and document image analysis”, proceedings of the seventh international conference on document analysis and recognition, august 3–6. Edinburgh, UK

    Google Scholar 

  5. Banerjee J, Namboodiri A, Jawahar C (2009) “Contextual restoration of severely degraded document images”, proceedings of the IEEE conference on computer vision and pattern recognition, CVPR 2009, june 20–25. Miami, Fl, pp 517–524

    Book  Google Scholar 

  6. Borgman C. (2002), “Challenges in Building Digital Libraries for the 21st Century”, Proceedings of 5th International Conference on Asian Digital Libraries, ICADL 2002, December 11–14, Singapore, pp. 1–13.

  7. Buchanan G, Bainbridge D, Don KJ (2005) “A New framework for building digital library collections”, proceedings of JCDL’05: joint conference on digital libraries, june 7–11. Denver, Colorado, USA

    Google Scholar 

  8. Brisaboa NR, Fariña A, Navarro G, Paramá JR (2007) “Lightweight natural language text compression”, information retrieval, 10 (1). Springer, Netherlands, pp 1–33

    Google Scholar 

  9. Chang, N. and Hopkinson, A. (2006), “Reskilling staff for digital libraries”, Digital Libraries: Achievements, Challenges and Opportunities, Lecture Notes in Computer Science, Vol. 4312, Springer-Verlag, Berlin, pp. 531–532.

  10. CCSDS: Consultative Committee for Space Data Systems (2002), “Referente Model an Open Archival información System (OAIS)”, Available at: http://public.ccsds.org/publications/archive/650x0m1.pdf (accessed January 2014).

  11. Cramer T, Kott K (2010) “Designing and implementing second generation digital preservation services: a scalable model for the Stanford digital repositor”, D-Lib magazine, 16 (9/10), online., Available at http://www.dlib.org/dlib/september10/cramer/09cramer.html

    Google Scholar 

  12. Delos (2008), “A Reference Model for Digital Library Management Systems”, Available at: http://www.delos.info/index.php?option=com_content&task=view&id=345&Itemid= (accessed January 2014).

  13. Duguid P (1997) Report of the Santa Fe planning workshop on distributed knowledge work enviroments: digital libraries”. University of Michigan, School of Information

    Google Scholar 

  14. Fischer L (ed) (2003) Workflow handbook 2003, workflow management coalition, future strategies. Lighthouse Point, Florida

    Google Scholar 

  15. Hollingsworth, D. (1995), “WFMC Reference Model”. January 1995, available at: www.wfmc.org/standards/docs/tc003v11.pdf. (accessed January 2014).

  16. Kolak O, Byrne WJ, Resnik P (2003) “A generative probabilistic OCR model for NLP applications”, proceedings of HLT-NAACL, May 27-june 1. Edmonton, Canada

    Google Scholar 

  17. Larson R, Carson C (1999) “Information access for a digital library: Cheshire II and the Berkeley environmental digital library”, proceedings of ASIS’99, october 31- november 4. Washington D.C, USA

    Google Scholar 

  18. Library of Congress (2007), “Metadata Encoding and Transmission Standard (METS)”, available: http://www.loc.gov/standards/mets/

  19. McCray AT, Gallagher ME (2001) “Principles for digital library development” communications of the ACM, 44 (4). ACM, NEW YORK, NY, pp 49–54

    Google Scholar 

  20. Moura ES, Navarro G, Ziviani N, Baeza-Yates R (2000) “Fast and flexible word searching on compressed text” ACM Transactions on Information Systems, 18 (2). ACM, NEW YORK, NY, pp 113–139

    Google Scholar 

  21. Mourão, H. and Antunes, P. (2003), “Workflow Recovery Framework for Exception Handling: Involving the User”, Groupware: Design, Implementation, and Use, 9th International Workshop, CRIWG 2003, Lecture Notes in Computer Science, Vol. 2806, Springer-Verlag, Berlin, pp. 159–167.

  22. Navarro G, Raffinot M (2002) Flexible pattern matching in strings. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  23. Paramá JR, Places AS, Brisaboa NR, Penabad MR (2006) “The desing of a virtual library of emblem books”, software: practice and experience, 36 (5). John Willey & Sons, Sussex, England, pp 473–494

    Google Scholar 

  24. Places AS, Brisaboa NR, Fariña A, Luaces MR, Paramá JR, Penabad MR (2007) “The Galician virtual library”, online information review, 31 (3). Emerald Group Publishing Limited, Yorkshire, England, pp 333–352

    Google Scholar 

  25. Ross, S. and M. Hedstrom (2005), “Preservation research and sustainable digital libraries”, International Journal on Digital Libraries, Vol 5 No 4, Springer, pp. 317–324.

  26. Ross, S. (2014), “Digital preservation, archival science and methodological foundations for digital libraries”, New Review of Information Networking, Vol. 17, Taylor & Francis Group, pp. 43–68.

  27. Sankar, K. P., Ambati, V., Pratha, L. and Jawahar, C. V. (2006), “Digitizing a Million Books: Challenges for Document Analysis”, Proceedings of Development and Application Systems, DAS 2006, Lecture Notes in Computer Science, Vol. 3872, Springer-Verlag, Berlin, pp. 425–436.

  28. Van de Sompel, H. and Lagoze, C. (2000), “The Santa Fe Convention of the Open Archives Initiative”, Dlib Magazine, Vol 6 No 2, available http://www.dlib.org/dlib/february00/vandesompel-oai/02vandesompel-oai.html (accesed January 2014)

  29. Witten IH, Bainbridge D (2003) How to build a digital library. Morgan Kaufmann Publishers, San Mateo, CA

    Google Scholar 

Download references

Acknowledgments

This work has been partially funded by “Xunta de Galicia (Cofinanciado con Fondos FEDER)”, ref. GRC2013/053, “Ministerio de Ciencia e Innovación (PGE e Fondos FEDER)” ref. TIN2009-14560-C03-02 and ref. TIN2010-21246-C02-01, and CDTI EXP 00064563 / ITC-20133062 (“Subvencionado polo CDTI, Ministerio de Economía e Competitividade e pola Axencia Galega de Innovación”).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Óscar Pedreira.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Places, Á.S., Fariña, A., Luaces, M.R. et al. A workflow management system to feed digital libraries: proposal and case study. Multimed Tools Appl 75, 3843–3877 (2016). https://doi.org/10.1007/s11042-014-2155-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2155-3

Keywords

Navigation