Abstract
Building a digital library of antique documents involves not only technical implementation issues, but also aspects related to the digitization of large collections of documents. Antique documents are usually delicate and need to be handled with care. Also, a poor state of preservation and the use of unrecognizable font types make automatic text recognition more difficult, hence requiring a further human revision to perform text corrections. This makes the participation of experts in the digitization process mandatory and, therefore, costly. In this paper, we present a framework for managing the workflow of the digitization of large collections of antique documents. We describe the digitization process, and a tool supporting all of its phases and tasks. We also present a case study in which we describe how the workflow management system was applied to the digitization of more than 10,000 documents from journals of the 19th century. In addition, we describe the resulting digital library, focusing on the most important technological issues.
Similar content being viewed by others
References
Aalst WMP, Hee KM (2002) Workflow management: models, methods, and systems. MIT Press, Cambridge, MA
Arms, C. R. (2000), “Keeping Memory Alive: Practices for Preserving Digital Content at the National Digital Library Program of the Library of Congress”, RLG DigiNews, Vol 4 No 3, available at: http://www.rlg.org/legacy/preserv/diginews/diginews4-3.html#feature1 (accessed 11 May 2007)
Bainbridge D, Thompson J, Witten IH (2003) “Assembling and enriching library collections”, proceedings of JCDL’03: joint conference on digital libraries, May 27–31. Houston, Texas, USA
Baird HS (2003) “Digital libraries and document image analysis”, proceedings of the seventh international conference on document analysis and recognition, august 3–6. Edinburgh, UK
Banerjee J, Namboodiri A, Jawahar C (2009) “Contextual restoration of severely degraded document images”, proceedings of the IEEE conference on computer vision and pattern recognition, CVPR 2009, june 20–25. Miami, Fl, pp 517–524
Borgman C. (2002), “Challenges in Building Digital Libraries for the 21st Century”, Proceedings of 5th International Conference on Asian Digital Libraries, ICADL 2002, December 11–14, Singapore, pp. 1–13.
Buchanan G, Bainbridge D, Don KJ (2005) “A New framework for building digital library collections”, proceedings of JCDL’05: joint conference on digital libraries, june 7–11. Denver, Colorado, USA
Brisaboa NR, Fariña A, Navarro G, Paramá JR (2007) “Lightweight natural language text compression”, information retrieval, 10 (1). Springer, Netherlands, pp 1–33
Chang, N. and Hopkinson, A. (2006), “Reskilling staff for digital libraries”, Digital Libraries: Achievements, Challenges and Opportunities, Lecture Notes in Computer Science, Vol. 4312, Springer-Verlag, Berlin, pp. 531–532.
CCSDS: Consultative Committee for Space Data Systems (2002), “Referente Model an Open Archival información System (OAIS)”, Available at: http://public.ccsds.org/publications/archive/650x0m1.pdf (accessed January 2014).
Cramer T, Kott K (2010) “Designing and implementing second generation digital preservation services: a scalable model for the Stanford digital repositor”, D-Lib magazine, 16 (9/10), online., Available at http://www.dlib.org/dlib/september10/cramer/09cramer.html
Delos (2008), “A Reference Model for Digital Library Management Systems”, Available at: http://www.delos.info/index.php?option=com_content&task=view&id=345&Itemid= (accessed January 2014).
Duguid P (1997) Report of the Santa Fe planning workshop on distributed knowledge work enviroments: digital libraries”. University of Michigan, School of Information
Fischer L (ed) (2003) Workflow handbook 2003, workflow management coalition, future strategies. Lighthouse Point, Florida
Hollingsworth, D. (1995), “WFMC Reference Model”. January 1995, available at: www.wfmc.org/standards/docs/tc003v11.pdf. (accessed January 2014).
Kolak O, Byrne WJ, Resnik P (2003) “A generative probabilistic OCR model for NLP applications”, proceedings of HLT-NAACL, May 27-june 1. Edmonton, Canada
Larson R, Carson C (1999) “Information access for a digital library: Cheshire II and the Berkeley environmental digital library”, proceedings of ASIS’99, october 31- november 4. Washington D.C, USA
Library of Congress (2007), “Metadata Encoding and Transmission Standard (METS)”, available: http://www.loc.gov/standards/mets/
McCray AT, Gallagher ME (2001) “Principles for digital library development” communications of the ACM, 44 (4). ACM, NEW YORK, NY, pp 49–54
Moura ES, Navarro G, Ziviani N, Baeza-Yates R (2000) “Fast and flexible word searching on compressed text” ACM Transactions on Information Systems, 18 (2). ACM, NEW YORK, NY, pp 113–139
Mourão, H. and Antunes, P. (2003), “Workflow Recovery Framework for Exception Handling: Involving the User”, Groupware: Design, Implementation, and Use, 9th International Workshop, CRIWG 2003, Lecture Notes in Computer Science, Vol. 2806, Springer-Verlag, Berlin, pp. 159–167.
Navarro G, Raffinot M (2002) Flexible pattern matching in strings. Cambridge University Press, Cambridge
Paramá JR, Places AS, Brisaboa NR, Penabad MR (2006) “The desing of a virtual library of emblem books”, software: practice and experience, 36 (5). John Willey & Sons, Sussex, England, pp 473–494
Places AS, Brisaboa NR, Fariña A, Luaces MR, Paramá JR, Penabad MR (2007) “The Galician virtual library”, online information review, 31 (3). Emerald Group Publishing Limited, Yorkshire, England, pp 333–352
Ross, S. and M. Hedstrom (2005), “Preservation research and sustainable digital libraries”, International Journal on Digital Libraries, Vol 5 No 4, Springer, pp. 317–324.
Ross, S. (2014), “Digital preservation, archival science and methodological foundations for digital libraries”, New Review of Information Networking, Vol. 17, Taylor & Francis Group, pp. 43–68.
Sankar, K. P., Ambati, V., Pratha, L. and Jawahar, C. V. (2006), “Digitizing a Million Books: Challenges for Document Analysis”, Proceedings of Development and Application Systems, DAS 2006, Lecture Notes in Computer Science, Vol. 3872, Springer-Verlag, Berlin, pp. 425–436.
Van de Sompel, H. and Lagoze, C. (2000), “The Santa Fe Convention of the Open Archives Initiative”, Dlib Magazine, Vol 6 No 2, available http://www.dlib.org/dlib/february00/vandesompel-oai/02vandesompel-oai.html (accesed January 2014)
Witten IH, Bainbridge D (2003) How to build a digital library. Morgan Kaufmann Publishers, San Mateo, CA
Acknowledgments
This work has been partially funded by “Xunta de Galicia (Cofinanciado con Fondos FEDER)”, ref. GRC2013/053, “Ministerio de Ciencia e Innovación (PGE e Fondos FEDER)” ref. TIN2009-14560-C03-02 and ref. TIN2010-21246-C02-01, and CDTI EXP 00064563 / ITC-20133062 (“Subvencionado polo CDTI, Ministerio de Economía e Competitividade e pola Axencia Galega de Innovación”).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Places, Á.S., Fariña, A., Luaces, M.R. et al. A workflow management system to feed digital libraries: proposal and case study. Multimed Tools Appl 75, 3843–3877 (2016). https://doi.org/10.1007/s11042-014-2155-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2155-3