An analysis of file format control in institutional repositories
Abstract
Purpose
The purpose of this paper is to analyze the file formats of the digital objects stored in two of the largest open-access repositories in Spain, DDUB and TDX, and determines the implications of these formats for long-term preservation, focussing in particular on the different versions of PDF.
Design/methodology/approach
To be able to study the two repositories, the authors harvested all the files corresponding to every digital object and some of their associated metadata using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and Open Archives Initiative Object Reuse and Exchange (OAI-ORE) protocols. The file formats were analyzed with DROID software and some additional tools.
Findings
The results show that there is no alignment between the preservation policies declared by institutions, the technical tools available, and the actual stored files.
Originality/value
The results show that file controls currently applied to institutional repositories do not suffice to grant their stated mission of long-term preservation of scientific literature.
Keywords
Acknowledgements
This study received a grant from the project El acceso abierto (open access) a la ciencia en España. 2012-2014. Plan Nacional I+D+i, código CSO2011-29503-C02-01. The authors thank Yvonne Friese of the Deutsche Zentralbibliothek für Wirtschaftswissenschaften for the use of her PDF scripts. The authors also thank the CBUC and the UB’s CRAI for their help with the data interpretation.
Citation
Termens, M., Ribera, M. and Locher, A. (2015), "An analysis of file format control in institutional repositories", Library Hi Tech, Vol. 33 No. 2, pp. 162-174. https://doi.org/10.1108/LHT-10-2014-0098
Publisher
:Emerald Group Publishing Limited
Copyright © 2015, Emerald Group Publishing Limited