Abstract
Creating an archived website that is as close as possible to the original, live website remains one of the most difficult challenges in the field of web archiving. Failing to adequately capture a website might mean an incomplete historical record or, worse, no evidence that the site ever even existed. This paper presents a grounded theory of quality for web archives created using data from web archivists. In order to achieve this, I analyzed support tickets submitted by clients of the Internet Archive’s Archive-It (AIT), a subscription-based web archiving service that helps organizations build and manage their own web archives. Overall, 305 tickets were analyzed, comprising 2544 interactions. The resulting theory is comprised of three dimensions of quality in a web archive: correspondence, relevance, and archivability. The dimension of correspondence, defined as the degree of similarity or resemblance between the original website and the archived website, is the most important facet of quality in web archives, and it is the focus of this work. This paper presents the first theory created specifically for web archives and lays the groundwork for future theoretical developments in the field. Furthermore, the theory is human-centered and grounded in how users and creators of web archives perceive their quality. By clarifying the notion of quality in a web archive, this research will be of benefit to web archivists and cultural heritage institutions.
Similar content being viewed by others
References
Ainsworth, S.G., Nelson, M.L.: Evaluating sliding and sticky target policies by measuring temporal drift in acyclic walks through a web archive. Int. J. Digit. Libr. 16(2), 129–144 (2015). https://doi.org/10.1007/s00799-014-0120-4
Ainsworth, S.G., Nelson, M.L., Van de Sompel, H.: A framework for evaluation of composite memento temporal coherence. Comput. Res. Respository (CoRR) (2014). arXiv:1402.0928
AlNoamany, Y., Weigle, M.C., Nelson, M.L.: Detecting off-topic pages in web archives. In: Kapidakis, S., Mazurek, C., Werla, M. (eds.) Research and Advanced Technology for Digital Libraries: Lecture Notes in Computer Science, vol. 9316, pp. 225–237. Springer, Cham (2015)
Archive-It. The Avengers Age of Ultron Movie, Trailer & official movie site (2015). http://wayback.archive-it.org/5182/20150501165207/http:/marvel.com/avengers/
Archive-It. Access to your archives in “proxy mode” (2021). https://support.archive-it.org/hc/en-us/articles/208002206-Access-to-your-archives-in-Proxy-Mode-
Archive-It. Learn more (2021). https://archive-it.org/learn-more
Banos, V., Manolopoulos, Y.: A quantitative approach to evaluate website archivability using the CLEAR+ method. Int. J. Digit. Libr. 17, 119–141 (2015). https://doi.org/10.1007/s00799-015-0144-4
Batini, C., Scannapieco, M.: Data and Information Quality: Dimensions, Principles and Techniques (Data-centric Systems and Applications). Springer, Cham (2016)
British Library. Nye and Jennie: a working class tale of life, labour and love (2018). https://www.webarchive.org.uk/wayback/archive/1/https://www.nyeandjennie.com/
Brunelle, J., Kelly, M., SalahEldeen, H., et al.: Not all mementos are created equal: measuring the impact of missing resources. Int. J. Digit. Librar. (2015). https://doi.org/10.1007/s00799-015-0150-6
Denev, D., Mazeika, A., Spaniol, M., et al.: The SHARC framework for data quality in web archiving. VLDB J. 20(2), 183–207 (2011). https://doi.org/10.1007/s00778-011-0219-9
Farrell, M., McCain, E., Praetzellis, M., et al.: Web archiving in the united states: a 2017 survey (2017). http://ndsa.org/publications/
Glaser, B.: Theoretical Sensitivity: Advances in the Methodology of Grounded Theory. The Sociology Press, Mill Valley, CA (1978)
Glaser, B., Strauss, A.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Routledge, London (1967)
Grbich, C.: Qualitative Data Analysis: An Introduction, 2nd edn. SAGE Publications Ltd, London (2012)
Internet Archive. UNT Map (2017). https://web.archive.org/web/20170910180007/http://maps.unt.edu/
Kiesel, J., Kneist, F., Alshomary, M., et al.: Reproducible web corpora: interactive archiving with automatic quality assessment. J. Data Inf. Qual. (2018). https://doi.org/10.1145/3239574
Klein, M., Shankar, H., Balakireva, L., et al.: The memento tracer framework: balancing quality and scalability for web archiving. In: Doucet, A., Isaac, A., Golub, K., et al. (eds.) Digital Libraries for Open Knowledge, pp. 163–176. Springer, Cham (2019)
Masanès, J.: Web Archiving. Springer, Berlin (2006)
Ohio State University. Causal reasoning (2011), http://www.istarassessment.org/srdims/causal-reasoning-2/
Pinter, C.C.: A Book of Set Theory. Dover Publications, Mineola, NY (2014)
Poursardar, F., Shipman, F.: How perceptions of web resource boundaries differ for institutional and personal archives. In: 2018 IEEE International Conference on Information Reuse and Integration (IRI), pp. 126–129 (2018). https://doi.org/10.1109/IRI.2018.00026
QSR International. Nvivo product range (2016). http://www.qsrinternational.com/nvivo-product
Reyes Ayala, B.: A grounded theory of information quality in web archives. University of North Texas. Doctoral dissertation (2018)
Reyes Ayala, B., Phillips, M.E., Ko, L.: Current quality assurance practices in web archiving, Research report, University of North Texas (2014). http://digital.library.unt.edu/ark:/67531/metadc333026/
Spaniol, M., Mazeika, A., Denev, D., et al.: “Catch me if you can”: visual analysis of coherence defects in web archiving. In: Proceedings of the 9th International Web Archiving Workshop (IWAW), Corfu, Greece, September 30–October 1, pp. 27–37 (2009)
University of Alberta. Darryl Nepinak (2016). https://wayback.archive-it.org/6296/20160222211229/darrylnepinak.yolasite.com/
Acknowledgements
An earlier version of this paper was presented at the conference Theory and Practice in Digital Libraries (TPDL 2020) in Lyon, France. The author would like to thank the reviewers for their constructive feedback, as well as Jefferson Bailey and Lori Donovan of the Internet Archive, who made this research possible.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Reyes Ayala, B. Correspondence as the primary measure of information quality for web archives: a human-centered grounded theory study. Int J Digit Libr 23, 19–31 (2022). https://doi.org/10.1007/s00799-021-00314-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-021-00314-x