Skip to main content
Log in

Correspondence as the primary measure of information quality for web archives: a human-centered grounded theory study

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

Creating an archived website that is as close as possible to the original, live website remains one of the most difficult challenges in the field of web archiving. Failing to adequately capture a website might mean an incomplete historical record or, worse, no evidence that the site ever even existed. This paper presents a grounded theory of quality for web archives created using data from web archivists. In order to achieve this, I analyzed support tickets submitted by clients of the Internet Archive’s Archive-It (AIT), a subscription-based web archiving service that helps organizations build and manage their own web archives. Overall, 305 tickets were analyzed, comprising 2544 interactions. The resulting theory is comprised of three dimensions of quality in a web archive: correspondence, relevance, and archivability. The dimension of correspondence, defined as the degree of similarity or resemblance between the original website and the archived website, is the most important facet of quality in web archives, and it is the focus of this work. This paper presents the first theory created specifically for web archives and lays the groundwork for future theoretical developments in the field. Furthermore, the theory is human-centered and grounded in how users and creators of web archives perceive their quality. By clarifying the notion of quality in a web archive, this research will be of benefit to web archivists and cultural heritage institutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Ainsworth, S.G., Nelson, M.L.: Evaluating sliding and sticky target policies by measuring temporal drift in acyclic walks through a web archive. Int. J. Digit. Libr. 16(2), 129–144 (2015). https://doi.org/10.1007/s00799-014-0120-4

    Article  Google Scholar 

  2. Ainsworth, S.G., Nelson, M.L., Van de Sompel, H.: A framework for evaluation of composite memento temporal coherence. Comput. Res. Respository (CoRR) (2014). arXiv:1402.0928

  3. AlNoamany, Y., Weigle, M.C., Nelson, M.L.: Detecting off-topic pages in web archives. In: Kapidakis, S., Mazurek, C., Werla, M. (eds.) Research and Advanced Technology for Digital Libraries: Lecture Notes in Computer Science, vol. 9316, pp. 225–237. Springer, Cham (2015)

    Chapter  Google Scholar 

  4. Archive-It. The Avengers Age of Ultron Movie, Trailer & official movie site (2015). http://wayback.archive-it.org/5182/20150501165207/http:/marvel.com/avengers/

  5. Archive-It. Access to your archives in “proxy mode” (2021). https://support.archive-it.org/hc/en-us/articles/208002206-Access-to-your-archives-in-Proxy-Mode-

  6. Archive-It. Learn more (2021). https://archive-it.org/learn-more

  7. Banos, V., Manolopoulos, Y.: A quantitative approach to evaluate website archivability using the CLEAR+ method. Int. J. Digit. Libr. 17, 119–141 (2015). https://doi.org/10.1007/s00799-015-0144-4

    Article  Google Scholar 

  8. Batini, C., Scannapieco, M.: Data and Information Quality: Dimensions, Principles and Techniques (Data-centric Systems and Applications). Springer, Cham (2016)

    Book  Google Scholar 

  9. British Library. Nye and Jennie: a working class tale of life, labour and love (2018). https://www.webarchive.org.uk/wayback/archive/1/https://www.nyeandjennie.com/

  10. Brunelle, J., Kelly, M., SalahEldeen, H., et al.: Not all mementos are created equal: measuring the impact of missing resources. Int. J. Digit. Librar. (2015). https://doi.org/10.1007/s00799-015-0150-6

    Article  Google Scholar 

  11. Denev, D., Mazeika, A., Spaniol, M., et al.: The SHARC framework for data quality in web archiving. VLDB J. 20(2), 183–207 (2011). https://doi.org/10.1007/s00778-011-0219-9

    Article  Google Scholar 

  12. Farrell, M., McCain, E., Praetzellis, M., et al.: Web archiving in the united states: a 2017 survey (2017). http://ndsa.org/publications/

  13. Glaser, B.: Theoretical Sensitivity: Advances in the Methodology of Grounded Theory. The Sociology Press, Mill Valley, CA (1978)

    Google Scholar 

  14. Glaser, B., Strauss, A.: The Discovery of Grounded Theory: Strategies for Qualitative Research. Routledge, London (1967)

    Google Scholar 

  15. Grbich, C.: Qualitative Data Analysis: An Introduction, 2nd edn. SAGE Publications Ltd, London (2012)

    Google Scholar 

  16. Internet Archive. UNT Map (2017). https://web.archive.org/web/20170910180007/http://maps.unt.edu/

  17. Kiesel, J., Kneist, F., Alshomary, M., et al.: Reproducible web corpora: interactive archiving with automatic quality assessment. J. Data Inf. Qual. (2018). https://doi.org/10.1145/3239574

    Article  Google Scholar 

  18. Klein, M., Shankar, H., Balakireva, L., et al.: The memento tracer framework: balancing quality and scalability for web archiving. In: Doucet, A., Isaac, A., Golub, K., et al. (eds.) Digital Libraries for Open Knowledge, pp. 163–176. Springer, Cham (2019)

    Chapter  Google Scholar 

  19. Masanès, J.: Web Archiving. Springer, Berlin (2006)

    Book  Google Scholar 

  20. Ohio State University. Causal reasoning (2011), http://www.istarassessment.org/srdims/causal-reasoning-2/

  21. Pinter, C.C.: A Book of Set Theory. Dover Publications, Mineola, NY (2014)

    Google Scholar 

  22. Poursardar, F., Shipman, F.: How perceptions of web resource boundaries differ for institutional and personal archives. In: 2018 IEEE International Conference on Information Reuse and Integration (IRI), pp. 126–129 (2018). https://doi.org/10.1109/IRI.2018.00026

  23. QSR International. Nvivo product range (2016). http://www.qsrinternational.com/nvivo-product

  24. Reyes Ayala, B.: A grounded theory of information quality in web archives. University of North Texas. Doctoral dissertation (2018)

  25. Reyes Ayala, B., Phillips, M.E., Ko, L.: Current quality assurance practices in web archiving, Research report, University of North Texas (2014). http://digital.library.unt.edu/ark:/67531/metadc333026/

  26. Spaniol, M., Mazeika, A., Denev, D., et al.: “Catch me if you can”: visual analysis of coherence defects in web archiving. In: Proceedings of the 9th International Web Archiving Workshop (IWAW), Corfu, Greece, September 30–October 1, pp. 27–37 (2009)

  27. University of Alberta. Darryl Nepinak (2016). https://wayback.archive-it.org/6296/20160222211229/darrylnepinak.yolasite.com/

Download references

Acknowledgements

An earlier version of this paper was presented at the conference Theory and Practice in Digital Libraries (TPDL 2020) in Lyon, France. The author would like to thank the reviewers for their constructive feedback, as well as Jefferson Bailey and Lori Donovan of the Internet Archive, who made this research possible.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brenda Reyes Ayala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reyes Ayala, B. Correspondence as the primary measure of information quality for web archives: a human-centered grounded theory study. Int J Digit Libr 23, 19–31 (2022). https://doi.org/10.1007/s00799-021-00314-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-021-00314-x

Keywords

Navigation