Abstract
To perform a longitudinal investigation of web archives and detecting variations and changes replaying individual archived pages, or mementos, we created a sample of 16,627 mementos from 17 public web archives. Over the course of our 14-month study (November, 2017–January, 2019), we found that four web archives changed their base URIs and did not leave a machine-readable method of locating their new base URIs, necessitating manual rediscovery. Of the 1,981 mementos in our sample from these four web archives, 537 were impacted: 517 mementos were rediscovered but with changes in their time of archiving (or Memento-Datetime), HTTP status code, or the string comprising their original URI (or URI-R), and 20 of the mementos could not be found at all.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Although this is outside of our 14-month study, this effectively means that all 351 LAC mementos are currently missing.
References
Ainsworth, S.G., Nelson, M.L., Van de Sompel, H.: A framework for evaluation of composite memento temporal coherence. Tech. Rep. arXiv:1402.0928, arXiv (2014)
AlNoamany, Y., Weigle, M.C., Nelson, M.L.: Detecting off-topic pages in web archives. In: Proceedings of Theory and Practice of Digital Libraries (TPDL), pp. 225–237 (2015). https://doi.org/10.1007/978-3-319-24592-8_17
AlNoamany, Y., Weigle, M.C., Nelson, M.L.: Detecting off-topic pages within TimeMaps in Web archives. Int. J. Digit. Libr. 17(3), 203–221 (2016). https://doi.org/10.1007/s00799-016-0183-5
Aturban, M.: Where did the archive go? Part 1: library and archives Canada (2019). https://ws-dl.blogspot.com/2019/08/2019-08-30-where-did-archive-go-part1.html
Aturban, M.: Where did the archive go? Part 2: National Library of Ireland (2019). https://ws-dl.blogspot.com/2019/09/2019-09-10-where-did-archive-go-part-2.html
Aturban, M.: Where did the archive go? Part 3: Public Record Office of Northern Ireland. https://ws-dl.blogspot.com/2019/09/2019-09-25-where-did-archive-go-part-3.html (2019)
Aturban, M.: A Framework for verifying the fixity of archived web resources. Ph.D. thesis, Old Dominion University (2020). https://doi.org/10.25777/PC8D-Y213
Aturban, M., Alam, S., Nelson, M.L., Weigle, M.C.: Archive assisted archival fixity verification framework. In: Proceedings of the 19th ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 162–171 (2019). https://doi.org/10.1109/JCDL.2019.00032
Aturban, M., Nelson, M.L., Weigle, M.C.: It is hard to compute fixity on archived web pages. In: Proceedings of the Workshop on Web Archiving and Digital Libraries (WADL) held in conjunction with the 18th ACM/IEEE Joint Conference on Digital Libraries (JCDL) (2018), https://vtechworks.lib.vt.edu/bitstream/handle/10919/97988/WADL2018.pdf
Aturban, M., Nelson, M.L., Weigle, M.C., Klein, M., Van de Sompel, H.: Collecting 16K archived web pages from 17 public web archives. Tech. Rep. arXiv:1905.03836, arXiv, May 2019
Berlin, J.: Squidwarc - A high fidelity archival crawler that uses Chrome or Chrome Headless, July 2017. https://github.com/N0taN3rd/Squidwarc
Berners-Lee, T., Fielding, R., Massinter, L.: Uniform Resource Identifier (URI): Generic Syntax, Internet RFC-3986, January 2005. https://datatracker.ietf.org/doc/html/rfc3986
Bornand, N.J., Balakireva, L., Van de Sompel, H.: Routing memento requests using binary classifiers. In: Proceedings of the 16th ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 63–72 (2016). https://doi.org/10.1145/2910896.2910899
Cremona, R.: New memento support at perma.cc, February 2020. https://groups.google.com/g/memento-dev/c/XHB4IezBiqA/m/BpB4u8DjBQAJ
Fielding, R.T.: REST APIs must be hypertext-driven (2008). https://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
International Organization for Standardization (ISO): WARC file format. ISO 28500:2017 (2017). https://www.iso.org/standard/68004.html
Jones, S.M., Weigle, M.C., Nelson, M.L.: The off-topic memento toolkit. In: Proceedings of iPRES (2018). https://doi.org/10.17605/OSF.IO/UBW87
Mohamed Aturban: Mementos-Fixity (2019). https://github.com/oduwsdl/mementos-fixity/blob/master/final_urims.txt
Van de Sompel, H., Nelson, M.L., Sanderson, R.: HTTP framework for time-based access to resource states - Memento, Internet RFC 7089 (2013). http://tools.ietf.org/html/rfc7089
Wilde, E.: The Sunset HTTP Header Field, Internet RFC 8594 (2019). https://tools.ietf.org/html/rfc8594
Zittrain, J., Albert, K., Lessig, L.: Perma: scoping and addressing the problem of link and reference rot in legal citations. Legal Inf. Manag 14(02), 88–99 (2014). https://doi.org/10.1017/S1472669614000255
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Aturban, M., Nelson, M.L., Weigle, M.C. (2021). Where Did the Web Archive Go?. In: Berget, G., Hall, M.M., Brenn, D., Kumpulainen, S. (eds) Linking Theory and Practice of Digital Libraries. TPDL 2021. Lecture Notes in Computer Science(), vol 12866. Springer, Cham. https://doi.org/10.1007/978-3-030-86324-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-86324-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86323-4
Online ISBN: 978-3-030-86324-1
eBook Packages: Computer ScienceComputer Science (R0)