skip to main content
10.1145/3133956.3134042acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article
Public Access

Rewriting History: Changing the Archived Web from the Present

Published:30 October 2017Publication History

ABSTRACT

The Internet Archive's Wayback Machine is the largest modern web archive, preserving web content since 1996. We discover and analyze several vulnerabilities in how the Wayback Machine archives data, and then leverage these vulnerabilities to create what are to our knowledge the first attacks against a user's view of the archived web. Our vulnerabilities are enabled by the unique interaction between the Wayback Machine's archives, other websites, and a user's browser, and attackers do not need to compromise the archives in order to compromise users' views of a stored page. We demonstrate the effectiveness of our attacks through proof-of-concept implementations. Then, we conduct a measurement study to quantify the prevalence of vulnerabilities in the archive. Finally, we explore defenses which might be deployed by archives, website publishers, and the users of archives, and present the prototype of a defense for clients of the Wayback Machine, ArchiveWatcher.

Skip Supplemental Material Section

Supplemental Material

References

  1. Ada Lerner, Anna Kornfeld Simpson, Tadayoshi Kohno, Franziska Roesner 2016. Internet Jones and the Raiders of the Lost Trackers: An Arcahaeological Study of Web Tracking from 1996 to 2016. 25th USENIX Security Symposium (August 2016).Google ScholarGoogle Scholar
  2. Scott G. Ainsworth, Ahmed AlSum, Hany SalahEldeen, Michele C. Weigle, and Michael L. Nelson. 2012. How Much of the Web Is Archived? arxiv.org (2012), 1--10. showeprint[arxiv]1212.6177http://arxiv.org/abs/1212.6177Google ScholarGoogle Scholar
  3. Scott G Ainsworth and Michael L Nelson 2004. Only One Out of Five Archived Web Pages Existed as Presented. ACM HT'15 (2004). http://public.lanl.gov/herbertv/papers/Papers/2015/ht15-ainsworth-submission.pdfGoogle ScholarGoogle Scholar
  4. Scott G Ainsworth, Michael L Nelson, and Herbert Van de Sompel 2015. Only One Out of Five Archived Web Pages Existed as Presented Proceedings of the 26th ACM Conference on Hypertext & Social Media. ACM, 257--266.Google ScholarGoogle Scholar
  5. Internet Archive. 2017. Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. https://github.com/internetarchive/heritrix3. (2017). shownoteAccessed: 2017-08--16.Google ScholarGoogle Scholar
  6. Internet Archive. 2017. IA's public Wayback Machine (moved from SourceForge). https://github.com/internetarchive/wayback. (2017). shownoteAccessed: 2017-08--16.Google ScholarGoogle Scholar
  7. Justin F. Brunelle. 2012. 2012--10--10: Zombies in the Archives. http://ws-dl.blogspot.com/2012/10/2012--10--10-zombies-in-archives.html. (2012). shownoteAccessed: 2017-05--13.Google ScholarGoogle Scholar
  8. Justin F Brunelle, Mat Kelly, Hany Salaheldeen, Michele C Weigle, and Michael L Nelson. 2015. Not All Mementos Are Created Equal : Measuring The Impact Of Missing Resources Categories and Subject Descriptors. International Journal on Digital Libraries (2015).Google ScholarGoogle Scholar
  9. International Internet Preservation Consortium 2017. The OpenWayback Development http://www.netpreserve.org/openwayback. https://github.com/iipc/openwayback. (2017). shownoteAccessed: 2017-08--16.Google ScholarGoogle Scholar
  10. Shawn E. Douglas [n. d.]. Citing from a Digital Archive like the Internet Archive: A Cheat Sheet. http://www.writediteach.com/images/Citing%20from%20a%20Digital%20Archive%20like%20the%20Internet%20Archive.pdf. ( [n. d.]). shownoteAccessed: 2017-05-08.Google ScholarGoogle Scholar
  11. Peter Eckersley. 2010. How unique is your web browser? Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 6205 LNCS (2010), 1--18. w.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/soskaGoogle ScholarGoogle Scholar
  12. Stanford Libraries. 2017. Web Archiving | Stanford Libraries. http://library.stanford.edu/projects/web-archiving. (2017). shownoteAccessed: 2017-08--16.Google ScholarGoogle Scholar
  13. Wikipedia. 2017. List of Web archiving initiatives. https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives. (2017). shownoteAccessed: 2017-08--16.endthebibliographyGoogle ScholarGoogle Scholar

Index Terms

  1. Rewriting History: Changing the Archived Web from the Present

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security
        October 2017
        2682 pages
        ISBN:9781450349468
        DOI:10.1145/3133956

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 October 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CCS '17 Paper Acceptance Rate151of836submissions,18%Overall Acceptance Rate1,261of6,999submissions,18%

        Upcoming Conference

        CCS '24
        ACM SIGSAC Conference on Computer and Communications Security
        October 14 - 18, 2024
        Salt Lake City , UT , USA

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader