ABSTRACT
The Internet Archive's Wayback Machine is the largest modern web archive, preserving web content since 1996. We discover and analyze several vulnerabilities in how the Wayback Machine archives data, and then leverage these vulnerabilities to create what are to our knowledge the first attacks against a user's view of the archived web. Our vulnerabilities are enabled by the unique interaction between the Wayback Machine's archives, other websites, and a user's browser, and attackers do not need to compromise the archives in order to compromise users' views of a stored page. We demonstrate the effectiveness of our attacks through proof-of-concept implementations. Then, we conduct a measurement study to quantify the prevalence of vulnerabilities in the archive. Finally, we explore defenses which might be deployed by archives, website publishers, and the users of archives, and present the prototype of a defense for clients of the Wayback Machine, ArchiveWatcher.
Supplemental Material
- Ada Lerner, Anna Kornfeld Simpson, Tadayoshi Kohno, Franziska Roesner 2016. Internet Jones and the Raiders of the Lost Trackers: An Arcahaeological Study of Web Tracking from 1996 to 2016. 25th USENIX Security Symposium (August 2016).Google Scholar
- Scott G. Ainsworth, Ahmed AlSum, Hany SalahEldeen, Michele C. Weigle, and Michael L. Nelson. 2012. How Much of the Web Is Archived? arxiv.org (2012), 1--10. showeprint[arxiv]1212.6177http://arxiv.org/abs/1212.6177Google Scholar
- Scott G Ainsworth and Michael L Nelson 2004. Only One Out of Five Archived Web Pages Existed as Presented. ACM HT'15 (2004). http://public.lanl.gov/herbertv/papers/Papers/2015/ht15-ainsworth-submission.pdfGoogle Scholar
- Scott G Ainsworth, Michael L Nelson, and Herbert Van de Sompel 2015. Only One Out of Five Archived Web Pages Existed as Presented Proceedings of the 26th ACM Conference on Hypertext & Social Media. ACM, 257--266.Google Scholar
- Internet Archive. 2017. Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. https://github.com/internetarchive/heritrix3. (2017). shownoteAccessed: 2017-08--16.Google Scholar
- Internet Archive. 2017. IA's public Wayback Machine (moved from SourceForge). https://github.com/internetarchive/wayback. (2017). shownoteAccessed: 2017-08--16.Google Scholar
- Justin F. Brunelle. 2012. 2012--10--10: Zombies in the Archives. http://ws-dl.blogspot.com/2012/10/2012--10--10-zombies-in-archives.html. (2012). shownoteAccessed: 2017-05--13.Google Scholar
- Justin F Brunelle, Mat Kelly, Hany Salaheldeen, Michele C Weigle, and Michael L Nelson. 2015. Not All Mementos Are Created Equal : Measuring The Impact Of Missing Resources Categories and Subject Descriptors. International Journal on Digital Libraries (2015).Google Scholar
- International Internet Preservation Consortium 2017. The OpenWayback Development http://www.netpreserve.org/openwayback. https://github.com/iipc/openwayback. (2017). shownoteAccessed: 2017-08--16.Google Scholar
- Shawn E. Douglas [n. d.]. Citing from a Digital Archive like the Internet Archive: A Cheat Sheet. http://www.writediteach.com/images/Citing%20from%20a%20Digital%20Archive%20like%20the%20Internet%20Archive.pdf. ( [n. d.]). shownoteAccessed: 2017-05-08.Google Scholar
- Peter Eckersley. 2010. How unique is your web browser? Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 6205 LNCS (2010), 1--18. w.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/soskaGoogle Scholar
- Stanford Libraries. 2017. Web Archiving | Stanford Libraries. http://library.stanford.edu/projects/web-archiving. (2017). shownoteAccessed: 2017-08--16.Google Scholar
- Wikipedia. 2017. List of Web archiving initiatives. https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives. (2017). shownoteAccessed: 2017-08--16.endthebibliographyGoogle Scholar
Index Terms
- Rewriting History: Changing the Archived Web from the Present
Recommendations
Defeating Cross-Site Request Forgery Attacks with Browser-Enforced Authenticity Protection
Financial Cryptography and Data SecurityA cross site request forgery (CSRF) attack occurs when a user's web browser is instructed by a malicious webpage to send a request to a vulnerable web site, resulting in the vulnerable web site performing actions not intended by the user. CSRF ...
History and Future of Automated Vulnerability Analysis
SACMAT '19: Proceedings of the 24th ACM Symposium on Access Control Models and TechnologiesThe software upon which our modern society operates is riddled with security vulnerabilities. These vulnerabilities allow hackers access to our sensitive data and make our system insecure. To identify vulnerabilities in software, human experts, or ...
Client-side cross-site scripting protection
Web applications are becoming the dominant way to provide access to online services. At the same time, web application vulnerabilities are being discovered and disclosed at an alarming rate. Web applications often make use of JavaScript code that is ...
Comments