skip to main content
10.1145/3543873.3587351acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
demonstration

Katti: An Extensive and Scalable Tool for Website Analyses

Published: 30 April 2023 Publication History

Abstract

Research on web security and privacy frequently relies on tools that analyze a set of websites. One major obstacle to the judicious analysis is the employment of a rock-solid and feature-rich web crawler. For example, the automated analysis of ad-malware campaigns on websites requests crawling a vast set of domains on multiple real web browsers, while simultaneously mitigating bot detections and applying user interactions on websites. Further, the ability to attach various threat analysis frameworks lacks current tooling efforts in web crawling and analyses.
In this paper we introduce Katti, which overcomes several of today’s technical hurdles in web crawling. Our tool employs a distributed task queue that efficiently and reliably handles both large crawling and threat analyses requests. Katti  extensively collects all available web data through an integrated person-in-the-middle proxy. Moreover, Katti  is not limited to a specific use case, allowing users to easily customize our tool to their individual research intends.

References

[1]
Syed Suleman Ahmad, Muhammad Daniyal Dar, Muhammad Fareed Zaffar, Narseo Vallina-Rodriguez, and Rishab Nithyanand. 2020. Apophanies or Epiphanies? How Crawlers Impact Our Understanding of the Web. In Proceedings of The Web Conference 2020. ACM, 271–280.
[2]
Stefano Calzavara, Tobias Urban, Dennis Tatang, Marius Steffens, and Ben Stock. 2021. Reining in the Web’s Inconsistencies with Site Policy. In Proceedings 2021 Network and Distributed System Security Symposium. Internet Society.
[3]
Darion Cassel, Su-Chin Lin, Alessio Buraggina, William Wang, Andrew Zhang, Lujo Bauer, Hsu-Chun Hsiao, Limin Jia, and Timothy Libert. 2022. OmniCrawl: Comprehensive Measurement of Web Tracking With Real Desktop and Mobile Browsers. Proceedings on Privacy Enhancing Technologies 2022, 1 (Jan. 2022).
[4]
Steven Englehardt and Arvind Narayanan. 2016. Online Tracking: A 1-million-site Measurement and Analysis. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, Vienna Austria, 1388–1401.
[5]
Luca Invernizzi, Kurt Thomas, Alexandros Kapravelos, Oxana Comanescu, Jean-Michel Picod, and Elie Bursztein. 2016. Cloak of Visibility: Detecting When Machines Browse a Different Web. In 2016 IEEE Symposium on Security and Privacy (SP). 743–758. ISSN: 2375-1207.
[6]
Umar Iqbal, Steven Englehardt, and Zubair Shafiq. 2021. Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors. In 2021 IEEE Symposium on Security and Privacy (SP). IEEE.
[7]
Gregoire Jacob, Engin Kirda, Christopher Kruegel, and Giovanni Vigna. 2012. PUBCRAWL: protecting users and businesses from CRAWLers. In Proc. of the 21st USENIX conference on Security symposium(Security’12). 25.
[8]
Marc Juarez, Sadia Afroz, Gunes Acar, Claudia Diaz, and Rachel Greenstadt. 2014. A Critical Evaluation of Website Fingerprinting Attacks. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security(CCS ’14). Association for Computing Machinery, 263–274.
[9]
Jordan Jueckstock and Alexandros Kapravelos. 2019. VisibleV8: In-browser Monitoring of JavaScript in the Wild. In Proceedings of the Internet Measurement Conference. ACM, Amsterdam Netherlands, 393–405.
[10]
Jordan Jueckstock, Shaown Sarker, Peter Snyder, Aidan Beggs, Panagiotis Papadopoulos, Matteo Varvello, Benjamin Livshits, and Alexandros Kapravelos. 2021. Towards Realistic and ReproducibleWeb Crawl Measurements. In Proceedings of the Web Conference 2021. ACM, 80–91.
[11]
Victor Le Pochat, Tom Van Goethem, Samaneh Tajalizadehkhoob, Maciej Korczyński, and Wouter Joosen. 2019. Tranco: A Research-Oriented Top Sites Ranking Hardened Against Manipulation. In 26th Annual Network and Distributed System Security Symposium(NDSS ’19).
[12]
Zhou Li, Kehuan Zhang, Yinglian Xie, Fang Yu, and XiaoFeng Wang. 2012. Knowing your enemy: understanding and detecting malicious web advertising. In Proceedings of the 2012 ACM conference on Computer and communications security(CCS ’12). Association for Computing Machinery, New York, NY, USA, 674–686.
[13]
Valentino Rizzo, Stefano Traverso, and Marco Mellia. 2021. Unveiling Web Fingerprinting in the Wild Via Code Mining and Machine Learning. Proceedings on Privacy Enhancing Technologies (2021).

Index Terms

  1. Katti: An Extensive and Scalable Tool for Website Analyses

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023
    April 2023
    1567 pages
    ISBN:9781450394192
    DOI:10.1145/3543873
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 April 2023

    Check for updates

    Author Tags

    1. analyses
    2. analysis
    3. crawling
    4. web
    5. website

    Qualifiers

    • Demonstration
    • Research
    • Refereed limited

    Conference

    WWW '23
    Sponsor:
    WWW '23: The ACM Web Conference 2023
    April 30 - May 4, 2023
    TX, Austin, USA

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 188
      Total Downloads
    • Downloads (Last 12 months)44
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media