skip to main content
10.1145/3341105.3373885acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

XIEv: dynamic analysis for crawling and modeling of web applications

Published: 30 March 2020 Publication History

Abstract

Researchers and practitioners in the fields of testing, security assessment and web development seeking to evaluate a given web application often have to rely on the existence of a model of the respective system, which is then used as input to task-specific tools. Such models may include information on HTTP endpoints and their parameters, available user actions/event listeners and required assets. Unfortunately, this data is often unavailable in practice, as only rigorous development practices or manual analysis guarantee their existence and correctness. Crawlers based on static analysis have traditionally been used to extract required information from existing sites. Regrettably, these tools can not accurately account for the dynamic behavior introduced by JavaScript and other technologies that are prevalent on modern sites. While methods based on dynamic analysis exist, they are not fully capable of identifying event listeners and their effects. This work presents XIEv, an approach for dynamic analysis of web applications that produces an execution trace usable for the extraction of navigation graphs, identification of bugs at runtime and enumeration of resources requested by each page. It offers improved recognition and selection of event listeners as well as a greater range of observed effects compared to existing approaches.

References

[1]
W3C. 2006. Document Object Model Events. Retrieved September 20, 2018 from https://www.w3.org/TR/2006/WD-DOM-Level-3-Events-20060413/events.html.
[2]
W3C. 2016. UI Events. Retrieved September 24, 2018 from https://www.w3.org/TR/uievents/.
[3]
Ali Mesbah, Arie van Deursen, and Stefan Lenselink. 2012. Crawling Ajax-Based Web Applications through Dynamic Analysis of User Interface State Changes. ACM Trans. Web 6, 1, Article 3 (March 2012), 30 pages.
[4]
Ali Mesbah and Arie van Deursen. 2009. Invariant-Based Automatic Testing of Ajax User Interfaces. In Proceedings of the 31st International Conference on Software Engineering (ICSE'09). IEEE Computer Society, 210--220.
[5]
Ali Mesbah and Mukul R. Prasad. 2011. Automated cross-browser compatibility testing. In Proceedings of the 33rd International Conference on Software Engineering (ICSE '11). ACM, New York, NY, USA, 561--570.
[6]
Shabnam Mirshokraie and Ali Mesbah. 2012. JSART: JavaScript Assertion-based Regression Testing. In Proceedings of the 12th International Conference on Web Engineering (ICWE'12). Springer Berlin Heidelberg, 238--252.
[7]
Software Freedom Conservancy. 2019. Selenium - Web Browser Automation. Retrieved January 10, 2019 from https://www.seleniumhq.org/.
[8]
The Chromium Authors. 2019. Chrome DevTools Protocol Viewer. Retrieved March 20, 2019 from https://chromedevtools.github.io/devtools-protocol/.
[9]
W3C. 2005. W3C Document Object Model. Retrieved January 22, 2019 from https://www.w3.org/DOM/.
[10]
Giancarlo Pellegrino, Constantin Tschürtz, Eric Bodden and Christian Rossow. 2015. jÄk: Using Dynamic Analysis to Crawl and Test Modern Web Applications. In Proceedings of Research in Attacks, Intrusions and Defenses (RAID) Symposium (RAID 2015). Springer International Publishing, 295--316.
[11]
Steven Van Acker, Daniel Hausknecht, and Andrei Sabelfeld. 2017. Measuring login webpage security. In Proceedings of the Symposium on Applied Computing (SAC '17). ACM, New York, NY, USA, 1753--1760.
[12]
Milivoj Simeonovski, Giancarlo Pellegrino, Christian Rossow, and Michael Backes. 2017. Who Controls the Internet?: Analyzing Global Threats using Property Graph Traversals. In Proceedings of the 26th International Conference on World Wide Web (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 647--656.
[13]
Constantin Tschürtz. 2015. jAEk. Retrieved April 20, 2019 from https://github.com/ConstantinT/jAEk.
[14]
Manuel Leithner and Dimitris E. Simos. 2018. DOMdiff: Identification and Classification of Inter-DOM Modifications. In 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI). IEEE Computer Society, 262--269.
[15]
The Chromium Authors. 2017. Chromium policy on JavaScript dialogs. Retrieved May 6, 2019 from https://developers.google.com/web/updates/2017/03/dialogs-policy.
[16]
w3af.org. 2013. w3af. Retrieved January 25, 2019 from http://w3af.org/.
[17]
Google Inc, Michal Zalewski, Niels Heinen, Sebastian Roschke. 2012. Skipfish. Retrieved January 25, 2019 from https://code.google.com/archive/p/skipfish/.
[18]
Free Software Foundation, Inc. 2017. Wget. Retrieved January 25, 2019 from https://www.gnu.org/software/wget/.
[19]
WordPress Foundation. 2003. WordPress. Retrieved February 13, 2019 from https://wordpress.org/.
[20]
Open Source Matters, Inc. 2005. Joomla Content Management System (CMS). Retrieved February 13, 2019 from https://www.joomla.org/.
[21]
MODX LLC. 2004. MODX. Retrieved February 13, 2019 from https://modx.com/.
[22]
Wikimedia Foundation. 2002. MediaWiki. Retrieved February 13, 2019 from https://www.mediawiki.org/wiki/MediaWiki.
[23]
phpBB Limited. 2000. phpBB. Retrieved February 13, 2019 from https://www.phpbb.com/.
[24]
Chris Boulton. 2002. MyBB. Retrieved February 13, 2019 from https://mybb.com/.
[25]
Bedirhan Urgun. 2014. Web Input Vector Extractor Teaser. Retrieved February 27, 2019 from https://github.com/bedirhan/wivet.
[26]
Sanjay K. Malik and Syed A. M. Rizvi. 2011. Information extraction using web usage mining, web scrapping and semantic annotation. In 2011 International Conference on Computational Intelligence and Communication Networks. IEEE Computer Society, 465--469.
[27]
Suhit Gupta, Gail Kaiser, David Neistadt, and Peter Grimm. 2003. DOM-based content extraction of HTML documents. In Proceedings of the 12th international conference on World Wide Web (WWW '03). ACM, New York, NY, USA, 207--214.
[28]
Josip Bozic, Bernhard Garn, Dimitris E. Simos and Franz Wotawa. 2015. Evaluation of the IPO-family algorithms for test case generation in web security testing. In 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE Computer Society, 1--10.
[29]
Allison Woodruff and Paul M. Aoki and Eric Brewer and Paul Gauthier and Lawrence A. Rowe. 1996. An investigation of documents from the World Wide Web. Computer Networks and ISDN Systems, 28, 7--11 (1996), 963--980.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing
March 2020
2348 pages
ISBN:9781450368667
DOI:10.1145/3341105
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 March 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic analysis
  2. modeling
  3. web applications
  4. web crawling

Qualifiers

  • Research-article

Funding Sources

  • Österreichische Forschungsförderungsgesellschaft

Conference

SAC '20
Sponsor:
SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing
March 30 - April 3, 2020
Brno, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)1
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)QualState: Finding Website States for Accessibility EvaluationProceedings of the 21st International Web for All Conference10.1145/3677846.3677851(96-105)Online publication date: 13-May-2024
  • (2024)Dead or aliveJournal of Information Security and Applications10.1016/j.jisa.2024.10374682:COnline publication date: 17-Jul-2024
  • (2024)Finding Server-Side Endpoints with Static Analysis of Client-Side JavaScriptComputer Security. ESORICS 2023 International Workshops10.1007/978-3-031-54129-2_26(442-458)Online publication date: 12-Mar-2024
  • (2022)Gelato: Feedback-driven and Guided Security Analysis of Client-side Web Applications2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER53432.2022.00079(618-629)Online publication date: Mar-2022
  • (2021)CHIEvACM SIGAPP Applied Computing Review10.1145/3477133.347713421:1(5-23)Online publication date: 20-Jul-2021
  • (2021)Web Application TestingProceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1145/3475716.3484187(1-6)Online publication date: 11-Oct-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media