Skip to main content

iCrawl: A Visual High Interaction Web Crawler

  • Conference paper
  • First Online:
Computer Network Security (MMM-ACNS 2017)

Abstract

This paper presents “iCrawl”, a visual high interaction client honeypot system. Web-based cyber-attacks have increased exponentially along with the growth of cloud-based web application technologies. Web browsers provide users with an entry point to these web applications. The iCrawl system is designed to deliver a high interaction honey client that is virtually indistinguishable from a real human-driven client. The system operates by driving an actual web browser in a fashion closely resembling a genuine user’s actions. Unlike most crawlers iCrawl attempts to operate over visual elements on the web page, not code elements. The honeypot system consists of pre-configured decoy virtual machines. Each virtual machine includes spider program, which upon execution automates the process of driving the web browser and crawling the targeted website. It performs browsing by observing the page and simulating human user input through mouse and keyboard activity. The data collected from the crawling is stored in a graph database in the form of nodes and relations. This data captures the context and the changes in system behavior due to interaction with the crawled website. The graph data can be queried and monitored online for structural patterns and anomalies.

The iCrawl system is enabling technology for studying sophisticated malicious websites that can avoid detection by the simpler crawlers typically utilized by well-known security companies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Symantec: Internet Security Threat Report (2016). https://www.symantec.com/security-center/threat-report. Accessed 3 Nov 2016

  2. Garnaeva, M., Wiel, J.V.D., Makrushin, D., Ivanov, A., Namestnikov, Y.: Kaspersky Security Bulletin, Overall Statistics for 2015. https://securelist.com/analysis/kaspersky-security-bulletin/73038/kaspersky-security-bulletin-2015-overall-statistics-for-2015/. Accessed 3 Nov 2016 (2015)

  3. Patil, D.R., Patil, J.B.: Survey on malicious web pages detection techniques. Int. J. u- e- Serv. Sci. Technol. 8(5), 195–206 (2015)

    Article  MathSciNet  Google Scholar 

  4. U.S federal Government: Digital Analytics program. https://analytics.usa.gov/. Accessed May 2017

  5. Robinson, T., Webber, J., Eifrem, E.: Graph Databases. O’Reilly Media, Sebastopol (2013)

    Google Scholar 

  6. Neo4j: What is Graph Database? https://neo4j.com/developer/graph-database/. Accessed 13 Nov 2016

  7. Vicknair, C., et al.: A Comaprison of Graph Database and a Relational Database (2009)

    Google Scholar 

  8. Selenium, Selenium WebDriver (2012). http://www.seleniumhq.org/projects/webdriver/. Accessed 18 Nov 2016

  9. Richardson, L.: Beautiful Soup (2004). https://www.crummy.com/software/BeautifulSoup/. Accessed 18 Nov 2016

  10. Rodola, G.: Psutils (2009). https://github.com/giampaolo/psutil. Accessed 19 Nov 2016

  11. Small, N.: Py2neo v3 (2011). https://github.com/nigelsmall/py2neo. Accessed 19 Nov 2016

  12. Wang, Y.M., et al: Automated web patrol with strider HoneyMonkeys: finding web sites that exploit browser vulnerabilities. In: 13th Annual Symposium on Network and Distributed System, San Diego, California, USA (2006)

    Google Scholar 

  13. Anagnostakis, K.G., et al.: Detecting targeted attacks using shadow honeypots. In: USENIX Security Symposium (2005)

    Google Scholar 

  14. Dell’Aera, A.: Thug, Github (2011). https://github.com/buffer/thug. Accessed 12 Nov 2016

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deeraj Nagothu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Nagothu, D., Dolgikh, A. (2017). iCrawl: A Visual High Interaction Web Crawler. In: Rak, J., Bay, J., Kotenko, I., Popyack, L., Skormin, V., Szczypiorski, K. (eds) Computer Network Security. MMM-ACNS 2017. Lecture Notes in Computer Science(), vol 10446. Springer, Cham. https://doi.org/10.1007/978-3-319-65127-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65127-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65126-2

  • Online ISBN: 978-3-319-65127-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics