Skip to main content

Fingerprint Surface-Based Detection of Web Bot Detectors

  • Conference paper
  • First Online:
Computer Security – ESORICS 2019 (ESORICS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11736))

Included in the following conference series:

Abstract

Web bots are used to automate client interactions with websites, which facilitates large-scale web measurements. However, websites may employ web bot detection. When they do, their response to a bot may differ from responses to regular browsers. The discrimination can result in deviating content, restriction of resources or even the exclusion of a bot from a website. This places strict restrictions upon studies: the more bot detection takes place, the more results must be manually verified to confirm the bot’s findings.

To investigate the extent to which bot detection occurs, we reverse-analysed commercial bot detection. We found that in part, bot detection relies on the values of browser properties and the presence of certain objects in the browser’s DOM model. This part strongly resembles browser fingerprinting. We leveraged this for a generic approach to detect web bot detection: we identify what part of the browser fingerprint of a web bot uniquely identifies it as a web bot by contrasting its fingerprint with those of regular browsers. This leads to the fingerprint surface of a web bot. Any website accessing the fingerprint surface is then accessing a part unique to bots, and thus engaging in bot detection.

We provide a characterisation of the fingerprint surface of 14 web bots. We show that the vast majority of these frameworks are uniquely identifiable through well-known fingerprinting techniques. We design a scanner to detect web bot detection based on the reverse analysis, augmented with the found fingerprint surfaces. In a scan of the Alexa Top 1 Million, we find that 12.8% of websites show indications of web bot detection.

Authors in alphabetic order.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://stackoverflow.com/questions/33225947/can-a-website-detect-when-you-are-using-selenium-withchromedriver.

  2. 2.

    https://github.com/Valve/fingerprintjs2.

  3. 3.

    https://github.com/ariya/phantomjs/issues/15344.

  4. 4.

    Electron is a framework for making stand-alone apps using web technologies. It relies on Chromium and Node.js.

  5. 5.

    https://github.com/bkrumnow/BrowserBasedBotFP/blob/master/public/js/fingerprint.js.

  6. 6.

    http://www.gm.fh-koeln.de/~krumnow/fp_bot/fp_deviations.html.

  7. 7.

    webdriver_evaluate, webdriver-evaluate, fxdriver_unwrapped, $wdc, domAutomation and domAutomationController.

  8. 8.

    https://blog.shapesecurity.com/2015/01/22/detecting-phantomjs-based-visitors/.

  9. 9.

    https://www.slideshare.net/SergeyShekyan/shekyan-zhang-owasp.

  10. 10.

    https://antoinevastel.com/bot%20detection/2017/08/05/detect-chrome-headless.html.

References

  1. Acar, G., et al.: FPDetective: dusting the web for fingerprinters. In Proceedings of the 2013 ACM SIGSAC conference on Computer and Communications Security, pp. 1129–1140. ACM (2013)

    Google Scholar 

  2. Boda, K., Földes, Á.M., Gulyás, G.G., Imre, S.: User tracking on the web via cross-browser fingerprinting. In: Laud, P. (ed.) NordSec 2011. LNCS, vol. 7161, pp. 31–46. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29615-4_4

    Chapter  Google Scholar 

  3. Brewer, D., Li, K., Ramaswamy, L., Pu, C.: A link obfuscation service to detect webbots. In: 2010 IEEE International Conference on Services Computing, SCC 2010, Miami, Florida, USA, 5–10 July 2010, pp. 433–440 (2010)

    Google Scholar 

  4. Chu, Z., Gianvecchio, S., Koehl, A., Wang, H., Jajodia, S.: Blog or block: detecting blog bots through behavioral biometrics. Comput. Netw. 57(3), 634–646 (2013)

    Article  Google Scholar 

  5. Doran, D., Gokhale, S.S.: Web robot detection techniques: overview and limitations. Data Min. Knowl. Discov. 22(1–2), 183–210 (2011)

    Article  Google Scholar 

  6. Eckersley, P.: How unique is your web browser? In: Atallah, M.J., Hopper, N.J. (eds.) PETS 2010. LNCS, vol. 6205, pp. 1–18. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14527-8_1

    Chapter  Google Scholar 

  7. Englehardt, S., Narayanan, A.: Online tracking: a 1-million-site measurement and analysis. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1388–1401. ACM (2016)

    Google Scholar 

  8. Grosskurth, A., Godfrey, M.W.: A reference architecture for web browsers. In: 21st IEEE International Conference on Software Maintenance (ICSM 2005), Budapest, Hungary, 25–30 September 2005, pp. 661–664 (2005)

    Google Scholar 

  9. Gyöngyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005, Co-located with the WWW Conference, Chiba, Japan, May 2005, pp. 39–47 (2005)

    Google Scholar 

  10. Invernizzi, L., Thomas, K., Kapravelos, A., Comanescu, O., Picod, J.M., Bursztein, E.: Cloak of visibility: detecting when machines browse a different web. In: Proceedings of the 37th IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, 22–26 May 2016, pp. 743–758 (2016)

    Google Scholar 

  11. Mowery, K., Bogenreif, D., Yilek, S., Shacham, H.: Fingerprinting information in JavaScript implementations. In: Proceedings of Web 2.0 Security and Privacy (W2SP 2011), vol. 2. IEEE Computer Society (2011)

    Google Scholar 

  12. Mowery, K., Shacham, H.: Pixel perfect: fingerprinting canvas in HTML5. In: Proceedings of Web 2.0 Security and Privacy (W2SP 2012). IEEE Computer Society (2012)

    Google Scholar 

  13. Nikiforakis, N., et al.: You are what you include: large-scale evaluation of remote javascript inclusions. In: Proceedings of the 19th ACM Conference on Computer and Communications Security, CCS 2012, Raleigh, NC, USA, 16–18 October 2012, pp. 736–747 (2012)

    Google Scholar 

  14. Nikiforakis, N., Kapravelos, A., Joosen, W., Kruegel, C., Piessens, F., Vigna, G.: Cookieless monster: exploring the ecosystem of web-based device fingerprinting. In: Proceedings of 34th IEEE Symposium on Security and Privacy (SP 2013), pp. 541–555. IEEE Computer Society (2013)

    Google Scholar 

  15. Park, K.S., Pai, V.S., Lee, K.-W., Calo, S.B.: Securing web service by automatic robot detection. In: Proceedings of the 2006 USENIX Annual Technical Conference, Boston, MA, USA, 30 May–3 June 2006, pp. 255–260 (2006)

    Google Scholar 

  16. Pham, K., Santos, A.S.R., Freire, J.: Understanding website behavior based on user agent. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, 17–21 July 2016, pp. 1053–1056 (2016)

    Google Scholar 

  17. Stassopoulou, A., Dikaiakos, M.D.: Web robot detection: a probabilistic reasoning approach. Comput. Netw. 53(3), 265–278 (2009)

    Article  Google Scholar 

  18. Torres, C.F., Jonker, H., Mauw, S.: FP-Block: usable web privacy by controlling browser fingerprinting. In: Pernul, G., Ryan, P.Y.A., Weippl, E. (eds.) ESORICS 2015. LNCS, vol. 9327, pp. 3–19. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24177-7_1

    Chapter  Google Scholar 

  19. von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: using hard AI problems for security. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39200-9_18

    Chapter  Google Scholar 

  20. Vikram, S., Yang, C., Gu, G.: NOMAD: towards non-intrusive moving-target defense against web bots. In: IEEE Conference on Communications and Network Security, CNS 2013, National Harbor, MD, USA, 14–16 October 2013, pp. 55–63 (2013)

    Google Scholar 

  21. Wu, B., Davison, B.D.: Cloaking and redirection: a preliminary study. In: First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005, Co-located with the WWW Conference, Chiba, Japan, May 2005, pp. 7–16 (2005)

    Google Scholar 

  22. Wang, D.Y., Savage, S., Voelker, G.M.: Cloak and dagger: dynamics of web search cloaking. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS 2011, Chicago, Illinois, USA, 17–21 October 2011, pp. 477–490 (2011)

    Google Scholar 

  23. Xu, H., et al.: Detecting and characterizing web bot traffic in a large e-commerce marketplace. In: Lopez, J., Zhou, J., Soriano, M. (eds.) ESORICS 2018. LNCS, vol. 11099, pp. 143–163. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98989-1_8

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugo Jonker .

Editor information

Editors and Affiliations

Appendices

A Screenshots from Effects of Web Bot Detection

Comparing websites requested from web bots with websites requested from human controlled browsers can lead to various deviations. On sites that perform bot detection, we found websites that do not display login elements for visitors using PhantomJS or do not load videos (c.f. Figs. 6 and 7) (Figs. 4 and 5).

Fig. 4.
figure 4

Missing login fields on kiyu.tw.

Fig. 5.
figure 5

Missing video on hummingbirddrones.ca.

Fig. 6.
figure 6

Blockage and loading of a CAPTCHA on frankmotorsinc.com.

Fig. 7.
figure 7

Missing ads on cordcuttersnews.com.

B Advanced Notes to Determining the Fingerprint Surface

The following subsections provide further insights into our process to derive a fingerprint surface for web bots. We begin with the description of our modification to fingerprintjs2, in order to cover more web bot specific characteristics. Then, we give an overview of our used setup during to make this process repeatable.

1.1 B.1 Extra Elements Used in Determining the Fingerprint

There are several discussions on best practices for identifying web bots available online. From this, we included the following extra elements to include in the browser fingerprint:

  • Lack of “bind” JavaScript engine featureFootnote 8.

    Certain older web bots make use of outdated JavaScript engines that do not support this feature, which allows them to be distinguished from full JavaScript engines.

  • StackTraceFootnote 9.

    When throwing an error in PhantomJS, the resulting StackTrace includes the string ‘phantomjs’.

  • Properties of missing imagesFootnote 10.

    The width and height of a missing image is zero in headless Chrome, while being non-zero in full Chrome.

  • Sandboxed XMLHttpRequest (See footnote 8).

    PhantomJS allows turning off “web-security”, which permits a website to execute a cross-domain XMLHttpRequest().

  • Autoclosing dialog windows (See footnote 8).

    PhantomJS auto-closes dialog windows.

1.2 B.2 Setup for Determining the Fingerprint Surface

The resulting fingerprint surface of a web bot framework depends on used versions of the framework and corresponding browser. The versions and setup that were used during our experiment are listed in Table 6. Human-controlled browsers are marked as bold.

Table 6. Configurations used to determine fingerprint surfaces.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jonker, H., Krumnow, B., Vlot, G. (2019). Fingerprint Surface-Based Detection of Web Bot Detectors. In: Sako, K., Schneider, S., Ryan, P. (eds) Computer Security – ESORICS 2019. ESORICS 2019. Lecture Notes in Computer Science(), vol 11736. Springer, Cham. https://doi.org/10.1007/978-3-030-29962-0_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29962-0_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29961-3

  • Online ISBN: 978-3-030-29962-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics