Abstract
Web bots are used to automate client interactions with websites, which facilitates large-scale web measurements. However, websites may employ web bot detection. When they do, their response to a bot may differ from responses to regular browsers. The discrimination can result in deviating content, restriction of resources or even the exclusion of a bot from a website. This places strict restrictions upon studies: the more bot detection takes place, the more results must be manually verified to confirm the bot’s findings.
To investigate the extent to which bot detection occurs, we reverse-analysed commercial bot detection. We found that in part, bot detection relies on the values of browser properties and the presence of certain objects in the browser’s DOM model. This part strongly resembles browser fingerprinting. We leveraged this for a generic approach to detect web bot detection: we identify what part of the browser fingerprint of a web bot uniquely identifies it as a web bot by contrasting its fingerprint with those of regular browsers. This leads to the fingerprint surface of a web bot. Any website accessing the fingerprint surface is then accessing a part unique to bots, and thus engaging in bot detection.
We provide a characterisation of the fingerprint surface of 14 web bots. We show that the vast majority of these frameworks are uniquely identifiable through well-known fingerprinting techniques. We design a scanner to detect web bot detection based on the reverse analysis, augmented with the found fingerprint surfaces. In a scan of the Alexa Top 1 Million, we find that 12.8% of websites show indications of web bot detection.
Authors in alphabetic order.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
Electron is a framework for making stand-alone apps using web technologies. It relies on Chromium and Node.js.
- 5.
- 6.
- 7.
webdriver_evaluate, webdriver-evaluate, fxdriver_unwrapped, $wdc, domAutomation and domAutomationController.
- 8.
- 9.
- 10.
References
Acar, G., et al.: FPDetective: dusting the web for fingerprinters. In Proceedings of the 2013 ACM SIGSAC conference on Computer and Communications Security, pp. 1129–1140. ACM (2013)
Boda, K., Földes, Á.M., Gulyás, G.G., Imre, S.: User tracking on the web via cross-browser fingerprinting. In: Laud, P. (ed.) NordSec 2011. LNCS, vol. 7161, pp. 31–46. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29615-4_4
Brewer, D., Li, K., Ramaswamy, L., Pu, C.: A link obfuscation service to detect webbots. In: 2010 IEEE International Conference on Services Computing, SCC 2010, Miami, Florida, USA, 5–10 July 2010, pp. 433–440 (2010)
Chu, Z., Gianvecchio, S., Koehl, A., Wang, H., Jajodia, S.: Blog or block: detecting blog bots through behavioral biometrics. Comput. Netw. 57(3), 634–646 (2013)
Doran, D., Gokhale, S.S.: Web robot detection techniques: overview and limitations. Data Min. Knowl. Discov. 22(1–2), 183–210 (2011)
Eckersley, P.: How unique is your web browser? In: Atallah, M.J., Hopper, N.J. (eds.) PETS 2010. LNCS, vol. 6205, pp. 1–18. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14527-8_1
Englehardt, S., Narayanan, A.: Online tracking: a 1-million-site measurement and analysis. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 1388–1401. ACM (2016)
Grosskurth, A., Godfrey, M.W.: A reference architecture for web browsers. In: 21st IEEE International Conference on Software Maintenance (ICSM 2005), Budapest, Hungary, 25–30 September 2005, pp. 661–664 (2005)
Gyöngyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005, Co-located with the WWW Conference, Chiba, Japan, May 2005, pp. 39–47 (2005)
Invernizzi, L., Thomas, K., Kapravelos, A., Comanescu, O., Picod, J.M., Bursztein, E.: Cloak of visibility: detecting when machines browse a different web. In: Proceedings of the 37th IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, 22–26 May 2016, pp. 743–758 (2016)
Mowery, K., Bogenreif, D., Yilek, S., Shacham, H.: Fingerprinting information in JavaScript implementations. In: Proceedings of Web 2.0 Security and Privacy (W2SP 2011), vol. 2. IEEE Computer Society (2011)
Mowery, K., Shacham, H.: Pixel perfect: fingerprinting canvas in HTML5. In: Proceedings of Web 2.0 Security and Privacy (W2SP 2012). IEEE Computer Society (2012)
Nikiforakis, N., et al.: You are what you include: large-scale evaluation of remote javascript inclusions. In: Proceedings of the 19th ACM Conference on Computer and Communications Security, CCS 2012, Raleigh, NC, USA, 16–18 October 2012, pp. 736–747 (2012)
Nikiforakis, N., Kapravelos, A., Joosen, W., Kruegel, C., Piessens, F., Vigna, G.: Cookieless monster: exploring the ecosystem of web-based device fingerprinting. In: Proceedings of 34th IEEE Symposium on Security and Privacy (SP 2013), pp. 541–555. IEEE Computer Society (2013)
Park, K.S., Pai, V.S., Lee, K.-W., Calo, S.B.: Securing web service by automatic robot detection. In: Proceedings of the 2006 USENIX Annual Technical Conference, Boston, MA, USA, 30 May–3 June 2006, pp. 255–260 (2006)
Pham, K., Santos, A.S.R., Freire, J.: Understanding website behavior based on user agent. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, 17–21 July 2016, pp. 1053–1056 (2016)
Stassopoulou, A., Dikaiakos, M.D.: Web robot detection: a probabilistic reasoning approach. Comput. Netw. 53(3), 265–278 (2009)
Torres, C.F., Jonker, H., Mauw, S.: FP-Block: usable web privacy by controlling browser fingerprinting. In: Pernul, G., Ryan, P.Y.A., Weippl, E. (eds.) ESORICS 2015. LNCS, vol. 9327, pp. 3–19. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24177-7_1
von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: using hard AI problems for security. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39200-9_18
Vikram, S., Yang, C., Gu, G.: NOMAD: towards non-intrusive moving-target defense against web bots. In: IEEE Conference on Communications and Network Security, CNS 2013, National Harbor, MD, USA, 14–16 October 2013, pp. 55–63 (2013)
Wu, B., Davison, B.D.: Cloaking and redirection: a preliminary study. In: First International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2005, Co-located with the WWW Conference, Chiba, Japan, May 2005, pp. 7–16 (2005)
Wang, D.Y., Savage, S., Voelker, G.M.: Cloak and dagger: dynamics of web search cloaking. In: Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS 2011, Chicago, Illinois, USA, 17–21 October 2011, pp. 477–490 (2011)
Xu, H., et al.: Detecting and characterizing web bot traffic in a large e-commerce marketplace. In: Lopez, J., Zhou, J., Soriano, M. (eds.) ESORICS 2018. LNCS, vol. 11099, pp. 143–163. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98989-1_8
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Screenshots from Effects of Web Bot Detection
Comparing websites requested from web bots with websites requested from human controlled browsers can lead to various deviations. On sites that perform bot detection, we found websites that do not display login elements for visitors using PhantomJS or do not load videos (c.f. Figs. 6 and 7) (Figs. 4 and 5).
B Advanced Notes to Determining the Fingerprint Surface
The following subsections provide further insights into our process to derive a fingerprint surface for web bots. We begin with the description of our modification to fingerprintjs2, in order to cover more web bot specific characteristics. Then, we give an overview of our used setup during to make this process repeatable.
1.1 B.1 Extra Elements Used in Determining the Fingerprint
There are several discussions on best practices for identifying web bots available online. From this, we included the following extra elements to include in the browser fingerprint:
-
Lack of “bind” JavaScript engine featureFootnote 8.
Certain older web bots make use of outdated JavaScript engines that do not support this feature, which allows them to be distinguished from full JavaScript engines.
-
StackTraceFootnote 9.
When throwing an error in PhantomJS, the resulting StackTrace includes the string ‘phantomjs’.
-
Properties of missing imagesFootnote 10.
The width and height of a missing image is zero in headless Chrome, while being non-zero in full Chrome.
-
Sandboxed XMLHttpRequest (See footnote 8).
PhantomJS allows turning off “web-security”, which permits a website to execute a cross-domain XMLHttpRequest().
-
Autoclosing dialog windows (See footnote 8).
PhantomJS auto-closes dialog windows.
1.2 B.2 Setup for Determining the Fingerprint Surface
The resulting fingerprint surface of a web bot framework depends on used versions of the framework and corresponding browser. The versions and setup that were used during our experiment are listed in Table 6. Human-controlled browsers are marked as bold.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jonker, H., Krumnow, B., Vlot, G. (2019). Fingerprint Surface-Based Detection of Web Bot Detectors. In: Sako, K., Schneider, S., Ryan, P. (eds) Computer Security – ESORICS 2019. ESORICS 2019. Lecture Notes in Computer Science(), vol 11736. Springer, Cham. https://doi.org/10.1007/978-3-030-29962-0_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-29962-0_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29961-3
Online ISBN: 978-3-030-29962-0
eBook Packages: Computer ScienceComputer Science (R0)