Abstract
The abuse of Internet online services by automated programs, known as bots, poses a serious threat to Internet users. Bots target popular Internet online services, such as web blogs and online social networks, to distribute spam and malware. In this work, we will first characterize the human behaviors and bot behaviors in online services. Based on the behavior characterization, we propose an effective detection system to accurately distinguish bots from humans. Our proposed detection system consists of two main components: (1) a client-side logger and (2) a server-side classifier. The client-side logger records user behavioral events such as mouse movement and keystroke data, and provides this data in batches to a server-side classifier which identifies a user as human or bot. Our experimental results demonstrate that our proposed detection is able to achieve very high accuracy with negligible overhead.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The form is usually well-structured, and the ID/name of each input field remains constant. For example, <input type=“text” name=“email” /> is the text field to enter email address. Thus, the bot author programs the bot to recognize fields and fill in appropriate content.
- 2.
- 3.
The page layout is different from page to page, and may affect how the Human Mimic Bot works. For example, by moving down the same amount of pixels, the mouse enters the comment form on one page, but falls out of the form on another page.
- 4.
For example, the position of the submit button may vary in the webpage layout. The bot must be customized to move to the button and generate a click event on it.
- 5.
Form Inject Bot generates no UI data. As Replay Bot replays traces generated by human, it is inappropriate to include human traces to characterize bot behavior.
- 6.
Kolmogorov-Smirnov test presents P-value of the distribution fitting at 0.882 with a 99% confidence level.
- 7.
Take the following Mouse Move record as an example, {“time”:1278555037098, “type”:“Mouse Move”, “X”:590, “Y”:10, “tagName”:“DIV”, “tagID”:“footnote”}. The “time” field contains the time stamp of the event in the unit of millisecond. The two coordinates, X and Y, denote the mouse cursor position. The last two fields describe the name and ID of the DOM element where the event happens, such as <div ID=“footnote”>. In a record of Mouse Press, {“time”:1278555074750, “type”:“Mouse Press”, “virtualKey”:0x01, “tagName”:“HTML”}, The “virtualKey” field denotes the virtual-key code of 0x01 in hexadecimal value, which corresponds to the left mouse button here.
- 8.
Average speed is distance over duration, and move efficiency is displacement over distance.
- 9.
Input is converted the ARFF format required by Weka [1].
- 10.
As our classification only involves two categories, human and bot, a majority means more than half of the votes.
- 11.
The idle time is not included in the traces. The bot trace consists of 30 h of Human Mimic Bot data and 2 h of Replay Bot data.
- 12.
The true positive rate is the ratio of the number of bots which are correctly classified to the number of all the bots.
- 13.
The true negative rate is the ratio of the number of humans which are correctly classified to the number of all the humans.
- 14.
A series of consecutive actions represent continuous behavior well.
References
Attribute-relation file format (arff). http://www.cs.waikato.ac.nz/ml/weka/arff.html
Autohotkey - free mouse and keyboard macro program with hotkeys. http://www.autohotkey.com/
Autoit, automation and scripting language. http://www.autoitscript.com/site/autoit/
Autome - automate mouse and keyboard actions. http://www.asoftech.com/autome/
Blogbot by incansoft. http://blogbot.auto-submitters.com/
Global mouse and keyboard library. http://www.codeproject.com/KB/system/globalmousekeyboardlib.aspx
Json, javascript object notation. http://www.json.org/
Ultimate wordpress comment submitter. http://www.wordpresscommentspammer.com/
Virtual-key codes. http://msdn.microsoft.com/en-us/library/ms927178.aspx
Ahmed, A.A.E., Traore, I.: A new biometric technology based on mouse dynamics. IEEE Trans. Dependable Secure Comput. 4(3), 165–179 (2007)
von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: using hard AI problems for security. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39200-9_18
Van Balen, N., Ball, C.T., Wang, H.: A behavioral biometrics based approach to online gender classification. In: Deng, R., Weng, J., Ren, K., Yegneswaran, V. (eds.) SecureComm 2016. LNICST, vol. 198, pp. 475–495. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59608-2_27
Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Who is tweeting on Twitter: human, bot or cyborg? In: Proceedings of the 2010 Annual Computer Security Applications Conference, Austin, TX, USA (2010)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York (2006)
Funk, C., Liu, Y.: Symmetry reCAPTCHA. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, June 2016
Gianvecchio, S., Wang, H.: Detecting covert timing channels: an entropy-based approach. In: Proceedings of the 2007 ACM Conference on Computer and Communications Security, Alexandria, VA, USA, October–November 2007
Gianvecchio, S., Wu., Z., Xie, M., Wang, H.: Battle of botcraft: fighting bots in online games with human observational proofs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, Chicago, IL, USA (2009)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Jackson, C., Bortz, A., Boneh, D., Mitchell, J.C.: Protecting browser state from web privacy attacks. In: Proceedings of the 15th International Conference on World Wide Web, pp. 737–744 (2006)
Kohavi, R., Quinlan, R.: Decision tree discovery. In: Handbook of Data Mining and Knowledge Discovery, pp. 267–276. University Press (1999)
McLachlan, G., Do, K., Ambroise, C.: Analyzing Microarray Gene Expression Data. Wiley, Hoboken (2004)
Mohta, A.: Bots are back in Yahoo! chat rooms. http://www.technospot.net/blogs/bots-are-back-in- yahoo-chat-room/
Mohta, A.: Yahoo! chat adds CAPTCHA check to remove bots. http://www.technospot.net/blogs/yahoo-chat-captcha- check-to-remove-bots/
Porta, A., et al.: Measuring regularity by means of a corrected conditional entropy in sympathetic outflow. Biol. Cybern. 78(1), 71–78 (1998)
Quinlan, J.R.: Discovering Rules from Large Collections of Examples: A Case Study. Edinburgh University Press, Edinburgh (1979)
Zheng, N., Bai, K., Huang, H., Wang, H.: You are how you touch: user verification on smartphones via tapping behaviors. In: Proceedings of IEEE Conference on Network Protocol (ICNP 2014), Research Triangle Park, NC, USA, October 2014
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Chu, Z., Gianvecchio, S., Wang, H. (2018). Bot or Human? A Behavior-Based Online Bot Detection System. In: Samarati, P., Ray, I., Ray, I. (eds) From Database to Cyber Security. Lecture Notes in Computer Science(), vol 11170. Springer, Cham. https://doi.org/10.1007/978-3-030-04834-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-04834-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04833-4
Online ISBN: 978-3-030-04834-1
eBook Packages: Computer ScienceComputer Science (R0)