skip to main content
10.1145/1920261.1920265acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacsacConference Proceedingsconference-collections
research-article

Who is tweeting on Twitter: human, bot, or cyborg?

Published: 06 December 2010 Publication History

Abstract

Twitter is a new web application playing dual roles of online social networking and micro-blogging. Users communicate with each other by publishing text-based posts. The popularity and open structure of Twitter have attracted a large number of automated programs, known as bots, which appear to be a double-edged sword to Twitter. Legitimate bots generate a large amount of benign tweets delivering news and updating feeds, while malicious bots spread spam or malicious contents. More interestingly, in the middle between human and bot, there has emerged cyborg referred to either bot-assisted human or human-assisted bot. To assist human users in identifying who they are interacting with, this paper focuses on the classification of human, bot and cyborg accounts on Twitter. We first conduct a set of large-scale measurements with a collection of over 500,000 accounts. We observe the difference among human, bot and cyborg in terms of tweeting behavior, tweet content, and account properties. Based on the measurement results, we propose a classification system that includes the following four parts: (1) an entropy-based component, (2) a machine-learning-based component, (3) an account properties component, and (4) a decision maker. It uses the combination of features extracted from an unknown user to determine the likelihood of being a human, bot or cyborg. Our experimental evaluation demonstrates the efficacy of the proposed classification system.

References

[1]
Amazon comes to twitter. http://www.readwriteweb.com/archives/amazon_comes_to_twitter.php {Accessed: Dec. 20, 2009}.
[2]
Barack obama uses twitter in 2008 presidential campaign. http://twitter.com/BarackObama/ {Accessed: Dec. 20, 2009}.
[3]
Best buy goes all twitter crazy with @twelpforce. http://twitter.com/in_social_media/status/2756927865 {Accessed: Dec. 20, 2009}.
[4]
The crm114 discriminator. http://crm114.sourceforge.net/ {Accessed: Sept. 12, 2009}.
[5]
Alexa. The top 500 sites on the web by alexa. http://www.alexa.com/topsites {Accessed: Jan. 15, 2010}.
[6]
Meeyoung Cha, Haewoon Kwak, Pablo Rodriguez, Yong-Yeol Ahn, and Sue Moon. I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, San Diego, CA, USA, 2007.
[7]
Meeyoung Cha, Alan Mislove, and Krishna P. Gummadi. A measurement-driven analysis of information propagation in the flickr social network. In Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 2009.
[8]
Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, 2006.
[9]
Marcel Dischinger, Andreas Haeberlen, Krishna P. Gummadi, and Stefan Saroiu. Characterizing residential broadband networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet Measurement, San Diego, CA, USA, 2007.
[10]
Il-Chul Moon Dongwoo Kim, Yohan Jo and Alice Oh. Analysis of twitter lists as a potential source for discovering latent characteristics of users. In To appear on CHI 2010 Workshop on Microblogging: What and How Can We Learn From It?, 2010.
[11]
Henry J. Fowler and Will E. Leland. Local area network traffic characteristics, with implications for broadband network congestion management. IEEE Journal of Selected Areas in Communications, 9(7), 1991.
[12]
Steven Gianvecchio and Haining Wang. Detecting covert timing channels: An entropy-based approach. In Proceedings of the 2007 ACM Conference on Computer and Communications Security, Alexandria, VA, USA, October-November 2007.
[13]
Steven Gianvecchio, Zhenyu Wu, Mengjun Xie, and Haining Wang. Battle of botcraft: fighting bots in online games with human observational proofs. In Proceedings of the 16th ACM conference on Computer and Communications Security, Chicago, IL, USA, 2009.
[14]
Steven Gianvecchio, Mengjun Xie, Zhenyu Wu, and Haining Wang. Measurement and classification of humans and bots in internet chat. In Proceedings of the 17th USENIX Security symposium, San Jose, CA, 2008.
[15]
Minas Gjoka, Maciej Kurant, Carter T Butts, and Athina Markopoulou. Walking in facebook: A case study of unbiased sampling of osns. In Proceedings of the 27th IEEE International Conference on Computer Communications, San Diego, CA, USA, March 2010.
[16]
Google. Google safe browsing API. http://code.google.com/apis/safebrowsing/ {Accessed: Feb. 5, 2010}.
[17]
Paul Graham. A plan for spam, 2002. http://www.paulgraham.com/spam.html {Accessed: Jan. 25, 2008}.
[18]
Monika R. Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc Najork. On near-uniform url sampling. In Proceedings of the 9th International World Wide Web Conference on Computer Networks, Amsterdam, The Netherlands, May 2000.
[19]
Christopher M. Hill and Linda C. Malone. Using simulated data in support of research on regression analysis. In WSC '04: Proceedings of the 36th conference on Winter simulation, 2004.
[20]
B A Huberman and T Hogg. Complexity and adaptation. Phys. D, 2(1--3), 1986.
[21]
A. L. Hughes and L. Palen. Twitter adoption and use in mass convergence and emergency events. In Proceedings of the 6th International ISCRAM Conference, Gothenburg, Sweden, May 2009.
[22]
H. Husna, S. Phithakkitnukoon, and R. Dantu. Traffic shaping of spam botnets. In Proceedings of the 5th IEEE Conference on Consumer Communications and Networking, Las Vegas, NV, USA, January 2008.
[23]
Bernard J. Jansen, Mimi Zhang, Kate Sobel, and Abdur Chowdury. Twitter power: Tweets as electronic word of mouth. American Society for Information Science and Technology, 60(11), 2009.
[24]
Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, San Jose, CA, USA, 2007.
[25]
Balachander Krishnamurthy, Phillipa Gill, and Martin Arlitt. A few chirps about twitter. In Proceedings of the First Workshop on Online Social Networks, Seattle, WA, USA, 2008.
[26]
G. J. McLachlan. Discriminant Analysis and Statistical Pattern Recognition. Wiley Interscience, 2004.
[27]
Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, San Diego, CA, USA, 2007.
[28]
A Porta, G Baselli, D Liberati, N Montano, C Cogliati, T Gnecchi-Ruscone, A Malliani, and S Cerutti. Measuring regularity by means of a corrected conditional entropy in sympathetic outflow. Biological Cybernetics, Vol. 78(No. 1), January 1998.
[29]
P. Real. A generalized analysis of variance program utilizing binary logic. In ACM '59: Preprints of papers presented at the 14th national meeting of the Association for Computing Machinery, New York, NY, USA, 1959.
[30]
Erick Schonfeld. Costolo: Twitter now has 190 million users tweeting 65 million times a day. http://techcrunch.com/2010/06/08/twitter-190-million-users/ {Accessed: Sept. 26, 2010}.
[31]
Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, Vol. 34(No. 1), 2002.
[32]
Kate Starbird, Leysia Palen, Amanda Hughes, and Sarah Vieweg. Chatter on the red: What hazards threat reveals about the social life of microblogged information. In Proceedings of the ACM 2010 Conference on Computer Supported Cooperative Work, February 2010.
[33]
Statsoft. Statistica, a statistics and analytics software package developed by statsoft. http://www.statsoft.com/support/download/brochures/ {Accessed: Mar. 12, 2010}.
[34]
Brett Stone-Gross, Marco Cova, Lorenzo Cavallaro, Bob Gilbert, Martin Szydlowski, Richard Kemmerer, Christopher Kruegel, and Giovanni Vigna. Your botnet is my botnet: analysis of a botnet takeover. In Proceedings of the 16th ACM conference on Computer and Communications Security, Chicago, IL, USA, 2009.
[35]
J. Sutton, Leysia Palen, and Irina Shlovski. Back-channels on the front lines: Emerging use of social media in the 2007 southern california wildfires. In Proceedings of the 2008 ISCRAM Conference, Washington, DC, USA, May 2008.
[36]
Alan M. Turing. Computing machinery and intelligence. Mind, Vol. 59:433--460, 1950.
[37]
Tweetadder. Automatic twitter software. http://www.tweetadder.com/ {Accessed: Feb. 5, 2010}.
[38]
Twitter. How to report spam on twitter. http://help.twitter.com/entries/64986 {Accessed: May. 30, 2010}.
[39]
Twitter. Twitter api wiki. http://apiwiki.twitter.com/ {Accessed: Feb. 5, 2010}.
[40]
Mengjun Xie, Zhenyu Wu, and Haining Wang. Honeyim: Fast detection and suppression of instant messaging malware in enterprise-like networks,. In Proceedings of the 23rd Annual Computer Security Applications Conference, Miami Beach, FL, USA, 2007.
[41]
Mengjun Xie, Heng Yin, and Haining Wang. An effective defense against email spam laundering. In Proceedings of the 13th ACM conference on Computer and Communications Security, Alexandria, VA, USA, 2006.
[42]
Jeff Yan. Bot, cyborg and automated turing test. In Proceedings of the 14th International Workshop on Security Protocols, Cambridge, UK, March 2006.
[43]
Sarita Yardi, Daniel Romero, Grant Schoenebeck, and Danah Boyd. Detecting spam in a twitter network. First Monday, 15(1), January 2010.
[44]
Jonathan A. Zdziarski. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. No Starch Press, 2005.
[45]
Dejin Zhao and Mary Beth Rosson. How and why people twitter: the role that micro-blogging plays in informal communication at work. In Proceedings of the ACM 2009 International Conference on Supporting Group Work, Sanibel Island, FL, USA, 2009.

Cited By

View all
  • (2025)HyperSMOTE-MC: Enhancing Multiclass Bot Detection on X Through Hypergraph-Based ResamplingSocial Networks Analysis and Mining10.1007/978-3-031-78554-2_11(171-186)Online publication date: 25-Jan-2025
  • (2024)A Cyborg Walk for Urban Analysis? From Existing Walking Methodologies to the Integration of Machine LearningLand10.3390/land1308121113:8(1211)Online publication date: 6-Aug-2024
  • (2024)Detecting Fake Accounts on Social Media Portals—The X Portal Case StudyElectronics10.3390/electronics1313254213:13(2542)Online publication date: 28-Jun-2024
  • Show More Cited By

Index Terms

  1. Who is tweeting on Twitter: human, bot, or cyborg?

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ACSAC '10: Proceedings of the 26th Annual Computer Security Applications Conference
    December 2010
    419 pages
    ISBN:9781450301336
    DOI:10.1145/1920261
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • ACSA: Applied Computing Security Assoc

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 December 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Twitter
    2. automatic identification
    3. bot
    4. cyborg

    Qualifiers

    • Research-article

    Conference

    ACSAC '10
    Sponsor:
    • ACSA

    Acceptance Rates

    Overall Acceptance Rate 104 of 497 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)161
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)HyperSMOTE-MC: Enhancing Multiclass Bot Detection on X Through Hypergraph-Based ResamplingSocial Networks Analysis and Mining10.1007/978-3-031-78554-2_11(171-186)Online publication date: 25-Jan-2025
    • (2024)A Cyborg Walk for Urban Analysis? From Existing Walking Methodologies to the Integration of Machine LearningLand10.3390/land1308121113:8(1211)Online publication date: 6-Aug-2024
    • (2024)Detecting Fake Accounts on Social Media Portals—The X Portal Case StudyElectronics10.3390/electronics1313254213:13(2542)Online publication date: 28-Jun-2024
    • (2024)Social Media as an Agent of Influence: Twitter Bots in Russia - Ukraine WarGüvenlik Stratejileri Dergisi10.17752/guvenlikstrtj.139670520:47(99-122)Online publication date: 26-Apr-2024
    • (2024)The “Russian bots” between social and technological: Examining the ordinary folk theories of Twitter usersNew Media & Society10.1177/14614448241255692Online publication date: 27-May-2024
    • (2024)Automatic Construction of Expiration Time Expression Dataset from RetweetsCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651471(545-548)Online publication date: 13-May-2024
    • (2024)Implementing Machine Learning Approaches to Identify Fabricated Profiles2024 International Conference on Science Technology Engineering and Management (ICSTEM)10.1109/ICSTEM61137.2024.10560730(1-7)Online publication date: 26-Apr-2024
    • (2024)A new sociology of humans and machinesNature Human Behaviour10.1038/s41562-024-02001-88:10(1864-1876)Online publication date: 22-Oct-2024
    • (2024)Misinformation on social platforms: A review and research agendaTechnology in Society10.1016/j.techsoc.2024.102654(102654)Online publication date: Jul-2024
    • (2024)Cutting through the noise to motivate people: A comprehensive analysis of COVID-19 social media posts de/motivating vaccinationNatural Language Processing Journal10.1016/j.nlp.2024.1000858(100085)Online publication date: Sep-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media