Skip to main content

Bots and Gender Detection on Twitter Using Stylistic Features

  • Conference paper
  • First Online:
Advances in Computational Collective Intelligence (ICCCI 2022)

Abstract

This paper describes our proposed method for the author profiling task at PAN 2019. The aim of this task is to identify the type of a Twitter user (i.e. bot or human). Then, in case of a human, determine its gender (i.e. male or female). Our approach uses a set of language-independent features and it applies machine learning algorithms. After an in-depth experimental study, conducted on English and Spanish datasets, we show that by using a simple set of stylistic information, we can surpass other existing methods that mainly depend on the content of the tweets. For the English dataset, accuracies of 93.06% and 90.04% are obtained for bot an gender classification tasks respectively. Using Spanish tweets, accuracies of 90.53% and 89.11% are achieved for bot and gender detection task respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Daneshvar, S., Inkpen, D.: Gender identification in twitter using n-grams and lsa. In: Proceedings of the Ninth International Conference of the CLEF Association (CLEF 2018) (2018)

    Google Scholar 

  2. Fatima, M., Hasan, K., Anwar, S., Nawab, R.M.A.: Multilingual author profiling on Facebook. Inf. Process. Manage. 53(4), 886–904 (2017)

    Article  Google Scholar 

  3. Rangel Pardo, F.M., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd Author Profiling Task at PAN 2015. In: CLEF 2015 Evaluation Labs and Workshop Working Notes Papers, pp. 1–8 (2015)

    Google Scholar 

  4. Juola, P.: Industrial uses for authorship analysis. Mathematics and Computers in Sciences and Industry, pp. 21–25 D(2015)

    Google Scholar 

  5. Subrahmanian, V., Azaria, A., Durst, S., Kagan, V., Galstyan, A., Lerman, K., et al.: The DARPA Twitter bot challenge. Computer 49(6), 38–46 (2016)

    Article  Google Scholar 

  6. Rangel, F., Rosso, P.: Overview of the 7th author profiling task at PAN 2019: bots and gender profiling in Twitter. In: Working Notes Papers of the CLEF 2019 Evaluation Labs. CEUR Workshop, vol. 2380 (2019)

    Google Scholar 

  7. Ouni, S., Fkih, F., Omri, M.N.: Toward a new approach to author profiling based on the extraction of statistical features. Soc. Netw. Anal. Min. 11(1), 1–16 (2021). https://doi.org/10.1007/s13278-021-00768-6

    Article  Google Scholar 

  8. Cai, C., Li, L., Zengi, D.: Behavior enhanced deep bot detection in social media. In: 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 128–30. IEEE (2017)

    Google Scholar 

  9. Fkih, F., Omri, M.N.: Hidden data states-based complex terminology extraction from textual web data model. Appl. Intell. 50(6), 1813–1831 (2020). https://doi.org/10.1007/s10489-019-01568-4

    Article  Google Scholar 

  10. Mabrouk, O., Hlaoua, L., Omri, M.N.: Exploiting Ontology Information in Fuzzy SVM Social Media Profile Classification. Applied Intelligence, September 2020

    Google Scholar 

  11. Mahmoud, R., Belgacem, S., Omri, M.N.: Deep signature-based isolated and large scale continuous gesture recognition approach. J. King Saud Univ.-Comput. Inf. Sci. (2020)

    Google Scholar 

  12. Mahmoud, R., Belgacem, S., Omri, M.N.: Towards wide-scale continuous gesture recognition model for in-depth and grayscale input videos. Int. J. Mach. Learn. Cybern. 12(4), 1173–1189 (2021). https://doi.org/10.1007/s13042-020-01227-y

    Article  Google Scholar 

  13. Mabrouk, O., Hlaoua, L., Omri, M.N.: Exploiting ontology information in fuzzy SVM social media profile classification. Appl. Intell. 51(6), 3757–3774 (2020). https://doi.org/10.1007/s10489-020-01939-2

    Article  Google Scholar 

  14. Mabrouk, O., Hlaoua, L., Omri, M.N.: Profile categorization system based on features reduction. In: ISAIM (2018)

    Google Scholar 

  15. Mabrouk, O., Hlaoua, L., Omri, M.N.: Fuzzy twin SVM based-profile categorization approach. In: 2018 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pp. 547–553. IEEE (2018)

    Google Scholar 

  16. Hall, A., Terveen, L., Halfaker, A.: Bot detection in Wikidata using behavioral and other informal cues. In: Proceedings of the ACM on Human-Computer Interaction, vol. 2(CSCW), pp. 1–18 (2018)

    Google Scholar 

  17. Yang, K.C., Varol, O., Davis, C.A., Ferrara, E., Flammini, A., Menczer, F.: Arming the public with artificial intelligence to counter social bots. Hum. Behav. Emerg. Technol. 1(1), 48–61 (2019)

    Article  Google Scholar 

  18. Dickerson, J.P., Kagan, V., Subrahmanian, V.S.: Using sentiment to detect bots on twitter: are humans more opinionated than bots? In: 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), p. 620–7. IEEE (2014)

    Google Scholar 

  19. Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Trans. Dependable Secure Comput. 9(6), 811–24 (2012)

    Article  Google Scholar 

  20. Ashraf S, Iqbal HR, Nawab RMA. Cross-Genre Author Profile Prediction Using Stylometry-Based Approach. In: CLEF (Working Notes). Citeseer, pp. 992–9 (2016)

    Google Scholar 

  21. Bartle A, Zheng J. Gender classification with deep learning. Stanfordcs, 224d Course Project Report. 2015:1–7

    Google Scholar 

  22. Safara, F., Mohammed, A.S., Potrus, M.Y., Ali, S., Tho, Q.T., Souri, A., et al.: An Author Gender Detection Method Using Whale Optimization Algorithm and Artificial Neural Network. IEEE Access. 8, 48428–48437 (2020)

    Article  Google Scholar 

  23. Flekova, L., Preoţiuc-Pietro, D., Ungar, L.: Exploring stylistic variation with age and income on Twitter. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 313–319 (2016)

    Google Scholar 

  24. Kovács G, Balogh V, Mehta P, Shridhar K, Alonso P, Liwicki M. Author Profiling using Semantic and Syntactic Features. In: CLEF (Working Notes); 2019

    Google Scholar 

  25. Fkih, F., Omri, M.N.: Estimation of a priori decision threshold for collocations extraction: an empirical study. Int. J. Inf. Technol. Web Eng. (IJITWE) 8(3), 34–49 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarra Ouni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ouni, S., Fkih, F., Omri, M.N. (2022). Bots and Gender Detection on Twitter Using Stylistic Features. In: Bădică, C., Treur, J., Benslimane, D., Hnatkowska, B., Krótkiewicz, M. (eds) Advances in Computational Collective Intelligence. ICCCI 2022. Communications in Computer and Information Science, vol 1653. Springer, Cham. https://doi.org/10.1007/978-3-031-16210-7_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16210-7_53

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16209-1

  • Online ISBN: 978-3-031-16210-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics