Skip to main content

Person Versus Non-person Classification of Twitter Handle

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 734))

Abstract

As marketing based upon the social media profiles is growing at a great pace, finding authenticity of social media accounts is vital for brands. This paper addresses the task of user classification in the micro blogging social media Twitter. We aim to identify whether a Twitter handle is a real person or not. It is done in two steps. First, we segregate human and bot twitter handles to discard the latter. Secondly, we classify whether a human twitter handle is a real person or non-person, e.g., an organization. For the first step, we use a Twitterati identification system [16] which computes various statistical measures from the tweets and use them to segregate human and bot twitter handles. For the second step we use two most widely used and well performing classifiers linear regression (LR) and support vector machine (SVM) for classification of human twitter handles as real person or non-person. We find that SVM outperforms LR. Moreover, the performance of SVM (F1-score = 0.9310) indicates that the proposed method may be used practically in real-life application.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M., Maynard, D., Aswani, N.: TwitIE: an open-source information extraction pipeline for microblog text. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing. Association for Computational Linguistics (2013)

    Google Scholar 

  2. Bruce, R.F., Wiebe, J.M.: Recognizing subjectivity: a case study in manual tagging. Nat. Lang. Eng. 5(2), 187–205 (1999)

    Article  Google Scholar 

  3. Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Trans. Dependable Secur. Comput. 9(6), 811–824 (2012)

    Article  Google Scholar 

  4. Eberhardt, J.J.: Bayesian spam detection. Sch. Horiz. Univ. Minn. Morris Undergraduate J. 2(1) (2015)

    Google Scholar 

  5. Gunn, S.R.: Support vector machines for classification and regression. Isis technical report, School of Electronics and Computer Science, University of Southampton (1998)

    Google Scholar 

  6. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd. edn. Morgan Kaufmann (2011)

    Google Scholar 

  7. Kamal, A.: Subjectivity classification using machine learning techniques for mining feature-opinion pairs from web opinion sources (2013). http://arxiv.org/abs/1312.6962

  8. Klout: Be known for what you love (2014). https://klout.com/home/

  9. Lee, A.J., Seber, G.A.F.: Linear Regression Analysis, 2nd edn. Wiley, Hoboken (2003)

    MATH  Google Scholar 

  10. McCord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: Proceedings of the International Conference on Autonomic and Trusted Computing (ATC), pp. 175–186. Springer, Berlin, Heidelberg (2011)

    Google Scholar 

  11. Quinlan, R.: C4.5, March 2014. https://ww.mgt.ncu.edu.tw/wabble/School/C45.ppt

  12. Riloff, E.M., Phillips, W.: Introduction to the sundance and autoslog systems. Technical Report UUCS-04-015, 1-47.7, School of Computing: The University of Utah (2004)

    Google Scholar 

  13. Tao, K., Abel, F., Hauff, C., Houben, G.J., Gadiraju, U.: Groundhog day: near-duplicate detection on twitter. In: Proceedings of the 22nd International Conference on World Wide Web (WWW), pp. 1273–1284. ACM, New York (2013)

    Google Scholar 

  14. Weka: Data mining with open source machine learning (2014). https://www.cs.waikato.ac.nz/ml/weka

  15. Wiebe, J., Riloff, E.: Creating subjective and objective sentence classifiers from unannotated texts. In: Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), pp. 486–497 (2005)

    Chapter  Google Scholar 

  16. Winnie Main, N.S.: Twitterati identification system. In: Proceedings of the International Conference on Advanced Computing Technologies and Applications (ICACTA), vol. 45, pp. 32–41 (2015)

    Article  Google Scholar 

  17. Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. Trans. Info. For. Sec. 8(8), 1280–1293 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Himanshu Budania .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Budania, H., Singh, P.K. (2018). Person Versus Non-person Classification of Twitter Handle. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Hybrid Intelligent Systems. HIS 2017. Advances in Intelligent Systems and Computing, vol 734. Springer, Cham. https://doi.org/10.1007/978-3-319-76351-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76351-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76350-7

  • Online ISBN: 978-3-319-76351-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics