Abstract
As marketing based upon the social media profiles is growing at a great pace, finding authenticity of social media accounts is vital for brands. This paper addresses the task of user classification in the micro blogging social media Twitter. We aim to identify whether a Twitter handle is a real person or not. It is done in two steps. First, we segregate human and bot twitter handles to discard the latter. Secondly, we classify whether a human twitter handle is a real person or non-person, e.g., an organization. For the first step, we use a Twitterati identification system [16] which computes various statistical measures from the tweets and use them to segregate human and bot twitter handles. For the second step we use two most widely used and well performing classifiers linear regression (LR) and support vector machine (SVM) for classification of human twitter handles as real person or non-person. We find that SVM outperforms LR. Moreover, the performance of SVM (F1-score = 0.9310) indicates that the proposed method may be used practically in real-life application.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M., Maynard, D., Aswani, N.: TwitIE: an open-source information extraction pipeline for microblog text. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing. Association for Computational Linguistics (2013)
Bruce, R.F., Wiebe, J.M.: Recognizing subjectivity: a case study in manual tagging. Nat. Lang. Eng. 5(2), 187–205 (1999)
Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Detecting automation of twitter accounts: Are you a human, bot, or cyborg? IEEE Trans. Dependable Secur. Comput. 9(6), 811–824 (2012)
Eberhardt, J.J.: Bayesian spam detection. Sch. Horiz. Univ. Minn. Morris Undergraduate J. 2(1) (2015)
Gunn, S.R.: Support vector machines for classification and regression. Isis technical report, School of Electronics and Computer Science, University of Southampton (1998)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd. edn. Morgan Kaufmann (2011)
Kamal, A.: Subjectivity classification using machine learning techniques for mining feature-opinion pairs from web opinion sources (2013). http://arxiv.org/abs/1312.6962
Klout: Be known for what you love (2014). https://klout.com/home/
Lee, A.J., Seber, G.A.F.: Linear Regression Analysis, 2nd edn. Wiley, Hoboken (2003)
McCord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: Proceedings of the International Conference on Autonomic and Trusted Computing (ATC), pp. 175–186. Springer, Berlin, Heidelberg (2011)
Quinlan, R.: C4.5, March 2014. https://ww.mgt.ncu.edu.tw/wabble/School/C45.ppt
Riloff, E.M., Phillips, W.: Introduction to the sundance and autoslog systems. Technical Report UUCS-04-015, 1-47.7, School of Computing: The University of Utah (2004)
Tao, K., Abel, F., Hauff, C., Houben, G.J., Gadiraju, U.: Groundhog day: near-duplicate detection on twitter. In: Proceedings of the 22nd International Conference on World Wide Web (WWW), pp. 1273–1284. ACM, New York (2013)
Weka: Data mining with open source machine learning (2014). https://www.cs.waikato.ac.nz/ml/weka
Wiebe, J., Riloff, E.: Creating subjective and objective sentence classifiers from unannotated texts. In: Proceedings of the 6th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), pp. 486–497 (2005)
Winnie Main, N.S.: Twitterati identification system. In: Proceedings of the International Conference on Advanced Computing Technologies and Applications (ICACTA), vol. 45, pp. 32–41 (2015)
Yang, C., Harkreader, R., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. Trans. Info. For. Sec. 8(8), 1280–1293 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Budania, H., Singh, P.K. (2018). Person Versus Non-person Classification of Twitter Handle. In: Abraham, A., Muhuri, P., Muda, A., Gandhi, N. (eds) Hybrid Intelligent Systems. HIS 2017. Advances in Intelligent Systems and Computing, vol 734. Springer, Cham. https://doi.org/10.1007/978-3-319-76351-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-76351-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76350-7
Online ISBN: 978-3-319-76351-4
eBook Packages: EngineeringEngineering (R0)