Abstract
Today social networks contain a high number of false profiles that can carry out malicious actions on other users, such as radicalization or defamation. This makes it necessary to be able to identify the same false profile and its behaviour on different social networks in order to take action against it. To this end, this article presents a new approach based on behavior analysis for the identification of text authorship in social networks.
The work presented in this paper was supported by the European Commission under contract H2020-700367 DANTE.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The DANTE (Detecting and analysing terrorist-related online contents and financing activities) project aims to deliver more effective, efficient, automated data mining and analytics solutions and an integrated system to detect, retrieve, collect and analyse huge amount of heterogeneous and complex multimedia and multi-language terrorist-related contents, from both the Surface and the Deep Web, including Dark nets. More information at http://www.h2020-dante.eu/.
- 2.
Stop words are the most common words in a language, which are normally filtered out during the pre-processing step of a natural language processing experiment.
- 3.
More information at https://spark.apache.org/mllib/.
- 4.
The cosine similarity is a measure that calculates the cosine of the angle between two vectors (orientation). It is commonly used for measuring the similarity between two documents represented in a normalized vector space model.
References
Boyd, D.M., Ellison, N.B.: Social network sites: definition, history, and scholarship. J. Comput.-Mediat. Commun. 13(1), 210–230 (2007)
Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), pp. 136–140. IEEE, Beijing (2015)
Agarwal, A., and Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of Twitter data. In: Proceedings of the Workshop on Languages in Social Media (LSM 2011), pp. 30–38. Association for Computational Linguistics, Portland (2011)
Khonji, M., Iraqi, Y., Jones, A.: Mitigation of spear phishing attacks: a content-based Authorship Identification framework. In: 2011 International Conference for Internet Technology and Secured Transactions, pp. 416–421. IEEE, Abu Dabi (2010)
Chunxia, Z., Xindong, W., Zhendong, N., Wei, D.: Authorship identification from unstructured texts. Knowl.-Based Syst. 66, 99–111 (2014)
Galán-García, P., Puerta, J.G.D.L., Gómez, C.L., Santos, I., Bringas, P.G.: Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying. Log. J. IGPL 24(1), 42–53 (2016)
Webster, J.J., Kit, C.: Tokenization as the initial phase in NLP. In: Proceedings of the 14th Conference on Computational Linguistics, pp. 1106–1110. Association for Computational Linguistics (1992)
Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Salton, G., McGill, M.J.: Book Title. McGraw-Hill, Inc. (1986)
Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, vol. 161175 (1994)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural. Inf. Process. Syst. 26, 3111–3119 (2010)
Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. Aistats 5, 246–252 (2005)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: 31st International Conference on Machine Learning, Beijing, pp. 1188–1196 (2014)
Foltz, P.W., Kintsch, W., Landauer, T.K.: The measurement of textual coherence with latent semantic analysis. Discour. Process. 25(2,4), 285–307 (1998)
Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642. Association for Computational Linguistics, Seattle (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
de la Puerta, J.G. et al. (2021). Content-Based Authorship Identification for Short Texts in Social Media Networks. In: Sanjurjo González, H., Pastor López, I., García Bringas, P., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2021. Lecture Notes in Computer Science(), vol 12886. Springer, Cham. https://doi.org/10.1007/978-3-030-86271-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-86271-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86270-1
Online ISBN: 978-3-030-86271-8
eBook Packages: Computer ScienceComputer Science (R0)