Skip to main content

Content-Based Authorship Identification for Short Texts in Social Media Networks

  • Conference paper
  • First Online:
Hybrid Artificial Intelligent Systems (HAIS 2021)

Abstract

Today social networks contain a high number of false profiles that can carry out malicious actions on other users, such as radicalization or defamation. This makes it necessary to be able to identify the same false profile and its behaviour on different social networks in order to take action against it. To this end, this article presents a new approach based on behavior analysis for the identification of text authorship in social networks.

The work presented in this paper was supported by the European Commission under contract H2020-700367 DANTE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The DANTE (Detecting and analysing terrorist-related online contents and financing activities) project aims to deliver more effective, efficient, automated data mining and analytics solutions and an integrated system to detect, retrieve, collect and analyse huge amount of heterogeneous and complex multimedia and multi-language terrorist-related contents, from both the Surface and the Deep Web, including Dark nets. More information at http://www.h2020-dante.eu/.

  2. 2.

    Stop words are the most common words in a language, which are normally filtered out during the pre-processing step of a natural language processing experiment.

  3. 3.

    More information at https://spark.apache.org/mllib/.

  4. 4.

    The cosine similarity is a measure that calculates the cosine of the angle between two vectors (orientation). It is commonly used for measuring the similarity between two documents represented in a normalized vector space model.

References

  1. Boyd, D.M., Ellison, N.B.: Social network sites: definition, history, and scholarship. J. Comput.-Mediat. Commun. 13(1), 210–230 (2007)

    Article  Google Scholar 

  2. Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), pp. 136–140. IEEE, Beijing (2015)

    Google Scholar 

  3. Agarwal, A., and Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis of Twitter data. In: Proceedings of the Workshop on Languages in Social Media (LSM 2011), pp. 30–38. Association for Computational Linguistics, Portland (2011)

    Google Scholar 

  4. Khonji, M., Iraqi, Y., Jones, A.: Mitigation of spear phishing attacks: a content-based Authorship Identification framework. In: 2011 International Conference for Internet Technology and Secured Transactions, pp. 416–421. IEEE, Abu Dabi (2010)

    Google Scholar 

  5. Chunxia, Z., Xindong, W., Zhendong, N., Wei, D.: Authorship identification from unstructured texts. Knowl.-Based Syst. 66, 99–111 (2014)

    Article  Google Scholar 

  6. Galán-García, P., Puerta, J.G.D.L., Gómez, C.L., Santos, I., Bringas, P.G.: Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying. Log. J. IGPL 24(1), 42–53 (2016)

    MathSciNet  Google Scholar 

  7. Webster, J.J., Kit, C.: Tokenization as the initial phase in NLP. In: Proceedings of the 14th Conference on Computational Linguistics, pp. 1106–1110. Association for Computational Linguistics (1992)

    Google Scholar 

  8. Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  9. Salton, G., McGill, M.J.: Book Title. McGraw-Hill, Inc. (1986)

    Google Scholar 

  10. Cavnar, W.B., Trenkle, J.M.: N-gram-based text categorization. In: Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, vol. 161175 (1994)

    Google Scholar 

  11. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. Adv. Neural. Inf. Process. Syst. 26, 3111–3119 (2010)

    Google Scholar 

  12. Morin, F., Bengio, Y.: Hierarchical probabilistic neural network language model. Aistats 5, 246–252 (2005)

    Google Scholar 

  13. Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: 31st International Conference on Machine Learning, Beijing, pp. 1188–1196 (2014)

    Google Scholar 

  14. Foltz, P.W., Kintsch, W., Landauer, T.K.: The measurement of textual coherence with latent semantic analysis. Discour. Process. 25(2,4), 285–307 (1998)

    Article  Google Scholar 

  15. Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642. Association for Computational Linguistics, Seattle (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Gaviria de la Puerta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

de la Puerta, J.G. et al. (2021). Content-Based Authorship Identification for Short Texts in Social Media Networks. In: Sanjurjo González, H., Pastor López, I., García Bringas, P., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2021. Lecture Notes in Computer Science(), vol 12886. Springer, Cham. https://doi.org/10.1007/978-3-030-86271-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86271-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86270-1

  • Online ISBN: 978-3-030-86271-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics