Skip to main content
Log in

Emoji Helps! A Multi-modal Siamese Architecture for Tweet User Verification

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

In the current paper, we have proposed a new multi-modal authorship verification approach for social media texts. Authorship verification is a task of verifying whether an unknown text is written by a suspect or not. Use of social media like Facebook and Twitter is increasing day by day because of digitization. People have grown accustomed to regularly post or tweet about their everyday life, memorable incidences, random thoughts, opinions, and much more. Emojis are widely used in these tweets and posts. The writing style of a user can differ from others, since word choices, sentence structures, usage of punctuation symbols, and use of emoji can be different. We have applied a multi-modal Siamese-based framework for automatic extraction of features from the given texts and emojis. After the extraction of features, the extracted features are applied to a neural network–based architecture for binary classification. A multi-modal Twitter-based dataset is created for evaluating the performance of the proposed framework. We obtained an average accuracy of 61.56% with 78.08%, 61.50%, and 58.32% precision, recall, and f-measure values, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://www.seventeen.com/celebrity/a25486/celebs-favorite-emojis-decoded/

  2. https://www.lightworkers.com/celebs-emojis/

  3. http://www.mtv.com.au/style/pictures/celebrities-most-used-emojis

  4. https://www.reddit.com/r/datasets/comments/6fniik/overonemilliontweetscollectedfromus/

  5. https://archive.ics.uci.edu/ml/datasets/Reuter_50_50

  6. https://pan.webis.de/clef19/pan19-web/

  7. The dataset can be received for research purpose by mailing the authors.

References

  1. Ahmed H, et al. 2019. Sample size in arabic authorship verification. Association for Computational Linguistics.

  2. Ahmed H. Dynamic similarity threshold in authorship verification: evidence from classical arabic. Procedia Comput Sci 2017;117:145–52.

    Article  Google Scholar 

  3. Ahmed H. The role of linguistic feature categories in authorship verification. Procedia Comput Sci 2018;142: 214–21.

    Article  Google Scholar 

  4. Ahmed H. Distance-based authorship verification across modern standard arabic genres. Proceedings of the 3rd workshop on arabic corpus linguistics; 2019. p. 89–96.

  5. Al-Ghadir AI, Azmi AM. A study of arabic social media users—posting behavior and author’s gender prediction. Cogn Comput 2019;11(1):71–86.

    Article  Google Scholar 

  6. Bagnall D. 2015.

  7. Bartoli A, Dagri A, De Lorenzo A, Medvet E, Tarlao F. An author verification approach based on differential features. Conference and labs of the evaluation forum. CEUR; 2015.

  8. Bevendorff J, Hagen M, Stein B, Potthast M. Bias analysis and mitigation in the evaluation of authorship verification. Proceedings of the 57th annual meeting of the association for computational linguistic; 2019. p. 6301–6.

  9. Boumber D, Zhang Y, Hosseinia M, Mukherjee A, Vilalta R. 2019. Robust authorship verification with transfer learning. Tech. rep., EasyChair.

  10. Brocardo ML, Traore I, Saad S, Woungang I. Authorship verification for short messages using stylometry. 2013 International conference on computer, information and telecommunication systems (CITS). IEEE; 2013. p. 1–6.

  11. Brocardo ML, Traore I, Woungang I. Authorship verification of e-mail and tweet messages applied for continuous authentication. J Comput Syst Sci 2015;81(8):1429–40.

    Article  MathSciNet  Google Scholar 

  12. Bromley J, Guyon I, LeCun Y, Säckinger E, Shah R. Signature verification using a“ siamese” time delay neural network. Advances in neural information processing systems; 1994. p. 737–44.

  13. Canales O, Monaco V, Murphy T, Zych E, Stewart J, Castro CTA, Sotoye O, Torres L, Truley G. 2011. A stylometry system for authenticating students taking online tests. P. of Student-Faculty Research Day, Ed., CSIS. Pace University.

  14. Castro DC, Arcia YA, Brioso MP, Guillena RM. Authorship verification, average similarity analysis. Proceedings of the international conference recent advances in natural language processing; 2015. p. 84–90.

  15. Ding SH, Fung BC, Iqbal F, Cheung WK. Learning stylometric representations for authorship analysis. IEEE Trans Cybern 2017;49(1):107–21.

    Article  Google Scholar 

  16. Eisner B, Rocktäschel T, Augenstein I, Bošnjak M, Riedel S. 2016. emoji2vec: learning emoji representations from their description. arXiv:1609.08359.

  17. Fréry J, Largeron C, Juganaru-Mathieu M. 2014. Ujm at clef in author identification. In: Proceedings CLEF-2014, Working Notes, pp 1042–48.

  18. Frery J, Largeron C, Juganaru-Mathieu M. Ujm at clef in author verification based on optimized classification trees. Proc. int. conf. CLEF notebook PAN; 2014. p. 1042–8.

  19. Halvani O, Graner L, Vogel I. Authorship verification in the absence of explicit features and thresholds. European conference on information retrieval. Springer; 2018. p. 454–65.

  20. Hochreiter S, Schmidhuber J. Long short-term memory. Neur Comput 1997;9(8):1735–80.

    Article  Google Scholar 

  21. Hosseinia M, Mukherjee A. 2018. Experiments with neural networks for small and large scale authorship verification. arXiv:1803.06456.

  22. Hürlimann M, Weck B, van den Berg E, Suster S, Nissim M. Glad: groningen lightweight authorship detection. CLEF (Working Notes); 2015.

  23. Kestemont M, Tschuggnall M, Stamatatos E, Daelemans W, Specht G, Stein B, Potthast M. Overview of the author identification task at pan-2018: cross-domain authorship attribution and style change detection. Working notes papers of the CLEF 2018 evaluation labs. Avignon, France, September 10-14, 2018/Cappellato, Linda [edit.]; et al; 2018. p. 1–25.

  24. Khonji M, Iraqi Y. A slightly-modified gi-based author-verifier with lots of features (asgalf). CLEF (Working Notes) 2014;1180:977–83.

    Google Scholar 

  25. Koch G, Zemel R, Salakhutdinov R. Siamese neural networks for one-shot image recognition. ICML deep learning workshop; 2015.

  26. Kocher M, Savoy J. A simple and efficient algorithm for authorship verification. J Assoc Inform Sci Technol 2017;68(1):259–69.

    Article  Google Scholar 

  27. Koppel M, Schler J. Authorship verification as a one-class classification problem. Proceedings of the twenty-first international conference on machine learning. ACM; 2004 . p. 62.

  28. Koppel M, Schler J, Argamon S. Authorship attribution in the wild. Lang Resour Eval 2011;45(1):83–94.

    Article  Google Scholar 

  29. Koppel M, Schler J, Bonchek-Dokow E. Measuring differentiability: unmasking pseudonymous authors. J Mach Learn Res 2007;8:1261–76.

    MATH  Google Scholar 

  30. Koppel M, Winter Y. Determining if two documents are written by the same author. J Assoc Inform Sci Technol 2014;65(1):178–87.

    Article  Google Scholar 

  31. Li Y, Yang L, Xu B, Wang J, Lin H. 2019. Improving user attribute classification with text and social network attention. Cogn Comput, 1–10.

  32. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems; 2013. p. 3111–9.

  33. Moreau E, Jayapal A, Lynch G, Vogel C. 2015. Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners-notebook for pan at clef 2015.

  34. Pennington J, Socher R, Manning C. Glove: global vectors for word representation. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014. p. 1532–43.

  35. Poria S, Cambria E, Hussain A, Huang GB. Towards an intelligent framework for multimodal affective data analysis. Neur Netw 2015;63:104–16.

    Article  Google Scholar 

  36. Poria S, Chaturvedi I, Cambria E, Hussain A. Convolutional mkl based multimodal emotion recognition and sentiment analysis. 2016 IEEE 16th international conference on data mining (ICDM). IEEE; 2016. p. 439–48.

  37. Potha N, Stamatatos E. A profile-based method for authorship verification. Hellenic conference on artificial intelligence. Springer; 2014. p. 313–26.

  38. Potha N, Stamatatos E. An improved impostors me verification. International conference of the cross-language evaluation forum for European languages. Springer; 2017. p. 138–44.

  39. Potha N, Stamatatos E. Intrinsic author verification using topic modeling. Proceedings of the 10th Hellenic conference on artificial intelligence. ACM; 2018. p. 20.

  40. Potha N, Stamatatos E. Dynamic ensemble selection for author verification. European conference on information retrieval. Springer; 2019. p. 102–15.

  41. Schwartz R, Tsur O, Rappoport A, Koppel M. Authorship attribution of micro-messages. Proceedings of the 2013 conference on empirical methods in natural language processing; 2013. p. 1880–91.

  42. Seidman S. Authorship verification using the impostors method. CLEF 2013 evaluation labs and workshop-online working notes. Citeseer; 2013.

  43. Stamatatos E. Authorship verification: a review of recent advances. Res Comput Sci 2016;123:9–25.

    Article  Google Scholar 

  44. Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing. IEEE Comput Intell Mag 2018;13(3):55–75.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chanchal Suman.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suman, C., Saha, S., Bhattacharyya, P. et al. Emoji Helps! A Multi-modal Siamese Architecture for Tweet User Verification. Cogn Comput 13, 261–276 (2021). https://doi.org/10.1007/s12559-020-09715-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-020-09715-7

Keywords

Navigation