Skip to main content

DeepStyle: User Style Embedding for Authorship Attribution of Short Texts

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12318))

  • 1534 Accesses

Abstract

Authorship attribution (AA), which is the task of finding the owner of a given text, is an important and widely studied research topic with many applications. Recent works have shown that deep learning methods could achieve significant accuracy improvement for the AA task. Nevertheless, most of these proposed methods represent user posts using a single type of features (e.g., word bi-grams) and adopt a text classification approach to address the task. Furthermore, these methods offer very limited explainability of the AA results. In this paper, we address these limitations by proposing DeepStyle, a novel embedding-based framework that learns the representations of users’ salient writing styles. We conduct extensive experiments on two real-world datasets from Twitter and Weibo. Our experiment results show that DeepStyle outperforms the state-of-the-art baselines on the AA task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Code implementation: https://gitlab.com/bottle_shop/style/deepstyle.

  2. 2.

    https://hub.hku.hk/cris/dataset/dataset107483.

References

  1. Boenninghoff, B., Hessler, S., Kolossa, D., Nickel, R.: Explainable authorship verification in social media via attention-based similarity learning. In: 2019 IEEE International Conference on Big Data (Big Data). IEEE (2019)

    Google Scholar 

  2. Bu, Z., Xia, Z., Wang, J.: A sock puppet detection algorithm on virtual spaces. Knowl.-Based Syst. 37, 366–377 (2013)

    Article  Google Scholar 

  3. Cheng, D., Gong, Y., Zhou, S., Wang, J., Zheng, N.: Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In: IEEE CVPR (2016)

    Google Scholar 

  4. Ding, S.H., Fung, B.C., Iqbal, F., Cheung, W.K.: Learning stylometric representations for authorship analysis. IEEE Trans. Cybern. 49(1), 107–121 (2017)

    Article  Google Scholar 

  5. Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inform. Sci. Technol. 60(1), 9–26 (2009)

    Article  Google Scholar 

  6. Koppel, M., Winter, Y.: Determining if two documents are written by the same author. J. Assoc. Inform. Sci. Technol. 65(1), 178–187 (2014)

    Article  Google Scholar 

  7. Layton, R., Watters, P., Dazeley, R.: Authorship attribution for twitter in 140 characters or less. In: IEEE Cybercrime and Trustworthy Computing Workshop (2010)

    Google Scholar 

  8. Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. In: IJCAI (2016)

    Google Scholar 

  9. Rocha, A., et al.: Authorship attribution for social media forensics. IEEE Trans. Inf. Forensics Secur. 12(1), 5–33 (2016)

    Article  Google Scholar 

  10. Ruder, S., Ghaffari, P., Breslin, J.G.: Character-level and multi-channel convolutional neural networks for large-scale authorship attribution. Insight Centre for Data Analytics. National University of Ireland Galway, Technical Report (2016)

    Google Scholar 

  11. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: NIPS (2017)

    Google Scholar 

  12. Sapkota, U., Bethard, S., Montes, M., Solorio, T.: Not all character N-grams are created equal: a study in authorship attribution. In: NAACL (2015)

    Google Scholar 

  13. Sari, Y., Stevenson, M., Vlachos, A.: Topic or style? Exploring the most useful features for authorship attribution. In: COLING (2018)

    Google Scholar 

  14. Schwartz, R., Tsur, O., Rappoport, A., Koppel, M.: Authorship attribution of micro-messages. In: EMNLP (2013)

    Google Scholar 

  15. Shrestha, P., Sierra, S., Gonzalez, F., Montes, M., Rosso, P., Solorio, T.: Convolutional neural networks for authorship attribution of short texts. In: EACL (2017)

    Google Scholar 

  16. Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60(3), 538–556 (2009)

    Article  Google Scholar 

  17. Sundararajan, K., Woodard, D.: What represents "style" in authorship attribution? In: COLING (2018)

    Google Scholar 

  18. Xiao, C., Freeman, D.M., Hwa, T.: Detecting clusters of fake accounts in online social networks. In: ACM Workshop on Artificial Intelligence and Security (2015)

    Google Scholar 

  19. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A.J., Hovy, E.H.: Hierarchical attention networks for document classification. In: NAACL: HLT (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roy Ka-Wei Lee .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 273 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hu, Z., Lee, R.KW., Wang, L., Lim, Ep., Dai, B. (2020). DeepStyle: User Style Embedding for Authorship Attribution of Short Texts. In: Wang, X., Zhang, R., Lee, YK., Sun, L., Moon, YS. (eds) Web and Big Data. APWeb-WAIM 2020. Lecture Notes in Computer Science(), vol 12318. Springer, Cham. https://doi.org/10.1007/978-3-030-60290-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60290-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60289-5

  • Online ISBN: 978-3-030-60290-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics