ABSTRACT
In this work, we address the question of whether the authorship of a single tweet can be successfully identified (and in a mixed set with other authors). Here, we present a new authorship identification scheme, which is useful in detecting authorship of short texts such as tweets, in case where only single messages are available. Our authorship identification scheme relies on selecting features that work for the special setting and combine them in order to obtain a better accuracy. This technique demonstrates significant results through out our experiments. Our results can be used to detect authors of illegitimate tweets, fake tweets in a Twitter account or break the privacy of a multi-user account by showing the authors who participate in it.
- {n. d.}. Twitter Blogs. Following rules and best practices. ({n. d.}). https://support.twitter.com/entries/68916-following-rules-and-best-practices.Google Scholar
- Ahmed Abbasi and Hsinchun Chen. 2008. Writeprints: A Stylometric Approach to Identity-level Identification and Similarity Detection in Cyberspace. ACM Trans. Inf. Syst. 26, 2, Article 7 (April 2008), 29 pages. Google ScholarDigital Library
- Sadia Afroz, Aylin Caliskan Islam, Ariel Stolerman, Rachel Greenstadt, and Damon McCoy. 2014. Doppelganger Finder: Taking Stylometry to the Underground. In Proceedings of the 2014 IEEE Symposium on Security and Privacy (SP '14). IEEE Computer Society, Washington, DC, USA, 212--226. Google ScholarDigital Library
- Mudit Bhargava, Pulkit Mehndiratta, and Krishna Asawa. 2013. Stylometric Analysis for Authorship Attribution on Twitter. In Proceedings of the Second International Conference on Big Data Analytics - Volume 8302 (BDA 2013). Springer-Verlag New York, Inc., New York, NY, USA, 37--47. Google ScholarDigital Library
- Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explor. Newsl. 11, 1 (Nov. 2009), 10--18. Google ScholarDigital Library
- Twitter Inc. 2014. Twitter4J API. (2014). http://twitter4j.org/.Google Scholar
- Robert Layton, Paul Watters, and Richard Dazeley. 2010. Authorship Attribution for Twitter in 140 Characters or Less. In Proceedings of the 2010 Second Cyber-crime and Trustworthy Computing Workshop (CTC '10). IEEE Computer Society, Washington, DC, USA, 1--8. Google ScholarDigital Library
- Christopher D. Manning and Hinrich Schütze. 1999. Foundations of statistical natural language processing. MIT Press, Cambridge, MA, USA. Google ScholarDigital Library
- Mishari Al Mishari, Dali Kaafar, Gene Tsudik, and Ekin Oguz. 2014. Are 140 Characters Enough? A Large-Scale Linkability Study of Tweets. CoRR abs/1406.2746 (2014). http://arxiv.org/abs/1406.2746Google Scholar
- Mishari Al Mishari and Gene Tsudik. 2011. Exploring Linkablility of Community Reviewing. CoRR abs/1111.0338 (2011).Google Scholar
- T. M. Mitchell. 1997. Machine learning. McGraw Hill, New York. Google ScholarDigital Library
- Arvind Narayanan, Hristo Paskov, Neil Zhenqiang Gong, John Bethencourt, Eui Chul, Richard Shin, and Dawn Song. 2012. On the Feasibility of Internet-scale Author Identification. In Proceedings of the 33rd conference on IEEE Sympsoium on Security and Privacy. IEEE. Google ScholarDigital Library
- Arvind Narayanan and Vitaly Shmatikov. 2008. Robust De-anonymization of Large Sparse Datasets. In Proceedings of the 2008 IEEE Symposium on Security and Privacy (SP '08). IEEE Computer Society, Washington, DC, USA, 111--125. Google ScholarDigital Library
- Telegraph News. {n. d.}. Female MPs were sent 25,000 abusive Twitter messages in just six months - with half of them directed at Diane Abbott. ({n. d.}). http://www.telegraph.co.uk/news/2017/09/04/female-mps-sent-25000-abusive-twitter-messages-just-six-months/.Google Scholar
- Rebekah Overdorf and Rachel Greenstadt. 2016. Blogs, Twitter Feeds, and Reddit Comments: Cross-domain Authorship Attribution. Proceedings on Privacy Enhancing Technologies 3 (July 2016), 155--171.Google ScholarCross Ref
- Roy Schwartz, Oren Tsur, Ari Rappoport, and Moshe Koppel. 2013. Authorship Attribution of Micro-Messages. In EMNLP. ACL, 1880--1891. http://dblp.uni-trier.de/db/conf/emnlp/emnlp2013.html#SchwartzTRK13Google Scholar
- Rui Sousa Silva, Gustavo Laboreiro, Luís Sarmento, Tim Grant, Eugénio Oliveira, and Belinda Maia. 2011. 'Twazn Me!!! ;('Automatic Authorship Analysis of Micro-blogging Messages. In Proceedings of the 16th International Conference on Natural Language Processing and Information Systems (NLDB'11). Springer-Verlag, Berlin, Heidelberg, 161--168. http://dl.acm.org/citation.cfm?id=2026011.2026029 Google ScholarDigital Library
- Jonghyuk Song, Sangho Lee, and Jong Kim. 2015. CrowdTarget: Target-based Detection of Crowdturfing in Online Social Networks. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security (CCS '15). ACM, New York, NY, USA, 793--804. Google ScholarDigital Library
- Efstathios Stamatatos. 2009. A Survey of Modern Authorship Attribution Methods. J. Am. Soc. Inf. Sci. Technol. 60, 3 (March 2009), 538--556. Google ScholarCross Ref
- Efstathios Stamatatos, George Kokkinakis, and Nikos Fakotakis. 2000. Automatic Text Categorization in Terms of Genre and Author. Comput. Linguist. 26, 4 (Dec. 2000), 471--495. Google ScholarDigital Library
- www.tripwire.com. {n. d.}. A Guide on 5 Common Twitter Scams. ({n. d.}). https://www.tripwire.com/state-of-security/security-awareness/a-guide-on-5-common-twitter-scams/.Google Scholar
- Rong Zheng, Jiexun Li, Hsinchun Chen, and Zan Huang. 2006. A Framework for Authorship Identification of Online Messages: Writing-style Features and Classification Techniques. Journal of the American Society for Information Science and Technology 57, 3 (2006), 378--393. Google ScholarDigital Library
- Rong Zheng, Jiexun Li, Hsinchun Chen, and Zan Huang. 2006. A Framework for Authorship Identification of Online Messages: Writing-style Features and Classification Techniques. J. Am. Soc. Inf. Sci. Technol. 57, 3 (Feb. 2006), 378--393. Google ScholarDigital Library
Index Terms
- On De-anonymization of Single Tweet Messages
Recommendations
Predicting Tweet Retweetability during Hurricane Disasters
Twitter is a vital source for obtaining information, especially during events such as natural disasters. Users can spread information on Twitter either by crafting new posts, which are called "tweets," or by using the retweet mechanism to re-post ...
Multitask learning for blackmarket tweet detection
ASONAM '19: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and MiningOnline social media platforms have made the world more connected than ever before, thereby making it easier for everyone to spread their content across a wide variety of audiences. Twitter is one such popular platform where people publish tweets to ...
Tweet Properly: Analyzing Deleted Tweets to Understand and Identify Regrettable Ones
WWW '16: Proceedings of the 25th International Conference on World Wide WebInappropriate tweets can cause severe damages on authors' reputation or privacy. However, many users do not realize the negative consequences until they publish these tweets. Published tweets have lasting effects that may not be eliminated by simple ...
Comments