ABSTRACT
What should a malicious user write next to fool a detection model? Identifying malicious users is critical to ensure the safety and integrity of internet platforms. Several deep learning based detection models have been created. However, malicious users can evade deep detection models by manipulating their behavior, rendering these models of little use. The vulnerability of such deep detection models against adversarial attacks is unknown. Here we create a novel adversarial attack model against deep user sequence embedding-based classification models, which use the sequence of user posts to generate user embeddings and detect malicious users. In the attack, the adversary generates a new post to fool the classifier. We propose a novel end-to-end Personalized Text Generation Attack model, called PETGEN, that simultaneously reduces the efficacy of the detection model and generates posts that have several key desirable properties. Specifically, PETGEN generates posts that are personalized to the user's writing style, have knowledge about a given target context, are aware of the user's historical posts on the target context, and encapsulate the user's recent topical interests. We conduct extensive experiments on two real-world datasets (Yelp and Wikipedia, both with ground-truth of malicious users) to show that PETGEN significantly reduces the performance of popular deep user sequence embedding-based classification models. PETGEN outperforms five attack baselines in terms of text quality and attack efficacy in both white-box and black-box classifier settings. Overall, this work paves the path towards the next generation of adversary-aware sequence classification models.
Supplemental Material
- Raghavendra Chalapathy and Sanjay Chawla. 2019. Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407 (2019).Google Scholar
- Teodora Dobrilova. 2020. https://review42.com/what-percentage-of-amazon-reviews-are-fake/. [Online; accessed 02-02-2021].Google Scholar
- Yingtong Dou, Guixiang Ma, Philip S Yu, and Sihong Xie. 2020. Robust spammer detection by nash reinforcement learning. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 924--933.Google ScholarDigital Library
- Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing Dou. 2017. Hotflip: White-box adversarial examples for text classification. ACL (2017).Google Scholar
- Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014).Google Scholar
- Jeff Heaton. 2017. Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep learning: The MIT Press, 2016, 800 pp, ISBN: 0262035618. Genetic Programming and Evolvable Machines, Vol. 19, 1--2 (2017).Google Scholar
- Clayton Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In ICWSM, Vol. 8.Google ScholarCross Ref
- Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical Reparameterization with Gumbel-Softmax. ICLR (2017).Google Scholar
- Xiaowei Jia, Sheng Li, Handong Zhao, Sungchul Kim, and Vipin Kumar. 2019. Towards robust and discriminative sequential data learning: When and how to perform adversarial training?. In SIGKDD. 1665--1673.Google Scholar
- Srijan Kumar, Justin Cheng, Jure Leskovec, and VS Subrahmanian. 2017. An army of me: Sockpuppets in online discussion communities. In WWW. 857--866.Google Scholar
- Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, and VS Subrahmanian. 2018. Rev2: Fraudulent user prediction in rating platforms. In WSDM. 333--341.Google Scholar
- Srijan Kumar, Francesca Spezzano, and V.S. Subrahmanian. 2015. VEWS: A Wikipedia Vandal Early Warning System. In SIGKDD. ACM.Google ScholarDigital Library
- Srijan Kumar, Xikun Zhang, and J. Leskovec. 2019. Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. SIGKDD (2019).Google Scholar
- Alex M Lamb, Anirudh Goyal Alias Parth Goyal, Ying Zhang, Saizheng Zhang, Aaron C Courville, and Yoshua Bengio. 2016. Professor forcing: A new algorithm for training recurrent networks. In NeuIPS. 4601--4609.Google Scholar
- Thai Le, Suhang Wang, and Dongwon Lee. 2020. Malcom: Generating malicious comments to attack neural fake news detection models. ICDM (2020).Google Scholar
- Ji Young Lee and Franck Dernoncourt. 2016. Sequential short-text classification with recurrent and convolutional neural networks. NAACL (2016).Google Scholar
- Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, and Ting Wang. 2018. Textbugger: Generating adversarial text against real-world applications. NDSS.Google Scholar
- Yang Liu and Mirella Lapata. 2019. Hierarchical Transformers for Multi-Document Summarization. In ACL.Google Scholar
- Weili Nie, Nina Narodytska, and Ankit Patel. 2018. Relgan: Relational generative adversarial networks for text generation. In ICLR.Google Scholar
- Nima Noorshams, Saurabh Verma, and Aude Hofleitner. 2020. TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook. In SIGKDD. 3128--3135.Google ScholarDigital Library
- Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In EuroS&P. IEEE, 372--387.Google Scholar
- Danish Pruthi, Bhuwan Dhingra, and Zachary C Lipton. 2019. Combating adversarial misspellings with robust word recognition. ACL (2019).Google Scholar
- Shebuti Rayana and Leman Akoglu. 2015. Collective opinion spam detection: Bridging review networks and metadata. In SIGKDD. 985--994.Google Scholar
- Gisela Redeker. 2000. Coherence and structure in text and discourse.Google Scholar
- Dino Sejdinovic, Bharath Sriperumbudur, Arthur Gretton, and Kenji Fukumizu. 2013. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. The Annals of Statistics (2013), 2263--2291.Google Scholar
- Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter, Vol. 19, 1 (2017), 22--36.Google ScholarDigital Library
- Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. 2019. Universal Adversarial Triggers for Attacking and Analyzing NLP. In EMNLP.Google Scholar
- Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In NAACL.Google Scholar
- Wei Emma Zhang, Quan Z Sheng, Ahoud Alhazmi, and Chenliang Li. 2020. Adversarial attacks on deep-learning models in natural language processing: A survey. ACM TIST, Vol. 11, 3 (2020), 1--41.Google Scholar
- Yi Zhao, Yanyan Shen, and Junjie Yao. 2019. Recurrent Neural Network for Text Classification with Hierarchical Multiscale Dense Connections.. In IJCAI.Google Scholar
Index Terms
- PETGEN: Personalized Text Generation Attack on Deep Sequence Embedding-based Classification Models
Recommendations
A survey of detection methods for XSS attacks
AbstractCross-site scripting attack (abbreviated as XSS) is an unremitting problem for the Web applications since the early 2000s. It is a code injection attack on the client-side where an attacker injects malicious payload into a vulnerable ...
Military cybersomethings
You can hardly read the news without seeing dire warnings of national security problems lurking in our computers. If it isn't some country stealing some other country's commercial secrets—just who's the victim and who's the thief varies with ...
Integrating the Escaping Technique in Preventing Cross Site Scripting in an Online Inventory System
ICISS '19: Proceedings of the 2nd International Conference on Information Science and SystemsThis paper discusses the implementation of the Escaping Technique in an Online Inventory System to prevent the Cross Site Scripting (XSS) attack. It also covers discussion about XSS described as a kind of injection attack that injects malicious scripts ...
Comments