research-article

Social Media Dark Side Content Detection using Transfer Learning Emphasis on Hate and Conflict

Author:

Zewdie MossieAuthors Info & Claims

WWW '20: Companion Proceedings of the Web Conference 2020

Pages 259 - 263

https://doi.org/10.1145/3366424.3382084

Published: 20 April 2020 Publication History

Abstract

Although online content continues to grow, the prevalence of dark side content such as hate, misinformation, disinformation, conflicting, fake, and so on continues to grow and has become a problem for online and offline society. Consequently, work into automated analytical and detection methods has gained much attention. The scarcity of the labeled dataset has, however, become one of the major challenges in both machine and deep learning to develop an effective supervised learning model. As a result, most State-of-the-Art (SOTA) approaches focus on English languages for the detection of such content. The identification task of such content has become a problem due to the diversity of languages used on social media platforms. We propose transfer learning since it needs only access to a large unlabeled text available on social media platforms. Since we use data from Amharic Language, which is in the low-resource language family for machine leaarning, transfer learning is found effective. First, we prepare a topic and word embedding models using Facebook data as a task-specific and a general corpus from different web domains respectively. Second, we combine topic embedding and word embedding and then send the features to a fully-connected Recurrent Neural Networks (RNNs). Our preliminary experimental results from the newly proposed attention-based topic model combined with word embedding outperform the baselines.

References

[1]

B. S. S. Kietzmann, Jan H., Kristopher Hermkens, Ian P. McCarthy, “Social media? Get serious! Understanding the functional building blocks of social media,” Bus. Horiz., vol. 54, pp. 241–251, 2011.

[2]

C. Baccarella and T. J. K. Wagner, “Social media? It's serious! Understanding the dark side of social media,” Eur. Manag. J., vol. 36, no. 4, 2018.

[3]

F. Del Vigna, A. Cimino, F. Dell'Orletta, and M. Petrocchi, “Hate me, hate me not: Hate speech detection on Facebook,” in First Italian Conference on Cybersecurity (ITASEC17), 2017, pp. 86–95.

[4]

S. Merity, C. Xiong, J. Bradbury, and R. Socher, “Pointer Sentinel Mixture Models.” arXiv preprint arXiv, 2016.

[5]

A. Vaswani and N. Shazeer, “Attention is all you need,” in Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.

Digital Library

[6]

A. Radford, J. Wu, R. Child, and I. S. Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, “Language models are unsupervised multitask learners,” techbooky.com. OpenAI Blog, 2019.

[7]

R. S. McCann, Bryan, Nitish Shirish Keskar, Caiming Xiong, “The natural language decathlon: Multitask learning as question answering,” arXiv preprint arXiv. 2018.

[8]

B. Zoph, D. Yuret, J. May, and K. Knight, “Transfer Learning for Low-Resource Neural Machine Translation,” in In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 1568–1575.

[9]

T. C. Adams, Oliver, Adam Makarucha, Graham Neubig, Steven Bird, “Cross-lingual word embeddings for low-resource language modeling,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017, pp. 937–947.

[10]

K. D. Cotterell, Ryan, “Low-resource named entity recognition with cross-lingual, character-level neural conditional random fields,” in Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017, pp. 91–96.

[11]

J. Howard and S. Ruder, “Universal Language Model Fine-tuning for Text Classification,” arXiv, Jan. 2018.

[12]

J. Devlin, M. Chang, and K. Lee, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arxiv.org, 2019.

[13]

Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, and … D. C., “Roberta: A robustly optimized bert pretraining approach,” arxiv.org, 2019.

[14]

C. E. Moody, “Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec.” arXiv preprint arXiv, 2016.

[15]

P. Badjatiya, S. Gupta, and M. Gupta, “Deep learning for hate speech detection in tweets,” in26th International Conference on World Wide Web Companion, 2017, pp. 759–760.

Digital Library

[16]

B. Gambäck and U. K. Sikdar, “Using Convolutional Neural Networks to Classify Hate-Speech,” no. 7491, pp. 85–90, 2017.

[17]

Y. Mehdad and J. T. the, “Do characters abuse more than words?,” in Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2016, pp. 299–303.

[18]

C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang, “Abusive Language Detection in Online User Content,” Proc. 25th Int. Conf. World Wide Web - WWW ’16, pp. 145–153, 2016.

Digital Library

[19]

C. Nobata, J. Tetreault, and Y. M. Thomas, A, “Abusive language detection in online user content,” in Proceedings of the 25th international conference on world wide web, 2016, pp. 145–153.

Digital Library

[20]

D. H. Waseem, Zeerak, “Hateful symbols or hateful people? predictive features for hate speech detection on twitter,” in Proceedings of NAACL-HLT 2016, 2016, pp. 88–93.

[21]

Z. Mossie and J.-H. Wang, “Vulnerable community identification using hate speech detection on social media,” Inf. Process. Manag., p. 102087, Jul. 2019.

[22]

A. Baruah and F. Barbhuiya, “ABARUAH at SemEval-2019 Task 5: Bi-directional LSTM for Hate Speech Detection,” in Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019), 2019, pp. 371–376.

[23]

E. S. Mishra, Pushkar, Marco Del Tredici, Helen Yannakoudakis, “Author Profiling for Hate Speech Detection.” arXiv preprint arXiv, 2019.

[24]

R. Gomez, J. Gibert, L. Gomez, and D. Karatzas, “Exploring Hate Speech Detection in Multimodal Publications,”1910.03814. 2019.

[25]

F. Del Vigna, A. Cimino, F. Dell'Orletta, M. Petrocchi, and M. Tesconi, “Hate me, hate me not: Hate speech detection on Facebook,” CEUR Workshop Proc., vol. 1816, pp. 86–95, 2017.

[26]

Y. E. Alfina, Ika, Rio Mulia, Mohamad Ivan Fanany, “Hate speech detection in the indonesian language: A dataset and preliminary study,” in In 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS), 2017, pp. 233–238.

[27]

J. Risch, A. Stoll, M. Ziegele, and R. Krestel, “hpiDEDIS at GermEval 2019: Offensive Language Identification using a German BERT model.” 2019.

[28]

Zewdie Mossie and Jenq-Haur Wang, “Social Network Hate Speech Detection for Amharic Language,” in4th International Conference on Natural Language Computing (NATL 2018), 2018, pp. 41–55.

[29]

J. D. Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, “Distributed representations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems 26 (NIPS 2013), 2013, pp. 3111–3119.

Digital Library

[30]

C. M. Pennington, Jeffrey, Richard Socher, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.

[31]

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching Word Vectors with Subword Information,” Trans. Assoc. Comput. Linguist., vol. 5, pp. 135–146, Dec. 2017.

[32]

M. Pagliardini, P. Gupta, and M. Jaggi, “Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features,” Mar. 2017.

[33]

Q. Le and T. M. learning, “Distributed representations of sentences and documents,” in Proceedings of the31stInternational Conference on MachineLearning, Beijing, China, 2014.

[34]

T. Mikolov, I. Sutskever, and J. Chen, K, GS Corrado,Dean, “Distributed representations of words and phrases and their compositionality,” in In Advances in neural information processing systems, 2013, pp. 3111–3119.

[35]

M. Bansal and R. J. Passonneau, “Deep contextualized word representations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts, 2018.

[36]

K. Shu and Mahudeswaran, “Fakenewsnet: A data repository with news content, social context and dynamic information for studying fake news on social media,” in researchgate.net, 2018.

[37]

N. Ruchansky, S. Seo, and Y. Liu, “CSI: A Hybrid Deep Model for Fake News Detection,” in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management - CIKM ’17, 2017, pp. 797–806.

Digital Library

[38]

T. Davidson, D. Warmsley, and M. Macy, “Automated hate speech detection and the problem of offensive language,” in arXiv:1703.04009, 2017.

[39]

S. Fortuna, P.and Nunes, “A survey on automatic detection of hate speech in text,” ACM Comput. Surv., vol. 51, no. 4, 2018.

[40]

Z. Waseem and D. Hovy, “Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter,” Proc. NAACL Student Res. Work., pp. 88–93, 2016.

[41]

V. Badjatiya, P., Gupta, S., Gupta, M.,Varma, “Deep learning for hate speech detection in tweets,” in In Proceedings of the 26th International Conference on World Wide Web Companion, 2017, pp. 759–760.

Digital Library

[42]

M. Röder, A. Both, and A. Hinneburg, “Exploring the Space of Topic Coherence Measures,” in Proceedings of the Eighth ACM International Conference on Web Search and Data Mining - WSDM ’15, 2015, pp. 399–408.

Digital Library

[43]

M. Sokolova, N. Japkowicz, and S. Szpakowicz, “Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation,” in Australasian joint conference on artificial intelligence, 2006, pp. 1015–1021.

Digital Library

[44]

Fawcett Tom, “An introduction to ROC analysis,” Elsevier, vol. 8, no. 27, pp. 861–874, 2006.

[45]

L. L. Zhang, Ziqi, “Hate speech detection: A solved problem? The challenging case of long tail on Twitter.” Semantic Web Preprint, pp. 1–21, 2018.

[46]

M. H. Mukkamala, Mahesh Chandra, “Variants of rmsprop and adagrad with logarithmic regret bounds,” in Proceedings of the 34th International Conference on Machine Learning, 2017, vol. 70, pp. 2545–2553.

[47]

D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization.” arXiv preprint arXiv, 2014.

Cited By

Ho JJu GHong SAn JLee C(2023)Factors influencing customer satisfaction with AR shopping assistant applications in e-commerce: an empirical analysis utilizing text-mining techniquesAslib Journal of Information Management10.1108/AJIM-03-2023-008977:2(239-259)Online publication date: 1-Nov-2023
https://doi.org/10.1108/AJIM-03-2023-0089
Raja ESoni BBorgohain S(2023)Fake news detection in Dravidian languages using transfer learning with adaptive finetuningEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106877126:PAOnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.engappai.2023.106877
Elsafoury FKatsigiannis SPervez ZRamzan N(2021)When the Timeline Meets the Pipeline: A Survey on Automated Cyberbullying DetectionIEEE Access10.1109/ACCESS.2021.30989799(103541-103563)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3098979

Index Terms

Social Media Dark Side Content Detection using Transfer Learning Emphasis on Hate and Conflict

Index terms have been assigned to the content through auto-classification.

Recommendations

Spread of Hate Speech in Online Social Media
WebSci '19: Proceedings of the 10th ACM Conference on Web Science

Hate speech is considered to be one of the major issues currently plaguing the online social media. With online hate speech culminating in gruesome scenarios like the Rohingya genocide in Myanmar, anti-Muslim mob violence in Sri Lanka, and the ...
A Measurement Study of Hate Speech in Social Media
HT '17: Proceedings of the 28th ACM Conference on Hypertext and Social Media

Social media platforms provide an inexpensive communication medium that allows anyone to quickly reach millions of users. Consequently, in these platforms anyone can publish content and anyone interested in the content can obtain it, representing a ...
Hate speech detection on Twitter using transfer learning
Highlights
- The results show that using transfer learning with BERT architecture gives best results on our dataset.
Abstract
Social Media has become an ultimate driver of social change in the global society. Implications of the events, that take place in one corner of the word, reverberate across the globe in various geographies. This is so because the huge ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Companion Proceedings of the Web Conference 2020

April 2020

854 pages

ISBN:9781450370240

DOI:10.1145/3366424

Editors:
Amal El Fallah Seghrouchni
Sorbonne University, France
,
Gita Sukthankar
University of Central Florida, United States
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
469
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)3

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ho JJu GHong SAn JLee C(2023)Factors influencing customer satisfaction with AR shopping assistant applications in e-commerce: an empirical analysis utilizing text-mining techniquesAslib Journal of Information Management10.1108/AJIM-03-2023-008977:2(239-259)Online publication date: 1-Nov-2023
https://doi.org/10.1108/AJIM-03-2023-0089
Raja ESoni BBorgohain S(2023)Fake news detection in Dravidian languages using transfer learning with adaptive finetuningEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106877126:PAOnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.engappai.2023.106877
Elsafoury FKatsigiannis SPervez ZRamzan N(2021)When the Timeline Meets the Pipeline: A Survey on Automated Cyberbullying DetectionIEEE Access10.1109/ACCESS.2021.30989799(103541-103563)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3098979

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten