An online and highly-scalable streaming platform for filtering trolls with transfer learning

Lai, Chun-Ming; Chang, Ting-Wei; Yang, Chao-Tung

doi:10.1007/s11227-023-05312-1

An online and highly-scalable streaming platform for filtering trolls with transfer learning

Published: 29 April 2023

Volume 79, pages 16664–16687, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Chun-Ming Lai¹,
Ting-Wei Chang¹ &
Chao-Tung Yang^1,2

243 Accesses
Explore all metrics

Abstract

The internet has reached a mature stage of development, and Online Social Media (OSM) platforms such as Twitter and Facebook have become vital channels for public communication and discussion on matters of public interest. However, these platforms are often plagued by improper statements or content, propagated by anonymous users and trolls, which negatively impact both the platforms and their users. Existing methods for dealing with inappropriate information rely on (semi)-manual offline assessments, which do not fully account for the streaming nature of OSM feeds. In this paper, we implement a robust and decoupled system that considers social media data as streaming data. With a publisher and consumer model, our system can process more than 179 MB of data per second with only 166.3 ms latency using Apache Kafka. Accordingly, we deploy a well-trained transfer learning model to classify incoming data streams, with an accuracy of 0.836. Our proposed architecture has the potential to assist online communities in developing more constructive and flawless OSM platforms. We believe that our contribution will help address the challenges associated with improper content on OSM platforms and pave the way for the development of more effective and efficient solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Insight4News: Connecting News to Relevant Social Conversations

Social Media and Clickstream Analysis in Turkish News with Apache Spark

SocialEcho: A Social Networking Platform with Community Guidelines Violation Pre-check

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

Notes

https://www.taipeitimes.com/News/taiwan/archives/2015/04/23/2003616602.

References

Rosa H, Pereira N, Ribeiro R, Ferreira P, Carvalho J, Oliveira S, Coheur L, Paulino P, Simão A, Trancoso I (2019) Automatic cyberbullying detection: a systematic review. Comput Hum Behav 93:333–345
Article Google Scholar
2021 Online Social Anxiety and Cyberbullying Experiences among Children in Taiwan Survey https://www.children.org.tw/english/news_detail/bully2021
Hinduja S, Patchin J (2019) Connecting adolescent suicide to the severity of bullying and cyberbullying. J Sch Violence 18:333–346
Article Google Scholar
Sawhney R, Agarwal S, Neerkaje A, Aletras N, Nakov P, Flek L (2022) Towards suicide ideation detection through online conversational context. In: Proceedings Of The 45th International ACM SIGIR Conference On Research And Development In Information Retrieval. pp 1716-1727
Hossain E, Sharif O, Hoque M (2021) NLP-CUET@DravidianLangTech-EACL2021: investigating visual and textual features to identify trolls from multimodal social media memes. In: Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages. pp 300-306 (2021,4), https://aclanthology.org/2021.dravidianlangtech-1.43
Stewart L, Arif A, Starbird K (2018) Examining trolls and polarization with a retweet network. In: Proc ACM WSDM, Workshop On Misinformation And Misbehavior Mining on the Web. 70
Ali R, Farooq U, Arshad U, Shahzad W, Beg MO (2022) Hate speech detection on twitter using transfer learning. Comput Speech Lang 74:101365
Article Google Scholar
Kumar DA, Chinnalagu A (2020) Sentiment and emotion in social media covid-19 conversations: Sab-lstm approach. In: 2020 9th International Conference System Modeling and Advancement in Research Trends (SMART), pages 463-467
Devlin J, Chang M, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Mendhe C, Henderson N, Srivastava G, Mago V (2020) A scalable platform to collect, store, visualize, and analyze big data in real time. IEEE Trans Comput Soc Syst 8:260–269
Article Google Scholar
Alothali E, Alashwal H, Salih M, Hayawi K (2021) Real time detection of social bots on Twitter using machine learning and Apache Kafka. In: 2021 5th Cyber Security In Networking Conference (CSNet). pp 98-102
Lai CM, Chen MH, Kristiani E, Verma VK, Yang CT (2022) Fake news classification based on content level features. Appl Sci 12(3):1116
Article Google Scholar
Fathoni H, Yen HY, Yang CT, Huang CY, Kristiani E (2021) A container-based of edge device monitoring on kubernetes. In: Chang JW, Yen NL, Hung JC (eds) Frontier Computing. Springer, Singapore, pp 231–237
Chapter Google Scholar
Dewi L, Noertjahyana A, Palit H, Yedutun K (2019) Server scalability using kubernetes. In: 2019 4th Technology Innovation Management and Engineering Science International Conference (TIMES-iCON). pp 1-4
Hugo A, Morin B, Svantorp K (2020) Bridging mqtt and kafka to support c-its: a feasibility study. In: 2020 21st IEEE International Conference on Mobile Data Management (MDM), pages 371-376
van Dongen G, Van Den Poel D (2021) A performance analysis of fault recovery in stream processing frameworks. IEEE Access 9:93745–93763
Article Google Scholar
Wu H, Shang Z, Wolter K (2020) Learning to reliably deliver streaming data with apache kafka. In: 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 564-571
Wu H, Shang Z, Peng G, Wolter K (2020) A reactive batching strategy of apache kafka for reliable stream processing in real-time. In: 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), pp 207-217
Xiao J, Zhou Z (2020) Research progress of RNN language model. In: 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA). pp 1285-1288
Eker A, Eker K, Duru N (2021) Multi-Class Sentiment Analysis from Turkish Tweets with RNN. In: 2021 6th International Conference on Computer Science and Engineering (UBMK). pp 560-564
Saha D, Das A, Nath TC, Saha S, Das R (2022) Detection of Fake News and Rumors in Social Media Using Machine Learning Techniques With Semantic Attributes. In: Convergence Of Deep Learning In Cyber-IoT Systems And Security. pp 85
Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. Proc Int AAAI Conf Web Soc Media 11:512–515
Article Google Scholar
Waseem Z, Hovy D (2016) Hateful symbols or hateful people? Predictive features for hate speech detection on twitter. In: Proceedings Of The NAACL Student Research Workshop. pp 88-93
De Gibert O, Perez N, Garcia-Pablos A, Cuadros M (2018) Hate speech dataset from a white supremacy forum. arXiv:1809.04444
Cresci S (2020) A decade of social bot detection. Commun ACM 63:72–83
Article Google Scholar
Qian J, ElSherief M, Belding E, Wang W (2018) Leveraging intra-user and inter-user representation learning for automated hate speech detection. arXiv:1804.03124
Alothali E, Alashwal H, Salih M, Hayawi K (2021) Real time detection of social bots on twitter using machine learning and apache kafka. In: 2021 5th Cyber Security in Networking Conference (CSNet), pp 98-102
Fimoza D, Amalia A, Harumy TH (2021) Sentiment analysis for movie review in bahasa indonesia using bert. In: 2021 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA), pp 27-34
Ksieniewicz P, Zyblewski P, Choraś M, Kozik R, Giełczyk A, Woźniak M (2020) Fake news detection from data streams. In: 2020 International Joint Conference On Neural Networks (IJCNN). pp 1-8
Roy P, Tripathy A, Das T, Gao X (2020) A framework for hate speech detection using deep convolutional neural network. IEEE Access 8:204951–204962
Article Google Scholar
Fimoza D, Amalia A, Harumy T (2021) Sentiment analysis for movie review in Bahasa Indonesia using BERT. In: 2021 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA). pp 27-34
Jiang Z, Di Troia F, Stamp M (2021) Sentiment analysis for troll detection on Weibo. In: Malware Analysis Using Artificial Intelligence and Deep Learning. pp 555-579
Del Vigna12 F, Cimino23 A, Dell’Orletta F, Petrocchi M, Tesconi M (2017) Hate me, hate me not: Hate speech detection on facebook. In: Proceedings of the First Italian Conference on Cybersecurity (ITASEC17). pp 86-95
Wagh R, Punde P (2018) Survey on sentiment analysis using twitter dataset. In: 2018 Second International Conference on Electronics, Communication And Aerospace Technology (ICECA). pp 208-211

Download references

Funding

This work was partially supported by the National Science and Technology Council (NSTC), Taiwan (R.O.C.), under Grants Number 111-2622-E-029-003-, 111-2811-E-029-001-, 111-2621-M-029- 004-, and 110-2222-E-029-001-

Author information

Authors and Affiliations

Department of Computer Science, Tunghai University, No. 1727, Sec.4, Taiwan Blvd., Taichung, 407224, Taiwan R.O.C.
Chun-Ming Lai, Ting-Wei Chang & Chao-Tung Yang
Research Center for Smart Sustainable Circular Economy, Tunghai University, No. 1727, Sec.4, Taiwan Blvd., Taichung, 407224, Taiwan R.O.C.
Chao-Tung Yang

Authors

Chun-Ming Lai
View author publications
You can also search for this author in PubMed Google Scholar
Ting-Wei Chang
View author publications
You can also search for this author in PubMed Google Scholar
Chao-Tung Yang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

CM Lai conceived research design, structuring the paper and interpreting the findings. TW Chang collected the data, conducted experiment evaluation and wrote a part of the manuscript. CT Yang supervised the research and reviewed the manuscript.

Corresponding author

Correspondence to Chao-Tung Yang.

Ethics declarations

Ethical approval

Not applicable.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lai, CM., Chang, TW. & Yang, CT. An online and highly-scalable streaming platform for filtering trolls with transfer learning. J Supercomput 79, 16664–16687 (2023). https://doi.org/10.1007/s11227-023-05312-1

Download citation

Accepted: 14 April 2023
Published: 29 April 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s11227-023-05312-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An online and highly-scalable streaming platform for filtering trolls with transfer learning

Abstract

Access this article

Similar content being viewed by others

Insight4News: Connecting News to Relevant Social Conversations

Social Media and Clickstream Analysis in Turkish News with Apache Spark

SocialEcho: A Social Networking Platform with Community Guidelines Violation Pre-check

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An online and highly-scalable streaming platform for filtering trolls with transfer learning

Abstract

Access this article

Similar content being viewed by others

Insight4News: Connecting News to Relevant Social Conversations

Social Media and Clickstream Analysis in Turkish News with Apache Spark

SocialEcho: A Social Networking Platform with Community Guidelines Violation Pre-check

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation