skip to main content
10.1145/3428658.3430965acmconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
research-article

FakeNewsSetGen: a Process to Build Datasets that Support Comparison Among Fake News Detection Methods

Published: 30 November 2020 Publication History

Abstract

Due to easy access and low cost, social media online news consumption has increased significantly for the last decade. Despite their benefits, some social media allow anyone to post news with intense spreading power, which amplifies an old problem: the dissemination of Fake News. In the face of this scenario, several machine learning-based methods to automatically detect Fake News (MLFN) have been proposed. All of them require datasets to train and evaluate their detection models. Although recent MLFN were designed to consider data regarding the news propagation on social media, most of the few available datasets do not contain this kind of data. Hence, comparing the performances amid those recent MLFN and the others is restricted to a very limited number of datasets. Moreover, all existing datasets with propagation data do not contain news in Portuguese, which impairs the evaluation of the MLFN in this language. Thus, this work proposes FakeNewsSetGen, a process that builds Fake News datasets that contain news propagation data and support comparison amid the state-of-the-art MLFN. FakeNewsSetGen's software engineering process was guided to include all kind of data required by the existing MLFN. In order to illustrate FakeNewsSetGen's viability and adequacy, a case study was carried out. It encompassed the implementation of a FakeNewsSetGen prototype and the application of this prototype to create a dataset called FakeNewsSet, with news in Portuguese. Five MLFN with different kind of data requirements (two of them demanding news propagation data) were applied to FakeNewsSet and compared, demonstrating the potential use of both the proposed process and the created dataset.

References

[1]
Eduardo Bezerra. 2007. Princípios de Análise e Projeto de Sistemas com UML. Vol. 2. Elsevier, Rio de Janeiro, RJ, Brazil.
[2]
C. Buntain and J. Golbeck. 2017. Automatically Identifying Fake News in Popular Twitter Threads. In 2017 IEEE International Con on Smart Cloud (SmartCloud). IEEE, New York, NY, USA, 208--215. https://doi.org/10.1109/SmartCloud.2017.40
[3]
Sonia Castelo et al. 2019. A Topic-Agnostic Approach for Identifying Fake News Pages. In CompanionProceedings of The 2019 World Wide Web Conference (San Francisco, USA) (WWW '19). ACM, New York, NY, USA, 975--980.
[4]
Paulo Roberto Cordeiro and Vladia Pinheiro. 2019. Um Corpus de Notícias Falsas do Twitter e Verificação Automática de Rumores em Lingua Portuguesa. In STIL - Brazilian Symposium in Information and Human Language Technology. IEEE, Salvaldor, BA, Brazil, 220--228.
[5]
X. Dong et al. 2014. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. In ACM SIGKDD international Con on Knowledge discovery and data mining. ACM, New York, NY, USA, 601--610.
[6]
Mehrdad Farajtabar et al. 2017. Fake News Mitigation via Point Process Based Intervention. In Proceedings of the 34th International Con on Machine Learning (Sydney, Australia) (ICML '17). JMLR.org, Sydney, NSW, Australia, 1097--1106.
[7]
Paulo Márcio Souza Freire and Ronaldo Ribeiro Goldschmidt. 2019. Fake News Detection on Social Media via Implicit Crowd Signals. In Proceedings of the 25th Brazilian Symposium on Multimedia and the Web (Rio de Janeiro, Brazil) (WebMedia '19). ACM, New York, NY, USA, 521--524.
[8]
Paulo Márcio Souza Freire and Ronaldo Ribeiro Goldschmidt. 2019. Uma Introdução ao Combate Automático às Fake News em Redes Sociais Virtuais. In Tópicos de Gerenciamento de Dados e Informação (Rio de Janeiro, Brazil) (34th SBBD). SBC, Fortaleza, CE, Brazil, 38--67. http://http://sbbd.org.br/2019/proceedings/
[9]
S. Gilda. 2017. Evaluating machine learning algorithms for fake news detection. In 2017 IEEE 15th Student Con on Research and Development (SCOReD). IEEE, Putrajaya, Malaysia, 110--115. https://doi.org/10.1109/SCORED.2017.8305411
[10]
Jennifer Golbeck et al. 2018. Fake news vs satire: A dataset and analysis. In WebSci 2018 - Proceedings of the 10th ACM Conference on Web Science. ACM, New York, NY, USA, 17--21. https://doi.org/10.1145/3201064.3201100
[11]
Vishal Gupta et al. 2009. A survey of text mining techniques and applications. Journal of emerging technologies in web intelligence 1 (2009), 60--76.
[12]
S. Helmstetter et al. 2018. Weakly Supervised Learning for Fake News Detection on Twitter. In ACM International Conference). IEEE, Barcelona, Spain, 274--277.
[13]
Benjamin D. Horne and Sibel Adali. 2017. This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News. In Association for the Advancement of Artificial Inteligence. The 2nd International Workshop on News, New York, USA, 163--173.
[14]
Christian Janze and Marten Risius. 2017. Automatic Detection of Fake News on Social Media Platforms. In PACIS 2017. AISeL, Frankfurt, Germany, 261.
[15]
Ron Kohavi et al. 1995. A study of cross-validation and bootstrap for accuracy estimation. In Ijcai, Vol. 14. researchgate, Montreal, Canada, 1137--1145.
[16]
Y. Liu et al. 2018. Early Detection of Fake News on Social Media Through Propagation Path Classification with Recurrent and Convolutional Networks. In AAAI Con on Artificial Intelligence. AAAI, New Jersey, USA, 354--361.
[17]
Yelena Mejova and Kyriaki Kalimeri. 2020. Advertisers Jump on Coronavirus Bandwagon: Politics, News, and Business. ArXiv abs/2003.00923 (2020), 90--101.
[18]
Rafael A. Monteiro et al. 2018. Contributions to the Study of Fake News in Portuguese: New Corpus and Automatic Detection Results. In Computational Processing of the Portuguese Language. Springer International, Cham, 324--334.
[19]
Marcos Paulo Moraes, Jonice de Oliveira Sampaio, and Anderson Cordeiro Charles. 2019. Data Mining Applied in Fake News Classification through Textual Patterns. In Proceedings of the 25th Brazillian Symposium on Multimedia and the Web (RJ, Brazil) (WebMedia '19). ACM, New York, USA, 321--324.
[20]
João Moreno and Graça Bressan. 2019. FACTCK.BR: A New Dataset to Study Fake News. In Proceedings of the 25th Brazillian Symposium on Multimedia and the Web (Rio de Janeiro, Brazil) (WebMedia '19). ACM, New York, NY, USA, 525--527.
[21]
Mehwish Nasim et al. 2018. Real-time Detection of Content Polluters in Partially Observable Twitter Networks. In Companion Proceedings of the The Web Con 2018 (Lyon, France) (WWW '18). International World Wide Web Con Steering Committee, Republic and Canton of Geneva, Switzerland, 1331--1339.
[22]
Thiago Covões Pedro Faustini. 2019. Fake News Detection Using One-Class Classification. In 2019 8th Brazilian Conference on Intelligent Systems (BRACIS). IEEE, Salvador, Brazil, 592--597. https://doi.org/10.18653/v1/n19-1141
[23]
Roger Pressman and Bruce Maxim. 2016. Engenharia de Software-8a Edição. McGraw Hill Brasil, Connecticut, EUA.
[24]
V. Pérez-Rosas, B. Kleinberg, A. Lefevre, and R. Mihalcea. 2018. Automatic Detection of Fake News. In International Conference on Computational Linguistics. arXiv.org, Santa Fe, New Mexico, USA, 3391--3401.
[25]
Feng Qian et al. 2018. Neural user response generator: Fake news detection with collective user intelligence. IJCAI International Joint Conference on Artificial Intelligence 2018-July (2018), 3834--3840.
[26]
Victoria L Rubin and Niall J Conroy. 2015. Towards news verification: Deception detection methods for news discourse. In Hawaii International Conference on System Sciences. WebSci 2018 - Proceedings of the 10th ACM Conference on Web Science, Ontario, CANADA, 13 pages.
[27]
Natali Ruchansky, Sungyong Seo, and Yan Liu. 2017. CSI: A Hybrid Deep Model for Fake News Detection. In Proceedings of the 2017 ACM on Con on Information and Knowledge Management (Singapore, Singapore) (CIKM '17). ACM, New York, NY, USA, 797--806. https://doi.org/10.1145/3132847.3132877
[28]
Giovanni C Santia and Jake Ryland Williams. 2018. Buzz face: A news veracity dataset with facebook user commentary and egos. In 12th International AAAI Conference on Web and Social Media. AAAI, Pennsylvania, USA, 531--540.
[29]
Kai Shu et al. 2017. Fake News Detection on Social Media: A Data Mining Perspective. SIGKDD Explor. Newsl. 19, 1 (Sept. 2017), 22--36.
[30]
Kai Shu et al. 2020. FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media. Big Data 8, 3 (2020), 171--188. https://doi.org/10.1089/big.2020.0062 arXiv:https://doi.org/10.1089/big.2020.0062 32491943.
[31]
Kai Shu, Deepak Mahudeswaran, and Huan Liu. 2019. FakeNewsTracker: a tool for fake news collection, detection, and visualization. Computational and Mathematical Organization Theory 25, 1 (01 Mar 2019), 60--71.
[32]
Ajitesh Srivastava et al. 2018. FActCheck: Keeping Activation of Fake News at Check. In Proceedings of the 17th International Conf on Autonomous Agents and MultiAgent Systems (Stockholm, Sweden) (AAMAS '18). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 2079--2081.
[33]
Sebastian Tschiatschek et al. 2018. Fake News Detection in Social Networks via Crowd Signals. In Companion Proceedings of the The Web Con 2018 (Lyon, France) (WWW '18). International World Wide Web Con Steering Committee, Republic and Canton of Geneva, Switzerland, 517--524.
[34]
Soroush Vosoughi et al. 2017. Rumor Gauge: Predicting the Veracity of Rumors on Twitter. ACM Trans. Knowl. Discov. Data 11, 4, Article 50 (July 2017), 36 pages.
[35]
Patrick Wang et al. 2018. Is This the Era of Misinformation Yet: Combining Social Bots and Fake News to Deceive the Masses. In Companion Proceedings of the The Web Con 2018 (Lyon, France) (WWW '18). International WWW Con Steering Committee, Republic and Canton of Geneva, Switzerland, 1557--1561.
[36]
William Yang Wang. 2017. "Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Vancouver, Canada, 422--426. https://doi.org/10.18653/v1/P17-2067
[37]
Vinicius Woloszyn and Wolfgang Nejdl. 2018. DistrustRank: Spotting False News Domains. In Proceedings of the 10th ACM Con on Web Science (Amsterdam, Netherlands) (WebSci '18). ACM, New York, NY, USA, 221--228.
[38]
Liang Wu and Huan Liu. 2018. Tracing Fake-News Footprints: Characterizing Social Media Messages by How They Propagate. In Proceedings of the Eleventh ACM International Con on Web Search and Data Mining (Marina Del Rey, CA, USA) (WSDM '18). ACM, New York, NY, USA, 637--645.
[39]
Fan Yang et al. 2019. XFake: Explainable Fake News Detector with Visualizations. In The World Wide Web Conference (San Francisco, CA, USA) (WWW '19). ACM, New York, NY, USA, 3600--3604. https://doi.org/10.1145/3308558.3314119
[40]
Qiang Zhang et al. 2018. Ranking-based Method for News Stance Detection. In Proceedings The WebCon 2018 (Lyon, France) (WWW '18). World Wide Web Con Steering Committee, Rep. Canton Geneva, Switzerland, 41--42.
[41]
Xinyi Zhou and Reza Zafarani. 2018. Fake News: A Survey of Research, Detection Methods, and Opportunities. arXiv:1812.00315 pages. arXiv:1812.00315 [cs.CL]

Cited By

View all
  • (2024)Emotion detection for misinformation: A reviewInformation Fusion10.1016/j.inffus.2024.102300107(102300)Online publication date: Jul-2024
  • (2024)JEDi - a digital educational game to support student training in identifying portuguese-written fake news: Case studies in high school, undergraduate and graduate scenariosEducation and Information Technologies10.1007/s10639-023-12309-z29:10(11815-11845)Online publication date: 1-Jul-2024
  • (2022)Fake news detection based on explicit and implicit signals of a hybrid crowdExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.115414183:COnline publication date: 3-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WebMedia '20: Proceedings of the Brazilian Symposium on Multimedia and the Web
November 2020
364 pages
ISBN:9781450381963
DOI:10.1145/3428658
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • SBC: Brazilian Computer Society
  • CNPq: Conselho Nacional de Desenvolvimento Cientifico e Tecn
  • CGIBR: Comite Gestor da Internet no Brazil
  • CAPES: Brazilian Higher Education Funding Council

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dataset building process
  2. Fake News detection
  3. social media

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WebMedia '20
Sponsor:
WebMedia '20: Brazillian Symposium on Multimedia and the Web
November 30 - December 4, 2020
São Luís, Brazil

Acceptance Rates

WebMedia '20 Paper Acceptance Rate 34 of 87 submissions, 39%;
Overall Acceptance Rate 270 of 873 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Emotion detection for misinformation: A reviewInformation Fusion10.1016/j.inffus.2024.102300107(102300)Online publication date: Jul-2024
  • (2024)JEDi - a digital educational game to support student training in identifying portuguese-written fake news: Case studies in high school, undergraduate and graduate scenariosEducation and Information Technologies10.1007/s10639-023-12309-z29:10(11815-11845)Online publication date: 1-Jul-2024
  • (2022)Fake news detection based on explicit and implicit signals of a hybrid crowdExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.115414183:COnline publication date: 3-Jan-2022
  • (2021)ReVera FrameworkProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3470482.3479626(137-140)Online publication date: 5-Nov-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media