skip to main content
10.1145/3201064.3201083acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

DistrustRank: Spotting False News Domains

Published: 15 May 2018 Publication History

Abstract

In this paper we propose a semi-supervised learning strategy to automatically separate fake News from reliable News sources: DistrustRank. We first select a small set of unreliable News, manually evaluated and classified by experts on fact checking portals. Once this set is created, DistrustRank constructs a weighted graph where nodes represent websites, connected by edges based on a minimum similarity between a pair of websites. Next it computes the centrality using a biased PageRank, where a bias is applied to the selected set of seeds. As an output of the proposed model we obtain a trust (or distrust) rank that can be used in two ways: a) as a counter-bias to be applied when News about a specific subject is ranked, in order to discount possible boosts achieved by false claims; and b) to assist humans to identify sources that are likely to be source of fake News (or that are likely to be reputable), suggesting websites that should be examined more closely or to be avoided. In our experiments, DistrustRank outperforms the supervised approaches in either ranking and classification task.

References

[1]
Eda Baykan, Monika Henzinger, and Ingmar Weber . 2013. A comprehensive study of techniques for URL-based web page language classification. ACM Transactions on the Web (TWEB) Vol. 7, 1 (2013), 3.
[2]
Shiri Dori-Hacohen and James Allan . 2013. Detecting controversy on the web. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 1845--1848.
[3]
Juan Echeverria and Shi Zhou . 2017. Discovery, Retrieval, and Analysis of the'Star Wars' Botnet in Twitter Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. ACM, 1--8.
[4]
Arthur C Graesser, Danielle S McNamara, and Jonna M Kulikowich . 2011. Coh-Metrix: Providing multilevel analyses of text characteristics. Educational researcher Vol. 40, 5 (2011), 223--234.
[5]
Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi . 2013. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In Proceedings of the 22nd international conference on World Wide Web. ACM, 729--736.
[6]
Zoltán Gyöngyi, Hector Garcia-Molina, and Jan Pedersen . 2004. Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 576--587.
[7]
Benjamin D Horne and Sibel Adali . 2017. This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. arXiv preprint arXiv:1703.09398 (2017).
[8]
Srijan Kumar, Robert West, and Jure Leskovec . 2016. Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 591--602.
[9]
Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava . 2012. Truth finding on the deep web: Is the problem solved? Proceedings of the VLDB Endowment, Vol. Vol. 6. VLDB Endowment, 97--108.
[10]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean . 2013. Distributed representations of words and phrases and their compositionality Advances in neural information processing systems. 3111--3119.
[11]
Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu . 2013. Using of Jaccard coefficient for keywords similarity Proceedings of the International MultiConference of Engineers and Computer Scientists, Vol. Vol. 1.
[12]
Michael J Paul, ChengXiang Zhai, and Roxana Girju . 2010. Summarizing contrastive viewpoints in opinionated text Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 66--76.
[13]
Kashyap Popat, Subhabrata Mukherjee, Jannik Strötgen, and Gerhard Weikum . 2016. Credibility assessment of textual claims on the web Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2173--2178.
[14]
Pujari Rajkumar, Swara Desai, Niloy Ganguly, and Pawan Goyal . 2014. A novel two-stage framework for extracting opinionated sentences from news articles. In Proceedings of TextGraphs-9: the workshop on Graph-based Methods for Natural Language Processing. 25--33.
[15]
Shafiza Mohd Shariff, Xiuzhen Zhang, and Mark Sanderson . 2017. On the credibility perception of news on Twitter: Readers, topics and features. Computers in Human Behavior Vol. 75 (2017), 785--796.
[16]
Tarcisio Souza, Elena Demidova, Thomas Risse, Helge Holzmann, Gerhard Gossen, and Julian Szymanski . 2015. Semantic URL Analytics to support efficient annotation of large scale web archives. In Semanitic Keyword-based Search on Structured Data Sources. Springer, 153--166.
[17]
Gabriel Stanovsky, Judith Eckle-Kohler, Yevgeniy Puzikov, Ido Dagan, and Iryna Gurevych . 2017. Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. Vol. 2. 352--357.
[18]
Frank Wilcoxon, SK Katti, and Roberta A Wilcox . 1970. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected tables in mathematical statistics Vol. 1 (1970), 171--259.
[19]
Vinicius Woloszyn, Henrique DP dos Santos, Leandro Krug Wives, and Karin Becker . 2017. Mrr: an unsupervised algorithm to rank reviews by relevance Proceedings of the International Conference on Web Intelligence. ACM, 877--883.
[20]
Hong Yu and Vasileios Hatzivassiloglou . 2003. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics, 129--136.

Cited By

View all
  • (2022)Fake news detection based on explicit and implicit signals of a hybrid crowdExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.115414183:COnline publication date: 3-Jan-2022
  • (2021)Towards a Novel Benchmark for Automatic Generation of ClaimReview MarkupProceedings of the 13th ACM Web Science Conference 202110.1145/3447535.3462640(29-35)Online publication date: 21-Jun-2021
  • (2021)Big Data Science Over the Past WebThe Past Web10.1007/978-3-030-63291-5_21(271-282)Online publication date: 1-Jul-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WebSci '18: Proceedings of the 10th ACM Conference on Web Science
May 2018
399 pages
ISBN:9781450355636
DOI:10.1145/3201064
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. credibility analysis
  2. rumor detection
  3. text mining

Qualifiers

  • Research-article

Conference

WebSci '18
Sponsor:
WebSci '18: 10th ACM Conference on Web Science
May 27 - 30, 2018
Amsterdam, Netherlands

Acceptance Rates

WebSci '18 Paper Acceptance Rate 30 of 113 submissions, 27%;
Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Fake news detection based on explicit and implicit signals of a hybrid crowdExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.115414183:COnline publication date: 3-Jan-2022
  • (2021)Towards a Novel Benchmark for Automatic Generation of ClaimReview MarkupProceedings of the 13th ACM Web Science Conference 202110.1145/3447535.3462640(29-35)Online publication date: 21-Jun-2021
  • (2021)Big Data Science Over the Past WebThe Past Web10.1007/978-3-030-63291-5_21(271-282)Online publication date: 1-Jul-2021
  • (2020)A systematic literature review on disinformation: Toward a unified taxonomical frameworkNew Media & Society10.1177/146144482095929623:5(1301-1326)Online publication date: 20-Sep-2020
  • (2020)FakeNewsSetGenProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3428658.3430965(241-248)Online publication date: 30-Nov-2020
  • (2020)Intelligent Fake News Detection: A Systematic MappingJournal of Applied Security Research10.1080/19361610.2020.176122416:2(168-189)Online publication date: 14-May-2020
  • (2018)When, Where, Who, What or Why? A Hybrid Model to Question Answering SystemsComputational Processing of the Portuguese Language10.1007/978-3-319-99722-3_14(136-146)Online publication date: 26-Aug-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media