research-article

DistrustRank: Spotting False News Domains

Authors:

Vinicius Woloszyn,

Wolfgang NejdlAuthors Info & Claims

WebSci '18: Proceedings of the 10th ACM Conference on Web Science

Pages 221 - 228

https://doi.org/10.1145/3201064.3201083

Published: 15 May 2018 Publication History

Abstract

In this paper we propose a semi-supervised learning strategy to automatically separate fake News from reliable News sources: DistrustRank. We first select a small set of unreliable News, manually evaluated and classified by experts on fact checking portals. Once this set is created, DistrustRank constructs a weighted graph where nodes represent websites, connected by edges based on a minimum similarity between a pair of websites. Next it computes the centrality using a biased PageRank, where a bias is applied to the selected set of seeds. As an output of the proposed model we obtain a trust (or distrust) rank that can be used in two ways: a) as a counter-bias to be applied when News about a specific subject is ranked, in order to discount possible boosts achieved by false claims; and b) to assist humans to identify sources that are likely to be source of fake News (or that are likely to be reputable), suggesting websites that should be examined more closely or to be avoided. In our experiments, DistrustRank outperforms the supervised approaches in either ranking and classification task.

References

[1]

Eda Baykan, Monika Henzinger, and Ingmar Weber . 2013. A comprehensive study of techniques for URL-based web page language classification. ACM Transactions on the Web (TWEB) Vol. 7, 1 (2013), 3.

Digital Library

[2]

Shiri Dori-Hacohen and James Allan . 2013. Detecting controversy on the web. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management. ACM, 1845--1848.

Digital Library

[3]

Juan Echeverria and Shi Zhou . 2017. Discovery, Retrieval, and Analysis of the'Star Wars' Botnet in Twitter Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. ACM, 1--8.

Digital Library

[4]

Arthur C Graesser, Danielle S McNamara, and Jonna M Kulikowich . 2011. Coh-Metrix: Providing multilevel analyses of text characteristics. Educational researcher Vol. 40, 5 (2011), 223--234.

[5]

Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi . 2013. Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In Proceedings of the 22nd international conference on World Wide Web. ACM, 729--736.

Digital Library

[6]

Zoltán Gyöngyi, Hector Garcia-Molina, and Jan Pedersen . 2004. Combating web spam with trustrank. In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 576--587.

Digital Library

[7]

Benjamin D Horne and Sibel Adali . 2017. This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. arXiv preprint arXiv:1703.09398 (2017).

[8]

Srijan Kumar, Robert West, and Jure Leskovec . 2016. Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 591--602.

Digital Library

[9]

Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava . 2012. Truth finding on the deep web: Is the problem solved? Proceedings of the VLDB Endowment, Vol. Vol. 6. VLDB Endowment, 97--108.

Digital Library

[10]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean . 2013. Distributed representations of words and phrases and their compositionality Advances in neural information processing systems. 3111--3119.

Digital Library

[11]

Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu . 2013. Using of Jaccard coefficient for keywords similarity Proceedings of the International MultiConference of Engineers and Computer Scientists, Vol. Vol. 1.

[12]

Michael J Paul, ChengXiang Zhai, and Roxana Girju . 2010. Summarizing contrastive viewpoints in opinionated text Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 66--76.

Digital Library

[13]

Kashyap Popat, Subhabrata Mukherjee, Jannik Strötgen, and Gerhard Weikum . 2016. Credibility assessment of textual claims on the web Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 2173--2178.

Digital Library

[14]

Pujari Rajkumar, Swara Desai, Niloy Ganguly, and Pawan Goyal . 2014. A novel two-stage framework for extracting opinionated sentences from news articles. In Proceedings of TextGraphs-9: the workshop on Graph-based Methods for Natural Language Processing. 25--33.

[15]

Shafiza Mohd Shariff, Xiuzhen Zhang, and Mark Sanderson . 2017. On the credibility perception of news on Twitter: Readers, topics and features. Computers in Human Behavior Vol. 75 (2017), 785--796.

Digital Library

[16]

Tarcisio Souza, Elena Demidova, Thomas Risse, Helge Holzmann, Gerhard Gossen, and Julian Szymanski . 2015. Semantic URL Analytics to support efficient annotation of large scale web archives. In Semanitic Keyword-based Search on Structured Data Sources. Springer, 153--166.

Digital Library

[17]

Gabriel Stanovsky, Judith Eckle-Kohler, Yevgeniy Puzikov, Ido Dagan, and Iryna Gurevych . 2017. Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Vol. Vol. 2. 352--357.

[18]

Frank Wilcoxon, SK Katti, and Roberta A Wilcox . 1970. Critical values and probability levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test. Selected tables in mathematical statistics Vol. 1 (1970), 171--259.

[19]

Vinicius Woloszyn, Henrique DP dos Santos, Leandro Krug Wives, and Karin Becker . 2017. Mrr: an unsupervised algorithm to rank reviews by relevance Proceedings of the International Conference on Web Intelligence. ACM, 877--883.

Digital Library

[20]

Hong Yu and Vasileios Hatzivassiloglou . 2003. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics, 129--136.

Digital Library

Cited By

Souza Freire PMatias da Silva FGoldschmidt R(2022)Fake news detection based on explicit and implicit signals of a hybrid crowdExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.115414183:COnline publication date: 3-Jan-2022
https://dl.acm.org/doi/10.1016/j.eswa.2021.115414
Woloszyn VCortes EAmantea RSchmitt VBarone DMöller S(2021)Towards a Novel Benchmark for Automatic Generation of ClaimReview MarkupProceedings of the 13th ACM Web Science Conference 202110.1145/3447535.3462640(29-35)Online publication date: 21-Jun-2021
https://dl.acm.org/doi/10.1145/3447535.3462640
Costa MMasanès J(2021)Big Data Science Over the Past WebThe Past Web10.1007/978-3-030-63291-5_21(271-282)Online publication date: 1-Jul-2021
https://doi.org/10.1007/978-3-030-63291-5_21
Show More Cited By

Index Terms

DistrustRank: Spotting False News Domains
1. Information systems
  1. World Wide Web
    1. Web searching and information discovery
      1. Content ranking

Recommendations

VRoC: Variational Autoencoder-aided Multi-task Rumor Classifier Based on Text
WWW '20: Proceedings of The Web Conference 2020

Social media became popular and percolated almost all aspects of our daily lives. While online posting proves very convenient for individual users, it also fosters fast-spreading of various rumors. The rapid and wide percolation of rumors can cause ...
Automatic detection of rumor on Sina Weibo
MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics

The problem of gauging information credibility on social networks has received considerable attention in recent years. Most previous work has chosen Twitter, the world's largest micro-blogging platform, as the premise of research. In this work, we shift ...
A debiased self-training framework with graph self-supervised pre-training aided for semi-supervised rumor detection
Abstract
Existing rumor detection models have achieved remarkable performance in fully-supervised settings. However, it is time-consuming and labor-intensive to obtain extensive labeled rumor data. To mitigate the reliance on labeled data, semi-supervised ...
Highlights
- A self-training framework for semi-supervised rumor detection is proposed.
- Graph self-supervised pre-training is employed to alleviate confirmation bias.
- Self-adaptive thresholds are designed to generate reliable pseudo-labels.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WebSci '18: Proceedings of the 10th ACM Conference on Web Science

May 2018

399 pages

ISBN:9781450355636

DOI:10.1145/3201064

General Chairs:
Hans Akkermans
Vrije Universiteit Amsterdam, The Netherlands
,
Kathy Fontaine
Rensselaer Polytechnic Institute, USA
,
Ivar Vermeulen
Vrije Universiteit Amsterdam, The Netherlands
,
Program Chairs:
Geert-Jan Houben
TU Delft, The Netherlands
,
Matthew S. Weber
Rutgers University, New Jersey, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 May 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WebSci '18

Sponsor:

SIGWEB

WebSci '18: 10th ACM Conference on Web Science

May 27 - 30, 2018

Amsterdam, Netherlands

Acceptance Rates

WebSci '18 Paper Acceptance Rate 30 of 113 submissions, 27%;

Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
285
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Souza Freire PMatias da Silva FGoldschmidt R(2022)Fake news detection based on explicit and implicit signals of a hybrid crowdExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.115414183:COnline publication date: 3-Jan-2022
https://dl.acm.org/doi/10.1016/j.eswa.2021.115414
Woloszyn VCortes EAmantea RSchmitt VBarone DMöller S(2021)Towards a Novel Benchmark for Automatic Generation of ClaimReview MarkupProceedings of the 13th ACM Web Science Conference 202110.1145/3447535.3462640(29-35)Online publication date: 21-Jun-2021
https://dl.acm.org/doi/10.1145/3447535.3462640
Costa MMasanès J(2021)Big Data Science Over the Past WebThe Past Web10.1007/978-3-030-63291-5_21(271-282)Online publication date: 1-Jul-2021
https://doi.org/10.1007/978-3-030-63291-5_21
Kapantai EChristopoulou ABerberidis CPeristeras V(2020)A systematic literature review on disinformation: Toward a unified taxonomical frameworkNew Media & Society10.1177/146144482095929623:5(1301-1326)Online publication date: 20-Sep-2020
https://doi.org/10.1177/1461444820959296
da Silva FFreire Pde Souza Mde A. B. Plenamente GGoldschmidt Rde Salles Soares Neto C(2020)FakeNewsSetGenProceedings of the Brazilian Symposium on Multimedia and the Web10.1145/3428658.3430965(241-248)Online publication date: 30-Nov-2020
https://dl.acm.org/doi/10.1145/3428658.3430965
Meneses Silva CSilva Fontes RColaço Júnior M(2020)Intelligent Fake News Detection: A Systematic MappingJournal of Applied Security Research10.1080/19361610.2020.176122416:2(168-189)Online publication date: 14-May-2020
https://doi.org/10.1080/19361610.2020.1761224
Cortes EWoloszyn VBarone D(2018)When, Where, Who, What or Why? A Hybrid Model to Question Answering SystemsComputational Processing of the Portuguese Language10.1007/978-3-319-99722-3_14(136-146)Online publication date: 26-Aug-2018
https://doi.org/10.1007/978-3-319-99722-3_14

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten