Snooping Wikipedia vandals with MapReduce | IEEE Conference Publication | IEEE Xplore

Snooping Wikipedia vandals with MapReduce


Abstract:

In this paper, we present and validate an algorithm able to accurately identify anomalous behaviors on online and collaborative social networks, based on their interactio...Show More

Abstract:

In this paper, we present and validate an algorithm able to accurately identify anomalous behaviors on online and collaborative social networks, based on their interaction with other fellows. We focus on Wikipedia, where accurate ground truth for the classification of vandals can be reliably gathered by manual inspection of the page edit history. We develop a distributed crawler and classifier tasks, both implemented in MapReduce, with whom we are able to explore a very large dataset, consisting of over 5 millions articles collaboratively edited by 14 millions authors, resulting in over 8 billion pairwise interactions. We represent Wikipedia as a signed network, where positive arcs imply constructive interaction between editors. We then isolate a set of high reputation editors (i.e., nodes having many positive incoming links) and classify the remaining ones based on their interactions with high reputation editors. We demonstrate our approach not only to be practically relevant (due to the size of our dataset), but also feasible (as it requires few MapReduce iteration) and accurate (over 95% true positive rate). At the same time, we are able to classify only about half of the dataset editors (recall of 50%) for which we outline some solution under study.
Date of Conference: 08-12 June 2015
Date Added to IEEE Xplore: 10 September 2015
ISBN Information:

ISSN Information:

Conference Location: London, UK

Contact IEEE to Subscribe

References

References is not available for this document.