Mining online political opinion surveys for suspect entries: An interdisciplinary comparison

https://doi.org/10.1016/j.jides.2016.11.003Get rights and content
Under a Creative Commons license
open access

Abstract

Filtering data generated by so-called Voting Advice Applications (VAAs) in order to remove entries that exhibit unrealistic behavior (i.e., cannot correspond to a real political view) is of primary importance. If such entries are significantly present in VAA generated datasets, they can render conclusions drawn from VAA data analysis invalid. In this work we investigate approaches that can be used for automating the process of identifying entries that appear to be suspicious in terms of a users’ answer patterns. We utilize two unsupervised data mining techniques and compare their performance against a well established psychometric approach. Our results suggest that the performance of data mining approaches is comparable to those drawing on psychometric theory with a fraction of the complexity. More specifically, our simulations show that data mining techniques as well as psychometric approaches can be used to identify truly ‘rogue’ data (i.e., completely random data injected into the dataset under investigation). However, when analysing real datasets the performance of all approaches dropped considerably. This suggests that ‘suspect’ entries are neither random nor clustered. This finding poses some limitations on the use of unsupervised techniques, suggesting that the latter can only complement rather than substitute existing methods to identifying suspicious entries.

Keywords

Voting advice applications
Data cleaning
Machine learning
Data mining
Anomaly detection
Psychometric Likert scale

Cited by (0)

Peer review under responsibility of Qassim University.