Loading [a11y]/accessibility-menu.js
Debiased Learning of Self-Labeled Twitter Data for User Demographic Prediction | IEEE Conference Publication | IEEE Xplore

Debiased Learning of Self-Labeled Twitter Data for User Demographic Prediction


Abstract:

Labeling sufficient data for supervised learning remains an open challenge in social network analysis. An alternative is to collect self-labeled data, i.e. the data label...Show More

Abstract:

Labeling sufficient data for supervised learning remains an open challenge in social network analysis. An alternative is to collect self-labeled data, i.e. the data labeled by their owners. Emmery et al show that standard models can be trained and perform well on self-labeled data, suggesting the effectiveness of this approach. In this paper, we argue self-labeled data may not be representative of the population. Taking Twitter demographic prediction as an example, we show the popular FastText model standardly trained on self-labeled data does not generalize well on random testing samples. We then present a new learner DeFastText that aims to correct data bias using the kernel means matching technique. In experiment, we show it achieves lower generalization errors than FastText. This research raises an attention of the data bias problem when learning from self-labeled data in social network analysis.
Date of Conference: 17-20 December 2022
Date Added to IEEE Xplore: 26 January 2023
ISBN Information:
Conference Location: Osaka, Japan

Contact IEEE to Subscribe

References

References is not available for this document.