Adversarial Active Learning in the Presence of Weak and Malicious Oracles

Zhou, Yan; Kantarcioglu, Murat; Xi, Bowei

doi:10.1007/978-3-030-26142-9_8

Yan Zhou¹⁰,
Murat Kantarcioglu¹⁰ &
Bowei Xi¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11607))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

811 Accesses
1 Citations

Abstract

We present a robust active learning technique for situations where there are weak and adversarial oracles. Our work falls under the general umbrella of active learning in which training data is insufficient and oracles are queried to supply labels for the most informative samples to expand the training set. On top of that, we consider problems where a large percentage of oracles may be strategically lying, as in adversarial settings. We present an adversarial active learning technique that explores the duality between oracle modeling and data modeling. We demonstrate on real datasets that our adversarial active learning technique is superior to not only the heuristic majority-voting technique but one of the state-of-the-art adversarial crowdsourcing technique—Generative model of Labels, Abilities, and Difficulties (GLAD), when genuine oracles are outnumbered by weak oracles and malicious oracles, and even in the extreme cases where all the oracles are either weak or malicious. To put our technique under more rigorous tests, we compare our adversarial active learner to the ideal active learner that always receives correct labels. We demonstrate that our technique is as effective as the ideal active learner when only one third of the oracles are genuine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://s3.amazonaws.com/mplabsites/Sites/OptimalLabelingRelease1.0.3.tar.gz.

References

Balcan, M., Beygelzimer, A., Langford, J.: Agnostic active learning. In: ICML, pp. 65–72 (2006)
Google Scholar
Beygelzimer, A., Langford, J., Tong, Z., Hsu, D.J.: Agnostic active learning without constraints. In: Advances in Neural Information Processing Systems, vol. 23, pp. 199–207. Curran Associates, Inc. (2010)
Google Scholar
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM 2017 (2017)
Google Scholar
French, S.: Group consensus probability distributions: a critical survey. Bayesian Stat. 2, 183–202 (1985)
MathSciNet MATH Google Scholar
Jagabathula, S., Subramanian, L., Venkataraman, A.: Reputation-based worker filtering in crowdsourcing. In: NIPS, pp. 2492–2500 (2014)
Google Scholar
LIBSVM: LIBSVM Data: Classification, Regression, and Multi-label (2014). http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Liu, Q., Peng, J., Ihler, A.T.: Variational inference for crowdsourcing. In: Advances In Neural Information Processing Systems, pp. 692–700 (2012)
Google Scholar
Ma, F., et al.: Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21th ACM SIGKDD, pp. 745–754 (2015)
Google Scholar
Miller, B., et al.: Adversarial active learning. In: Proceedings of the 2014 AISec Workshop, pp. 3–14 (2014)
Google Scholar
Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J. Mach. Learn. Res. 13, 491–518 (2012)
MathSciNet MATH Google Scholar
Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD, pp. 614–622 (2008)
Google Scholar
Uebersax, J.S.: Statistical modeling of expert ratings on medical treatment appropriateness. J. Am. Stat. Assoc. 88, 421–427 (1993)
Article Google Scholar
Vuurens, J.B., de Vries, A.P.: Obtaining high-quality relevance judgments using crowdsourcing. IEEE Internet Comput. 16, 20–27 (2012)
Article Google Scholar
Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds. In: NIPS, pp. 2424–2432 (2010)
Google Scholar
Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.R.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009)
Google Scholar

Download references

Acknowledgement

The research reported herein was supported in part by NIH award 1R01HG006844, NSF awards CICI- 1547324, IIS-1633331, CNS-1837627, OAC-1828467 and ARO award W911NF-17-1-0356.

Author information

Authors and Affiliations

School of Engineering and Computer Science, University of Texas at Dallas, Richardson, TX, 75080, USA
Yan Zhou & Murat Kantarcioglu
Department of Statistics, Purdue University, West Lafayette, IN, 47907, USA
Bowei Xi

Authors

Yan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Murat Kantarcioglu
View author publications
You can also search for this author in PubMed Google Scholar
Bowei Xi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Zhou .

Editor information

Editors and Affiliations

University of Macau, Macao, China
Leong Hou U.
Singapore Management University, Singapore, Singapore
Hady W. Lauw

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Y., Kantarcioglu, M., Xi, B. (2019). Adversarial Active Learning in the Presence of Weak and Malicious Oracles. In: U., L., Lauw, H. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11607. Springer, Cham. https://doi.org/10.1007/978-3-030-26142-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-26142-9_8
Published: 12 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26141-2
Online ISBN: 978-3-030-26142-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics