Skip to main content

Adversarial Active Learning in the Presence of Weak and Malicious Oracles

  • Conference paper
  • First Online:
Trends and Applications in Knowledge Discovery and Data Mining (PAKDD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11607))

Included in the following conference series:

Abstract

We present a robust active learning technique for situations where there are weak and adversarial oracles. Our work falls under the general umbrella of active learning in which training data is insufficient and oracles are queried to supply labels for the most informative samples to expand the training set. On top of that, we consider problems where a large percentage of oracles may be strategically lying, as in adversarial settings. We present an adversarial active learning technique that explores the duality between oracle modeling and data modeling. We demonstrate on real datasets that our adversarial active learning technique is superior to not only the heuristic majority-voting technique but one of the state-of-the-art adversarial crowdsourcing technique—Generative model of Labels, Abilities, and Difficulties (GLAD), when genuine oracles are outnumbered by weak oracles and malicious oracles, and even in the extreme cases where all the oracles are either weak or malicious. To put our technique under more rigorous tests, we compare our adversarial active learner to the ideal active learner that always receives correct labels. We demonstrate that our technique is as effective as the ideal active learner when only one third of the oracles are genuine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://s3.amazonaws.com/mplabsites/Sites/OptimalLabelingRelease1.0.3.tar.gz.

References

  1. Balcan, M., Beygelzimer, A., Langford, J.: Agnostic active learning. In: ICML, pp. 65–72 (2006)

    Google Scholar 

  2. Beygelzimer, A., Langford, J., Tong, Z., Hsu, D.J.: Agnostic active learning without constraints. In: Advances in Neural Information Processing Systems, vol. 23, pp. 199–207. Curran Associates, Inc. (2010)

    Google Scholar 

  3. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th International AAAI Conference on Web and Social Media, ICWSM 2017 (2017)

    Google Scholar 

  4. French, S.: Group consensus probability distributions: a critical survey. Bayesian Stat. 2, 183–202 (1985)

    MathSciNet  MATH  Google Scholar 

  5. Jagabathula, S., Subramanian, L., Venkataraman, A.: Reputation-based worker filtering in crowdsourcing. In: NIPS, pp. 2492–2500 (2014)

    Google Scholar 

  6. LIBSVM: LIBSVM Data: Classification, Regression, and Multi-label (2014). http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

  7. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  8. Liu, Q., Peng, J., Ihler, A.T.: Variational inference for crowdsourcing. In: Advances In Neural Information Processing Systems, pp. 692–700 (2012)

    Google Scholar 

  9. Ma, F., et al.: Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21th ACM SIGKDD, pp. 745–754 (2015)

    Google Scholar 

  10. Miller, B., et al.: Adversarial active learning. In: Proceedings of the 2014 AISec Workshop, pp. 3–14 (2014)

    Google Scholar 

  11. Raykar, V.C., Yu, S.: Eliminating spammers and ranking annotators for crowdsourced labeling tasks. J. Mach. Learn. Res. 13, 491–518 (2012)

    MathSciNet  MATH  Google Scholar 

  12. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD, pp. 614–622 (2008)

    Google Scholar 

  13. Uebersax, J.S.: Statistical modeling of expert ratings on medical treatment appropriateness. J. Am. Stat. Assoc. 88, 421–427 (1993)

    Article  Google Scholar 

  14. Vuurens, J.B., de Vries, A.P.: Obtaining high-quality relevance judgments using crowdsourcing. IEEE Internet Comput. 16, 20–27 (2012)

    Article  Google Scholar 

  15. Welinder, P., Branson, S., Perona, P., Belongie, S.J.: The multidimensional wisdom of crowds. In: NIPS, pp. 2424–2432 (2010)

    Google Scholar 

  16. Whitehill, J., Ruvolo, P., Wu, T., Bergsma, J., Movellan, J.R.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009)

    Google Scholar 

Download references

Acknowledgement

The research reported herein was supported in part by NIH award 1R01HG006844, NSF awards CICI- 1547324, IIS-1633331, CNS-1837627, OAC-1828467 and ARO award W911NF-17-1-0356.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhou, Y., Kantarcioglu, M., Xi, B. (2019). Adversarial Active Learning in the Presence of Weak and Malicious Oracles. In: U., L., Lauw, H. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11607. Springer, Cham. https://doi.org/10.1007/978-3-030-26142-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26142-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26141-2

  • Online ISBN: 978-3-030-26142-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics