WeakAL: Combining Active Learning and Weak Supervision

Gonsior, Julius; Thiele, Maik; Lehner, Wolfgang

doi:10.1007/978-3-030-61527-7_3

WeakAL: Combining Active Learning and Weak Supervision

Conference paper
First Online: 15 October 2020

1528 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12323))

Abstract

Supervised Learning requires a huge amount of labeled data, making efficient labeling one of the most critical components for the success of Machine Learning (ML). One well-known method to gain labeled data efficiently is Active Learning (AL), where the learner interactively asks human experts to label the most informative data point. Nevertheless, even by applying AL in labeling tasks the amount of human effort is still too high and should be minimized further.

In this paper therefore we propose WeakAL, which incorporates Weak Supervision (WS) techniques directly into the AL cycle. This allows us to reduce the number of annotations by human experts while keeping the same level of ML performance. We investigate different WS strategies as well as different parameter combinations for a wide range of real-world datasets. Our evaluation shows that for example in the context of Web table classification, 55% of otherwise manually retrieved labels can be generated by WS techniques with a negligible loss of test accuracy by 0.31% only. To further prove the general applicability of our approach we applied it to six datasets from the AL challenge from Guyon et al., where over 90% of the labels could be computed by the WS techniques, while still achieving competitive competition results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/jgonsior/weakal.

References

Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994). https://doi.org/10.1007/BF00993277
Article Google Scholar
Settles, B.: Active learning literature survey. Computer Sciences. Technical Report 1648 (2010)
Google Scholar
Lewis, D.D.: A sequential algorithm for training text classifiers: corrigendum and additional data. SIGIR Forum 29(2), 13–19 (1995)
Article MathSciNet Google Scholar
Scheffer, T., Decomain, C., Wrobel, S.: Active hidden Markov models for information extraction. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds.) IDA 2001. LNCS, vol. 2189, pp. 309–318. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44816-0_31
Chapter Google Scholar
Shannon, C.E.: A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)
Article MathSciNet Google Scholar
Baram, Y., El-Yaniv, R., Luz, K.: Online choice of active learning algorithms. JMLR 5, 255–291 (2004)
MathSciNet Google Scholar
Scudder, H.J.: Probability of error of some adaptive pattern-recognition machines. IEEE Trans. Inf. Theory 11, 363–371 (1965)
Article MathSciNet Google Scholar
Guyon, I., Cawley, G., Dror, G., Lemaire, V.: Results of the active learning challenge. JMLR 16, 19–45 (2011)
Google Scholar
Eberius, J., Braunschweig, K., Hentsch, M., Thiele, M., Ahmadov, A., Lehner, W.: Building the dresden web table corpus: a classification approach. In: BDC, pp. 41–50. IEEE (2015)
Google Scholar
Sculley, D.: Web-scale k-means clustering. In: WWW, pp. 1177–1178 (2010)
Google Scholar
Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)
Article MathSciNet Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Folleco, A., Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: Identifying learners robust to low quality data. Informatica (Slovenia) 33, 245–259 (2009)
MathSciNet MATH Google Scholar
Zhu, X.: Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison 2 (2008)
Google Scholar
McCallum, A., Nigam, K.: Employing EM and pool-based active learning for text classification. In: ICML, pp. 350–358 (1998)
Google Scholar
Muslea, I., Minton, S.N., Knoblock, C.A.: Active + semi-supervised learning = robust multi-view learning. In: ICML (2002)
Google Scholar
Zhou, Z.H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2017)
Article Google Scholar
Dara, R., Kremer, S., Stacey, D.: Clustering unlabeled data with SOMs improves classification of labeled real-world data. In: Proceedings of the 2002 International Joint Conference on Neural Networks, vol. 3, pp. 2237–2242 (2002)
Google Scholar
Bodó, Z., Minier, Z., Csató, L.: Active learning with clustering. In: Active Learning and Experimental Design workshop@AISTATS, vol. 16, pp. 127–139 (2011)
Google Scholar
Zhu, X., Ghahramani, Z.: Learning from labeled and unlabeled data with label propagation. Technical report (2002)
Google Scholar
Varma, P., Ré, C.: Snuba: automating weak supervision to label training data. VLDB 12(3), 223–236 (2018)
Google Scholar

Download references

Acknowledgements

This research and development project is funded by the German Federal Ministry of Education and Research (BMBF) and the European Social Funds (ESF) within the “Innovations for Tomorrow’s Production, Services, and Work” Program (funding number 02L18B561) and implemented by the Project Management Agency Karlsruhe (PTKA). The author is responsible for the content of this publication.

Author information

Authors and Affiliations

Technische Universität Dresden, Dresden, Germany
Julius Gonsior, Maik Thiele & Wolfgang Lehner

Authors

Julius Gonsior
View author publications
You can also search for this author in PubMed Google Scholar
Maik Thiele
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julius Gonsior .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas
Open University of Cyprus, Nicosia, Cyprus
Yannis Manolopoulos
Dalhousie University, Halifax, NS, Canada
Stan Matwin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonsior, J., Thiele, M., Lehner, W. (2020). WeakAL: Combining Active Learning and Weak Supervision. In: Appice, A., Tsoumakas, G., Manolopoulos, Y., Matwin, S. (eds) Discovery Science. DS 2020. Lecture Notes in Computer Science(), vol 12323. Springer, Cham. https://doi.org/10.1007/978-3-030-61527-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-61527-7_3
Published: 15 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61526-0
Online ISBN: 978-3-030-61527-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics