Selective Sampling for Classification

Laviolette, François; Marchand, Mario; Shanian, Sara

doi:10.1007/978-3-540-68825-9_19

François Laviolette¹,
Mario Marchand¹ &
Sara Shanian¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5032))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

1518 Accesses
1 Citations

Abstract

Supervised learning is concerned with the task of building accurate classifiers from a set of labelled examples. However, the task of gathering a large set of labelled examples can be costly and time-consuming. Active learning algorithms try to reduce this labelling cost by performing a small number of label-queries from a large set of unlabelled examples during the process of building a classifier. However, the level of performance achieved by active learning algorithms is not always up to our expectations and no rigorous performance guarantee, in the form of a risk bound, exists for non-trivial active learning algorithms. In this paper, we propose a novel (and easy to implement) active learning algorithm having a rigorous performance guarantee (i.e., a valid risk bound) and that performs very well in comparison with some widely-used active learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Active learning algorithm through the lens of rejection arguments

Article Open access 26 December 2023

Toward optimal probabilistic active learning using a Bayesian approach

Article Open access 04 May 2021

Regression tree-based active learning

Article 16 August 2023

References

Ben-David, S., Blitze, J., Crammer, K., Pereira, F.: Analysis of Representations for Domain Adaptation. Advances in Neural Information Processing System 19, 137–144 (2007)
Google Scholar
Cohn, D.A., Atlas, L., Ladner, R.E.: Improving generalization with active learning. Machine Learning 15(2), 201–221 (1994)
Google Scholar
Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28, 133–168 (1997)
Article MATH Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Article MATH MathSciNet Google Scholar
Kääriäinen, M.: Generalization error bounds using unlabelled data. In: Proceedings of the 18th Annual Conference on Learning Theory, pp. 127–142 (2005)
Google Scholar
Langford, J.: Tutorial on practical prediction theory for classification. Journal of Machine Learning Research 6, 273–306 (2005)
MathSciNet Google Scholar
Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of the 11th International Conference on Machine Learning (ML 1994), pp. 148–156 (1994)
Google Scholar
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of SIGIR 1994, 17th ACM International Conference on Research and Development in Information Retrieval, pp. 3–12 (1994)
Google Scholar
Lewis, D.D., Gale, W.A.: Training text classifiers by uncertainty sampling. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1994), pp. 3–12 (1994)
Google Scholar
Marchand, M., Sokolova, M.: Learning with Decision Lists of Data- Dependent Features. Journal of Machine Learning Research 6, 427–451 (2005)
MathSciNet Google Scholar
Seeger, M.: PAC-Bayesian generalization bounds for guassian process. Journal of machine learning research 3, 233–269 (2002)
Article MathSciNet Google Scholar
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. Journal of machine learning research 2, 45–66 (2002)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

IFT-GLO, Université Laval, Québec (QC), Canada, G1V-0A6
François Laviolette, Mario Marchand & Sara Shanian

Authors

François Laviolette
View author publications
You can also search for this author in PubMed Google Scholar
Mario Marchand
View author publications
You can also search for this author in PubMed Google Scholar
Sara Shanian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Sabine Bergler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laviolette, F., Marchand, M., Shanian, S. (2008). Selective Sampling for Classification. In: Bergler, S. (eds) Advances in Artificial Intelligence. Canadian AI 2008. Lecture Notes in Computer Science(), vol 5032. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68825-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-68825-9_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68821-1
Online ISBN: 978-3-540-68825-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Selective Sampling for Classification

Abstract

Access this chapter

Preview

Similar content being viewed by others

Active learning algorithm through the lens of rejection arguments

Toward optimal probabilistic active learning using a Bayesian approach

Regression tree-based active learning

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Selective Sampling for Classification

Abstract

Access this chapter

Preview

Similar content being viewed by others

Active learning algorithm through the lens of rejection arguments

Toward optimal probabilistic active learning using a Bayesian approach

Regression tree-based active learning

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation