Active Learning Using Difficult Instances

Chen, Bowen; Koh, Yun Sing; Halstead, Ben

doi:10.1007/978-3-031-22695-3_52

Active Learning Using Difficult Instances

Bowen Chen¹⁰,
Yun Sing Koh¹⁰ &
Ben Halstead¹⁰

Conference paper
First Online: 03 December 2022

1417 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13728))

Abstract

Active learning systems achieve high accuracy with a low labeling budget by annotating high utility instances incrementally. In uncertainty sampling, labels of instances with maximal uncertainty are queried; however, redundant instances with similar features are often selected during the sampling process. We proposed a novel difficulty-based active learning framework that constructs decision boundaries by sampling instances with maximal classification difficulty. We propose three instance level difficulty measures, specifically base classifier count, fluctuation score and individual error score, in a boosted ensemble setting to identify difficult to classify instances. In real-life settings, obtaining labeled data is often expensive and requires domain experts; unlike other difficulty measures that assume complete label knowledge, the proposed measures need only limited labeled data. Experiments with real-world and synthetic datasets show that difficulty-based sampling requires significantly fewer labeled instances to achieve high accuracy than uncertainty sampling.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Armano, G., Tamponi, E.: Experimenting multiresolution analysis for identifying regions of different classification complexity. Pattern Anal. Appl. 19(1), 129–137 (2016)
Article Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MATH Google Scholar
Friederich, P., Häse, F., Proppe, J., Aspuru-Guzik, A.: Machine-learned potentials for next-generation matter simulations. Nat. Mater. 20(6), 750–761 (2021)
Article Google Scholar
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002)
Article MATH Google Scholar
Garcia, L.P., de Carvalho, A.C., Lorena, A.C.: Effect of label noise in the complexity of classification problems. Neurocomputing 160, 108–119 (2015)
Article Google Scholar
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)
Article Google Scholar
Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach Learn. 1–50 (2021)
Google Scholar
Lorena, A.C., Costa, I.G., Spolaôr, N., De Souto, M.C.: Analysis of complexity indices for classification problems: cancer gene expression data. Neurocomputing 75(1), 33–42 (2012)
Article Google Scholar
Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. J. Mach. Learn. Res. 19(72), 1–5 (2018)
Google Scholar
Pungpapong, V., Kanawattanachai, P.: The impact of data-complexity and team characteristics on performance in the classification model. Int. J. Bus. Anal. (2022)
Google Scholar
Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., Aroyo, L.M.: “Everyone wants to do the model work, not the data work”: data cascades in high-stakes AI. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–15 (2021)
Google Scholar
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp. 287–294 (1992)
Google Scholar
Sharma, M., Bilgic, M.: Evidence-based uncertainty sampling for active learning. Data Min. Knowl. Disc. 31(1), 164–202 (2017)
Article MATH Google Scholar
Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data complexity. Mach. Learn. 95(2), 225–256 (2014)
Article MATH Google Scholar
Wang, H., Bah, M.J., Hammad, M.: Progress in outlier detection techniques: a survey. IEEE Access 7, 107964–108000 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

The University of Auckland, Auckland, New Zealand
Bowen Chen, Yun Sing Koh & Ben Halstead

Authors

Bowen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yun Sing Koh
View author publications
You can also search for this author in PubMed Google Scholar
Ben Halstead
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bowen Chen .

Editor information

Editors and Affiliations

University of New South Wales, Sydney, NSW, Australia
Haris Aziz
University of Western Australia, Perth, WA, Australia
Débora Corrêa
University of Western Australia, Perth, WA, Australia
Tim French

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1293 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, B., Koh, Y.S., Halstead, B. (2022). Active Learning Using Difficult Instances. In: Aziz, H., Corrêa, D., French, T. (eds) AI 2022: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13728. Springer, Cham. https://doi.org/10.1007/978-3-031-22695-3_52

Download citation

DOI: https://doi.org/10.1007/978-3-031-22695-3_52
Published: 03 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22694-6
Online ISBN: 978-3-031-22695-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics