Skip to main content

Active Learning Using Difficult Instances

  • Conference paper
  • First Online:
  • 1417 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13728))

Abstract

Active learning systems achieve high accuracy with a low labeling budget by annotating high utility instances incrementally. In uncertainty sampling, labels of instances with maximal uncertainty are queried; however, redundant instances with similar features are often selected during the sampling process. We proposed a novel difficulty-based active learning framework that constructs decision boundaries by sampling instances with maximal classification difficulty. We propose three instance level difficulty measures, specifically base classifier count, fluctuation score and individual error score, in a boosted ensemble setting to identify difficult to classify instances. In real-life settings, obtaining labeled data is often expensive and requires domain experts; unlike other difficulty measures that assume complete label knowledge, the proposed measures need only limited labeled data. Experiments with real-world and synthetic datasets show that difficulty-based sampling requires significantly fewer labeled instances to achieve high accuracy than uncertainty sampling.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Armano, G., Tamponi, E.: Experimenting multiresolution analysis for identifying regions of different classification complexity. Pattern Anal. Appl. 19(1), 129–137 (2016)

    Article  Google Scholar 

  2. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)

    Google Scholar 

  3. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MATH  Google Scholar 

  4. Friederich, P., Häse, F., Proppe, J., Aspuru-Guzik, A.: Machine-learned potentials for next-generation matter simulations. Nat. Mater. 20(6), 750–761 (2021)

    Article  Google Scholar 

  5. Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002)

    Article  MATH  Google Scholar 

  6. Garcia, L.P., de Carvalho, A.C., Lorena, A.C.: Effect of label noise in the complexity of classification problems. Neurocomputing 160, 108–119 (2015)

    Article  Google Scholar 

  7. Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)

    Article  Google Scholar 

  8. Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach Learn. 1–50 (2021)

    Google Scholar 

  9. Lorena, A.C., Costa, I.G., Spolaôr, N., De Souto, M.C.: Analysis of complexity indices for classification problems: cancer gene expression data. Neurocomputing 75(1), 33–42 (2012)

    Article  Google Scholar 

  10. Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: a multi-output streaming framework. J. Mach. Learn. Res. 19(72), 1–5 (2018)

    Google Scholar 

  11. Pungpapong, V., Kanawattanachai, P.: The impact of data-complexity and team characteristics on performance in the classification model. Int. J. Bus. Anal. (2022)

    Google Scholar 

  12. Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., Aroyo, L.M.: “Everyone wants to do the model work, not the data work”: data cascades in high-stakes AI. In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–15 (2021)

    Google Scholar 

  13. Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the 5th Annual Workshop on Computational Learning Theory, pp. 287–294 (1992)

    Google Scholar 

  14. Sharma, M., Bilgic, M.: Evidence-based uncertainty sampling for active learning. Data Min. Knowl. Disc. 31(1), 164–202 (2017)

    Article  MATH  Google Scholar 

  15. Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data complexity. Mach. Learn. 95(2), 225–256 (2014)

    Article  MATH  Google Scholar 

  16. Wang, H., Bah, M.J., Hammad, M.: Progress in outlier detection techniques: a survey. IEEE Access 7, 107964–108000 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bowen Chen .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1293 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, B., Koh, Y.S., Halstead, B. (2022). Active Learning Using Difficult Instances. In: Aziz, H., Corrêa, D., French, T. (eds) AI 2022: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13728. Springer, Cham. https://doi.org/10.1007/978-3-031-22695-3_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22695-3_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22694-6

  • Online ISBN: 978-3-031-22695-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics