Skip to main content

Class-Conditional Label Noise in Astroparticle Physics

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track (ECML PKDD 2023)

Abstract

Class-conditional label noise characterizes classification tasks in which the training set labels are randomly flipped versions of the actual ground-truth. The analysis of telescope data in astroparticle physics poses this problem with a novel condition: one of the class-wise label flip probabilities is known while the other is not. We address this condition with an objective function for optimizing the decision thresholds of existing classifiers. Our experiments on several imbalanced data sets demonstrate that accounting for the known label flip probability substantially improves the learning outcome over existing methods for learning under class-conditional label noise. In astroparticle physics, our proposal achieves an improvement in predictive performance and a considerable reduction in computational requirements. These achievements are a direct result of our proposal’s ability to learn from real telescope data, instead of relying on simulated data as is common practice in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://tevcat.uchicago.edu/, catalog version 3.400 by Wakely and Horan [32].

  2. 2.

    https://github.com/mirkobunse/pkccn.

  3. 3.

    https://imbalanced-learn.org/stable/data sets/.

  4. 4.

    https://factdata.app.tu-dortmund.de/.

  5. 5.

    https://github.com/fact-project/open_crab_sample_analysis.

References

  1. Acharya, B.S., et al.: Science with the Cherenkov telescope array. World Sci. (2018)

    Google Scholar 

  2. Actis, M., Agnetta, G., Aharonian, F., Akhperjanian, A., Aleksić, J., et al.: Design concepts for the Cherenkov telescope array CTA: an advanced facility for ground-based high-energy gamma-ray astronomy. Exper. Astron. 32(3), 193–316 (2011). https://doi.org/10.1007/s10686-011-9247-0

    Article  Google Scholar 

  3. Aharonian, F., Akhperjanian, A., Bazer-Bachi, A., Beilicke, M., Benbow, W., et al.: Observations of the crab nebula with HESS. Astron. Astrophys. 457(3), 899–915 (2006)

    Article  Google Scholar 

  4. Anderhub, H., Backes, M., Biland, A., Boccone, V., Braun, I., et al.: Design and operation of FACT-the first G-APD Cherenkov telescope. J. Inst. 8(06) (2013). https://doi.org/10.1088/1748-0221/8/06/p06008

  5. Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Conference on Computing Learning Theory, pp. 92–100. ACM (1998). https://doi.org/10.1145/279943.279962

  6. Bock, R., et al.: Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl. Inst. and Meth. in Phys. Res. Sec. A 516(2–3), 511–528 (2004). https://doi.org/10.1016/j.nima.2003.08.157

  7. Bockermann, C., et al.: Online analysis of high-volume data streams in astroparticle physics. In: Bifet, A., et al. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 100–115. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_7

    Chapter  Google Scholar 

  8. Buja, A., Stuetzle, W., Shen, Y.: Loss functions for binary class probability estimation and classification: structure and applications. Tech. rep., University of Pennsylvania (2005)

    Google Scholar 

  9. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Engel, R., Heck, D., Pierog, T.: Extensive air showers and hadronic interactions at high energy. Ann. Rev. Nucl. Part. Sci. 61(1), 467–489 (2011). https://doi.org/10.1146/annurev.nucl.012809.104544

    Article  Google Scholar 

  11. Falkenburg, B., Rhode, W.: From ultra rays to astroparticles: a historical introduction to astroparticle physics. Springer (2012). https://doi.org/10.1007/978-94-007-5422-5

    Article  Google Scholar 

  12. Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4

    Book  Google Scholar 

  13. Funk, S.: Ground- and space-based gamma-ray astronomy. Ann. Rev. Nucl. Part. Sci. 65(1), 245–277 (2015). https://doi.org/10.1146/annurev-nucl-102014-022036

    Article  Google Scholar 

  14. Ghosh, A., Manwani, N., Sastry, P.S.: Making risk minimization tolerant to label noise. Neurocomput. 160, 93–107 (2015). https://doi.org/10.1016/j.neucom.2014.09.081

    Article  Google Scholar 

  15. Han, B., et al.: Co-teaching: robust training of deep neural networks with extremely noisy labels. Adv. in Neur. Inform. Process. Syst., 8536–8546 (2018)

    Google Scholar 

  16. Koyejo, O., Natarajan, N., Ravikumar, P., Dhillon, I.S.: Consistent binary classification with generalized performance metrics. Adv. in Neur. Inform. Process. Syst., 2744–2752 (2014)

    Google Scholar 

  17. Lemaitre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18, 17:1–17:5 (2017)

    Google Scholar 

  18. Li, T.P., Ma, Y.Q.: Analysis methods for results in gamma-ray astronomy. Astrophys. J. 272, 317–324 (1983)

    Article  Google Scholar 

  19. Li, X., Liu, T., Han, B., Niu, G., Sugiyama, M.: Provably end-to-end label-noise learning without anchor points. In: International Conference on Machine Learning Research, vol. 139, pp. 6403–6413. PMLR (2021)

    Google Scholar 

  20. Ma, X., Huang, H., Wang, Y., Romano, S., Erfani, S.M., Bailey, J.: Normalized loss functions for deep learning with noisy labels. In: International Conference on Machine Learning Proceedings of Machine Learning Research, vol. 119, pp. 6543–6553. PMLR (2020)

    Google Scholar 

  21. Martschei, D., Feindt, M., Honc, S., Wagner-Kuhr, J.: Advanced event reweighting using multivariate analysis. J. of Phys.: Conf. Ser. 368 (2012). https://doi.org/10.1088/1742-6596/368/1/012028

  22. Menon, A.K., Narasimhan, H., Agarwal, S., Chawla, S.: On the statistical consistency of algorithms for binary classification under class imbalance. In: International Conference on Machine Learning JMLR Workshop and Conference Proceedings, vol. 28, pp. 603–611 (2013)

    Google Scholar 

  23. Menon, A.K., van Rooyen, B., Natarajan, N.: Learning from binary labels with instance-dependent noise. Mach. Learn. 107(8–10), 1561–1595 (2018). https://doi.org/10.1007/s10994-018-5715-3

    Article  MathSciNet  MATH  Google Scholar 

  24. Menon, A.K., van Rooyen, B., Ong, C.S., Williamson, R.C.: Learning from corrupted binary labels via class-probability estimation. In: International Conference on Machine Learning JMLR Workshop and Conference Proceedings, vol. 37, pp. 125–134 (2015)

    Google Scholar 

  25. Mithal, V., Nayak, G., Khandelwal, A., Kumar, V., Oza, N.C., Nemani, R.R.: RAPT: rare class prediction in absence of true labels. IEEE Trans. Knowl. Data Eng. 29(11), 2484–2497 (2017). https://doi.org/10.1109/TKDE.2017.2739739

    Article  Google Scholar 

  26. Narasimhan, H., Vaish, R., Agarwal, S.: On the statistical consistency of plug-in classifiers for non-decomposable performance measures. In: Advances in Neural Information Processing Systems, pp. 1493–1501 (2014)

    Google Scholar 

  27. Natarajan, N., Dhillon, I.S., Ravikumar, P., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems, pp. 1196–1204 (2013)

    Google Scholar 

  28. Northcutt, C.G., Jiang, L., Chuang, I.L.: Confident learning: estimating uncertainty in dataset labels. J. Artif. Intell. Res. 70, 1373–1411 (2021). https://doi.org/10.1613/jair.1.12125

    Article  MathSciNet  MATH  Google Scholar 

  29. Patrini, G., Rozza, A., Menon, A.K., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: Conference on Computer Vision and Pattern Recognition, pp. 2233–2241. IEEE (2017). https://doi.org/10.1109/CVPR.2017.240

  30. Scott, C., Blanchard, G., Handy, G.: Classification with asymmetric label noise: consistency and maximal denoising. In: Annual Conference on Learning Theory JMLR Workshop and Conference Proceedings, vol. 30, pp. 489–511. JMLR.org (2013)

    Google Scholar 

  31. Tridon, D.B., et al.: The MAGIC-II gamma-ray stereoscopic telescope system. Nucl. Inst. Meth. in Phys. Res. Sec. A 623(1), 437–439 (2010). https://doi.org/10.1016/j.nima.2010.03.028

  32. Wakely, S.P., Horan, D.: TeVCat: an online catalog for very high energy gamma-ray astronomy. In: International Cosmic Ray Conference, vol. 3, pp. 1341–1344 (2008)

    Google Scholar 

  33. Xia, X., Liu, T., Wang, N., Han, B., Gong, C., et al.: Are anchor points really indispensable in label-noise learning? In: Advances in Neural Information Processing Systems, pp. 6835–6846 (2019)

    Google Scholar 

  34. Yao, Y., Liu, T., Han, B., Gong, M., Deng, J., et al.: Dual T: Reducing estimation error for transition matrix in label-noise learning. In: Advances in Neural Information Processing Systems (2020)

    Google Scholar 

  35. Ye, N., Chai, K.M.A., Lee, W.S., Chieu, H.L.: Optimizing F-measure: a tale of two approaches. In: International Conference on Machine Learning Omnipress (2012)

    Google Scholar 

Download references

Acknowledgments

This research was partly funded by the Federal Ministry of Education and Research of Germany and the state of North-Rhine Westphalia as part of the Lamarr-Institute for Machine Learning and Artificial Intelligence.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mirko Bunse .

Editor information

Editors and Affiliations

Ethics declarations

Ethical Implications

Label noise, if not mitigated, can easily lead to the learning of incorrect prediction models, which is a particular danger for safety-critical applications. Moreover, label noise can perpetuate and amplify existing societal biases if appropriate countermeasures are not taken. The existence of these risks crucially requires research on the effects and the mitigation of different kinds of label noise. In this regard, we contribute a characterization and mitigation of PK-CCN, a novel instance of class-conditional label noise.

Successful mitigation techniques can tempt stakeholders to take the risks of label noise even if alternative solutions exist. In fact, we advocate the employment of PK-CCN data in a use case where training data is otherwise obtained from simulations. In spite of such alternative solutions, a careful consideration of all risks is morally required. In our use case, the risks of learning from simulations are still vague while we have clearly described the effects of PK-CCN and have mitigated them through learning algorithms that are proven to be consistent. Our algorithms result in a reduction of computational requirements, which translates to a reduction in energy consumption. This improvement is a desirable property for combating climate change. We emphasize that other cases of label noise can involve risks that require different considerations.

Astroparticle physics is a research field that is concerned with advancing our understanding of the cosmos and fundamental physics. While a deep understanding of the cosmos can inspire us to appreciate nature and take better care of our planet, the understanding of fundamental physics can eventually contribute to the development of technologies that improve the lives of many.

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bunse, M., Pfahler, L. (2023). Class-Conditional Label Noise in Astroparticle Physics. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43427-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43426-6

  • Online ISBN: 978-3-031-43427-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics