Class-Conditional Label Noise in Astroparticle Physics

Bunse, Mirko; Pfahler, Lukas

doi:10.1007/978-3-031-43427-3_2

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14174))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

874 Accesses

Abstract

Class-conditional label noise characterizes classification tasks in which the training set labels are randomly flipped versions of the actual ground-truth. The analysis of telescope data in astroparticle physics poses this problem with a novel condition: one of the class-wise label flip probabilities is known while the other is not. We address this condition with an objective function for optimizing the decision thresholds of existing classifiers. Our experiments on several imbalanced data sets demonstrate that accounting for the known label flip probability substantially improves the learning outcome over existing methods for learning under class-conditional label noise. In astroparticle physics, our proposal achieves an improvement in predictive performance and a considerable reduction in computational requirements. These achievements are a direct result of our proposal’s ability to learn from real telescope data, instead of relying on simulated data as is common practice in the field.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://tevcat.uchicago.edu/, catalog version 3.400 by Wakely and Horan [32].
2.
https://github.com/mirkobunse/pkccn.
3.
https://imbalanced-learn.org/stable/data sets/.
4.
https://factdata.app.tu-dortmund.de/.
5.
https://github.com/fact-project/open_crab_sample_analysis.

References

Acharya, B.S., et al.: Science with the Cherenkov telescope array. World Sci. (2018)
Google Scholar
Actis, M., Agnetta, G., Aharonian, F., Akhperjanian, A., Aleksić, J., et al.: Design concepts for the Cherenkov telescope array CTA: an advanced facility for ground-based high-energy gamma-ray astronomy. Exper. Astron. 32(3), 193–316 (2011). https://doi.org/10.1007/s10686-011-9247-0
Article Google Scholar
Aharonian, F., Akhperjanian, A., Bazer-Bachi, A., Beilicke, M., Benbow, W., et al.: Observations of the crab nebula with HESS. Astron. Astrophys. 457(3), 899–915 (2006)
Article Google Scholar
Anderhub, H., Backes, M., Biland, A., Boccone, V., Braun, I., et al.: Design and operation of FACT-the first G-APD Cherenkov telescope. J. Inst. 8(06) (2013). https://doi.org/10.1088/1748-0221/8/06/p06008
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Conference on Computing Learning Theory, pp. 92–100. ACM (1998). https://doi.org/10.1145/279943.279962
Bock, R., et al.: Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl. Inst. and Meth. in Phys. Res. Sec. A 516(2–3), 511–528 (2004). https://doi.org/10.1016/j.nima.2003.08.157
Bockermann, C., et al.: Online analysis of high-volume data streams in astroparticle physics. In: Bifet, A., et al. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 100–115. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_7
Chapter Google Scholar
Buja, A., Stuetzle, W., Shen, Y.: Loss functions for binary class probability estimation and classification: structure and applications. Tech. rep., University of Pennsylvania (2005)
Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Engel, R., Heck, D., Pierog, T.: Extensive air showers and hadronic interactions at high energy. Ann. Rev. Nucl. Part. Sci. 61(1), 467–489 (2011). https://doi.org/10.1146/annurev.nucl.012809.104544
Article Google Scholar
Falkenburg, B., Rhode, W.: From ultra rays to astroparticles: a historical introduction to astroparticle physics. Springer (2012). https://doi.org/10.1007/978-94-007-5422-5
Article Google Scholar
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4
Book Google Scholar
Funk, S.: Ground- and space-based gamma-ray astronomy. Ann. Rev. Nucl. Part. Sci. 65(1), 245–277 (2015). https://doi.org/10.1146/annurev-nucl-102014-022036
Article Google Scholar
Ghosh, A., Manwani, N., Sastry, P.S.: Making risk minimization tolerant to label noise. Neurocomput. 160, 93–107 (2015). https://doi.org/10.1016/j.neucom.2014.09.081
Article Google Scholar
Han, B., et al.: Co-teaching: robust training of deep neural networks with extremely noisy labels. Adv. in Neur. Inform. Process. Syst., 8536–8546 (2018)
Google Scholar
Koyejo, O., Natarajan, N., Ravikumar, P., Dhillon, I.S.: Consistent binary classification with generalized performance metrics. Adv. in Neur. Inform. Process. Syst., 2744–2752 (2014)
Google Scholar
Lemaitre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18, 17:1–17:5 (2017)
Google Scholar
Li, T.P., Ma, Y.Q.: Analysis methods for results in gamma-ray astronomy. Astrophys. J. 272, 317–324 (1983)
Article Google Scholar
Li, X., Liu, T., Han, B., Niu, G., Sugiyama, M.: Provably end-to-end label-noise learning without anchor points. In: International Conference on Machine Learning Research, vol. 139, pp. 6403–6413. PMLR (2021)
Google Scholar
Ma, X., Huang, H., Wang, Y., Romano, S., Erfani, S.M., Bailey, J.: Normalized loss functions for deep learning with noisy labels. In: International Conference on Machine Learning Proceedings of Machine Learning Research, vol. 119, pp. 6543–6553. PMLR (2020)
Google Scholar
Martschei, D., Feindt, M., Honc, S., Wagner-Kuhr, J.: Advanced event reweighting using multivariate analysis. J. of Phys.: Conf. Ser. 368 (2012). https://doi.org/10.1088/1742-6596/368/1/012028
Menon, A.K., Narasimhan, H., Agarwal, S., Chawla, S.: On the statistical consistency of algorithms for binary classification under class imbalance. In: International Conference on Machine Learning JMLR Workshop and Conference Proceedings, vol. 28, pp. 603–611 (2013)
Google Scholar
Menon, A.K., van Rooyen, B., Natarajan, N.: Learning from binary labels with instance-dependent noise. Mach. Learn. 107(8–10), 1561–1595 (2018). https://doi.org/10.1007/s10994-018-5715-3
Article MathSciNet MATH Google Scholar
Menon, A.K., van Rooyen, B., Ong, C.S., Williamson, R.C.: Learning from corrupted binary labels via class-probability estimation. In: International Conference on Machine Learning JMLR Workshop and Conference Proceedings, vol. 37, pp. 125–134 (2015)
Google Scholar
Mithal, V., Nayak, G., Khandelwal, A., Kumar, V., Oza, N.C., Nemani, R.R.: RAPT: rare class prediction in absence of true labels. IEEE Trans. Knowl. Data Eng. 29(11), 2484–2497 (2017). https://doi.org/10.1109/TKDE.2017.2739739
Article Google Scholar
Narasimhan, H., Vaish, R., Agarwal, S.: On the statistical consistency of plug-in classifiers for non-decomposable performance measures. In: Advances in Neural Information Processing Systems, pp. 1493–1501 (2014)
Google Scholar
Natarajan, N., Dhillon, I.S., Ravikumar, P., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems, pp. 1196–1204 (2013)
Google Scholar
Northcutt, C.G., Jiang, L., Chuang, I.L.: Confident learning: estimating uncertainty in dataset labels. J. Artif. Intell. Res. 70, 1373–1411 (2021). https://doi.org/10.1613/jair.1.12125
Article MathSciNet MATH Google Scholar
Patrini, G., Rozza, A., Menon, A.K., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: Conference on Computer Vision and Pattern Recognition, pp. 2233–2241. IEEE (2017). https://doi.org/10.1109/CVPR.2017.240
Scott, C., Blanchard, G., Handy, G.: Classification with asymmetric label noise: consistency and maximal denoising. In: Annual Conference on Learning Theory JMLR Workshop and Conference Proceedings, vol. 30, pp. 489–511. JMLR.org (2013)
Google Scholar
Tridon, D.B., et al.: The MAGIC-II gamma-ray stereoscopic telescope system. Nucl. Inst. Meth. in Phys. Res. Sec. A 623(1), 437–439 (2010). https://doi.org/10.1016/j.nima.2010.03.028
Wakely, S.P., Horan, D.: TeVCat: an online catalog for very high energy gamma-ray astronomy. In: International Cosmic Ray Conference, vol. 3, pp. 1341–1344 (2008)
Google Scholar
Xia, X., Liu, T., Wang, N., Han, B., Gong, C., et al.: Are anchor points really indispensable in label-noise learning? In: Advances in Neural Information Processing Systems, pp. 6835–6846 (2019)
Google Scholar
Yao, Y., Liu, T., Han, B., Gong, M., Deng, J., et al.: Dual T: Reducing estimation error for transition matrix in label-noise learning. In: Advances in Neural Information Processing Systems (2020)
Google Scholar
Ye, N., Chai, K.M.A., Lee, W.S., Chieu, H.L.: Optimizing F-measure: a tale of two approaches. In: International Conference on Machine Learning Omnipress (2012)
Google Scholar

Download references

Acknowledgments

This research was partly funded by the Federal Ministry of Education and Research of Germany and the state of North-Rhine Westphalia as part of the Lamarr-Institute for Machine Learning and Artificial Intelligence.

Author information

Authors and Affiliations

Artificial Intelligence Group, TU Dortmund University, 44227, Dortmund, Germany
Mirko Bunse & Lukas Pfahler

Authors

Mirko Bunse
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Pfahler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mirko Bunse .

Editor information

Editors and Affiliations

CENTAI, Turin, Italy
Gianmarco De Francisci Morales
NYU and Two Sigma, New York, NY, USA
Claudia Perlich
Netflix, Los Angeles, CA, USA
Natali Ruchansky
Telefonica Research, Barcelona, Spain
Nicolas Kourtellis
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Ethics declarations

Ethical Implications

Label noise, if not mitigated, can easily lead to the learning of incorrect prediction models, which is a particular danger for safety-critical applications. Moreover, label noise can perpetuate and amplify existing societal biases if appropriate countermeasures are not taken. The existence of these risks crucially requires research on the effects and the mitigation of different kinds of label noise. In this regard, we contribute a characterization and mitigation of PK-CCN, a novel instance of class-conditional label noise.

Successful mitigation techniques can tempt stakeholders to take the risks of label noise even if alternative solutions exist. In fact, we advocate the employment of PK-CCN data in a use case where training data is otherwise obtained from simulations. In spite of such alternative solutions, a careful consideration of all risks is morally required. In our use case, the risks of learning from simulations are still vague while we have clearly described the effects of PK-CCN and have mitigated them through learning algorithms that are proven to be consistent. Our algorithms result in a reduction of computational requirements, which translates to a reduction in energy consumption. This improvement is a desirable property for combating climate change. We emphasize that other cases of label noise can involve risks that require different considerations.

Astroparticle physics is a research field that is concerned with advancing our understanding of the cosmos and fundamental physics. While a deep understanding of the cosmos can inspire us to appreciate nature and take better care of our planet, the understanding of fundamental physics can eventually contribute to the development of technologies that improve the lives of many.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bunse, M., Pfahler, L. (2023). Class-Conditional Label Noise in Astroparticle Physics. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-43427-3_2
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43426-6
Online ISBN: 978-3-031-43427-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Class-Conditional Label Noise in Astroparticle Physics