Abstract
Class-conditional label noise characterizes classification tasks in which the training set labels are randomly flipped versions of the actual ground-truth. The analysis of telescope data in astroparticle physics poses this problem with a novel condition: one of the class-wise label flip probabilities is known while the other is not. We address this condition with an objective function for optimizing the decision thresholds of existing classifiers. Our experiments on several imbalanced data sets demonstrate that accounting for the known label flip probability substantially improves the learning outcome over existing methods for learning under class-conditional label noise. In astroparticle physics, our proposal achieves an improvement in predictive performance and a considerable reduction in computational requirements. These achievements are a direct result of our proposal’s ability to learn from real telescope data, instead of relying on simulated data as is common practice in the field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
http://tevcat.uchicago.edu/, catalog version 3.400 by Wakely and Horan [32].
- 2.
- 3.
- 4.
- 5.
References
Acharya, B.S., et al.: Science with the Cherenkov telescope array. World Sci. (2018)
Actis, M., Agnetta, G., Aharonian, F., Akhperjanian, A., Aleksić, J., et al.: Design concepts for the Cherenkov telescope array CTA: an advanced facility for ground-based high-energy gamma-ray astronomy. Exper. Astron. 32(3), 193–316 (2011). https://doi.org/10.1007/s10686-011-9247-0
Aharonian, F., Akhperjanian, A., Bazer-Bachi, A., Beilicke, M., Benbow, W., et al.: Observations of the crab nebula with HESS. Astron. Astrophys. 457(3), 899–915 (2006)
Anderhub, H., Backes, M., Biland, A., Boccone, V., Braun, I., et al.: Design and operation of FACT-the first G-APD Cherenkov telescope. J. Inst. 8(06) (2013). https://doi.org/10.1088/1748-0221/8/06/p06008
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: Conference on Computing Learning Theory, pp. 92–100. ACM (1998). https://doi.org/10.1145/279943.279962
Bock, R., et al.: Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl. Inst. and Meth. in Phys. Res. Sec. A 516(2–3), 511–528 (2004). https://doi.org/10.1016/j.nima.2003.08.157
Bockermann, C., et al.: Online analysis of high-volume data streams in astroparticle physics. In: Bifet, A., et al. (eds.) ECML PKDD 2015. LNCS (LNAI), vol. 9286, pp. 100–115. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23461-8_7
Buja, A., Stuetzle, W., Shen, Y.: Loss functions for binary class probability estimation and classification: structure and applications. Tech. rep., University of Pennsylvania (2005)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Engel, R., Heck, D., Pierog, T.: Extensive air showers and hadronic interactions at high energy. Ann. Rev. Nucl. Part. Sci. 61(1), 467–489 (2011). https://doi.org/10.1146/annurev.nucl.012809.104544
Falkenburg, B., Rhode, W.: From ultra rays to astroparticles: a historical introduction to astroparticle physics. Springer (2012). https://doi.org/10.1007/978-94-007-5422-5
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98074-4
Funk, S.: Ground- and space-based gamma-ray astronomy. Ann. Rev. Nucl. Part. Sci. 65(1), 245–277 (2015). https://doi.org/10.1146/annurev-nucl-102014-022036
Ghosh, A., Manwani, N., Sastry, P.S.: Making risk minimization tolerant to label noise. Neurocomput. 160, 93–107 (2015). https://doi.org/10.1016/j.neucom.2014.09.081
Han, B., et al.: Co-teaching: robust training of deep neural networks with extremely noisy labels. Adv. in Neur. Inform. Process. Syst., 8536–8546 (2018)
Koyejo, O., Natarajan, N., Ravikumar, P., Dhillon, I.S.: Consistent binary classification with generalized performance metrics. Adv. in Neur. Inform. Process. Syst., 2744–2752 (2014)
Lemaitre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18, 17:1–17:5 (2017)
Li, T.P., Ma, Y.Q.: Analysis methods for results in gamma-ray astronomy. Astrophys. J. 272, 317–324 (1983)
Li, X., Liu, T., Han, B., Niu, G., Sugiyama, M.: Provably end-to-end label-noise learning without anchor points. In: International Conference on Machine Learning Research, vol. 139, pp. 6403–6413. PMLR (2021)
Ma, X., Huang, H., Wang, Y., Romano, S., Erfani, S.M., Bailey, J.: Normalized loss functions for deep learning with noisy labels. In: International Conference on Machine Learning Proceedings of Machine Learning Research, vol. 119, pp. 6543–6553. PMLR (2020)
Martschei, D., Feindt, M., Honc, S., Wagner-Kuhr, J.: Advanced event reweighting using multivariate analysis. J. of Phys.: Conf. Ser. 368 (2012). https://doi.org/10.1088/1742-6596/368/1/012028
Menon, A.K., Narasimhan, H., Agarwal, S., Chawla, S.: On the statistical consistency of algorithms for binary classification under class imbalance. In: International Conference on Machine Learning JMLR Workshop and Conference Proceedings, vol. 28, pp. 603–611 (2013)
Menon, A.K., van Rooyen, B., Natarajan, N.: Learning from binary labels with instance-dependent noise. Mach. Learn. 107(8–10), 1561–1595 (2018). https://doi.org/10.1007/s10994-018-5715-3
Menon, A.K., van Rooyen, B., Ong, C.S., Williamson, R.C.: Learning from corrupted binary labels via class-probability estimation. In: International Conference on Machine Learning JMLR Workshop and Conference Proceedings, vol. 37, pp. 125–134 (2015)
Mithal, V., Nayak, G., Khandelwal, A., Kumar, V., Oza, N.C., Nemani, R.R.: RAPT: rare class prediction in absence of true labels. IEEE Trans. Knowl. Data Eng. 29(11), 2484–2497 (2017). https://doi.org/10.1109/TKDE.2017.2739739
Narasimhan, H., Vaish, R., Agarwal, S.: On the statistical consistency of plug-in classifiers for non-decomposable performance measures. In: Advances in Neural Information Processing Systems, pp. 1493–1501 (2014)
Natarajan, N., Dhillon, I.S., Ravikumar, P., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems, pp. 1196–1204 (2013)
Northcutt, C.G., Jiang, L., Chuang, I.L.: Confident learning: estimating uncertainty in dataset labels. J. Artif. Intell. Res. 70, 1373–1411 (2021). https://doi.org/10.1613/jair.1.12125
Patrini, G., Rozza, A., Menon, A.K., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: Conference on Computer Vision and Pattern Recognition, pp. 2233–2241. IEEE (2017). https://doi.org/10.1109/CVPR.2017.240
Scott, C., Blanchard, G., Handy, G.: Classification with asymmetric label noise: consistency and maximal denoising. In: Annual Conference on Learning Theory JMLR Workshop and Conference Proceedings, vol. 30, pp. 489–511. JMLR.org (2013)
Tridon, D.B., et al.: The MAGIC-II gamma-ray stereoscopic telescope system. Nucl. Inst. Meth. in Phys. Res. Sec. A 623(1), 437–439 (2010). https://doi.org/10.1016/j.nima.2010.03.028
Wakely, S.P., Horan, D.: TeVCat: an online catalog for very high energy gamma-ray astronomy. In: International Cosmic Ray Conference, vol. 3, pp. 1341–1344 (2008)
Xia, X., Liu, T., Wang, N., Han, B., Gong, C., et al.: Are anchor points really indispensable in label-noise learning? In: Advances in Neural Information Processing Systems, pp. 6835–6846 (2019)
Yao, Y., Liu, T., Han, B., Gong, M., Deng, J., et al.: Dual T: Reducing estimation error for transition matrix in label-noise learning. In: Advances in Neural Information Processing Systems (2020)
Ye, N., Chai, K.M.A., Lee, W.S., Chieu, H.L.: Optimizing F-measure: a tale of two approaches. In: International Conference on Machine Learning Omnipress (2012)
Acknowledgments
This research was partly funded by the Federal Ministry of Education and Research of Germany and the state of North-Rhine Westphalia as part of the Lamarr-Institute for Machine Learning and Artificial Intelligence.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Ethical Implications
Label noise, if not mitigated, can easily lead to the learning of incorrect prediction models, which is a particular danger for safety-critical applications. Moreover, label noise can perpetuate and amplify existing societal biases if appropriate countermeasures are not taken. The existence of these risks crucially requires research on the effects and the mitigation of different kinds of label noise. In this regard, we contribute a characterization and mitigation of PK-CCN, a novel instance of class-conditional label noise.
Successful mitigation techniques can tempt stakeholders to take the risks of label noise even if alternative solutions exist. In fact, we advocate the employment of PK-CCN data in a use case where training data is otherwise obtained from simulations. In spite of such alternative solutions, a careful consideration of all risks is morally required. In our use case, the risks of learning from simulations are still vague while we have clearly described the effects of PK-CCN and have mitigated them through learning algorithms that are proven to be consistent. Our algorithms result in a reduction of computational requirements, which translates to a reduction in energy consumption. This improvement is a desirable property for combating climate change. We emphasize that other cases of label noise can involve risks that require different considerations.
Astroparticle physics is a research field that is concerned with advancing our understanding of the cosmos and fundamental physics. While a deep understanding of the cosmos can inspire us to appreciate nature and take better care of our planet, the understanding of fundamental physics can eventually contribute to the development of technologies that improve the lives of many.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bunse, M., Pfahler, L. (2023). Class-Conditional Label Noise in Astroparticle Physics. In: De Francisci Morales, G., Perlich, C., Ruchansky, N., Kourtellis, N., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14174. Springer, Cham. https://doi.org/10.1007/978-3-031-43427-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-43427-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43426-6
Online ISBN: 978-3-031-43427-3
eBook Packages: Computer ScienceComputer Science (R0)