Abstract
This paper presents the results of experiments with undersampling and oversampling applied to machine learning classifiers used in the identification of exoplanet transits with low signal-to-noise ratio (SNR) data. We start by giving an overview of the most popular method for exoplanet detection, followed by an analysis of the Kepler Object of Interest (KOI) data set, along with an overview of the state of the art machine learning models applied to this problem, and how complex it is to correctly identify exoplanets on low SNR data. We then briefly discuss our signal-to noise ratio reduction procedure, used to generate the low SNR data for our experiments. Finally we use our low SNR data set to train and evaluate some models in scenarios with no sampling strategy and with oversampling and undersampling, using repeated holdout validation. Results show that current classifiers can identify transits in low SNR data sets, with accuracy varying between 69% and 81%, and that sampling strategies can affect simpler classifiers, making them less conservative, but do not show significant effects on more complex classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
MES is a significance metric derived, among other things, from the transit SNR, so that the greater a transit’s SNR, the greater its MES [15].
- 2.
Kepler’s KOI catalog has three possible classes for transit events. “CONFIRMED” is a transit which was identified as a PC and lately confirmed by another method; “FALSE POSITIVES” are transits confirmed to fall into one of the false positive categories from Sect. 1; and “CANDIDATE” are transits identified as PC by the pipeline, but which were not yet confirmed by another method.
- 3.
The test was executed through the f_oneway function of the SciPy package [33].
- 4.
Calculated with the pairwise_tukeyhsd function of the statsmodels package [31], leading to AstroNet \(\times \) SIDRA \(t=-0,0561, p\ll 0,001\); AstroNet \(\times \) ExoplanetSVM \(t=-0,0908, p\ll 0,001\); SIDRA \(\times \) ExoplanetSVM \(t=-0,0347, p\ll 0,001\).
References
Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: 12th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 2016), pp. 265–283 (2016)
Alshehhi, R., Rodenbeck, K., Gizon, L., Sreenivasan, K.R.: Detection of exomoons in simulated light curves with a regularized convolutional neural network. Astron. Astrophys. 640, A41 (2020). https://doi.org/10.1051/0004-6361/201937059
Amin, R.A., et al.: Detection of exoplanet systems in kepler light curves using adaptive neuro-fuzzy system. In: 2018 International Conference on Intelligent Systems (IS), pp. 66–72. IEEE (2018)
Ansdell, M., et al.: Scientific domain knowledge improves exoplanet transit classification with deep learning. Astrophys. J. 869(1), L7 (2018). https://doi.org/10.3847/2041-8213/aaf23b
Armstrong, D.J., Gamper, J., Damoulas, T.: Exoplanet validation with machine learning: 50 new validated kepler planets (2020)
Armstrong, D.J.: Automatic vetting of planet candidates from ground-based surveys: machine learning with NGTS. Monthly Not. Roy. Astron. Soc. 478(3), 4225–4237 (2018)
Armstrong, D.J., Pollacco, D., Santerne, A.: Transit shapes and self organising maps as a tool for ranking planetary candidates: application to kepler and k2. Monthly Not. Roy. Astron. Soc., stw2881 (2016)
Assembly, I.G.: Resolutions b5 and b6 on the definition of a planet in the solar system and pluto (2014)
Battley, M.P., Pollacco, D., Armstrong, D.J.: A search for young exoplanets in sectors 1–5 of the tess full-frame images. Monthly Not. Roy. Astron. Soc. 496(2), 1197–1216 (2020)
Boss, A.P., et al.: Working group on extrasolar planets. Proc. Int. Astron. Union 1(T26A), 183–186 (2005)
Bugueno, M., Mena, F., Araya, M.: Refining exoplanet detection using supervised learning and feature engineering. In: 2018 XLIV Latin American Computer Conference (CLEI), pp. 278–287. IEEE (2018)
Caceres, G.A., et al.: Autoregressive planet search: application to the kepler mission. Astron. J. 158(2), 58 (2019)
Chaushev, A., et al.: Classifying exoplanet candidates with convolutional neural networks: application to the next generation transit survey. Monthly Not. Roy. Astron. Soc. 488(4), 5232–5250 (2019)
Chintarungruangchai, P., Jiang, G.: Detecting exoplanet transits through machine-learning techniques with convolutional neural networks. Publ. Astron. Soc. Pac. 131(1000), 064502 (2019)
Coughlin, J.L., et al.: Planetary candidates observed by kepler. vii. the first fully uniform catalog based on the entire 48-month data set (q1–q17 dr24). Astrophys. J. Suppl. Ser. 224(1), 12 (2016)
Dattilo, A., et al.: Identifying exoplanets with deep learning. ii. two new super-earths uncovered by a neural network in k2 data. Astron. J. 157(5), 169 (2019)
Grziwa, S., Pätzold, M.: Wavelet-based filter methods to detect small transiting planets in stellar light curves. arXiv preprint arXiv:1607.08417 (2016)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
Hinners, T.A., Tat, K., Thorp, R.: Machine learning techniques for stellar light curve classification. Astron. J. 156(1), 7 (2018)
Hippke, M., Heller, R.: Optimized transit detection algorithm to search for periodic transits of small planets. Astron. Astrophys. 623, A39 (2019)
Jara-Maldonado, M., Alarcon-Aquino, V., Rosas-Romero, R., Starostenko, O., Ramirez-Cortes, J.M.: Transiting exoplanet discovery using machine learning techniques: a survey (2020)
Jenkins, J.M., et al.: Overview of the kepler science processing pipeline. Astrophysi. J. Lett. 713(2), L87 (2010)
Jenkins, J.M., et al.: Auto-vetting transiting planet candidates identified by the kepler pipeline. Proc. Int. Astron. Union 8(S293), 94–99 (2012)
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)
McCauliff, S.D., et al.: Automatic classification of kepler planetary transit candidates. Astrophys. J. 806(1), 6 (2015)
Mislis, D., Bachelet, E., Alsubai, K., Bramich, D., Parley, N.: Sidra: a blind algorithm for signal detection in photometric surveys. Monthly Not. Roy. Astron. Soc. 455(1), 626–633 (2016)
Osborn, H.P., et al.: Rapid classification of tess planet candidates with convolutional neural networks. Astron. Astrophys. 633, A53 (2020)
Pearson, K.A., Palafox, L., Griffith, C.A.: Searching for exoplanets using artificial intelligence. Monthly Not. Roy. Astron. Soc. 474(1), 478–491 (2018)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Schanche, N., et al.: Machine-learning approaches to exoplanet transit detection and candidate validation in wide-field ground-based surveys. Monthly Not. Roy. Astron. Soc. 483(4), 5534–5547 (2019)
Seabold, S., Perktold, J.: Statsmodels: econometric and statistical modeling with python. In: Proceedings of the 9th Python in Science Conference, vol. 57, p. 61. Austin, TX (2010)
Shallue, C.J., Vanderburg, A.: Identifying exoplanets with deep learning: a five-planet resonant chain around kepler-80 and an eighth planet around kepler-90. Astron. J. 155(2), 94 (2018)
Virtanen, P., et al.: Scipy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17(3), 261–272 (2020)
Weiss, L.M., Petigura, E.A.: The kepler peas in a pod pattern is astrophysical. Astrophys. J. Lett. 893(1), L1 (2020)
Armstrong, D.J., Gamper, J., Damoulas, T.: Exoplanet validation with machine learning: 50 new validated kepler planets (2020)
Yu, L., et al.: Identifying exoplanets with deep learning. iii. automated triage and vetting of tess candidates. Astron. J. 158(1), 25 (2019)
Zucker, S., Giryes, R.: Shallow transits-deep learning. i. feasibility study of deep learning to detect periodic transits of exoplanets. Astron. J. 155(4), 147 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Braga, F.C., Roman, N.T., Falceta-Gonçalves, D. (2022). The Effects of Under and Over Sampling in Exoplanet Transit Identification with Low Signal-to-Noise Ratio Data. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13653. Springer, Cham. https://doi.org/10.1007/978-3-031-21686-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-21686-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21685-5
Online ISBN: 978-3-031-21686-2
eBook Packages: Computer ScienceComputer Science (R0)