Abstract
Advancements in space telescopes have opened new avenues for gathering vast amounts of data on exoplanet atmosphere spectra. However, accurately extracting chemical and physical properties from these spectra poses significant challenges due to the non-linear nature of the underlying physics.
This paper presents novel machine learning models developed by the AstroAI (AstroAI is hosted by the Center for Astrophysics | Harvard & Smithsonian) (https://astroai.cfa.harvard.edu/) team for the Ariel Data Challenge 2023 (https://www.ariel-datachallenge.space/), where one of the models secured the top position among 293 competitors. Leveraging Normalizing Flows, our models predict the posterior probability distribution of atmospheric parameters under different atmospheric assumptions.
Moreover, we introduce an alternative model that exhibits higher performance potential than the winning model, despite scoring lower in the challenge. These findings highlight the need to reevaluate the evaluation metric and prompt further exploration of more efficient and accurate approaches for exoplanet atmosphere spectra analysis.
Finally, we present recommendations to enhance the challenge and models, providing valuable insights for future applications on real observational data. These advancements pave the way for more effective and timely analysis of exoplanet atmospheric properties, advancing our understanding of these distant worlds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019)
Al-Refaie, A.F., Changeat, Q., Waldmann, I.P., Tinetti, G.: TauREx 3: a fast, dynamic, and extendable framework for retrievals. ApJ 917(1), 37 (2021). https://doi.org/10.3847/1538-4357/ac0252
Aubin, M., et al.: Exoplanet Atmospheric Parameter Retrieval: the AstroAI winning model for the 2023 Ariel Data Challenge using Normalizing Flows. in prep (2023)
Barber, D., Agakov, F.: The IM algorithm: a variational approach to information maximization. In: Proceedings of the 16th International Conference on Neural Information Processing Systems, pp. 201–208. NIPS’03, MIT Press, Cambridge, MA, USA (2003)
Barstow, J.K., Aigrain, S., Irwin, P.G.J., Sing, D.K.: A consistent retrieval analysis of 10 hot Jupiters observed in transmission. Astrophys. J. 834(1), 50 (2017). https://doi.org/10.3847/1538-4357/834/1/50
Boehm, S.: The normalizing flow network. https://siboehm.com/articles/19/normalizing-flow-network. Accessed 11 July 2023
Brogi, M., Line, M.R.: Retrieving temperatures and abundances of exoplanet atmospheres with high-resolution cross-correlation spectroscopy. Astron. J. 157(3), 114 (2019). https://doi.org/10.3847/1538-3881/aaffd3
Changeat, Q., Yip, K.H.: ESA-ariel data challenge neurIPS 2022: Introduction to exo-atmospheric studies and presentation of the atmospheric big challenge (ABC) database (2023)
Durkan, C., Bekasov, A., Murray, I., Papamakarios, G.: Neural spline flows (2019)
Excalidraw team: Excalidraw. https://excalidraw.com/
Fisher, C., Heng, K.: Retrieval analysis of 38 WFC3 transmission spectra and resolution of the normalization degeneracy. Mon. Not. R. Astron. Soc. 481(4), 4698–4727 (2018). https://doi.org/10.1093/mnras/sty2550
Foreman-Mackey, D.: corner.py: scatterplot matrices in python. J. Open Source Softw. 1(2), 24 (2016).https://doi.org/10.21105/joss.00024
Garrett, J.D.: garrettj403/SciencePlots (Sep 2021). https://doi.org/10.5281/zenodo.4106649
Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2
Huang, D., Bharti, A., Souza, A., Acerbi, L., Kaski, S.: Learning robust statistics for simulation-based inference under model misspecification (2023)
Hunter, J.D.: Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9(3), 90–95 (2007). https://doi.org/10.1109/MCSE.2007.55
Kobyzev, I., Prince, S.J., Brubaker, M.A.: Normalizing flows: an introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3964–3979 (2021).https://doi.org/10.1109/tpami.2020.2992934
Line, M.R., Parmentier, V.: The influence of nonuniform cloud cover on transit transmission spectra. Astrophys. J. 820(1), 78 (2016). https://doi.org/10.3847/0004-637X/820/1/78
Lueckmann, J.M., Boelts, J., Greenberg, D.S., Gonçalves, P.J., Macke, J.H.: Benchmarking simulation-based inference (2021)
Lustig-Yaeger, J., et al.: A JWST transmission spectrum of a nearby earth-sized exoplanet (2023)
MacDonald, R.J., Batalha, N.E.: A catalog of exoplanet atmospheric retrieval codes. Res. Notes AAS 7(3), 54 (2023).https://doi.org/10.3847/2515-5172/acc46a
MacDonald, R.J., Madhusudhan, N.: HD 209458b in new light: evidence of nitrogen chemistry, patchy clouds and sub-solar water. Mon. Not. R. Astron. Soc. 469(2), 1979–1996 (2017). https://doi.org/10.1093/mnras/stx804
Wes McKinney: data structures for statistical computing in Python. In: Stéfan van der Walt, Jarrod Millman (eds.) Proceedings of the 9th Python in Science Conference, pp. 56 – 61 (2010). https://doi.org/10.25080/Majora-92bf1922-00a
Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S., Lakshminarayanan, B.: Normalizing flows for probabilistic modeling and inference (2021)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library (2019)
Pinhas, A., Rackham, B.V., Madhusudhan, N., Apai, D.: Retrieval of planetary and stellar properties in transmission spectroscopy with AURA. Mon. Not. R. Astron. Soc. 480(4), 5314–5331 (2018). https://doi.org/10.1093/mnras/sty2209
Rezende, D.J., Mohamed, S.: Variational inference with normalizing flows (2016)
Rozet, F.: Zuko: Normalizing flows in PyTorch (oct 2022).https://doi.org/10.5281/zenodo.7625672, https://pypi.org/project/zuko
Seager, S., Sasselov, D.D.: Theoretical transmission spectra during extrasolar giant planet transits. ApJ 537(2), 916–921 (2000). https://doi.org/10.1086/309088
Sing, D.K., et al.: A continuum from clear to cloudy hot-Jupiter exoplanets without primordial water depletion. Nature 529(7584), 59–62 (2016). https://doi.org/10.1038/nature16068
Tinetti, G., et al.: A chemical survey of exoplanets with ARIEL. Exp. Astron. 46(1), 135–209 (2018). https://doi.org/10.1007/s10686-018-9598-x
Tsiaras, A., et al.: A population study of gaseous exoplanets. Astron. J. 155(4), 156 (2018). https://doi.org/10.3847/1538-3881/aaaf75
Vasist, M., Rozet, F., Absil, O., Mollière, P., Nasedkin, E., Louppe, G.: Neural posterior estimation for exoplanetary atmospheric retrieval. A &A 672, A147 (2023). https://doi.org/10.1051/0004-6361/202245263, https://doi.org/10.1051%2F0004-6361%2F202245263
Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
Wang, B., Leja, J., Villar, V.A., Speagle, J.S.: SBI: flexible, ultra-fast likelihood-free inference customized for astronomical applications. Astrophys. J. Lett. 952(1), L10 (2023).https://doi.org/10.3847/2041-8213/ace361, https://doi.org/10.3847%2F2041-8213%2Face361
Welbanks, L., Madhusudhan, N.: On degeneracies in retrievals of exoplanetary transmission spectra. Astron. J. 157(5), 206 (2019). https://doi.org/10.3847/1538-3881/ab14de
Yip, K.H., et al.: ESA-ariel data challenge neurIPS 2022: Inferring physical properties of exoplanets from next-generation telescopes (2022)
Acknowledgments
This team was put together, led and supervised by AstroAI at the Center for Astrophysics | Harvard & Smithsonian. Mayeul Aubin and Cecilia Garraffo were partially supported by the Director’s Office at the Center for Astrophysics | Harvard & Smithsonian. AstroAI thanks the Harvard Data Science Initiative for their support. Jeremy J. Drake was supported by NASA contract NAS8-03060 to the Chandra X-ray Center during the course of this research. Iouli Gordon, Robert Hargreaves, and Vladimir Makhnev were supported by NASA PDART grant 80NSSC20K1059 throughout this work.
We thank the NSF AI Institute for Artificial Intelligence and Fundamental Interactions (IAIFI) for providing computational resources through the Faculty of Arts and Science Research Cluster (FASRC) of Harvard.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Aubin, M. et al. (2025). Simulation-Based Inference for Exoplanet Atmospheric Retrieval: Insights from Winning the Ariel Data Challenge 2023 Using Normalizing Flows. In: Meo, R., Silvestri, F. (eds) Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2023. Communications in Computer and Information Science, vol 2137. Springer, Cham. https://doi.org/10.1007/978-3-031-74643-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-74643-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-74642-0
Online ISBN: 978-3-031-74643-7
eBook Packages: Artificial Intelligence (R0)