Abstract
This article aims to develop an accurate air quality prediction model to handle Jakarta's air pollution challenges. In this study, data from air quality monitoring stations’ conventional air pollution indexes was employed. In the research phase, data is explored, SMOTE is used to manage imbalances, and XGBoost is used to develop a model with the best parameters. The evaluation stage shows the model’s ability to predict air quality. With an accuracy rate of 99.516%, an F1-score of 99.528%, and a recall rate of 99.509%, the results were very astounding. These performance indicators show the model's exceptional ability to classify and predict air quality levels. Furthermore, this study investigates the significance of various variables in predicting air quality. A thorough evaluation of measures such as weight, gain, total gain, and cover indicators reveals the significance of numerous aspects. Even while SO2 helps predict air quality, the prevalence of PM2.5 on several measures reveals a significant influence. This study contributes to a better understanding of the complicated dynamics of air quality prediction by employing advanced analytical approaches and accurate models. This knowledge is useful in developing targeted solutions to address air pollution issues and promote healthier urban environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Domingo JL, Rovira J (2020) Effects of air pollutants on the transmission and severity of respiratory viral infections. Environ Res 187:109650. https://doi.org/10.1016/J.ENVRES.2020.109650
Liu M et al (2019) Population susceptibility differences and effects of air pollution on cardiovascular mortality: epidemiological evidence from a time-series study. Environ Sci Pollut Res 26(16):15943–15952. https://doi.org/10.1007/S11356-019-04960-2/FIGURES/1
Lestari P, Arrohman MK, Damayanti S, Klimont Z (2022) Emissions and spatial distribution of air pollutants from anthropogenic sources in Jakarta. Atmos Pollut Res 13(9):101521. https://doi.org/10.1016/J.APR.2022.101521
Mehmood I et al (2020) Carbon cycle in response to global warming. Environ Clim Plant Veg Growth 1–15. https://doi.org/10.1007/978-3-030-49732-3_1/COVER
Raihan A, Muhtasim DA, Pavel MI, Faruk O, Rahman M (2022) An econometric analysis of the potential emission reduction components in Indonesia. Cleaner Prod Lett 3:100008. https://doi.org/10.1016/J.CLPL.2022.100008
McGranahan G, Songsore J, Kjellén M (2021) Sustainability, poverty and urban environmental transitions. In: The Earthscan reader in sustainable cities, pp 107–133.. https://doi.org/10.4324/9781315800462-8
Abulude F, Abulude I, Oluwagbayide S, Afolayan S, Ishaku D (2021) Air quality index: case of one-day monitoring of 253 urban and suburban towns in Nigeria. Environ Sci Proc 8(1):4. https://doi.org/10.3390/ECAS2021-10342
Tiwari A et al (2019) Considerations for evaluating green infrastructure impacts in microscale and macroscale air pollution dispersion models. Sci Total Environ 672:410–426. https://doi.org/10.1016/J.SCITOTENV.2019.03.350
Masih A (2019) Machine learning algorithms in air quality modeling. Global J Environ Sci Manag 5(4):515–534. https://doi.org/10.22034/GJESM.2019.04.10
Wang H, Yilihamu Q, Yuan M, Bai H, Xu H, Wu J (2020) Prediction models of soil heavy metal(loid)s concentration for agricultural land in Dongli: a comparison of regression and random forest. Ecol Indic 119:106801. https://doi.org/10.1016/J.ECOLIND.2020.106801
Abdullah S, Ismail M, Ahmed AN, Abdullah AM (2019) Forecasting particulate matter concentration using linear and non-linear approaches for air quality decision support. Atmosphere 10(11):667. https://doi.org/10.3390/ATMOS10110667
Su X, An J, Zhang Y, Zhu P, Zhu B (2020) Prediction of ozone hourly concentrations by support vector machine and kernel extreme learning machine using wavelet transformation and partial least squares methods. Atmos Pollut Res 11(6):51–60. https://doi.org/10.1016/J.APR.2020.02.024
Ali M, Dewan A, Sahu AK, Taye MM (2023) Understanding of machine learning with deep learning: architectures, workflow, applications and future directions. Computers 12(5):91. https://doi.org/10.3390/COMPUTERS12050091
Otok BW, Suharsono A, Purhadi, Standsyah RE, Al Azies H (2022) Partitional clustering of underdeveloped area infrastructure with unsupervised learning approach: a case study in the Island of Java, Indonesia. J Reg City Plann 33(2):77–196. https://doi.org/10.5614/JPWK.2022.33.2.3
Ma J, Yu Z, Qu Y, Xu J, Cao Y (2020) Application of the XGBoost machine learning method in PM2.5 prediction: a case study of Shanghai. Aerosol Air Qual Res 20(1):128–138. https://doi.org/10.4209/AAQR.2019.08.0408
Pan B (2018) Application of XGBoost algorithm in hourly PM2.5 concentration prediction. IOP Conf Ser Earth Environ Sci 113(1):012127. https://doi.org/10.1088/1755-1315/113/1/012127
Jing H, Wang Y (2020) Research on urban air quality prediction based on ensemble learning of XGBoost. E3S Web of Conferences 165. https://doi.org/10.1051/E3SCONF/202016502014
Mishra A, Jalaluddin ZM, Mahamuni CV (2022) Air quality analysis and smog detection in smart cities for safer transport using machine learning (ML) regression models. In: Proceedings—2022 IEEE 11th international conference on communication systems and network technologies, CSNT 2022, pp 200–206. https://doi.org/10.1109/CSNT54456.2022.9787618
Zhou Y, Chang FJ, Chang LC, Kao IF, Wang YS (2019) Explore a deep learning multi-output neural network for regional multi-step-ahead air quality forecasts. J Clean Prod 209:134–145. https://doi.org/10.1016/J.JCLEPRO.2018.10.243
Wibowo W, Dewi Ratih I (2021) Classification of non-performing financing using logistic regression and synthetic minority over-sampling technique-nominal continuous (SMOTE-NC). Int J Adv Soft Comput Appl 13(3). https://doi.org/10.15849/IJASCA.211128.09
Tella A, Balogun AL (2022) GIS-based air quality modelling: spatial prediction of PM10 for Selangor State, Malaysia using machine learning algorithms. Environ Sci Pollut Res 29(57):86109–86125. https://doi.org/10.1007/S11356-021-16150-0/TABLES/5
Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM (2016) Extreme gradient boosting as a method for quantitative structure-activity relationships. J Chem Inf Model 56(12):2353–2360. https://doi.org/10.1021/ACS.JCIM.6B00591/SUPPL_FILE/CI6B00591_SI_033.ZIP
Chen T, He T (2023) xgboost: eXtreme Gradient Boosting
Nugroho A, Suhartanto H (2020) Hyper-parameter tuning based on random search for DenseNet optimization. In: 7th international conference on information technology, computer, and electrical engineering, ICITACEE 2020—Proceedings, pp 96–99. https://doi.org/10.1109/ICITACEE50144.2020.9239164
Sun L (2020) Application and improvement of Xgboost algorithm based on multiple parameter optimization strategy. In: Proceedings—2020 5th international conference on mechanical, control and computer engineering, ICMCCE 2020, pp 1822–1825. https://doi.org/10.1109/ICMCCE51767.2020.00400
Yang J, Jiang P, Nassar RUD, Suhail SA, Sufian M, Deifalla AF (2023) Experimental investigation and AI prediction modelling of ceramic waste powder concrete—an approach towards sustainable construction. J Market Res 23:3676–3696. https://doi.org/10.1016/J.JMRT.2023.02.024
Wibowo W, Amelia R, Octavia FA, Wilantari RN (2021) Classification using nonparametric logistic regression for predicting working status. AIP Conf Proc 2329(1). https://doi.org/10.1063/5.0043598/962507
Muljono, Andono PN, Wulandari SA, Al Azies H, Naufal M (2023) Tempo recognition of Kendhang instruments using hybrid feature extraction. J Appl Sci Eng 27(3):3177–2190. https://doi.org/10.6180/JASE.202403_27(3).0004
Ahmad M et al (2022) Extreme Gradient Boosting algorithm for predicting shear strengths of rockfill materials. Complexity. https://doi.org/10.1155/2022/9415863
Guo R, Zhao Z, Wang T, Liu G, Zhao J, Gao D (2020) Degradation state recognition of piston pump based on ICEEMDAN and XGBoost. Appl Sci 10(18):6593. https://doi.org/10.3390/APP10186593
Ren X, Guo H, Li S, Wang S, Li J (2017) A novel image classification method with CNN-XGBoost model. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 10431 LNCS, pp 378–390. https://doi.org/10.1007/978-3-319-64185-0_28/COVER
Chen J, Zhao F, Sun Y, Yin Y (2020) Improved XGBoost model based on genetic algorithm. Int J Comput Appl Technol 62(3):240–245. https://doi.org/10.1504/IJCAT.2020.106571
Liang Y et al (2019) Product marketing prediction based on XGboost and LightGBM algorithm. In: ACM international conference proceeding series, pp 150–15. https://doi.org/10.1145/3357254.3357290
Parsa M (2021) A data augmentation approach to XGboost-based mineral potential mapping: an example of carbonate-hosted ZnPb mineral systems of Western Iran. J Geochem Explor 228:106811. https://doi.org/10.1016/J.GEXPLO.2021.106811
Haumahu JP, Permana SDH, Yaddarabullah Y (2021) Fake news classification for Indonesian news using Extreme Gradient Boosting (XGBoost). IOP Conf Ser Mater Sci Eng 1098(5):052081. https://doi.org/10.1088/1757-899X/1098/5/052081
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wibowo, W., Al Azies, H., Wilujeng, S.A., Abdul-Rahman, S. (2024). Harnessing the XGBoost Ensemble for Intelligent Prediction and Identification of Factors with a High Impact on Air Quality: A Case Study of Urban Areas in Jakarta Province, Indonesia. In: Bee Wah, Y., Al-Jumeily OBE, D., Berry, M.W. (eds) Data Science and Emerging Technologies. DaSET 2023. Lecture Notes on Data Engineering and Communications Technologies, vol 191. Springer, Singapore. https://doi.org/10.1007/978-981-97-0293-0_24
Download citation
DOI: https://doi.org/10.1007/978-981-97-0293-0_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0292-3
Online ISBN: 978-981-97-0293-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)