Skip to main content

Harnessing the XGBoost Ensemble for Intelligent Prediction and Identification of Factors with a High Impact on Air Quality: A Case Study of Urban Areas in Jakarta Province, Indonesia

  • Conference paper
  • First Online:
Data Science and Emerging Technologies (DaSET 2023)

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 191))

Included in the following conference series:

  • 22 Accesses

Abstract

This article aims to develop an accurate air quality prediction model to handle Jakarta's air pollution challenges. In this study, data from air quality monitoring stations’ conventional air pollution indexes was employed. In the research phase, data is explored, SMOTE is used to manage imbalances, and XGBoost is used to develop a model with the best parameters. The evaluation stage shows the model’s ability to predict air quality. With an accuracy rate of 99.516%, an F1-score of 99.528%, and a recall rate of 99.509%, the results were very astounding. These performance indicators show the model's exceptional ability to classify and predict air quality levels. Furthermore, this study investigates the significance of various variables in predicting air quality. A thorough evaluation of measures such as weight, gain, total gain, and cover indicators reveals the significance of numerous aspects. Even while SO2 helps predict air quality, the prevalence of PM2.5 on several measures reveals a significant influence. This study contributes to a better understanding of the complicated dynamics of air quality prediction by employing advanced analytical approaches and accurate models. This knowledge is useful in developing targeted solutions to address air pollution issues and promote healthier urban environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Domingo JL, Rovira J (2020) Effects of air pollutants on the transmission and severity of respiratory viral infections. Environ Res 187:109650. https://doi.org/10.1016/J.ENVRES.2020.109650

    Article  Google Scholar 

  2. Liu M et al (2019) Population susceptibility differences and effects of air pollution on cardiovascular mortality: epidemiological evidence from a time-series study. Environ Sci Pollut Res 26(16):15943–15952. https://doi.org/10.1007/S11356-019-04960-2/FIGURES/1

    Article  Google Scholar 

  3. Lestari P, Arrohman MK, Damayanti S, Klimont Z (2022) Emissions and spatial distribution of air pollutants from anthropogenic sources in Jakarta. Atmos Pollut Res 13(9):101521. https://doi.org/10.1016/J.APR.2022.101521

    Article  Google Scholar 

  4. Mehmood I et al (2020) Carbon cycle in response to global warming. Environ Clim Plant Veg Growth 1–15. https://doi.org/10.1007/978-3-030-49732-3_1/COVER

  5. Raihan A, Muhtasim DA, Pavel MI, Faruk O, Rahman M (2022) An econometric analysis of the potential emission reduction components in Indonesia. Cleaner Prod Lett 3:100008. https://doi.org/10.1016/J.CLPL.2022.100008

    Article  Google Scholar 

  6. McGranahan G, Songsore J, Kjellén M (2021) Sustainability, poverty and urban environmental transitions. In: The Earthscan reader in sustainable cities, pp 107–133.. https://doi.org/10.4324/9781315800462-8

  7. Abulude F, Abulude I, Oluwagbayide S, Afolayan S, Ishaku D (2021) Air quality index: case of one-day monitoring of 253 urban and suburban towns in Nigeria. Environ Sci Proc 8(1):4. https://doi.org/10.3390/ECAS2021-10342

  8. Tiwari A et al (2019) Considerations for evaluating green infrastructure impacts in microscale and macroscale air pollution dispersion models. Sci Total Environ 672:410–426. https://doi.org/10.1016/J.SCITOTENV.2019.03.350

    Article  Google Scholar 

  9. Masih A (2019) Machine learning algorithms in air quality modeling. Global J Environ Sci Manag 5(4):515–534. https://doi.org/10.22034/GJESM.2019.04.10

    Article  MathSciNet  Google Scholar 

  10. Wang H, Yilihamu Q, Yuan M, Bai H, Xu H, Wu J (2020) Prediction models of soil heavy metal(loid)s concentration for agricultural land in Dongli: a comparison of regression and random forest. Ecol Indic 119:106801. https://doi.org/10.1016/J.ECOLIND.2020.106801

    Article  Google Scholar 

  11. Abdullah S, Ismail M, Ahmed AN, Abdullah AM (2019) Forecasting particulate matter concentration using linear and non-linear approaches for air quality decision support. Atmosphere 10(11):667. https://doi.org/10.3390/ATMOS10110667

  12. Su X, An J, Zhang Y, Zhu P, Zhu B (2020) Prediction of ozone hourly concentrations by support vector machine and kernel extreme learning machine using wavelet transformation and partial least squares methods. Atmos Pollut Res 11(6):51–60. https://doi.org/10.1016/J.APR.2020.02.024

    Article  Google Scholar 

  13. Ali M, Dewan A, Sahu AK, Taye MM (2023) Understanding of machine learning with deep learning: architectures, workflow, applications and future directions. Computers 12(5):91. https://doi.org/10.3390/COMPUTERS12050091

  14. Otok BW, Suharsono A, Purhadi, Standsyah RE, Al Azies H (2022) Partitional clustering of underdeveloped area infrastructure with unsupervised learning approach: a case study in the Island of Java, Indonesia. J Reg City Plann 33(2):77–196. https://doi.org/10.5614/JPWK.2022.33.2.3

  15. Ma J, Yu Z, Qu Y, Xu J, Cao Y (2020) Application of the XGBoost machine learning method in PM2.5 prediction: a case study of Shanghai. Aerosol Air Qual Res 20(1):128–138. https://doi.org/10.4209/AAQR.2019.08.0408

    Article  Google Scholar 

  16. Pan B (2018) Application of XGBoost algorithm in hourly PM2.5 concentration prediction. IOP Conf Ser Earth Environ Sci 113(1):012127. https://doi.org/10.1088/1755-1315/113/1/012127

  17. Jing H, Wang Y (2020) Research on urban air quality prediction based on ensemble learning of XGBoost. E3S Web of Conferences 165. https://doi.org/10.1051/E3SCONF/202016502014

  18. Mishra A, Jalaluddin ZM, Mahamuni CV (2022) Air quality analysis and smog detection in smart cities for safer transport using machine learning (ML) regression models. In: Proceedings—2022 IEEE 11th international conference on communication systems and network technologies, CSNT 2022, pp 200–206. https://doi.org/10.1109/CSNT54456.2022.9787618

  19. Zhou Y, Chang FJ, Chang LC, Kao IF, Wang YS (2019) Explore a deep learning multi-output neural network for regional multi-step-ahead air quality forecasts. J Clean Prod 209:134–145. https://doi.org/10.1016/J.JCLEPRO.2018.10.243

    Article  Google Scholar 

  20. Wibowo W, Dewi Ratih I (2021) Classification of non-performing financing using logistic regression and synthetic minority over-sampling technique-nominal continuous (SMOTE-NC). Int J Adv Soft Comput Appl 13(3). https://doi.org/10.15849/IJASCA.211128.09

  21. Tella A, Balogun AL (2022) GIS-based air quality modelling: spatial prediction of PM10 for Selangor State, Malaysia using machine learning algorithms. Environ Sci Pollut Res 29(57):86109–86125. https://doi.org/10.1007/S11356-021-16150-0/TABLES/5

    Article  Google Scholar 

  22. Sheridan RP, Wang WM, Liaw A, Ma J, Gifford EM (2016) Extreme gradient boosting as a method for quantitative structure-activity relationships. J Chem Inf Model 56(12):2353–2360. https://doi.org/10.1021/ACS.JCIM.6B00591/SUPPL_FILE/CI6B00591_SI_033.ZIP

    Article  Google Scholar 

  23. Chen T, He T (2023) xgboost: eXtreme Gradient Boosting

    Google Scholar 

  24. Nugroho A, Suhartanto H (2020) Hyper-parameter tuning based on random search for DenseNet optimization. In: 7th international conference on information technology, computer, and electrical engineering, ICITACEE 2020—Proceedings, pp 96–99. https://doi.org/10.1109/ICITACEE50144.2020.9239164

  25. Sun L (2020) Application and improvement of Xgboost algorithm based on multiple parameter optimization strategy. In: Proceedings—2020 5th international conference on mechanical, control and computer engineering, ICMCCE 2020, pp 1822–1825. https://doi.org/10.1109/ICMCCE51767.2020.00400

  26. Yang J, Jiang P, Nassar RUD, Suhail SA, Sufian M, Deifalla AF (2023) Experimental investigation and AI prediction modelling of ceramic waste powder concrete—an approach towards sustainable construction. J Market Res 23:3676–3696. https://doi.org/10.1016/J.JMRT.2023.02.024

    Article  Google Scholar 

  27. Wibowo W, Amelia R, Octavia FA, Wilantari RN (2021) Classification using nonparametric logistic regression for predicting working status. AIP Conf Proc 2329(1). https://doi.org/10.1063/5.0043598/962507

  28. Muljono, Andono PN, Wulandari SA, Al Azies H, Naufal M (2023) Tempo recognition of Kendhang instruments using hybrid feature extraction. J Appl Sci Eng 27(3):3177–2190. https://doi.org/10.6180/JASE.202403_27(3).0004

  29. Ahmad M et al (2022) Extreme Gradient Boosting algorithm for predicting shear strengths of rockfill materials. Complexity. https://doi.org/10.1155/2022/9415863

  30. Guo R, Zhao Z, Wang T, Liu G, Zhao J, Gao D (2020) Degradation state recognition of piston pump based on ICEEMDAN and XGBoost. Appl Sci 10(18):6593. https://doi.org/10.3390/APP10186593

  31. Ren X, Guo H, Li S, Wang S, Li J (2017) A novel image classification method with CNN-XGBoost model. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 10431 LNCS, pp 378–390. https://doi.org/10.1007/978-3-319-64185-0_28/COVER

  32. Chen J, Zhao F, Sun Y, Yin Y (2020) Improved XGBoost model based on genetic algorithm. Int J Comput Appl Technol 62(3):240–245. https://doi.org/10.1504/IJCAT.2020.106571

    Article  Google Scholar 

  33. Liang Y et al (2019) Product marketing prediction based on XGboost and LightGBM algorithm. In: ACM international conference proceeding series, pp 150–15. https://doi.org/10.1145/3357254.3357290

  34. Parsa M (2021) A data augmentation approach to XGboost-based mineral potential mapping: an example of carbonate-hosted ZnPb mineral systems of Western Iran. J Geochem Explor 228:106811. https://doi.org/10.1016/J.GEXPLO.2021.106811

    Article  Google Scholar 

  35. Haumahu JP, Permana SDH, Yaddarabullah Y (2021) Fake news classification for Indonesian news using Extreme Gradient Boosting (XGBoost). IOP Conf Ser Mater Sci Eng 1098(5):052081. https://doi.org/10.1088/1757-899X/1098/5/052081

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wahyu Wibowo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wibowo, W., Al Azies, H., Wilujeng, S.A., Abdul-Rahman, S. (2024). Harnessing the XGBoost Ensemble for Intelligent Prediction and Identification of Factors with a High Impact on Air Quality: A Case Study of Urban Areas in Jakarta Province, Indonesia. In: Bee Wah, Y., Al-Jumeily OBE, D., Berry, M.W. (eds) Data Science and Emerging Technologies. DaSET 2023. Lecture Notes on Data Engineering and Communications Technologies, vol 191. Springer, Singapore. https://doi.org/10.1007/978-981-97-0293-0_24

Download citation

Publish with us

Policies and ethics