Spatial Transfer Learning for Estimating PM $$_{2.5}$$ in Data-Poor Regions

Gupta, Shrey; Park, Yongbee; Bi, Jianzhao; Gupta, Suyash; Züfle, Andreas; Wildani, Avani; Liu, Yang

doi:10.1007/978-3-031-70378-2_24

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14949))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

771 Accesses
2 Altmetric

Abstract

Air pollution, especially particulate matter 2.5 (PM$_{2.5}$), is a pressing concern for public health and is difficult to estimate in developing countries (data-poor regions) due to a lack of ground sensors. Transfer learning models can be leveraged to solve this problem, as they use alternate data sources to gain knowledge (i.e., data from data-rich regions). However, current transfer learning methodologies do not account for dependencies between the source and the target domains. We recognize this transfer problem as spatial transfer learning and propose a new feature named Latent Dependency Factor (LDF) that captures spatial and semantic dependencies of both domains and is subsequently added to the feature spaces of the domains. We generate LDF using a novel two-stage autoencoder model that learns from clusters of similar source and target domain data. Our experiments show that transfer learning models using LDF have a $19.34\%$ improvement over the baselines. We additionally support our experiments with qualitative findings.

S. Gupta and Y. Park—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Exploration of transfer learning techniques for the prediction of PM₁₀

Article Open access 23 January 2025

Hybrid graph convolutional LSTM model for spatio-temporal air quality transfer learning

Article Open access 05 March 2025

Spatial process-based transfer learning for prediction problems

Article Open access 31 January 2025

References

Ayers, G., Keywood, M., Gras, J.: TEOM vs. manual gravimetric methods for determination of PM2.5 aerosol mass concentrations. Atmos. Environ. 33(22), 3717–3721 (1999)
Google Scholar
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)
Article Google Scholar
Bi, J., Belle, J.H., Wang, Y., Lyapustin, A.I., Wildani, A., Liu, Y.: Impacts of snow and cloud covers on satellite-derived PM2. 5 levels. Remote Sens. Environ. 221, 665–674 (2019)
Google Scholar
Bi, J., Wildani, A., Chang, H.H., Liu, Y.: Incorporating low-cost sensor measurements into high-resolution PM2. 5 modeling at a large spatial scale. Environ. Sci. Technol. 54(4), 2152–2162 (2020)
Google Scholar
Chen, S.: Beijing PM2.5. UCI Machine Learning Repository (2017). https://doi.org/10.24432/C5JS49
Daumé III, H.: Frustratingly easy domain adaptation. ACL 2007, 256 (2007)
Google Scholar
Department of Energy and Environmental Protection: Deep forecasts unhealthy levels of PM2.5 wednesday for the entire state from Canadian wildfire smoke (2023)
Google Scholar
Dey, S., Di Girolamo, L., van Donkelaar, A., Tripathi, S., Gupta, T., Mohan, M.: Variability of outdoor fine particulate (PM2.5) concentration in the Indian subcontinent: a remote sensing approach. Remote Sens. Environ. 127, 153–161 (2012)
Google Scholar
Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comput. Sci. 14, 241–258 (2020)
Article Google Scholar
Duan, L., Xu, D., Tsang, I.: Learning with augmented features for heterogeneous domain adaptation. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 667–674 (2012)
Google Scholar
El Haddad, I., Marchand, N., Wortham, H., et al.: Primary sources of PM 2.5 organic aerosol in an industrial Mediterranean city, Marseille. Atmos. Chem. Phys. 11(5), 2039–2058 (2011)
Google Scholar
Fong, I.H., Li, T., Fong, S., Wong, R.K., Tallon-Ballesteros, A.J.: Predicting concentration levels of air pollutants by transfer learning and recurrent neural network. Knowl.-Based Syst. 192, 105622 (2020)
Article Google Scholar
Garcke, J., Vanck, T.: Importance weighted inductive transfer learning for regression. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 466–481. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44848-9_30
Chapter Google Scholar
Gupta, S., Bi, J., Liu, Y., Wildani, A.: Boosting for regression transfer via importance sampling. Int. J. Data Sci. Anal. (2023)
Google Scholar
Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato (1999)
Google Scholar
Huang, J., Gretton, A., Borgwardt, K., Schölkopf, B., Smola, A.: Correcting sample selection bias by unlabeled data. In: Advances in Neural Information Processing Systems, vol. 19 (2006)
Google Scholar
Ito, K., Xue, N., Thurston, G.: Spatial variation of PM2.5 chemical species and source-apportioned mass concentrations in New York City. Atmos. Environ. 38(31), 5269–5282 (2004)
Google Scholar
Jaipuria, N., et al.: Deflating dataset bias using synthetic data augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 772–773 (2020)
Google Scholar
Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1 $\times $ 1 convolutions. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Kinney, P.L., Aggarwal, M., Northridge, M.E., Janssen, N.A., Shepard, P.: Airborne concentrations of PM (2.5) and diesel exhaust particles on Harlem sidewalks: a community-based pilot study. Environ. Health Perspect. 108(3), 213–218 (2000)
Google Scholar
Kumar, A., Naughton, J., Patel, J.M., Zhu, X.: To join or not to join? Thinking twice about joins before feature selection. In: Proceedings of the 2016 International Conference on Management of Data, pp. 19–34 (2016)
Google Scholar
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.: The similarity metric. IEEE Trans. Inf. Theory 50(12), 3250–3264 (2004)
Article MathSciNet Google Scholar
Liu, J., Chai, C., Luo, Y., Lou, Y., Feng, J., Tang, N.: Feature augmentation with reinforcement learning. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 3360–3372. IEEE (2022)
Google Scholar
Ma, J., Li, Z., Cheng, J.C., Ding, Y., Lin, C., Xu, Z.: Air quality prediction at new stations using spatially transferred bi-directional long short-term memory network. Sci. Total Environ. 705, 135771 (2020)
Article Google Scholar
Pan, S.J., Shen, D., Yang, Q., Kwok, J.T.: Transferring localization models across space. In: AAAI, pp. 1383–1388 (2008)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
Article Google Scholar
Park, Y., Kwon, B., Heo, J., Hu, X., Liu, Y., Moon, T.: Estimating PM2. 5 concentration of the conterminous united states via interpretable convolutional neural networks. Environ. Pollut. 256, 113395 (2020)
Google Scholar
Rosenstein, M.T., Marx, Z., Kaelbling, L.P., Dietterich, T.G.: To transfer or not to transfer. In: NIPS 2005 Workshop on Transfer Learning, vol. 898 (2005)
Google Scholar
Sato, M., Hansen, J.E., McCormick, M.P., Pollack, J.B.: Stratospheric aerosol optical depths, 1850–1990. J. Geophys. Res. Atmos. 98(D12), 22987–22994 (1993)
Article Google Scholar
Sharma, S., Chandra, M., Kota, S.H.: Health effects associated with PM 2.5: a systematic review. Curr. Pollut. Rep. 6, 345–367 (2020)
Google Scholar
Sugiyama, M., Nakajima, S., Kashima, H., Buenau, P., Kawanabe, M.: Direct importance estimation with model selection and its application to covariate shift adaptation. In: Advances in Neural Information Processing Systems, vol. 20 (2007)
Google Scholar
Tapia, V., Steenland, K., Vu, B., Liu, Y., Vásquez, V., Gonzales, G.F.: PM2.5 exposure on daily cardio-respiratory mortality in Lima, Peru, from 2010 to 2016. Environ. Health 19, 1–7 (2020)
Google Scholar
Vasiliev, I.R.: Visualization of spatial dependence: an elementary view of spatial autocorrelation. In: Practical Handbook of Spatial Statistics, pp. 17–30. CRC Press (2020)
Google Scholar
Veyseh, A.P.B., Van Nguyen, M., Min, B., Nguyen, T.H.: Augmenting open-domain event detection with synthetic data from GPT-2. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, 13–17 September 2021, Proceedings, Part III 21, pp. 644–660. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86523-8_39
Vobeckỳ, A., Hurych, D., Uřičář, M., Pérez, P., Sivic, J.: Artificial dummies for urban dataset augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2692–2700 (2021)
Google Scholar
Vu, B.N., et al.: Developing an advanced PM2.5 exposure model in Lima, Peru. Remote Sens. 11(6), 641 (2019)
Google Scholar
Yadav, K., Arora, V., Kumar, M., Tripathi, S.N., Motghare, V.M., Rajput, K.A.: Few-shot calibration of low-cost air pollution (PM$_{2.5}$) sensors using meta learning. IEEE Sens. Lett. 6(5), 1–4 (2022)
Google Scholar
Yao, B., Ling, G., Liu, F., Ge, M.F.: Multi-source variational mode transfer learning for enhanced PM2.5 concentration forecasting at data-limited monitoring stations. Expert Syst. Appl. 238, 121714 (2024)
Google Scholar
Zheng, V.W., Xiang, E.W., Yang, Q., Shen, D.: Transferring localization models over time. In: AAAI, vol. 2008, pp. 1421–1426 (2008)
Google Scholar
Zhang, T., He, W., Zheng, H., Cui, Y., Song, H., Fu, S.: Satellite-based ground PM2. 5 estimation using a gradient boosting decision tree. Chemosphere 268, 128801 (2021)
Google Scholar
Hu, X., et al.: Estimating PM$_{2.5}$ concentrations in the conterminous United States using the random forest approach. Environ. Sci. Technol. 51(12), 6936–6944 (2017)
Article Google Scholar
Loog, M.: Nearest neighbor-based importance weighting. In: 2012 IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Emory University, Atlanta, USA
Shrey Gupta, Andreas Züfle, Avani Wildani & Yang Liu
University of Washington, Seattle, USA
Jianzhao Bi
University of California, Berkeley, Berkeley, USA
Suyash Gupta
Ingkle, Cheonan-si, South Korea
Yongbee Park
Cloudflare, San Francisco, USA
Avani Wildani

Authors

Shrey Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Yongbee Park
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhao Bi
View author publications
You can also search for this author in PubMed Google Scholar
Suyash Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Züfle
View author publications
You can also search for this author in PubMed Google Scholar
Avani Wildani
View author publications
You can also search for this author in PubMed Google Scholar
Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shrey Gupta or Yang Liu .

Editor information

Editors and Affiliations

LTCI, Télécom Paris, Palaiseau Cedex, France
Albert Bifet
Faculty of Informatics, Vytautas Magnus University, Akademija, Lithuania
Tomas Krilavičius
Stockholm University, Kista, Sweden
Ioanna Miliou
School of Information Technology, Halmstad University, Halmstad, Sweden
Slawomir Nowaczyk

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 92 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, S. et al. (2024). Spatial Transfer Learning for Estimating PM$_{2.5}$ in Data-Poor Regions. In: Bifet, A., Krilavičius, T., Miliou, I., Nowaczyk, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14949. Springer, Cham. https://doi.org/10.1007/978-3-031-70378-2_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-70378-2_24
Published: 22 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70377-5
Online ISBN: 978-3-031-70378-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Spatial Transfer Learning for Estimating PM\(_{2.5}\) in Data-Poor Regions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploration of transfer learning techniques for the prediction of PM₁₀

Hybrid graph convolutional LSTM model for spatio-temporal air quality transfer learning

Spatial process-based transfer learning for prediction problems

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 92 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Spatial Transfer Learning for Estimating PM\(_{2.5}\) in Data-Poor Regions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Exploration of transfer learning techniques for the prediction of PM10

Hybrid graph convolutional LSTM model for spatio-temporal air quality transfer learning

Spatial process-based transfer learning for prediction problems

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 92 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Exploration of transfer learning techniques for the prediction of PM₁₀