Abstract
Air pollution, especially particulate matter 2.5 (PM\(_{2.5}\)), is a pressing concern for public health and is difficult to estimate in developing countries (data-poor regions) due to a lack of ground sensors. Transfer learning models can be leveraged to solve this problem, as they use alternate data sources to gain knowledge (i.e., data from data-rich regions). However, current transfer learning methodologies do not account for dependencies between the source and the target domains. We recognize this transfer problem as spatial transfer learning and propose a new feature named Latent Dependency Factor (LDF) that captures spatial and semantic dependencies of both domains and is subsequently added to the feature spaces of the domains. We generate LDF using a novel two-stage autoencoder model that learns from clusters of similar source and target domain data. Our experiments show that transfer learning models using LDF have a \(19.34\%\) improvement over the baselines. We additionally support our experiments with qualitative findings.
S. Gupta and Y. Park—Contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ayers, G., Keywood, M., Gras, J.: TEOM vs. manual gravimetric methods for determination of PM2.5 aerosol mass concentrations. Atmos. Environ. 33(22), 3717–3721 (1999)
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)
Bi, J., Belle, J.H., Wang, Y., Lyapustin, A.I., Wildani, A., Liu, Y.: Impacts of snow and cloud covers on satellite-derived PM2. 5 levels. Remote Sens. Environ. 221, 665–674 (2019)
Bi, J., Wildani, A., Chang, H.H., Liu, Y.: Incorporating low-cost sensor measurements into high-resolution PM2. 5 modeling at a large spatial scale. Environ. Sci. Technol. 54(4), 2152–2162 (2020)
Chen, S.: Beijing PM2.5. UCI Machine Learning Repository (2017). https://doi.org/10.24432/C5JS49
Daumé III, H.: Frustratingly easy domain adaptation. ACL 2007, 256 (2007)
Department of Energy and Environmental Protection: Deep forecasts unhealthy levels of PM2.5 wednesday for the entire state from Canadian wildfire smoke (2023)
Dey, S., Di Girolamo, L., van Donkelaar, A., Tripathi, S., Gupta, T., Mohan, M.: Variability of outdoor fine particulate (PM2.5) concentration in the Indian subcontinent: a remote sensing approach. Remote Sens. Environ. 127, 153–161 (2012)
Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q.: A survey on ensemble learning. Front. Comput. Sci. 14, 241–258 (2020)
Duan, L., Xu, D., Tsang, I.: Learning with augmented features for heterogeneous domain adaptation. In: Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 667–674 (2012)
El Haddad, I., Marchand, N., Wortham, H., et al.: Primary sources of PM 2.5 organic aerosol in an industrial Mediterranean city, Marseille. Atmos. Chem. Phys. 11(5), 2039–2058 (2011)
Fong, I.H., Li, T., Fong, S., Wong, R.K., Tallon-Ballesteros, A.J.: Predicting concentration levels of air pollutants by transfer learning and recurrent neural network. Knowl.-Based Syst. 192, 105622 (2020)
Garcke, J., Vanck, T.: Importance weighted inductive transfer learning for regression. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 466–481. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44848-9_30
Gupta, S., Bi, J., Liu, Y., Wildani, A.: Boosting for regression transfer via importance sampling. Int. J. Data Sci. Anal. (2023)
Hall, M.A.: Correlation-based feature selection for machine learning. Ph.D. thesis, The University of Waikato (1999)
Huang, J., Gretton, A., Borgwardt, K., Schölkopf, B., Smola, A.: Correcting sample selection bias by unlabeled data. In: Advances in Neural Information Processing Systems, vol. 19 (2006)
Ito, K., Xue, N., Thurston, G.: Spatial variation of PM2.5 chemical species and source-apportioned mass concentrations in New York City. Atmos. Environ. 38(31), 5269–5282 (2004)
Jaipuria, N., et al.: Deflating dataset bias using synthetic data augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 772–773 (2020)
Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible 1 \(\times \) 1 convolutions. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Kinney, P.L., Aggarwal, M., Northridge, M.E., Janssen, N.A., Shepard, P.: Airborne concentrations of PM (2.5) and diesel exhaust particles on Harlem sidewalks: a community-based pilot study. Environ. Health Perspect. 108(3), 213–218 (2000)
Kumar, A., Naughton, J., Patel, J.M., Zhu, X.: To join or not to join? Thinking twice about joins before feature selection. In: Proceedings of the 2016 International Conference on Management of Data, pp. 19–34 (2016)
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.: The similarity metric. IEEE Trans. Inf. Theory 50(12), 3250–3264 (2004)
Liu, J., Chai, C., Luo, Y., Lou, Y., Feng, J., Tang, N.: Feature augmentation with reinforcement learning. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE), pp. 3360–3372. IEEE (2022)
Ma, J., Li, Z., Cheng, J.C., Ding, Y., Lin, C., Xu, Z.: Air quality prediction at new stations using spatially transferred bi-directional long short-term memory network. Sci. Total Environ. 705, 135771 (2020)
Pan, S.J., Shen, D., Yang, Q., Kwok, J.T.: Transferring localization models across space. In: AAAI, pp. 1383–1388 (2008)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
Park, Y., Kwon, B., Heo, J., Hu, X., Liu, Y., Moon, T.: Estimating PM2. 5 concentration of the conterminous united states via interpretable convolutional neural networks. Environ. Pollut. 256, 113395 (2020)
Rosenstein, M.T., Marx, Z., Kaelbling, L.P., Dietterich, T.G.: To transfer or not to transfer. In: NIPS 2005 Workshop on Transfer Learning, vol. 898 (2005)
Sato, M., Hansen, J.E., McCormick, M.P., Pollack, J.B.: Stratospheric aerosol optical depths, 1850–1990. J. Geophys. Res. Atmos. 98(D12), 22987–22994 (1993)
Sharma, S., Chandra, M., Kota, S.H.: Health effects associated with PM 2.5: a systematic review. Curr. Pollut. Rep. 6, 345–367 (2020)
Sugiyama, M., Nakajima, S., Kashima, H., Buenau, P., Kawanabe, M.: Direct importance estimation with model selection and its application to covariate shift adaptation. In: Advances in Neural Information Processing Systems, vol. 20 (2007)
Tapia, V., Steenland, K., Vu, B., Liu, Y., Vásquez, V., Gonzales, G.F.: PM2.5 exposure on daily cardio-respiratory mortality in Lima, Peru, from 2010 to 2016. Environ. Health 19, 1–7 (2020)
Vasiliev, I.R.: Visualization of spatial dependence: an elementary view of spatial autocorrelation. In: Practical Handbook of Spatial Statistics, pp. 17–30. CRC Press (2020)
Veyseh, A.P.B., Van Nguyen, M., Min, B., Nguyen, T.H.: Augmenting open-domain event detection with synthetic data from GPT-2. In: Oliver, N., Pérez-Cruz, F., Kramer, S., Read, J., Lozano, J.A. (eds.) Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, 13–17 September 2021, Proceedings, Part III 21, pp. 644–660. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86523-8_39
Vobeckỳ, A., Hurych, D., Uřičář, M., Pérez, P., Sivic, J.: Artificial dummies for urban dataset augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2692–2700 (2021)
Vu, B.N., et al.: Developing an advanced PM2.5 exposure model in Lima, Peru. Remote Sens. 11(6), 641 (2019)
Yadav, K., Arora, V., Kumar, M., Tripathi, S.N., Motghare, V.M., Rajput, K.A.: Few-shot calibration of low-cost air pollution (PM\(_{2.5}\)) sensors using meta learning. IEEE Sens. Lett. 6(5), 1–4 (2022)
Yao, B., Ling, G., Liu, F., Ge, M.F.: Multi-source variational mode transfer learning for enhanced PM2.5 concentration forecasting at data-limited monitoring stations. Expert Syst. Appl. 238, 121714 (2024)
Zheng, V.W., Xiang, E.W., Yang, Q., Shen, D.: Transferring localization models over time. In: AAAI, vol. 2008, pp. 1421–1426 (2008)
Zhang, T., He, W., Zheng, H., Cui, Y., Song, H., Fu, S.: Satellite-based ground PM2. 5 estimation using a gradient boosting decision tree. Chemosphere 268, 128801 (2021)
Hu, X., et al.: Estimating PM\(_{2.5}\) concentrations in the conterminous United States using the random forest approach. Environ. Sci. Technol. 51(12), 6936–6944 (2017)
Loog, M.: Nearest neighbor-based importance weighting. In: 2012 IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6 (2012)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gupta, S. et al. (2024). Spatial Transfer Learning for Estimating PM\(_{2.5}\) in Data-Poor Regions. In: Bifet, A., Krilavičius, T., Miliou, I., Nowaczyk, S. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2024. Lecture Notes in Computer Science(), vol 14949. Springer, Cham. https://doi.org/10.1007/978-3-031-70378-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-70378-2_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70377-5
Online ISBN: 978-3-031-70378-2
eBook Packages: Computer ScienceComputer Science (R0)