Skip to main content

Assessing the Role of Temporal Information in Modelling Short-Term Air Pollution Effects Based on Traffic and Meteorological Conditions: A Case Study in Wrocław

  • Conference paper
  • First Online:
New Trends in Databases and Information Systems (ADBIS 2019)

Abstract

The temporal aspects often play an important role in information extraction. Given the peculiarities of temporal data, their management typically requires the use of dedicated algorithms, that make the overall data mining process complex, especially in those cases in which a dataset is characterised by both temporal and atemporal information. In such a situation, typical solutions include combining different algorithms for the independent handling of the temporal and atemporal parts, or relying on an encoding of temporal data that makes it possible to apply classical machine learning algorithms (such as with the use of lagged variables). This work investigates the management of temporal information in an environmental problem, that is, assessing the relationships between concentrations of the pollutants \(NO_2\), \(NO_X\), and \(PM_{2.5}\), and a set of independent variables that include meteorological conditions and traffic flow in the city of Wrocław (Poland). We show that taking into account temporal information by means of lagged variables leads to better results with respect to atemporal models. More importantly, an even higher performance may be achieved by making use of a recently proposed decision tree model, called J48SS, that is capable of handling heterogeneous datasets consisting of static (i.e., categorical and numerical) attributes, as well as sequential and time series data. Such an outcome highlights the importance of proper temporal data modelling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use the term timestamp to refer to the kind of variables that identify a specific time instant, to distinguish them from those we will consider to be proper temporal features, i.e., the ones that encode historical values.

  2. 2.

    Note that the European directives identify the two relevant values of 0 and 200 for \(NO_2\) concentrations. However, we chose here to rely on different interval boundaries, since in the considered data there are just 4 instances with values over 200. Although this is a rather arbitrary choice, it does not compromise the goal of the work, namely, assessing the role played by temporal information in the overall classification task.

References

  1. European Union air quality standards. http://ec.europa.eu/environment/air/quality/standards.htm. Accessed 21 May 2019

  2. NOx level objectives. http://www.icopal-noxite.co.uk/nox-problem/nox-level-objectives.aspx. Accessed 21 May 2019

  3. Scikit-learn’s compute_class_weight function. https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html. Accessed: 22 May 2019

  4. Scikit-learn’s RandomForestClassifier. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html. Accessed 22 May 2019

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    MATH  Google Scholar 

  6. Brunello, A., Marzano, E., Montanari, A., Sciavicco, G.: J48SS: a novel decision tree approach for the handling of sequential and time series data. Computers 8(1), 21 (2019)

    Article  Google Scholar 

  7. Deters, J.K., Zalakeviciute, R., Gonzalez, M., Rybarczyk, Y.: Modeling \(PM_{2.5}\) urban pollution using machine learning and selected meteorological parameters. J. Electr. Comput. Eng. 2017, 5106045:1–5106045:14 (2017)

    Google Scholar 

  8. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Proceedings of the 15th International Conference on Machine Learning (ICML), pp. 144–151. Morgan Kaufmann (1998)

    Google Scholar 

  9. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)

    Article  Google Scholar 

  10. Kamińska, J.A.: The use of random forests in modelling short-term air pollution effects based on traffic and meteorological conditions: A case study in Wrocław. J. Environ. Manag. 217, 164–174 (2018)

    Article  Google Scholar 

  11. Mbarak, A., Yetis, Y., Jamshidi, M.: Data - based pollution forecasting via machine learning: case of Northwest Texas. In: Proceedings of the 2018 World Automation Congress (WAC), pp. 1–6 (2018)

    Google Scholar 

  12. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)

    Google Scholar 

  13. Sasaki, Y.: The truth of the F-measure. Teach Tutor Mater 1(5), 1–5 (2007)

    Google Scholar 

  14. Shang, Z., Deng, T., He, J., Duan, X.: A novel model for hourly \(PM_{2.5}\) concentration prediction based on CART and EELM. Sci. Total Environ. 651, 3043–3052 (2019)

    Google Scholar 

  15. Wilkins, A.S.: To lag or not to lag?: Re-evaluating the use of lagged dependent variables in regression analysis. Polit. Sci. Res. Methods 6(2), 393–411 (2018)

    Article  Google Scholar 

  16. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

  17. Xie, J., et al.: The characteristics of hourly wind field and its impacts on air quality in the pearl river delta region during 2013–2017. Atmos. Res. 227, 112–124 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Brunello .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Brunello, A., Kamińska, J., Marzano, E., Montanari, A., Sciavicco, G., Turek, T. (2019). Assessing the Role of Temporal Information in Modelling Short-Term Air Pollution Effects Based on Traffic and Meteorological Conditions: A Case Study in Wrocław. In: Welzer, T., et al. New Trends in Databases and Information Systems. ADBIS 2019. Communications in Computer and Information Science, vol 1064. Springer, Cham. https://doi.org/10.1007/978-3-030-30278-8_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30278-8_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30277-1

  • Online ISBN: 978-3-030-30278-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics