Assessing the Role of Temporal Information in Modelling Short-Term Air Pollution Effects Based on Traffic and Meteorological Conditions: A Case Study in Wrocław

Brunello, Andrea; Kamińska, Joanna; Marzano, Enrico; Montanari, Angelo; Sciavicco, Guido; Turek, Tomasz

doi:10.1007/978-3-030-30278-8_45

Andrea Brunello¹⁷,
Joanna Kamińska²⁰,
Enrico Marzano¹⁹,
Angelo Montanari¹⁷,
Guido Sciavicco¹⁸ &
…
Tomasz Turek²¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1064))

Included in the following conference series:

European Conference on Advances in Databases and Information Systems

1211 Accesses
3 Citations

Abstract

The temporal aspects often play an important role in information extraction. Given the peculiarities of temporal data, their management typically requires the use of dedicated algorithms, that make the overall data mining process complex, especially in those cases in which a dataset is characterised by both temporal and atemporal information. In such a situation, typical solutions include combining different algorithms for the independent handling of the temporal and atemporal parts, or relying on an encoding of temporal data that makes it possible to apply classical machine learning algorithms (such as with the use of lagged variables). This work investigates the management of temporal information in an environmental problem, that is, assessing the relationships between concentrations of the pollutants \(NO_2\), \(NO_X\), and \(PM_{2.5}\), and a set of independent variables that include meteorological conditions and traffic flow in the city of Wrocław (Poland). We show that taking into account temporal information by means of lagged variables leads to better results with respect to atemporal models. More importantly, an even higher performance may be achieved by making use of a recently proposed decision tree model, called J48SS, that is capable of handling heterogeneous datasets consisting of static (i.e., categorical and numerical) attributes, as well as sequential and time series data. Such an outcome highlights the importance of proper temporal data modelling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use the term timestamp to refer to the kind of variables that identify a specific time instant, to distinguish them from those we will consider to be proper temporal features, i.e., the ones that encode historical values.
2.
Note that the European directives identify the two relevant values of 0 and 200 for \(NO_2\) concentrations. However, we chose here to rely on different interval boundaries, since in the considered data there are just 4 instances with values over 200. Although this is a rather arbitrary choice, it does not compromise the goal of the work, namely, assessing the role played by temporal information in the overall classification task.

References

European Union air quality standards. http://ec.europa.eu/environment/air/quality/standards.htm. Accessed 21 May 2019
NOx level objectives. http://www.icopal-noxite.co.uk/nox-problem/nox-level-objectives.aspx. Accessed 21 May 2019
Scikit-learn’s compute_class_weight function. https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html. Accessed: 22 May 2019
Scikit-learn’s RandomForestClassifier. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html. Accessed 22 May 2019
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
MATH Google Scholar
Brunello, A., Marzano, E., Montanari, A., Sciavicco, G.: J48SS: a novel decision tree approach for the handling of sequential and time series data. Computers 8(1), 21 (2019)
Article Google Scholar
Deters, J.K., Zalakeviciute, R., Gonzalez, M., Rybarczyk, Y.: Modeling \(PM_{2.5}\) urban pollution using machine learning and selected meteorological parameters. J. Electr. Comput. Eng. 2017, 5106045:1–5106045:14 (2017)
Google Scholar
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Proceedings of the 15th International Conference on Machine Learning (ICML), pp. 144–151. Morgan Kaufmann (1998)
Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)
Article Google Scholar
Kamińska, J.A.: The use of random forests in modelling short-term air pollution effects based on traffic and meteorological conditions: A case study in Wrocław. J. Environ. Manag. 217, 164–174 (2018)
Article Google Scholar
Mbarak, A., Yetis, Y., Jamshidi, M.: Data - based pollution forecasting via machine learning: case of Northwest Texas. In: Proceedings of the 2018 World Automation Congress (WAC), pp. 1–6 (2018)
Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Sasaki, Y.: The truth of the F-measure. Teach Tutor Mater 1(5), 1–5 (2007)
Google Scholar
Shang, Z., Deng, T., He, J., Duan, X.: A novel model for hourly \(PM_{2.5}\) concentration prediction based on CART and EELM. Sci. Total Environ. 651, 3043–3052 (2019)
Google Scholar
Wilkins, A.S.: To lag or not to lag?: Re-evaluating the use of lagged dependent variables in regression analysis. Polit. Sci. Res. Methods 6(2), 393–411 (2018)
Article Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2016)
Google Scholar
Xie, J., et al.: The characteristics of hourly wind field and its impacts on air quality in the pearl river delta region during 2013–2017. Atmos. Res. 227, 112–124 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Udine, Via delle Scienze 206, 33100, Udine, Italy
Andrea Brunello & Angelo Montanari
University of Ferrara, Via Giuseppe Saragat 1, 44122, Ferrara, Italy
Guido Sciavicco
Gap Srlu, Via Tricesimo 246, 33100, Udine, Italy
Enrico Marzano
Wrocław University of Environmental and Life Sciences, ul. Grunwaldzka 53, 50-357, Wrocław, Poland
Joanna Kamińska
Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
Tomasz Turek

Authors

Andrea Brunello
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Kamińska
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Marzano
View author publications
You can also search for this author in PubMed Google Scholar
Angelo Montanari
View author publications
You can also search for this author in PubMed Google Scholar
Guido Sciavicco
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Turek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Brunello .

Editor information

Editors and Affiliations

University of Maribor, Maribor, Slovenia
Tatjana Welzer
Alpen Adria University Klagenfurt, Klagenfurt am Wörthersee, Austria
Johann Eder
University of Maribor, Maribor, Slovenia
Vili Podgorelec
Poznan University of Technology, Poznan, Poland
Robert Wrembel
University of Novi Sad, Novi Sad, Serbia
Mirjana Ivanović
Free University of Bozen-Bolzano, Bolzano, Italy
Johann Gamper
Poznań University of Technology, Poznan, Poland
Mikoƚaj Morzy
University of Thessaly, Lamia, Greece
Theodoros Tzouramanis
Université Lumière Lyon 2, Lyon, France
Jérôme Darmont
University of Maribor, Maribor, Slovenia
Aida Kamišalić Latifić

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brunello, A., Kamińska, J., Marzano, E., Montanari, A., Sciavicco, G., Turek, T. (2019). Assessing the Role of Temporal Information in Modelling Short-Term Air Pollution Effects Based on Traffic and Meteorological Conditions: A Case Study in Wrocław. In: Welzer, T., et al. New Trends in Databases and Information Systems. ADBIS 2019. Communications in Computer and Information Science, vol 1064. Springer, Cham. https://doi.org/10.1007/978-3-030-30278-8_45

Download citation

DOI: https://doi.org/10.1007/978-3-030-30278-8_45
Published: 01 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30277-1
Online ISBN: 978-3-030-30278-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics