Skip to main content
Log in

Autoregressive model-based fuzzy clustering and its application for detecting information redundancy in air pollution monitoring networks

  • Original Paper
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Fuzzy clustering enables the simultaneous membership of objects in two or more clusters. This is particularly pertinent where time series are concerned, because very often patterns of time series change over time. Thus, a time series might belong to different clusters over different periods of time, in which case, crisp clustering is unable to capture this multi-cluster membership. In this paper, we adopt a Fuzzy C-Medoids approach to clustering time series based on autoregressive estimates of models fitted to the time series. We illustrate very good performance of this approach in a range of simulation studies. By means of two applications, we also show the usefulness of this clustering approach in the air pollution monitoring, by considering air pollution time series, i.e., CO time series, CO2 time series and NO time series monitored on world and urban scales. In particular, we show that, by considering in the clustering process, the autoregressive representation of these air pollution time series, we are able to detect possible information redundancy in the monitoring networks and then, decreasing the number of monitoring stations, to reduce the monitoring costs and then to increase the monitoring efficiency of the networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29

Similar content being viewed by others

Notes

  1. http://gaw.kishou.go.jp/cgi-bin/wdcgg/catalogue.cgi.

  2. http://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions.

  3. http://earthtrends.wri.org/text/climate-atmosphere/variable-814.html.

References

  • Alaeddini A, Ghazanfari M, Nayeri MA (2009) A hybrid fuzzy-statistical clustering approach for estimating the time of changes in fixed and variable sampling control charts. Inf Sci 179:1769–1784

    Article  Google Scholar 

  • Alonso AM, Berrendero JR, Hernández A, Justel A (2006) Time series clustering based on forecast density. Comput Stat Data Anal 51(2):762–776

    Article  MATH  Google Scholar 

  • Anttila P, Tuovinen J-P (2010) Trends of primary and secondary pollutant concentrations in Finland in 1994–2007. Atmos Environ 44:30–41

    Article  Google Scholar 

  • Ausloos MA, Gligor M (2008) Cluster expansion method for evolving weighted networks having vector-like nodes. Acta Phys Polon A 114:491–499

    Google Scholar 

  • Ausloos M, Lambiotte R (2007) Clusters or networks of economies? A macroeconomy study through gross domestic product. Physica A 382:16–21

    Article  Google Scholar 

  • Aznarte JLM, Benítez Sánchez JM, Lugilde DN, de Linares Fernández C, Díaz de la Guardia C, Sánchez FA (2007) Forecasting airborne pollen concentration time series with neural and nuro-fuzzy models. Exp Syst Appl 32:1218–1225

    Article  Google Scholar 

  • Barioni MCN, Razente HL, Traina AJM, Traina C (2008) Accelerating k-medoid-based algorithms through metric access methods. J Syst Softw 8:343–355

    Article  Google Scholar 

  • Basalto N, Bellotti R, De Carlo F, Facchi P, Pantaleo E, Pascazio S (2007) Hausdorff clustering of financial time series. Physica A 379:635–644

    Google Scholar 

  • Bengtsson T, Cavanaugh JE (2008) State-space discrimination and clustering of atmospheric time series data based on Kullback information measures. Environmetrics 19:103–121

    Article  MathSciNet  Google Scholar 

  • Bohm M, McCune B, Vandetta T (1991) Diurnal curves of tropospheric ozone in the western United States. Atmos Environ 25:1570–1590

    Google Scholar 

  • Caiado J, Crato N, Peña D (2006) A periodogram-based metric for time series classification. Comput Stat Data Anal 50:2668–2684

    Article  MATH  Google Scholar 

  • Caiado J, Crato N, Peña D (2009) Comparison of times series with unequal length in the frequency domain. Commun Stat Simul Comput 38(3):527–540

    Article  MATH  Google Scholar 

  • Campello RJGB, Hruschka ER (2006) A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Syst 157:2858–2875

    Article  MathSciNet  MATH  Google Scholar 

  • Coppi R, D’Urso P (2003) Three-way fuzzy clustering models for LR fuzzy time trajectories. Comput Stat Data Anal 43:149–177

    Article  MathSciNet  MATH  Google Scholar 

  • Coppi R, D’Urso P (2006) Fuzzy unsupervised classification of multivariate time trajectories with the Shannon entropy regularization. Computational Statistics & Data Analysis 50(6):1452–1477

    Article  MathSciNet  MATH  Google Scholar 

  • Coppi R, D’Urso P, Giordani P (2004) Informational paradigm and entropy-based dynamic clustering in a complete fuzzy framework. In: Angeles Gil M, Lopez-Diaz MC, Grzegorzewski P (eds) Soft methodology in random information systems. Springer, Heidelberg, pp 463–470

  • Coppi R, D’Urso P, Giordani P (2006a) Fuzzy C-Medoids Clustering Models for time-varying data. In: Bouchon-Meunier B, Coletti G, Yager RR (eds) Modern information processing: from theory to applications. Elsevier, Amsterdam, pp 195–206

  • Coppi R, D’Urso P, Giordani P (2006b) Fuzzy K-Medoids Clustering Models for fuzzy multivariate time trajectories. In: Rizzi A, Vichi M (eds) COMPSTAT 2006, Rome, 28 August–1 September 2006. Proceeding in computational statistics.Physica-Verlag, pp 17–29

  • Coppi R, D’Urso P, Giordani P (2010) A fuzzy clustering model for multivariate spatial time series. J Classif 27:54–88

    Article  MathSciNet  Google Scholar 

  • Corduas M, Piccolo D (2008) Time series clustering and classification by the autoregressive metric. Comput Stat Data Anal 52(4):1860–1872

    Article  MathSciNet  MATH  Google Scholar 

  • Costanzo GD (2001) A Constrained k-means clustering algorithm for classifying spatial units. Stat Methods Appl 10:237–256

    Article  MATH  Google Scholar 

  • D’Urso P (2004) Fuzzy C-means clustering models for multivariate time-varying data: different approaches. Int J Uncertain Fuzziness Knowledge-Based Syst 12(3):287–326

    Article  MathSciNet  MATH  Google Scholar 

  • D’Urso P (2005) Fuzzy clustering for data time arrays with inlier and outlier time trajectories. IEEE Trans Fuzzy Syst 13(5):583–604

    Article  Google Scholar 

  • D’Urso P, Giordani P (2006) A weighted fuzzy c-means clustering model for fuzzy data. Comput Stat Data Anal 50(6):1496–1523

    Article  MathSciNet  MATH  Google Scholar 

  • D’Urso P, Maharaj EA (2009) Autocorrelation-based fuzzy clustering of time series. Fuzzy Sets Syst 160:3565–3589

    Article  MathSciNet  Google Scholar 

  • Dembélé K (2003) Fuzzy C-means method for clustering microarray data. Bioinformatics 19:973–980

    Article  Google Scholar 

  • Dorling SR, Davies TD, Pierce CE (1992) Cluster analysis: a technique for estimating the synoptic meteorological controls on air and precipitation chemistry—method and applications. Atmos Environ 26:2575–2581

    Article  Google Scholar 

  • Dose C, Cincotti S (2005) Clustering of financial time series with application to index and enhanced-index tracking portfolio. Physica A 355:145–151

    Article  MathSciNet  Google Scholar 

  • Everitt BS, Landau S, Leese M (2001) Cluster analysis, 4th edn. Arnold Press, London

    MATH  Google Scholar 

  • Fadili MJ, Ruan S, Bloyet D, Mazoyer B (2000) A multistep unsupervised fuzzy clustering analysis of fMRI time series. Hum Brain Mapping 10(4):160–178

    Article  Google Scholar 

  • Fu KS (1982) Syntactic pattern recognition and applications. Academic Press, San Diego

    MATH  Google Scholar 

  • Gabusi V, Volta M (2005) A methodology for seasonal photochemical model simulation assessment. J Environ Pollut 24:11–21

    Article  Google Scholar 

  • Garcia-Escudero LA, Gordaliza A (2005) A proposal for robust curve clustering. J Classif 22:185–201

    Article  MathSciNet  Google Scholar 

  • Gligor M, Ausloos MA (2008) Convergence and cluster structures in EU area according to fluctuations in macroeconomic indices. J Econ Integration 23:297–330

    Google Scholar 

  • Gramsh E, Cereceda-Balic F, Oyola P, von Baer D (2006) Examination of pollution trends in Santiago de Chile with cluster analysis of PM10 and Ozone data. Atmos Environ 40:5464–5475

    Article  Google Scholar 

  • Hassanzadeh S, Hosseinibalam F, Alizadeh R (2009) Statistical models and time series forecasting of sulfur dioxide: a case study Tehran. Environ Monit Assess 155:149–155

    Article  Google Scholar 

  • Heiser WJ, Groenen PJF (1997) Cluster differences scaling with a within-clusters loss component and a fuzzy successive approximation strategy to avoid local minima. Psychometrika 62:63–83

    Article  MathSciNet  MATH  Google Scholar 

  • Hwang H, De Sarbo WS, Takane Y (2007) Fuzzy clusterwise generalized structured component analysis. Psychometrika 72:181–198

    Article  MathSciNet  MATH  Google Scholar 

  • Ibarra-Berastegi G, Sáenz J, Ezcurra A, Ganzedo U, de Argandoña JD, Errasti I, Fernandez-Ferrero A, Polanco-Martínez J (2009) Assessing spatial variability of SO2 field as detected by an air quality network using self-organizing maps, cluster, and principal component analysis. Atmos Environ 43:3829–3836

    Article  Google Scholar 

  • Ignaccolo R, Ghigo S, Giovenali E (2008) Analysis of monitoring networks by functional clustering. Environmetrics 62:672–686

    Article  MathSciNet  Google Scholar 

  • Ionescu A, Candau Y, Mayer E, Colda I (2000) Analytical determination and classification of pollutant concentration fields using air pollution monitoring network data—methodology and application in the Paris area, during episodes with peak nitrogen dioxide levels. Environ Model Softw 15:565–573

    Article  Google Scholar 

  • Jaimes M, Muñoz Retama A, Ramos R, Paramo VH. Redundancy analysis for the Mexico City air monitoring network: the case of SO2. http://files.abstractsonline.com/CTRL/51/8/223/401/82C/47F/E9B/3CA/C0C/4C9/F43/1A/a1172_1.doc

  • James G, Sugar C (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98(462):397–408

    Article  MathSciNet  MATH  Google Scholar 

  • Kakizawa Y, Shumway H, Taniguchi M (1998) Discriminant and clustering for multivariate time series. J Am Stat Assoc 93:328–340

    Article  MathSciNet  MATH  Google Scholar 

  • Kamdar T, Joshi A (2000) On creating adaptive Web servers using Weblog Mining. Technical report TR-CS-00-05, Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County

  • Karaca F, Camci F (2010) Distant source contributions to PM10 profile evaluated by SOM based cluster analysis of air mass trajectory sets. Atmos Environ 44:892–899

    Article  Google Scholar 

  • Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York

  • Krishnapuram R, Joshi A, Nasraoui O, Yi L (2001) Low-complexity fuzzy relational clustering algorithms for web mining. IEEE IEEE Trans Fuzzy Syst 9(4):595–607

    Article  Google Scholar 

  • Krishnapuram R, Joshi A, Yi L (1999) A fuzzy relative of the k-Medoids algorithm with application to Web document and Snippet clustering. IEEE international fuzzy systems conference (FUZZIEEE99), Seoul, pp 1281–1286

  • Kumar U, Jain VK (2010) ARIMA forecasting of ambient air pollutants (O3, NO, NO2 and CO). Stoch Environ Res Risk Assess 24:751–760

    Article  Google Scholar 

  • Kwon SH (1998) Cluster validity index for fuzzy clustering. Electron Lett 34(22):2176–2177

    Article  Google Scholar 

  • Lau J, Hung WT, Cheung CS (2009) Interpretation of air quality in relation to monitoring station’s surroundings. Atmos Environ 43:769–777

    Article  Google Scholar 

  • Lavecchia C, Angelino E, Bedogni M, Brevetti E, Gualdi R, Lanzani G, Musitelli A, Valentini M (1996) The ozone patterns in the aerological basin of Milan (Italy). Environ Softw 11:73–80

    Article  Google Scholar 

  • Liao TW (2005) Clustering of time series data—a survey. Pattern Recognit 38:1857–1874

    Article  MATH  Google Scholar 

  • Liao TW (2007) A clustering procedure for exploratory mining of vector time series. Pattern Recognit 40:2550–2562

    Article  MATH  Google Scholar 

  • Lu W-Z, He H-D, Dong L-Y (2011) Performance assessment of air quality monitoring networks using principal component analysis and cluster analysis. Build Environ 46:577–583

    Article  Google Scholar 

  • Ludwig FL, Jiang J, Chen J (1995) Classification of ozone and heather patterns associated with high ozone concentrations in the San Francisco and Monterey Bay areas. Atmos Environ 29:2915–2928

    Article  Google Scholar 

  • Maharaj EA (1996) A significance test for classifying ARMA models. J Stat Comput Simul 54:305–331

    Article  MathSciNet  MATH  Google Scholar 

  • Maharaj EA (1999) Comparison and classification of stationary multivariate time series. Pattern Recognit 32:1129–1138

    Article  Google Scholar 

  • Maharaj EA (2000) Clusters of time series. J Classif 17:297–314

    Article  MathSciNet  MATH  Google Scholar 

  • Maharaj EA (2002) Comparison of non-stationary time series in the frequency domain. Comput Stat Data Anal 40:131–141

    Article  MathSciNet  MATH  Google Scholar 

  • Maharaj EA, Alonso AM (2007) Discrimination of locally stationary time series using wavelets. Comput Stat Data Anal 52(2):879–895

    Article  MathSciNet  MATH  Google Scholar 

  • Maharaj EA, D’Urso P (2010) Coherence-based approach for the pattern recognition of time series. Physica A 389:3516–3537

    Article  MathSciNet  Google Scholar 

  • Maharaj EA, D’Urso P (2011) Fuzzy clustering of time series in the frequency domain. Inf Sci 181:1187–1211

    Article  MATH  Google Scholar 

  • Maharaj EA, D’Urso P, Galagedera DUA (2010) Wavelets-based fuzzy clustering of time series. J Classif 27:231–275

    Article  MathSciNet  Google Scholar 

  • McBratney AB, Moore AW (1985) Application of fuzzy sets to climatic classification. Agric For Meteorol 35:165–185

    Article  Google Scholar 

  • Miranda J, Cahill TA, Morales J, Roberto AF, Flores MJ, Diaz RV (1994) Determination of elemental concentrations in atmospheric aerosols in Mexico City using proton induced x-ray emission, proton elastic scattering, and laser absorption. Atmos Environ 28:2299–2306

    Article  Google Scholar 

  • Miskiewicz J, Ausloos M (2008) Correlation measure to detect time series distances, whence economy globalization. Physica A 387:6584–6594

    Article  Google Scholar 

  • Mitra S (2004) An evolutionary rough partitive clustering. Pattern Recognit Lett 25:1439–1449

    Article  Google Scholar 

  • Morlini I (2007) Searching for structure in measurements of air pollutant concentration. Environmetrics 18:823–840

    Article  MathSciNet  Google Scholar 

  • Ortuño C, Jaimes M, Muñoz R, Ramos R, Paramo VH. Redundancy analysis for the Mexico City air monitoring network: the case of CO. http://files.abstractsonline.com/CTRL/2D/A/06E/7F9/022/434/F8D/F8C/2D3/E4B/F3E/66/a1177_1.doc

  • Otranto E (2008) Clustering heteroskedastic time series by model-based procedures. Comput Stat Data Anal 52(10):4685–4698

    Article  MathSciNet  MATH  Google Scholar 

  • Pastres R, Pastore A, Tonellato SF (2011) Looking for similar patterns among monitoring stations. Venice Lagoon application. Environmetrics (in press)

  • Pértega Diaz S, Vilar JA (2010) Comparing several parametric and nonparametric approaches to time series clustering: a simulation study. J Classif 27:333–362

    Article  Google Scholar 

  • Piccolo D (1990) A distance measure for classifying ARIMA models. J Time Ser Anal 11(2):153–164

    Article  MATH  Google Scholar 

  • Piccolo D (2007) Statistical issues on the AR metric in time series analysis. In: Proceedings of the Italian Statistical Society, conference on risk and prediction, 6–8 June 2007, pp 221–232

  • Pires JCM, Sousa SIV, Pereira MC, Alvim-Ferraz MCM, Martins FG (2008a) Management of air quality monitoring using principal component and cluster analysis—Part I: SO2 and PM10. Atmos Environ 42:1249–1260

    Article  Google Scholar 

  • Pires JCM, Sousa SIV, Pereira MC, Alvim-Ferraz MCM, Martins FG (2008b) Management of air quality monitoring using principal component and cluster analysis—Part II: CO, NO2 and O3. Atmos Environ 42:1261–1274

    Article  Google Scholar 

  • Ramoni M, Sebastiani P, Cohen P (2002) Bayesian clustering by dynamics. Mach Learn 47:91–121

    Article  MATH  Google Scholar 

  • Romo-Groger CM, Morales JR, Dinator MI, Llona F (1994) Heavy metals in the atmosphere coming from a copper smelter in Chile. Atmos Environ 28:705–711

    Article  Google Scholar 

  • Ruijgrok W, Romer FG (1993) Aspects of wet, acidifying deposition in Arnhem: source regions, correlations, and trends. Atmos Environ 27:637–653

    Article  Google Scholar 

  • Runkler TA, Bezdek JC (1999) Ace: a tool for clustering and rule extraction. IEEE Trans Fuzzy Syst 5(2):270–293

    MathSciNet  Google Scholar 

  • Saksena S, Joshi V, Patil RS (2003) Cluster analysis of Delhi’s ambient air quality data. J Environ Monit 5:91–499

    Article  Google Scholar 

  • Sanchez Gomez ML, Ramos Martin MC (1987) Application of cluster analysis to identify sources of airborne particles. Atmos Environ 21:1521–1527

    Article  Google Scholar 

  • Sanchez ML, Pascual D, Ramos C, Perez I (1990) Forecasting particulate pollutant concentrations in a city from meteorological variables and regional weather patterns. Atmos Environ 26:1509–1519

    Google Scholar 

  • Savvides A, Promponas VJ, Fokianos K (2008) Clustering of biological time series by cepstral coefficients based distances. Pattern Recognit 41:2398–2412

    Article  MATH  Google Scholar 

  • Shaw CT, King GP (1992) Using cluster analysis to classify time series. Physica D Non Linear Phenom 58:288–298

    Article  Google Scholar 

  • Silva C, Quiroz A (2003) Optimization of the atmospheric pollution monitoring network at Santiago de Chile. Atmos Environ 37:2337–2345

    Article  Google Scholar 

  • Tarpey T, Kinateder KKJ (2003) Clustering functional data. J Classif 20:93–114

    Article  MathSciNet  MATH  Google Scholar 

  • Toulemonde G, Guillou A, Naveau P, Vrac M, Chevallier F (2010) Autoregressive models for maxima and their applications to CH4 and N2O. Environmetrics 21:189–207

    MathSciNet  Google Scholar 

  • Vilar JA, Alonso AM, Vilar JM (2010) Non-linear time series clustering based on non-parametric forecast densities. Comput Stat Data Anal 54:2850–2865

    Article  MathSciNet  Google Scholar 

  • Vilar JM, Vilar JA, Pértega S (2009) Classifying time series data: a nonparametric approach. J Classif 26(1):3–28

    Article  MATH  Google Scholar 

  • Wang N, Blostein S (2004) Adaptive zero-padding OFDM over frequency-selective multipath channels. Journal on Applied Signal Processing 10:1478–1488

    Article  Google Scholar 

  • Wedel M, Kamakura WA (1998) Market segmentation: Conceptual and methodological foundations. Kluwer, Boston

    Google Scholar 

  • Wongphatarakul V, Friedlander SK, Pinto JP (1998) A comparative study of PM2.5 ambient aerosol chemical databases. Environ Sci Technol 32:3926–3934

    Article  Google Scholar 

  • Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 13:841–847

    Google Scholar 

  • Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353

    Article  MathSciNet  MATH  Google Scholar 

  • Zarandi MHF, Alaeddini A (2010) A general fuzzy-statistical clustering approach for estimating the time of change in variable sampling control charts. Inf Sci 180:3033–3044

    Article  Google Scholar 

  • Zeng Y, Garcia-Frias J (2006) A novel HMM-based clustering algorithm for the analysis of gene expression time-course data. Comput Stat Data Anal 50(9):2472–2494

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang Y, Wang W, Zhang W, Li Y (2008) A cluster validity index for fuzzy clustering. Inf Sci 178:1205–1218

    Article  MATH  Google Scholar 

  • Zhou Q, Huang GH, Chan CW (2004) Development of an intelligent decision support system for air pollution control at coal-fired power plants. Exp Syst Appl 26:335–356

    Google Scholar 

Download references

Acknowledgments

The authors thank the editor and the referees for their useful comments and suggestions which helped to improve the quality and presentation of this manuscript. We wish to acknowledge the contributors of the CO and CO2 data sets from the various countries that were used in the Application 1. These data sets appear on website of World Data Centre for Green House Gases. For the CO and NO data sets used in Application 2, we wish to acknowledge the Italian Environmental Protection Agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierpaolo D’Urso.

Rights and permissions

Reprints and permissions

About this article

Cite this article

D’Urso, P., Di Lallo, D. & Maharaj, E.A. Autoregressive model-based fuzzy clustering and its application for detecting information redundancy in air pollution monitoring networks. Soft Comput 17, 83–131 (2013). https://doi.org/10.1007/s00500-012-0905-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-012-0905-6

Keywords

Navigation