Skip to main content

Evolutionary Computing in Statistical Data Analysis

  • Chapter
Foundations of Computational Intelligence Volume 3

Part of the book series: Studies in Computational Intelligence ((SCI,volume 203))

Abstract

Evolutionary computing methods are being used in a wide field domain with increasing confidence and encouraging outcomes. We want to illustrate how these new techniques have influenced the statistical theory and practice concerned with multivariate data analysis, time series model building and optimization methods for statistical estimates computation and inference in complex systems. The distinctive features all these subject topics have in common are the large number of alternatives for model choice, parametrization over high dimensional discrete spaces and lack of convenient properties that may be assumed to hold at least approximately about the data generating process. Evolutionary computing proved to be able to offer a valuable framework to deal with complicated problems in statistical data analysis and time series analysis and we shall draw a wide though by no means exhaustive list of topics of interest in statistics that have been successfully handled by evolutionary computing procedures. Specific issues will be concerned with variable selection in linear regression models, non linear regression, time series model identification and estimation, detection of outlying observations in time series as regards both location and type identification, cluster analysis and grouping problems, including clusters of directional data and clusters of time series. Simulated examples and applications to real data will be used for illustration purpose through the chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adanu, K.: Optimizing the garch model - An application of two global and two local search methods. Computational Economics 28, 277–290 (2006)

    Article  MATH  Google Scholar 

  2. Balcombe, K.G.: Model selection using information criteria and genetic algorithms. Computational Economics 25, 207–228 (2005)

    Article  MATH  Google Scholar 

  3. Bandyopadhyay, S., Maulik, U.: Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognition 35, 1197–1208 (2002)

    Article  MATH  Google Scholar 

  4. Bandyopadhyay, S., Maulik, U., Mukhopadhyay, A.: Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing 45, 1506–1511 (2007)

    Article  Google Scholar 

  5. Bandyopadhyay, S., Mukhopadhyay, A., Maulik, U.: An improved algorithm for clustering gene expression data. Bioinformatics 23, 2859–2865 (2007)

    Article  Google Scholar 

  6. Bandyopadhyay, S., Saha, S., Maulik, U., Deb, K.: A Simulated Annealing Based Multi-objective Optimization Algorithm: AMOSA. IEEE Transaction on Evolutionary Computation 12, 269–283 (2008)

    Article  Google Scholar 

  7. Baragona, R.: A simulation study on clustering time series with metaheuristic methods. Quaderni di Statistica 3, 1–26 (2001)

    MathSciNet  Google Scholar 

  8. Baragona, R.: Further results on Lund’s statistic for identifying cluster in a circular data set with application to time series. Communications in Statistics – Simulation and Computation 32(3) (2003)

    Google Scholar 

  9. Baragona, R.: General local search methods in time series. Contributed paper at the International Workshop on Computational Management Science, Economics, Finance and Engineering, Limassol, Cyprus, March 28-30, 2003, vol. 2003(10), pp. 28–59 (October 2003), http://www.sciencedirect.com/preprintarchive

  10. Baragona, R., Battaglia, F.: Multivariate mixture models estimation: a genetic algorithm approach. In: Schader, M., Gaul, W., Vichi, M. (eds.) Between Data Science and Applied Data Analysis, Series: Studies in Classification, Data Analysis and Knowledge Organization, pp. 133–142. Springer, Berlin (2003)

    Google Scholar 

  11. Baragona, R., Battaglia, F.: Genetic algorithms for building double threshold generalized autoregressive conditional heteroscedastic models of time series. In: Rizzi, A., Vichi, M. (eds.) Compstat 2006 - Proceedings in Computational Statistics, 17th Symposium Held in Rome, Italy, pp. 441–452. Springer, Berlin (2006)

    Google Scholar 

  12. Baragona, R., Battaglia, F.: Outliers detection in multivariate time series by independent component analysis. Neural Computation 19, 1962–1984 (2007)

    Article  MATH  Google Scholar 

  13. Baragona, R., Cucina, D.: Double threshold autoregressive conditionally heteroscedastic model building by genetic algorithms. Journal of Statistical Computation and Simulation 78, 541–559 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  14. Baragona, R., Battaglia, F., Calzini, C.: Genetic algorithms for the identification of additive and innovation outliers in time series. Computational Statistics & Data Analysis 37, 1–12 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  15. Baragona, R., Battaglia, F., Cucina, D.: A note on estimating autoregressive exponential models. Quaderni di Statistica 4, 71–88 (2002)

    Google Scholar 

  16. Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. John Wiley & Sons, Chichester (1994)

    MATH  Google Scholar 

  17. Bearse, P., Bozdogan, H.: Subset selection in vector autoregressive models using the genetic algorithm with informational complexity as the fitness function. Systems Analysis Modelling Simulation 31, 61–91 (1998)

    MATH  Google Scholar 

  18. Berkhin, P.: Survey of clustering data mining techniques. Technical Report, Accrue Software, San Jose, California (2002), http://citeseer.nj.nec.com/berkhin02survey.html

  19. Bezdek, J.C., Pal, N.R.: Some new indexes of cluster validity. IEEE Transactions on Systems, Man and Cybernetics – Part B: Cybernetics 28, 301–315 (1998)

    Article  Google Scholar 

  20. Bollerslev, T.: A generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31, 307–327 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  21. Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis: Forecasting and Control, 3rd edn. Prentice Hall, Englewood Cliffs (1994)

    MATH  Google Scholar 

  22. Bozdogan, H.: Information complexity criteria for detecting influential observations in dynamic multivariate linear models using the genetic algorithm. Journal of Statistical Planning and Inference 114, 31–44 (1988)

    Article  MathSciNet  Google Scholar 

  23. Bozdogan, H., Bearse, P.: ICOMP: A new model-selection criterion. In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 599–608. Elsevier Science Publishers, Amsterdam (2003)

    Google Scholar 

  24. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)

    Article  Google Scholar 

  25. Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Springer, New York (1996)

    MATH  Google Scholar 

  26. Brooks, C.: A double-threshold GARCH model for the French Franc Deutschmark exchange rate. Journal of Forecasting 20, 135–143 (2001)

    Article  Google Scholar 

  27. Broudiscou, A., Leardi, R., Phan-Tan-Luu, R.: Genetic algorithms as a tool for selection of D-optimal design. Chemometrics and Intelligent Laboratory Systems 35, 105–116 (1996)

    Article  Google Scholar 

  28. Chatterjee, S., Laudato, M.: Genetic algorithms in statistics: procedures and applications. Communications in Statistics – Theory and Methods 26(4), 1617–1630 (1997)

    MATH  Google Scholar 

  29. Chatterjee, S., Laudato, M., Lynch, L.A.: Genetic algorithms and their statistical applications: an introduction. Computational Statistics & Data Analysis 22, 633–651 (1996)

    Article  MATH  Google Scholar 

  30. Chen, C.W.S.: Subset selection of autoregressive time series models. Journal of Forecasting 18, 505–516 (1999)

    Article  MATH  Google Scholar 

  31. Chen, C., Liu, L.-M.: Joint estimation of model parameters and outlier effects in time series. Journal of the American Statistical Association 88, 284–297 (1993)

    Article  MATH  Google Scholar 

  32. Chen, R., Tsay, R.S.: Functional-coefficient autoregressive models. Journal of the American Statistical Association 88, 298–308 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  33. Chiogna, M., Gaetan, C., Masarotto, G.: Automatic identification of seasonal transfer function models by means of iterative stepwise and genetic algorithms. Journal of Time Series Analysis 29, 37–50 (2008)

    MATH  MathSciNet  Google Scholar 

  34. Chitre, Y., Dhawan, A.P.: M-band wavelet discrimination of natural textures. Pattern Recognition 32, 773–789 (1999)

    Article  Google Scholar 

  35. Choy, K.: Outlier detection for stationary time series. Journal of Statistical Planning and Inference 99, 111–127 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  36. Crawford, K.D., Wainwright, R.L.: Applying genetic algorithms to outlier detection. In: Eshelman, L.J. (ed.) Proceedings of the Sixth International Conference on Genetic Algorithms, pp. 546–550. Morgan Kaufmann, San Mateo (1995)

    Google Scholar 

  37. Davis, R.A., Lee, T.C.M., Rodriguez-Yam, G.A.: Structural break estimation for nonstationary time series models. Journal of the American Statistical Association 101, 223–239 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  38. Davis, R.A., Lee, T.C.M., Rodriguez-Yam, G.A.: Break detection for a class of nonlinear time series models. Journal of Time Series Analysis 29, 834–867 (2008)

    Article  MathSciNet  Google Scholar 

  39. Engle, R.F.: Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50, 987–1007 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  40. Falkenauer, E.: Genetic Algorithms and Grouping Problems. Wiley, New York (1998)

    Google Scholar 

  41. Fogel, D.B.: Evolutionary computation: toward a new philosophy of machine intelligence. IEEE Press, New York (1998)

    Google Scholar 

  42. Forlin, M., Poli, I., De March, D., Packard, N., Gazzola, G., Serra, R.: Evolutionary experiments for self-assembling amphiphilic systems. Chemometrics and Intelligent Laboratory Systems 90, 153–160 (2008)

    Article  Google Scholar 

  43. Gaetan, C.: Subset ARMA model identification using genetic algorithms. Journal of Time Series Analysis 21, 559–570 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  44. Galeano, P., Peña, D., Tsay, R.S.: Outlier detection in multivariate time series by projection pursuit. Journal of the American Statistical Association - Theory and Methods 101, 654–669 (2006)

    Article  MATH  Google Scholar 

  45. Ghaddar, D.K., Tong, H.: Data transformation and self-exciting threshold autoregression. Applied Statistics 30, 238–248 (1981)

    Article  Google Scholar 

  46. Glendinning, R.H.: Estimating the inverse autocorrelation function from outlier contaminated data. Computational Statistics 15, 541–565 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  47. Glover, F., Kelly, J.P., Laguna, M.: Genetic algorithms and tabu search: hybrids for optimization. Computers and Operations Research 22, 111–134 (1995)

    Article  MATH  Google Scholar 

  48. Gomez, V., Maravall, A., Peña, D.: Missing observations in ARIMA models: Skipping approach versus additive outlier approach. Journal of Econometrics 88, 341–363 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  49. Gourieroux, C., Monfort, A., Renault, E.: Indirect inference. Journal of Applied Econometrics 118, S85–S118 (1993)

    Article  Google Scholar 

  50. Haggan, V., Ozaki, T.: Modelling nonlinear random vibrations using an amplitude-dependent autoregressive time series model. Biometrika 68, 189–196 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  51. Heredia-Langner, A., Carlyle, W.M., Montgomery, D.C., Borror, C.M., Runger, G.C.: Genetic algorithms for the construction of D-optimal designs. Journal of Quality Technology 35, 28–46 (2003)

    Google Scholar 

  52. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression, 2nd edn. John Wiley & Sons, Hoboken (2000)

    MATH  Google Scholar 

  53. Justel, A., Peña, D., Tsay, R.S.: Detection of outlier patches in autoregressive time series. Statistica Sinica 11, 651–673 (2001)

    MATH  MathSciNet  Google Scholar 

  54. Kapetanios, G.: Cluster analysis of panel data sets using non-standard optimisation of information criteria. Journal of Economic Dynamics and Control 30, 1389–1408 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  55. Kapetanios, G.: Variable selection in regression models using nonstandard optimisation of information criteria. Computational Statistics & Data Analysis 52, 4–15 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  56. Keskinturk, T., Er, S.: A genetic algorithm approach to determine stratum boundaries and sample sizes of each stratum in stratified sampling. Computational Statistics & Data Analysis 52, 53–67 (2007)

    Article  MathSciNet  Google Scholar 

  57. Larrañaga, P., Lozano, J.A.: Estimation of distribution algorithms: a new tool for evolutionary optimization. Kluwer, Boston (2002)

    Google Scholar 

  58. Li, W.K., Lam, K.: Modelling asymmetry in stock returns by threshold autoregressive conditional heteroscedastic model. The Statistician 44, 333–341 (1995)

    Article  Google Scholar 

  59. Li, C.W., Li, W.K.: On a double-threshold autoregressive heteroscedastic time series model. Journal of Applied Econometrics 11, 253–274 (1996)

    Article  Google Scholar 

  60. Liao, T.W.: Clustering of time series data - a survey. Pattern Recognition 38, 1857–1874 (2005)

    Article  MATH  Google Scholar 

  61. Lozano, J.A., Larrañaga, P., Inza, I., Bengoetxea, G.: Towards a new evolutionary computation. Advances in estimation of distribution algorithms. Springer, Berlin (2006)

    Book  MATH  Google Scholar 

  62. Lund, U.: Cluster analysis for directional data. Communications in Statistics – Simulation and Computation 28(4), 1001–1009 (1999)

    Article  MATH  Google Scholar 

  63. Maulik, U., Bandyopadhyay, S.: Fuzzy Partitioning Using Real Coded Variable Length Genetic Algorithm for Pixel Classification. IEEE Transactions on Geosciences and Remote Sensing 41, 1075–1081 (2003)

    Article  Google Scholar 

  64. Minerva, T., Poli, I.: Building ARMA models with genetic algorithms. In: Boers, E.J.W., Gottlieb, J., Lanzi, P.L., Smith, R.E., Cagnoni, S., Hart, E., Raidl, G.R., Tijink, H. (eds.) EvoIASP 2001, EvoWorkshops 2001, EvoFlight 2001, EvoSTIM 2001, EvoCOP 2001, and EvoLearn 2001. LNCS, vol. 2037, pp. 335–342. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  65. Mitchell, M.: An introduction to genetic algorithms. MIT Press, Cambridge (1996)

    Google Scholar 

  66. Mühlenbein, H., Paas, G.: From Recombination of Genes to the Estimation of Distributions I. Binary Parameters, Proceedings of the 4th International Conference on Parallel Problem Solving from Nature, September 22-26, 1996, pp. 178–187 (1996)

    Google Scholar 

  67. Ong, C.S., Huang, J.J., Tzeng, G.H.: Model identification of ARIMA family using genetic algorithms. Applied Mathematics and Computation 164, 885–912 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  68. Pasia, J.M., Hermosilla, A.Y., Ombao, H.: A useful tool for statistical estimation: genetic algorithms. Journal of Statistical Computation and Simulation 75, 237–251 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  69. Paterlini, S., Minerva, T.: Evolutionary approaches for cluster analysis. In: Bonarini, A., Masulli, F., Pasi, G. (eds.) Soft Computing Applications, pp. 167–178. Springer, Berlin (2003)

    Google Scholar 

  70. Peña, D.: Influential observations in time series. Journal of Business & Economic Statistics 8, 235–241 (1990)

    Article  Google Scholar 

  71. Priestley, M.B.: Non-linear and Non-stationary Time Series Analysis. Academic Press, London (1988)

    Google Scholar 

  72. Qian, G., Zhao, X.: On time series model selection involving many candidate ARMA models. Computational Statistics & Data Analysis 51, 6180–6196 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  73. Reeves, C.R., Rowe, J.E.: Genetic algorithms - Principles and Perspective: A Guide to GA Theory. Kluwer Academic Publishers, London (2003)

    MATH  Google Scholar 

  74. Robles, V., Bielza, C., Larrañaga, P., González, S., Ohno-Machado, L.: Optimizing logistic regression coefficients for discrimination and calibration using estimation of distribution algorithms. TOP (2008) (published on line) doi:10.1007/s11750-008-0054-3

    Google Scholar 

  75. Roverato, A., Poli, I.: A genetic algorithm for graphical model selection. Journal of the Italian Statistical Society 7, 197–208 (1998)

    Article  Google Scholar 

  76. Sabatier, R., Reyne‘s, C.: Extensions of simple component analysis and simple linear discriminant analysis using genetic algorithms. Computational Statistics & Data Analysis 52, 4779–4789 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  77. Sahni, S., Gonzalez, T.: P-Complete approximation problems. Journal of the Association for Computing Machinery 23, 555–565 (1976)

    MATH  MathSciNet  Google Scholar 

  78. Sessions, D.N., Stevans, L.K.: Investigating omitted variable bias in regression parameter estimation: A genetic algorithm approach. Computational Statistics & Data Analysis 50, 2835–2854 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  79. Tong, H.: Non Linear Time Series: A Dynamical System Approach. Oxford University Press, Oxford (1990)

    MATH  Google Scholar 

  80. Tsay, R.S., Peña, D., Pankratz, A.E.: Outliers in multivariate time series. Biometrika 87, 789–804 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  81. van Dijk, D., Terasvirta, T., Franses, P.H.: Smooth transition autoregressive models - A survey of recent developments. Econometric Reviews 21, 1–47 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  82. Van Emden, M.H.: An analysis of complexity, vol. 35, Mathematical Centre Tracts, Amsterdam (1971)

    Google Scholar 

  83. Vitrano, S., Baragona, R.: The genetic algorithm estimates for the parameters of order p normal distributions. In: Bock, H.-H., Chiodi, M., Mineo, A. (eds.) Advances in Multivariate Data Analysis, Series: Studies in Classification, Data Analysis and Knowledge Organization, pp. 133–143. Springer, Berlin (2004)

    Google Scholar 

  84. Wei, W.W.S.: Time Series Analysis. Addison-Wesley, Redwood (1990)

    MATH  Google Scholar 

  85. Winker, P.: Optimization Heuristics in Econometrics: Application of Threshold Accepting. John Wiley & Sons, Chichester (2001)

    Google Scholar 

  86. Winker, P., Gilli, M.: Applications of optimization heuristics to estimation and modelling problems. Computational Statistics & Data Analysis 47, 211–223 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  87. Wu, B., Chang, C.-L.: Using genetic algorithms to parameters (d,r) estimation for threshold autoregressive models. Computational Statistics & Data Analysis 38, 315–330 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  88. Yang, Z., Tian, Z., Yuan, Z.: GSA-based maximum likelihood estimation for threshold vector error correction model. Computational Statistics & Data Analysis 52, 109–120 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  89. Zani, S.: Osservazioni sulle serie storiche multiple e l’analisi dei gruppi. In: Piccolo, D. (ed.) Analisi Moderna delle Serie Storiche, Franco Angeli, Milano, pp. 263–274 (1983)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Baragona, R., Battaglia, F. (2009). Evolutionary Computing in Statistical Data Analysis. In: Abraham, A., Hassanien, AE., Siarry, P., Engelbrecht, A. (eds) Foundations of Computational Intelligence Volume 3. Studies in Computational Intelligence, vol 203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01085-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01085-9_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01084-2

  • Online ISBN: 978-3-642-01085-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics