Skip to main content

RANSAC-GP: Dealing with Outliers in Symbolic Regression with Genetic Programming

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10196))

Abstract

Genetic programming (GP) has been shown to be a powerful tool for automatic modeling and program induction. It is often used to solve difficult symbolic regression tasks, with many examples in real-world domains. However, the robustness of GP-based approaches has not been substantially studied. In particular, the present work deals with the issue of outliers, data in the training set that represent severe errors in the measuring process. In general, a datum is considered an outlier when it sharply deviates from the true behavior of the system of interest. GP practitioners know that such data points usually bias the search and produce inaccurate models. Therefore, this work presents a hybrid methodology based on the RAndom SAmpling Consensus (RANSAC) algorithm and GP, which we call RANSAC-GP. RANSAC is an approach to deal with outliers in parameter estimation problems, widely used in computer vision and related fields. On the other hand, this work presents the first application of RANSAC to symbolic regression with GP, with impressive results. The proposed algorithm is able to deal with extreme amounts of contamination in the training set, evolving highly accurate models even when the amount of outliers reaches 90%.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    All results presented in this work are based on a standard GP implementation, using a tree representation, subtree crossover and subtree mutation.

  2. 2.

    An extensive discussion of these methods is beyond the scope of this work.

  3. 3.

    For all practitioners, it is intuitively evident that 10% of outliers can have drastic effects in the modeling process.

References

  1. Alfons, A., Croux, C., Gelper, S., et al.: Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Annals Appl. Stat. 7(1), 226–248 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  2. Derpanis, K.G.: Overview of the RANSAC algorithm. Image Rochester NY 4(1), 2–3 (2010)

    Google Scholar 

  3. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    Article  MathSciNet  Google Scholar 

  4. Fortin, F.A., De Rainville, F.M., Gardner, M.A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)

    MathSciNet  MATH  Google Scholar 

  5. Giloni, A., Padberg, M.: Least trimmed squares regression, least median squares regression, and mathematical programming. Math. Comput. Model. 35(9), 1043–1060 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  6. Gonçalves, I., Silva, S.: Balancing learning and overfitting in genetic programming with interleaved sampling of training data. In: Krawiec, K., Moraglio, A., Hu, T., Etaner-Uyar, A.Ş., Hu, B. (eds.) EuroGP 2013. LNCS, vol. 7831, pp. 73–84. Springer, Heidelberg (2013). doi:10.1007/978-3-642-37207-0_7

    Chapter  Google Scholar 

  7. Hast, A., Nysjö, J., Marchetti, A.: Optimal RANSAC-towards a repeatable algorithm for finding the optimal set. J. WSCG 21(1), 21–30 (2013)

    Google Scholar 

  8. Kotanchek, M.E., Vladislavleva, E.Y., Smits, G.F.: Symbolic regression via genetic programming as a discovery engine: insights on outliers and prototypes. In: Riolo, R., O’Reilly, U.-M., McConaghy, T. (eds.) Genetic Programming Theory and Practice VII, pp. 55–72. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. La Cava, W., Spector, L., Danai, K.: Epsilon-lexicase selection for regression. In: GECCO 2016 Proceedings of the Genetic and Evolutionary Computation Conference 2016, pp. 741–748. ACM, New York (2016)

    Google Scholar 

  10. Lacey, A., Pinitkarn, N., Thacker, N.A.: An evaluation of the performance of RANSAC algorithms for stereo camera calibrarion. In: BMVC, pp. 1–10 (2000)

    Google Scholar 

  11. Martínez, Y., Trujillo, L., Naredo, E., Legrand, P.: A comparison of fitness-case sampling methods for symbolic regression with genetic programming. In: Tantar, A.-A. (ed.) EVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation V. AISC, vol. 288, pp. 201–212. Springer, Heidelberg (2014). doi:10.1007/978-3-319-07494-8_14

    Google Scholar 

  12. Martnez, Y., Naredo, E., Trujillo, L., Legrand, P., Lpez, U.: A comparison of fitness-case sampling methods for genetic programming. Journal of Experimental and Theoretical Artificial Intelligence (accepted to appear 2016)

    Google Scholar 

  13. McDermott, J., White, D.R., Luke, S., Manzoni, L., Castelli, M., Vanneschi, L., Jaskowski, W., Krawiec, K., Harper, R., De Jong, K., O’Reilly, U.M.: Genetic programming needs better benchmarks. In: GECCO 2012 Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference, pp. 791–798. ACM, New York (2012)

    Google Scholar 

  14. Nunkesser, R., Morell, O.: An evolutionary algorithm for robust regression. Comput. Stat. & Data Anal. 54(12), 3242–3248 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  15. Pearson, R.K.: Mining imperfect data: dealing with contamination and incomplete records. SIAM (2005)

    Google Scholar 

  16. Rousseeuw, P.J.: Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  17. Spector, L.: Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report. In: GECCO Companion 2012 Proceedings of the Fourteenth International Conference on Genetic and Evolutionary Computation Conference Companion, pp. 401–408. ACM (2012)

    Google Scholar 

  18. Tarsha-Kurdi, F., Landes, T., Grussenmeyer, P., et al.: Hough-transform and extended RANSAC algorithms for automatic detection of 3D building roof planes from Lidar data. In: Proceedings of the ISPRS Workshop on Laser Scanning. vol. 36, pp. 407–412 (2007)

    Google Scholar 

  19. Torr, P.H., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2000)

    Article  Google Scholar 

  20. Zuliani, M.: RANSAC for Dummies. Vision Research Lab, University of California, Santa Barbara (2009)

    Google Scholar 

Download references

Acknowledgments

First author was supported by CONACYT (México) scholarships No. 573397. This research was partially supported by CONACYT Basic Science Research Project No. 178323, CONACYT Fronteras de la Ciencia 2015-2 No. 944, as well as by FP7- Marie Curie-IRSES 2013 European Commission program with project ACoBSEC with contract No. 612689. Sara Silva acknowledges project PERSEIDS (PTDC/EMS-SIS/0642/2014) and BioISI RD unit, UID/MULTI/04046/2013, funded by FCT/MCTES/PIDDAC, Portugal.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo Trujillo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

López, U., Trujillo, L., Martinez, Y., Legrand, P., Naredo, E., Silva, S. (2017). RANSAC-GP: Dealing with Outliers in Symbolic Regression with Genetic Programming. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds) Genetic Programming. EuroGP 2017. Lecture Notes in Computer Science(), vol 10196. Springer, Cham. https://doi.org/10.1007/978-3-319-55696-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55696-3_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55695-6

  • Online ISBN: 978-3-319-55696-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics