Skip to main content

Least Squares Approach for Multivariate Split Selection in Regression Trees

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2020 (IDEAL 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12489))

Abstract

In the context of Industry 4.0, an increasing number of data-driven models is used in order to improve industrial processes. These models need to be accurate and interpretable. Regression Trees are able to fulfill these requirements, especially if their model flexibility is increased by multivariate splits that adapt to the process function. In this paper, a novel approach for multivariate split selection is presented. The direction of the split is determined by a first-order Least Squares model, that adapts to process function gradient in a local area. By using a forward selection method, the curse of dimensionality is weakened, interpretability is maintained and a generalized split is created. The approach is implemented in CART as an extension to the existing algorithm for constructing the Least Squares Regression Tree (LSRT). For evaluation, an extensive experimental analysis is performed in which LSRT leads to much smaller trees and a higher prediction accuracy than univariate CART. Furthermore, low sensitivity to noise and performance improvements for high dimensional input spaces and small data sets are achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.: Classification and Regression Trees. Chapman and Hall/CRC, New York (1984)

    MATH  Google Scholar 

  2. Brodley, C.E., Utgoff, P.E.: Multivariate decision trees. Mach. Learn. 19(1), 45–77 (1995)

    MATH  Google Scholar 

  3. Ebert, T., Fischer, T., Belz, J., Heinz, T.O., Kampmann, G., Nelles, O.: Extended deterministic local search algorithm for maximin latin hypercube designs. In: IEEE Symposium Series on Computational Intelligence, pp. 375–382 (2015)

    Google Scholar 

  4. Eriksson, L., Trygg, J., Wold, S.: PLS-trees® a top-down clustering approach. J. Chemometr. 23, 569–580 (2009)

    Article  Google Scholar 

  5. Friedman, J.H., Grosse, E., Stuetzle, W.: Multidimensional additive spline approximation. SIAM J. Sci. Stat. Comput. 4(2), 291–301 (1983)

    Article  MathSciNet  Google Scholar 

  6. Gijsbers, P.: OpenML wine-quality-red. https://www.openml.org/d/40691. Accessed 21 May 2020

  7. Evolutionary Decision Trees in Large-Scale Data Mining. SBD, vol. 59. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21851-5_8

  8. Li, K.C., Lue, H.H., Chen, C.H.: Interactive tree-structured regression via principal hessian directions. J. Am. Stat. Assoc. 95, 547–560 (2000)

    Article  MathSciNet  Google Scholar 

  9. Lindsey, C., Sheather, S.: Variable selection in linear regression. Stata J. 10(4), 650–669 (2010)

    Article  Google Scholar 

  10. Loh, W.Y.: Fifty years of classification and regression trees. Int. Stat. Rev. 82, 329–348 (2014)

    Article  MathSciNet  Google Scholar 

  11. Nelles, O.: Nonlinear System Identification. Springer, Berlin Heidelberg (2001)

    Book  Google Scholar 

  12. van Rijn, J.: OpenML machine\_cpu. https://www.openml.org/d/230. Accessed 21 May 2020

  13. Shang, C., You, F.: Data analytics and machine learning for smart process manufacturing: recent advances and perspectives in the big data era. Engineering 5(6), 1010–1016 (2019)

    Article  Google Scholar 

  14. Vanschoren, J.: OpenML boston. https://www.openml.org/d/531. Accessed 21 May 2020

  15. Vanschoren, J.: OpenML tecator. https://www.openml.org/d/505. Accessed 21 May 2020

  16. Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. SIGKDD Explor. 15(2), 49–60 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the EFRE-NRW funding programme"Forschungsinfrastrukturen" (grant no. 34.EFRE–0300180).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marvin Schöne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schöne, M., Kohlhase, M. (2020). Least Squares Approach for Multivariate Split Selection in Regression Trees. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2020. IDEAL 2020. Lecture Notes in Computer Science(), vol 12489. Springer, Cham. https://doi.org/10.1007/978-3-030-62362-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-62362-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-62361-6

  • Online ISBN: 978-3-030-62362-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics