Abstract
Random Forests are one of the most reliable and robust general-purpose machine learning algorithms. They provide very competitive baselines for more complex algorithms. Recently, a new algorithm has been introduced into the family of decision tree learners – Similarity Forests, aiming at mitigating some of the well-known deficiencies of Random Forests. In this paper we extend the originally proposed Similarity Forests algorithm to one-class classification, multi-class classification, regression and metric learning tasks. We also introduce two new criteria for split evaluation in regression learning. The results of conducted experiments show that Similarity Forests can be a competitive alternative to Random Forests, in particular, when high quality data representation is difficult to obtain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Although it should be noted that methodological objections have been raised [22] regarding this often cited study.
- 2.
www.github.com/anonymous: anonymized for blind review.
- 3.
References
Archer, K.J., Kimes, R.V.: Empirical characterization of random forest variable importance measures. Comp. Stat. Data Anal. 52(4), 2249–2260 (2008)
Arlot, S., Genuer, R.: Analysis of purely Random Forests bias. arXiv:1407.3939 (2014)
Atkinson, A.B., et al.: On the measurement of inequality. J. Econ. Theory 2(3), 244–263 (1970)
Biau, G.: Analysis of a random forests model. J. Mach. Learn. Res. 13, 1063–1095 (2012)
Biau, G., Scornet, E., Welbl, J.: Neural random forests. Sankhya 81(2), 347–386 (2019)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)
Chen, X., Ishwaran, H.: Random forests for genomic data analysis. Genomics, 99(6), 323–329 (2012)
Denil, M., Matheson, D.: Consistency of online random forests. Tech. rep. (2013)
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D., Fernández-Delgado, A.: Do we need hundreds of classifiers to solve real world classification problems?. J. Mach. Learn. Res. 15, 3133–3181 (2014)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Haghiri, S., Ghoshdastidar, D., von Luxburg, U.: Comparison Based Nearest Neighbor Search. arXiv:1704.01460 (4 2017)
Hara, S., Hayashi, K.: Making Tree Ensembles Interpretable. arXiv:1606.05390 (2016). http://arxiv.org/abs/1606.05390
Hariri, S., Kind, M.C.: Extended isolation forest. arXiv:1811.02141 (2018)
Ishwaran, H., Lu, M.: Random survival forests. In: Wiley StatsRef: Statistics Reference Online, pp. 1–13. John Wiley & Sons Ltd, Chichester, UK (2 2019)
Liaw, A.: Classification and regression by random forests. Tech. rep. (2002)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
Lucas, B., et al.: Proximity forest: an effective and scalable distance-based classifier for time series. Data Mining Knowl. Discov. 33(3), 607–635 (2019)
Sathe, S., Aggarwal, C.C.: Similarity forests. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. vol. Part F1296, pp. 395–403 (2017)
Schölkopf, B.: The kernel trick for distances. Adv. Neural Inf. Process. Syst. 301–307 (2001)
Tyralis, H., Papacharalampous, G.: Variable selection in time series forecasting using random forests. Algorithms, 10(4), 114 (2017)
Wainberg, M., Alipanahi, B., Frey, B.J.: Are random forests truly the best classifiers? J. Mach. Learn. Res. 17, 1–5 (2016)
Xu, B., Guo, X., Ye, Y., Cheng, J.: An improved random forest classifier for text categorization. J. Comput. (2012)
Acknowledgements
This work is supported by the National Science Center, Poland, decision no. DEC-2016/23/B/ST6/03962.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Czekalski, S., Morzy, M. (2021). Similarity Forests Revisited: A Swiss Army Knife for Machine Learning. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12713. Springer, Cham. https://doi.org/10.1007/978-3-030-75765-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-75765-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75764-9
Online ISBN: 978-3-030-75765-6
eBook Packages: Computer ScienceComputer Science (R0)