Supervised learning via smoothed Polya trees

Cipolli, William; Hanson, Timothy

doi:10.1007/s11634-018-0344-z

Supervised learning via smoothed Polya trees

Regular Article
Published: 12 October 2018

Volume 13, pages 877–904, (2019)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

372 Accesses
2 Citations
Explore all metrics

Abstract

We propose a generative classification model that extends Quadratic Discriminant Analysis (QDA) (Cox in J R Stat Soc Ser B (Methodol) 20:215–242, 1958) and Linear Discriminant Analysis (LDA) (Fisher in Ann Eugen 7:179–188, 1936; Rao in J R Stat Soc Ser B 10:159–203, 1948) to the Bayesian nonparametric setting, providing a competitor to MclustDA (Fraley and Raftery in Am Stat Assoc 97:611–631, 2002). This approach models the data distribution for each class using a multivariate Polya tree and realizes impressive results in simulations and real data analyses. The flexibility gained from further relaxing the distributional assumptions of QDA can greatly improve the ability to correctly classify new observations for models with severe deviations from parametric distributional assumptions, while still performing well when the assumptions hold. The proposed method is quite fast compared to other supervised classifiers and very simple to implement as there are no kernel tricks or initialization steps perhaps making it one of the more user-friendly approaches to supervised learning. This highlights a significant feature of the proposed methodology as suboptimal tuning can greatly hamper classification performance; e.g., SVMs fit with non-optimal kernels perform significantly worse.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner B, Tucker PA, Vasudevan V, Warden P, Wicke M, Yu Y, Zhang X (2016) Tensorflow: a system for large-scale machine learning. In: OSDI, vol 16, pp 265–283
Alpaydin E (2014) Introduction to machine learning (adaptive computation and machine learning). The MIT Press, Cambridge
MATH Google Scholar
Anderson JA, Rosenfeld E (eds) (1988) Neurocomputing: foundations of research. MIT Press, Cambridge
Google Scholar
Bensmail H, Celeux G (1996) Regularized Gaussian discriminant analysis through eigenvalue decomposition. J Am Stat Assoc 91:1743–1748
MathSciNet MATH Google Scholar
Bergé L, Bouveyron C, Girard S (2012) HDclassif: an R package for model-based clustering and discriminant analysis of high-dimensional data. J Stat Softw 46(6):1–29
Google Scholar
Beygelzimer A, Kakadet S, Langford J, Arya S, Mount D, Li S (2013) FNN: fast nearest neighbor search algorithms and applications. R package version 1:1
Blackwell D, MacQueen JB (1973) Ferguson distributions via Polya urn schemes. Ann Stat 1:353–355
MATH Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory. ACM, pp 144–152
Bouveyron C, Girard S, Schmid C (2007) High-dimensional discriminant analysis. Commun Stat Theory Methods 36:2607–2623
MathSciNet MATH Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
MATH Google Scholar
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167
Google Scholar
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28
Google Scholar
Cipolli W, Hanson T (2017) Computationally tractable approximate and smoothed Polya trees. Stat Comput 27(1):39–51
MathSciNet MATH Google Scholar
Cipolli W, Hanson T, McLain A (2016) Bayesian nonparametric multiple testing. Comput Stat Data Anal 101:64–79
MathSciNet MATH Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297
MATH Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
MATH Google Scholar
Cox DR (1958) The regression analysis of binary sequences. J R Stat Soc Ser B (Methodol) 20:215–242
MathSciNet MATH Google Scholar
Cox DR (1966) Some procedures associated with the logistic qualitative response curve. Wiley, New York
MATH Google Scholar
Deng H (2014) Interpreting tree ensembles with intrees. arXiv preprint arXiv:1408.5456
Duan K, Keerthi SS (2005) Which is the best multiclass SVM method? An empirical study. In: Proceedings of the sixth international workshop on multiple classifier systems, pp 278–285
Google Scholar
Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York
MATH Google Scholar
Dudani SA (1976) The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern 6:325–327
Google Scholar
Ferguson TS (1974) Prior distributions on spaces of probability measures. Ann Stat 02:615–629
MathSciNet MATH Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
Google Scholar
Florida R (2011) America’s great passport divide. http://www.theatlantic.com/national/archive/2011/03/americas-great-passport-divide/72399/. Accessed 15 Mar 2011
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97:611–631
MathSciNet MATH Google Scholar
Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84:165–175
MathSciNet Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
MATH Google Scholar
Hannah LA, Blei DM, Powell WB (2011) Dirichlet process mixtures of generalized linear models. J Mach Learn Res 12:1923–1953
MathSciNet MATH Google Scholar
Hanson T (2006) For mixtures of finite Polya tree models. J Am Stat Assoc 101:1548–1565
MathSciNet MATH Google Scholar
Hanson T, Branscum A, Gardner I (2008) Multivariate mixtures of Polya trees for modelling ROC data. Stat Model 8:81–96
MathSciNet Google Scholar
Hanson T, Chen Y (2014) Bayesian nonparametric k-sample tests for censored and uncensored data. Comput Stat Data Anal 71:335–346
MathSciNet MATH Google Scholar
Hanson T, Monteiro J, Jara A (2011) The Polya tree sampler: towards efficient and automatic independent Metropolis-Hastings proposals. J Comput Graph Stat 20:41–62
MathSciNet Google Scholar
Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc Series B (Methodol) 58:155–176
MathSciNet MATH Google Scholar
Hastie T, Tibshirani R (1998) Classification by pairwise coupling. Ann Stat 26:451–471
MathSciNet MATH Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The Elements of statistical learning: data mining, inference, and prediction. Springer, Berlin
MATH Google Scholar
Ho TK (1995) Random decision forests. In: Third international conference on document analysis and recognition, ICDAR 1995, August 14–15, 1995, Montreal, Canada. Vol I, pp 278–282
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417–441
MATH Google Scholar
Izenman AJ (1991) Recent developments in nonparametric density estimation. J Am Stat Assoc 86:205–224
MathSciNet MATH Google Scholar
Jara A, Hanson T, Lesaffre E (2009) Robustifying generalized linear mixed models using a new class of mixtures of multivariate Polya trees. J Comput Graph Stat 18:838–860
MathSciNet Google Scholar
Jiang L, Wang D, Cai Z, Yan X (2007) Survey of improving naive Bayes for classification. In: Proceedings of the 3rd international conference on advanced data mining and applications. Springer, pp 134–145
Karsoliya S (2012) Approximating number of hidden layer neurons in multiple hidden layer BPNN architecture. Int J Eng Trends Technol 12:714–717
Google Scholar
Kotsiantis SB (2007) Supervised machine learning: a review of classification. Informatica 31:249–268
MathSciNet MATH Google Scholar
Larrañaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, Lozano JA, Armañanzas R, Santafé G, Pérez A (2006) Machine learning in bioinformatics. Brief Bioinform 17:86–112
Google Scholar
Lavine M (1992) Some aspects of Polya tree distributions for statistical modelling. Ann Stat 20:1222–1235
MATH Google Scholar
Lavine M (1994) More aspects of Polya tree distributions for statistical modelling. Ann Stat 22:1161–1176
MATH Google Scholar
Ledl T (2004) Kernel density estimation: theory and application in discriminant analysis. Austrian J Stat 33:267–279
Google Scholar
Leisch F, Dimitriadou E (2015) mlbench: machine learning benchmark problems. R package version 2.1-1
Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2:18–22
Google Scholar
Ma J, Yu MK, Fong S, Ono K, Sage E, Demchak B, Sharan R, Ideker T (2018) Using deep learning to model the hierarchical structure and function of a cell. Nat Methods 15:290–298
Google Scholar
Ma Y, Guo G (2014) Support vector machines applications. Springer, Berlin
Google Scholar
Mantel N (1966) Models for complex contingency tables and polychotomous dosage response curves. Biometrics 22:83–95
Google Scholar
Marzio M, Taylor CC (2005) On boosting kernel density methods for multivariate data: density estimation and classification. Stat Methods Appl 14:163–178
MathSciNet MATH Google Scholar
Mauldin RD, Sudderth WD, Williams SC (1992) Polya trees and random distributions. Ann Stat 20:1203–1221
MATH Google Scholar
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2015) e1071: Misc functions of the department of statistics, probability theory group (Formerly: E1071), TU Wien. R package version 1.6-7
Migration Policy Institute (2014). State immigration data profiles. http://www.migrationpolicy.org/programs/data-hub/state-immigration-data-profiles. Accessed 13 Mar 2016
Mohri M, Rostamizadeh A, Talwalkar A (2012) Foundations of machine learning. The MIT Press, Cambridge
MATH Google Scholar
Montavon G, Lapuschkin S, Binder A, Samek W, Müller K-R (2017) Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognit 65:211–222
Google Scholar
Montavon G, Samek W, Müller K-R (2018) Methods for interpreting and understanding deep neural networks. Digit Sig Process 73:1–15
MathSciNet Google Scholar
Mukhopadhyay S, Ghosh A (2011) Bayesian multiscale smoothing in supervised and semi-supervised kernel discriminant analysis. Comput Stat Data Anal 55:2344–2353
MathSciNet MATH Google Scholar
Müller P, Rodriguez A (2013) Chapter 4: Polya Trees, volume 9 of NSF-CBMS regional conference series in probability and statistics. Institute of Mathematical Statistics and American Statistical Assocation, pp 43–51
National Archives and Records Administration (2012) Historical election results. http://www.archives.gov/federal-register/electoral-college/historical.html. Accessed 13 Mar 2016
Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: Advances in neural information processing systems, pp 841–848
Paddock S, Ruggeri F, Lavine M, West M (2003) Randomised Polya tree models for nonparametric Bayesian inference. Statistica Sinica 13:443–460
MathSciNet MATH Google Scholar
Pati D, Bhattacharya A, Pillai NS, Dunson D (2014) Posterior contraction in sparse bayesian factor models for massive covariance matrices. Ann Stat 42(3):1102–1130
MathSciNet MATH Google Scholar
Plastria F, De Bruyne S, Carrizosa E (2008) Dimensionality reduction for classification. In: International conference on advanced data mining and applications. Springer, pp 411–418
R Core Team (2014) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Rao CR (1948) The utilization of multiple measurements in problems of biological classification. J R Stat Soc Ser B 10:159–203
MathSciNet MATH Google Scholar
Ripley BD (2007) Pattern recognition and neural networks. Cambridge University Press, Cambridge
MATH Google Scholar
Rish I (2001) An empirical study of the naive Bayes classifier. Technical report, IBM
Rojas R (1996) Neural networks: a systematic introduction. Springer, New York
MATH Google Scholar
Runcie DE, Mukherjee S (2013) Dissecting high-dimensional phenotypes with bayesian sparse factor analysis of genetic covariance matrices. Genetics 194(3):753–767
Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Google Scholar
Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge
Google Scholar
Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):205–233
Google Scholar
Shahbaba B, Neal R (2009) Nonlinear models using Dirichlet process mixtures. J Mach Learn Res 10:1829–1850
MathSciNet MATH Google Scholar
Steinwart I, Christmann A (2008) Support vector machines. Springer, Berlin
MATH Google Scholar
Tax Foundation (2007). Federal taxes paid vs. federal spending received by state, 1981–2005. http://taxfoundation.org/article/federal-taxes-paid-vs-federal-spending-received-state-1981-2005. Accessed 13 Mar 2016
Tsang IW, Kwok JT, Cheung P-M (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392
MathSciNet MATH Google Scholar
United States Census Bureau (2010) American community survey, education attainment for states, percent with high school diploma and with bachelor’s degree: 2010. https://www.census.gov/newsroom/releases/xls/cb12-33table1states.xls. Accessed 13 Mar 2016
United States Census Bureau (2014) State median income. https://www.census.gov/hhes/www/income/data/statemedian/. Accessed 13 Mar 2016
United States Department of State Bureau of Consular Affairs (2015) U.S. passports and international travel: passport statistics. https://travel.state.gov/content/passports/en/passports/statistics.html. Accessed 13 Mar 2016
Vapnik VN (1979) Estimation of dependences based on empirical data. Nauka, USSR (in Russian)
MATH Google Scholar
Vapnik VN, Chervonenkis A (1963) A note on one class of perceptrons. Autom Remote Control 25:774–780
Google Scholar
Vapnik VN, Lerner A (1962) Pattern recognition using generalized portrait method. Autom Remote Control 24:709–715
Google Scholar
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York. ISBN 0-387-95457-0. http://www.stats.ox.ac.uk/pub/MASS4
MATH Google Scholar
Wong WH, Ma L (2010) Optional Polya tree and Bayesian inference. Ann Stat 38:1433–1459
MATH Google Scholar
Yegnanarayana B (2004) Artificial neural networks. Prentice-Hall, New Jersey
Google Scholar
Zambom AZ, Dias R (2013) A review of kernel density estimation with applications to econometrics. Int Econ Rev (IER) 5:20–42
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Colgate University, Hamilton, USA
William Cipolli III
Department of Statistics, University of South Carolina, Columbia, USA
Timothy Hanson

Authors

William Cipolli III
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Hanson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William Cipolli III.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 130 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cipolli, W., Hanson, T. Supervised learning via smoothed Polya trees. Adv Data Anal Classif 13, 877–904 (2019). https://doi.org/10.1007/s11634-018-0344-z

Download citation

Received: 03 August 2017
Revised: 04 September 2018
Accepted: 17 September 2018
Published: 12 October 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s11634-018-0344-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised learning via smoothed Polya trees

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

Density-Based Clustering Based on Hierarchical Density Estimates

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 130 KB)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Supervised learning via smoothed Polya trees

Abstract

Access this article

Similar content being viewed by others

A random forest guided tour

Density-Based Clustering Based on Hierarchical Density Estimates

Learning from imbalanced data: open challenges and future directions

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 130 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation