Another look at linear programming for feature selection via methods of regularization

Yao, Yonggang; Lee, Yoonkyung

doi:10.1007/s11222-013-9408-2

Another look at linear programming for feature selection via methods of regularization

Published: 14 June 2013

Volume 24, pages 885–905, (2014)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Yonggang Yao¹ &
Yoonkyung Lee²

527 Accesses
3 Citations
Explore all metrics

Abstract

We consider statistical procedures for feature selection defined by a family of regularization problems with convex piecewise linear loss functions and penalties of l ₁ nature. Many known statistical procedures (e.g. quantile regression and support vector machines with l ₁-norm penalty) are subsumed under this category. Computationally, the regularization problems are linear programming (LP) problems indexed by a single parameter, which are known as ‘parametric cost LP’ or ‘parametric right-hand-side LP’ in the optimization theory. Exploiting the connection with the LP theory, we lay out general algorithms, namely, the simplex algorithm and its variant for generating regularized solution paths for the feature selection problems. The significance of such algorithms is that they allow a complete exploration of the model space along the paths and provide a broad view of persistent features in the data. The implications of the general path-finding algorithms are outlined for several statistical procedures, and they are illustrated with numerical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fundamentals of Artificial Neural Networks and Deep Learning

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

References

Barrodale, I., Roberts, F.: An improved algorithm for discrete l ₁ linear approximation. SIAM J. Numer. Anal. 10, 839–848 (1973)
Article MATH MathSciNet Google Scholar
Benson, H.Y., Shanno, D.F.: An exact primal—dual penalty method approach to warmstarting interior-point methods for linear programming. Comput. Optim. Appl. 38(3), 371–399 (2007)
Article MATH MathSciNet Google Scholar
Bertsimas, D., Tsitsiklis, J.: Introduction to Linear Programming. Athena Scientific, Belmont (1997)
Google Scholar
Bloomfield, P., Steiger, W.: Least absolute deviations curve-fitting. SIAM J. Sci. Comput. 1, 290–301 (1980)
Article MATH MathSciNet Google Scholar
Bloomfield, P., Steiger, W.L.: Least Absolute Deviations: Theory, Applications, and Algorithms. Birkhäuser, Basel (1983)
MATH Google Scholar
Bondell, H., Reich, B.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64, 115–123 (2008)
Article MATH MathSciNet Google Scholar
Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Shavlik, J. (ed.) Machine Learning Proceedings of the Fifteenth International Conference, pp. 82–90. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Candes, E., Tao, T.: The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35, 2313–2351 (2007)
Article MATH MathSciNet Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Dantzig, G.: Linear Programming and Extensions. Princeton University Press, Princeton (1951)
Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression (with discussion). Ann. Stat. 32, 407–451 (2004)
Article MATH MathSciNet Google Scholar
Fisher, W.D.: A note on curve fitting with minimum deviations by linear programming. J. Am. Stat. Assoc. 56, 359–362 (1961)
Article Google Scholar
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007). doi:10.1214/07-AOAS131
Article MATH MathSciNet Google Scholar
Gal, T.: Postoptimal Analyses, Parametric Programming, and Related Topics. McGraw–Hill, New York (1979)
MATH Google Scholar
Gass, S., Saaty, T.: The computational algorithm for the parametric objective function. Nav. Res. Logist. Q. 2, 39–45 (1955a)
Article MathSciNet Google Scholar
Gass, S., Saaty, T.: The parametric objective function (part 2). J. Oper. Res. Soc. Am. 3, 395–401 (1955b)
MathSciNet Google Scholar
Gill, P.E., Murray, W., Wright, M.H.: Numerical Linear Algebra and Optimization. Addison–Wesley, Reading (1991)
MATH Google Scholar
Gondzio, J., Grothey, A.: A new unblocking technique to warmstart interior point methods based on sensitivity analysis. SIAM J. Optim. 19(3), 1184–1210 (2008)
Article MATH MathSciNet Google Scholar
Gunn, S.R., Kandola, J.S.: Structural modelling with sparse kernels. Mach. Learn. 48(1), 137–163 (2002)
Article MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2001)
Book MATH Google Scholar
Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5, 1391–1415 (2004)
MATH MathSciNet Google Scholar
Hoerl, A., Kennard, R.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970)
Article MATH Google Scholar
Karmarkar, N.: A new polynomial-time algorithm for linear programming. Combinatorica 4(4), 373–395 (1984)
Article MATH MathSciNet Google Scholar
Kato, K.: Solving l ₁ regularization problems with piecewise linear losses. J. Comput. Graph. Stat. 19(4), 1024–1040 (2010)
Article Google Scholar
Kim, S.J., Koh, K., Lustig, M., Boyd, S., Gorinevsky, D.: An interior-point method for large-scale l ₁-regularized least squares. IEEE J. Sel. Top. Signal Process. 1(4), 606–617 (2007)
Article Google Scholar
Koenker, R.: Quantile Regression (Econometric Society Monographs). Cambridge University Press, Cambridge (2005)
Book Google Scholar
Koenker, R., Bassett, G.: Regression quantiles. Econometrica 1, 33–50 (1978)
Article MathSciNet Google Scholar
Koenker, R., D’Orey, V.: Algorithm AS 229: computing regression quantiles. Appl. Stat. 36, 383–393 (1987)
Article Google Scholar
Koenker, R., D’Orey, V.: Remark on algorithm AS 229: computing dual regression quantiles and regression rank scores. Appl. Stat. 43, 410–414 (1994)
Article Google Scholar
Koenker, R., Hallock, K.: Quantile regression. J. Econ. Perspect. 15, 143–156 (2001)
Article Google Scholar
Koh, K., Kim, S.J., Boyd, S.: An interior-point method for large-scale l ₁-regularized logistic regression. J. Mach. Learn. Res. 8, 1519–1555 (2007)
MATH MathSciNet Google Scholar
Lee, Y., Cui, Z.: Characterizing the solution path of multicategory support vector machine. Stat. Sin. 16, 391–409 (2006)
MATH MathSciNet Google Scholar
Lee, Y., Kim, Y., Lee, S., Koo, J.Y.: Structured multicategory support vector machine with ANOVA decomposition. Biometrika 93(3), 555–571 (2006)
Article MATH MathSciNet Google Scholar
Li, Y., Zhu, J.: L1-norm quantile regressions. J. Comput. Graph. Stat. 17, 163–185 (2008)
Article Google Scholar
Lin, Y., Zhang, H.H.: Component selection and smoothing in multivariate nonparametric regression. Ann. Stat. 34, 2272–2297 (2006)
Article MATH Google Scholar
Mehrotra, S.: On the implementation of a primal-dual interior point method. SIAM J. Optim. 2(4), 575–601 (1992)
Article MATH MathSciNet Google Scholar
Micchelli, C., Pontil, M.: Learning the kernel function via regularization. J. Mach. Learn. Res. 6, 1099–1125 (2005)
MATH MathSciNet Google Scholar
Murty, K.: Linear Programming. Wiley, New York (1983)
MATH Google Scholar
Osborne, M., Turlach, B.: A homotopy algorithm for the quantile regression lasso and related piecewise linear problems. J. Comput. Graph. Stat. 20(4), 972–987 (2011)
Article MathSciNet Google Scholar
Osborne, M., Presnell, B., Turlach, B.: A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20(3), 389–403 (2000)
Article MATH MathSciNet Google Scholar
Park, M.Y., Hastie, T.: l ₁-regularization path algorithm for generalized linear models. J. R. Stat. Soc., Ser. B, Stat. Methodol. 69(4), 659–677 (2007)
Article MathSciNet Google Scholar
Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Ann. Stat. 35(3), 1012–1030 (2007)
Article MATH MathSciNet Google Scholar
Saaty, T., Gass, S.: The parametric objective function (part 1). J. Oper. Res. Soc. Am. 2, 316–319 (1954)
MathSciNet Google Scholar
SAS Institute: The QUANTSELECT procedure. In: SAS/Stat 12.1 User’s Guide. SAS Institute Inc., Cary (2012). http://support.sas.com/rnd/app/da/stat/procedures/quantselect.html
Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels—Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2002)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B 58, 267–288 (1996)
MATH MathSciNet Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc., Ser. B 67, 91–108 (2005)
Article MATH MathSciNet Google Scholar
Turlach, B., Venables, W., Wright, S.: Simultaneous variable selection. Technometrics 47(3), 349–363 (2005)
Article MathSciNet Google Scholar
Vanderbei, R.J.: Linear Programming: Foundations and Extensions. Springer, Berlin (1997)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
MATH Google Scholar
Wagner, H.M.: Linear programming techniques for regression analysis. J. Am. Stat. Assoc. 54, 206–212 (1959)
Article MATH Google Scholar
Wahba, G.: Spline Models for Observational Data. Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)
Book MATH Google Scholar
Wang, L., Shen, X.: Multi-category support vector machines, feature selection and solution path. Stat. Sin. 16, 617–633 (2006)
MATH Google Scholar
Wolfe, P.: A technique for resolving degeneracy in linear programming. J. Soc. Ind. Appl. Math. 11(2), 205–211 (1963)
Article MATH MathSciNet Google Scholar
Wright, M.H.: Interior methods for constrained optimization. Acta Numer. 1, 341–407 (1992)
Article Google Scholar
Wright, S.J.: Primal-Dual Interior-Point Methods. Society for Industrial and Applied Mathematics, Philadelphia (1997)
Book MATH Google Scholar
Yao, Y.: Statistical applications of linear programming for feature selection via regularization methods. PhD thesis, The Ohio State University (2008)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc., Ser. B 68, 49–67 (2006)
Article MATH MathSciNet Google Scholar
Zhang, H.H.: Variable selection for support vector machines via smoothing spline ANOVA. Stat. Sin. 16(2), 659–674 (2006)
MATH Google Scholar
Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 36(6A), 3468–3497 (2009)
Article MathSciNet Google Scholar
Zhu, J., Rosset, S., Hastie, T., Tibshirani, R.: 1-Norm Support Vector Machines. In: Thrun, S., Saul, L., Schölkopf, B. (eds.): Advances in Neural Information Processing Systems 16. MIT Press, Cambridge (2004)
Google Scholar
Zou, H., Yuan, M.: The f _∞-norm support vector machine. Stat. Sin. 18, 379–398 (2008)
MATH MathSciNet Google Scholar

Download references

Acknowledgements

The authors thank the Editor and anonymous referees for helpful comments and additional references.

Author information

Authors and Affiliations

SAS Institute Inc., 100 SAS Campus Dr., Cary, NC, 27513, USA
Yonggang Yao
Department of Statistics, The Ohio State University, 1958 Neil Ave., Columbus, OH, 43210, USA
Yoonkyung Lee

Authors

Yonggang Yao
View author publications
You can also search for this author in PubMed Google Scholar
Yoonkyung Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoonkyung Lee.

Additional information

Lee’s research was supported in part by National Security Agency grant H98230-10-1-0202 and National Science Foundation grant DMS-12-09194.

Appendix

Lemma 1

Suppose that $\mathcal {B}^{l+1}:=\mathcal {B}^{l}\cup\{j^{l}\}\setminus\{i^{l}\}$, where $i^{l}:=k^{l}_{{i^{l}_{*}}}$. Let be defined as in (13). Then .

Proof

First observe that

Without loss of generality, the ${i^{l}_{*}}$th column vector ${{\bf A}}_{i^{l}}$ of ${{\bf A}}_{\mathcal {B}^{l}}$ is replaced with ${{\bf A}}_{j^{l}}$ to give ${{\bf A}}_{\mathcal {B}^{l+1}}$. For the ${{\bf A}}_{\mathcal {B}^{l+1}}$,

(27)

where . Thus, we have

$$ {{\bf A}}_{\mathcal {B}^{l+1}}^{-1}{{\bf A}}_{\mathcal {B}^l}= \left [ \begin{array}{c@{\quad}c@{\quad}c@{\quad}c@{\quad}c@{\quad}c} 1&&-u^l_1/{u}^l_{i^l_*}&&\\&\ddots&\vdots&&\\&&1/u^l_{i^l_*}&&\\&&\vdots&\ddots&\\&&-u^l_M/{u}^l_{i^l_*} &&1 \end{array} \right ]. $$

(28)

Then it immediately follows that . Hence, . □

1.1 8.1 Proof of (15)

For l=0,…,J−1, consider the following difference

By the intermediate calculation in Lemma 1, we can show that the difference is $\kappa_{\lambda_{l}}{{\bf A}}^{\top}({{\bf A}}_{\mathcal {B}^{l}}^{-1} )^{\top} {{\bf e}}_{{i^{l}_{*}}}$, where

$$\begin{aligned} \kappa_{\lambda_l} :=&({{c}}_{{\tiny k^l_{{i^l_*}}}} +\lambda_l {{a}}_{{\tiny k^l_{{i^l_*}}}}) -\frac{{{c}}_{j^l}+\lambda_l{{a}}_{j^l}}{u^l_{{i^l_*}}}\\&{}+\sum_{i\in {\tiny (\mathcal {B}^{l+1}\setminus\{j^l\} )}} \frac{({{c}}_i+\lambda_l{{a}}_i){u}^l_i}{u^l_{{i^l_*}}} \\=& \frac{({{\bf c}}_{\mathcal {B}^l}+\lambda_l{{\bf a}}_{\mathcal {B}^l})^\top {{\bf A}}_{\mathcal {B}^l}^{-1} {{\bf A}}_{j^l}-({{c}}_{j^l}+\lambda_l{{a}}_{j^l})}{u^l_{{i^l_*}}} \\=&-\frac{\check{{{c}}}^l_{j^l}+\lambda_l\check{{{a}}}^l_{j^l}}{u^l_{{i^l_*}}}. \end{aligned}$$

Since $\lambda_{l}:= -\check{{{c}}}^{l}_{j^{l}}/ {\check{{{a}}}^{l}_{j^{l}}}$, $\kappa_{\lambda_{l}}=0$, which proves (15).

1.2 8.2 Proof of Theorem 3

Let ${\mathfrak {B}}^{l}:=\mathcal {B}^{l}\cup\{j^{l}\}$ for l=0,…,J−1, and ${\mathfrak {B}}^{J}:=\mathcal {B}^{J}\cup\{N+1\}$, where $\mathcal {B}^{l}$, $\mathcal {B}^{J}$, andj ^l are as defined in the simplex algorithm. We will show that, for any fixed s∈[s _l,s _l+1) (or s≥s _J), ${\mathfrak {B}}^{l}$ (or ${\mathfrak {B}}^{J}$) is an optimal basic index set for the LP problem in (10).

For simplicity, let j ^J:=N+1, c _N+1:=0, , and a _N+1:=1. The inverse of

$$\begin{aligned} {\mbox {$\mathbb {A}$}}_{{\mathfrak {B}}^l} =&\left [ \begin{array}{c@{\quad}c} {{\bf A}}_{\mathcal {B}^l} & {{\bf A}}_{j^l}\\ {{\bf a}}_{\mathcal {B}^l}^\top& {{a}}_{j^l} \end{array} \right ] \end{aligned}$$

is given by

for l=0,…,J.

First, we show that ${\mbox {$\mathbb {A}$}}_{{{\mathfrak {B}}^{l}}}$ is a feasible basic index set of (10) for s∈[s _l,s _l+1], i.e.

(29)

Recalling that , $z^{l}_{j^{l}}=0$, , , and $d^{l}_{j^{l}}=1$, we have

(30)

From and

it can be shown that

Thus, (30) is a convex combination of and for s∈[s _l,s _l+1], and hence it is non-negative. This proves the feasibility of ${\mbox {$\mathbb {A}$}}_{{\mathfrak {B}}^{l}}$ for s∈[s _l,s _l+1] and l=0,…,J−1. For s≥s ^J, we have

Next, we prove that ${\mbox {$\mathbb {A}$}}_{{\mathfrak {B}}^{l}}$ is an optimal basic index set of (10) for s∈[s _l,s _l+1] by showing . For i=1,…,N, the ith element of is

Similarly, for s≥s ^J,

$$\begin{aligned} {{c}}_i-\left [ \begin{array}{c} {{{\bf c}}_{\mathcal {B}^J}}\\0 \end{array} \right ]^\top {\mbox {$\mathbb {A}$}}_{{\mathfrak {B}}^J}^{-1}\left [ \begin{array}{c} {{\bf A}}_i\\{{a}}_i \end{array} \right ] =&{{c}}_i- {{\bf c}}_{\mathcal {B}^J}^\top {{\bf A}}_{\mathcal {B}^J}^{-1}{{\bf A}}_i \\=&\left\{ \begin{array} {l@{\quad}l} \check{{{c}}}^J_i &\mbox{for }i=1,\ldots,N\\0&\mbox{for }i=N+1. \end{array} \right. \end{aligned}$$

Clearly, the optimality condition holds by the non-negativity of all the elements as defined in the simplex algorithm. This completes the proof.

1.3 8.3 Proof of Theorem 4

(i) By (28), we can update the pivot rows of the tableau as follows:

(31)

If $u^{l}_{i}=0$, the ith pivot row of $\mathcal {B}^{l+1}$ is the same as the ith pivot row of . For $i={i^{l}_{*}}$, the ith pivot row of ${\mathcal {B}^{l+1}}$ is $(1/{u}^{l}_{i^{l}_{*}})$ . If $i\neq{i^{l}_{*}}$ and ${u}^{l}_{i}<0$, which imply $-(u^{l}_{i}/{u}^{l}_{{i^{l}_{*}}})>0$, the ith pivot row of since the sum of any two lexicographically positive vectors is still lexicographically positive. According to the tableau update algorithm, we have ${u}^{l}_{{i^{l}_{*}}}>0$, where ${i^{l}_{*}}$ is the index number of the lexicographically smallest pivot row among all the pivot rows for $\mathcal {B}^{l}$ with ${u}^{l}_{i}>0$. For $i\neq{i^{l}_{*}}$ and ${u}^{l}_{i}>0$, by the definition of ${i^{l}_{*}}$, $\mbox{(the ${i^{l}_{*}}$th pivot row of ${\mathcal {B}^{l}}$)}/{u}^{l}_{i^{l}_{*}}\overset{L}{<} \mbox{(the $i$th pivot row of ${\mathcal {B}^{l}}$)}/{u}^{l}_{i}$. This implies

Therefore, all the updated pivot rows are lexicographically positive.

Remark 2

If $z^{l}_{{i^{l}}}=0$, (31) implies that $z^{l}_{k^{l}_{i}}=z^{l+1}_{k^{l}_{i}}$ for $i\neq{i^{l}_{*}}$, $i\in \mathcal {M}$. and $z^{l+1}_{j^{l}}=0$. Hence . On the other hand, if $z^{l}_{i^{l}}>0$, $z^{l+1}_{j^{l}} =(z^{l}_{i^{l}}/{u}^{l}_{j^{l}})>0$ while $z^{l}_{j^{l}}=0$ since $j^{l}\notin \mathcal {B}^{l}$. This implies . Therefore, if and only if $z_{i^{l}}^{l}=0$.

(ii) When the basic index set $\mathcal {B}^{l}$ is updated to $\mathcal {B}^{l+1}$, $\check{{{c}}}^{l}_{j^{l}}<0$. Since $j^{l}\in \mathcal {B}^{l+1}$, $\check{{{c}}}^{l+1}_{j^{l}}=0$. Then, $({{\bf c}}_{j^{l}}-{{\bf c}}_{\mathcal {B}^{l+1}}^{\top} {{\bf A}}_{\mathcal {B}^{l+1}}^{-1}{{\bf A}}_{j^{l}}) -({{\bf c}}_{j^{l}}-{{\bf c}}_{\mathcal {B}^{l}}^{\top} {{\bf A}}_{\mathcal {B}^{l}}^{-1}{{\bf A}}_{j^{l}}) =(\check{{{c}}}^{l+1}_{j^{l}}-\check{{{c}}}^{l}_{j^{l}})>0$.

Similarly as the proof of (15),

$$\bigl({{\bf c}}^\top-{{\bf c}}_{\mathcal {B}^{l+1}}^\top {{\bf A}}_{\mathcal {B}^{l+1}}^{-1}{{\bf A}}\bigr)- \bigl({{\bf c}}^\top- {{\bf c}}_{\mathcal {B}^l}^\top {{\bf A}}_{\mathcal {B}^l}^{-1}{{\bf A}}\bigr) = \kappa^l{{\bf e}}_{{i^l_*}}^\top {{\bf A}}_{\mathcal {B}^l}^{-1} {{\bf A}}, $$

where . ${{\bf e}}_{{i^{l}_{*}}}^{\top} {{\bf A}}_{\mathcal {B}^{l}}^{-1}{{\bf A}}$ is the ${i^{l}_{*}}$th pivot row for $\mathcal {B}^{l}$, which is lexicographically positive. Since the j ^lth entry of ${{\bf e}}_{{i^{l}_{*}}}^{\top} {{\bf A}}_{\mathcal {B}^{l}}^{-1}{{\bf A}}$ is strictly positive, that of $({{\bf c}}^{\top}-{{\bf c}}_{\mathcal {B}^{l+1}}^{\top} {{\bf A}}_{\mathcal {B}^{l+1}}^{-1}{{\bf A}})- ({{\bf c}}^{\top}-{{\bf c}}_{\mathcal {B}^{l}}^{\top} {{\bf A}}_{\mathcal {B}^{l}}^{-1}{{\bf A}})$ must share the same sign with κ ^l. Thus, we have κ ^l>0. Then the updated cost row is given as

Clearly, the cost row for $\mathcal {B}^{l+1}$ is lexicographically greater than that for $\mathcal {B}^{l}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yao, Y., Lee, Y. Another look at linear programming for feature selection via methods of regularization. Stat Comput 24, 885–905 (2014). https://doi.org/10.1007/s11222-013-9408-2

Download citation

Received: 05 May 2012
Accepted: 29 May 2013
Published: 14 June 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11222-013-9408-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Another look at linear programming for feature selection via methods of regularization

Abstract

Access this article

Similar content being viewed by others

Fundamentals of Artificial Neural Networks and Deep Learning

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Acknowledgements