Abstract
This paper studies four families of polyhedral norms parametrized by a single parameter. The first two families consist of the CVaR norm (which is equivalent to the D-norm, or the largest-\(k\) norm) and its dual norm, while the second two families consist of the convex combination of the \(\ell _1\)- and \(\ell _\infty \)-norms, referred to as the deltoidal norm, and its dual norm. These families contain the \(\ell _1\)- and \(\ell _\infty \)-norms as special limiting cases. These norms can be represented using linear programming (LP) and the size of LP formulations is independent of the norm parameters. The purpose of this paper is to establish a relation of the considered LP-representable norms to the \(\ell _p\)-norm and to demonstrate their potential in optimization. On the basis of the ratio of the tight lower and upper bounds of the ratio of two norms, we show that in each dual pair, the primal and dual norms can equivalently well approximate the \(\ell _p\)- and \(\ell _q\)-norms, respectively, for \(p,q\in [1,\infty ]\) satisfying \(1/p+1/q=1\). In addition, the deltoidal norm and its dual norm are shown to have better proximity to the \(\ell _p\)-norm than the CVaR norm and its dual. Numerical examples demonstrate that LP solutions with optimized parameters attain better approximation of the \(\ell _{2}\)-norm than the \(\ell _1\)- and \(\ell _\infty \)-norms do.















Similar content being viewed by others
Notes
In many practical situation, the distance between two points is measured by the length of the line segment connecting them. In statistics, random variables are usually evaluated via the \(\ell _2\)-norm (e.g., variance, \(\chi ^2\) value, and the sum of squared errors).
For example, in machine learning (e.g., the lasso of [24] and the elastic net of [27]) and compressed sensing (e.g., [7]), it is employed because its minimization often brings a sparse solution, i.e., a solution having many zero-elements, which is in contrast to the \(\ell _2\)-norm minimization usually resulting in a dense solution.
For example, with a slight modification, minimizing the absolute deviation from the expected value in place of the variance is considered as an application of the approximation of the \(\ell _2\)-norm by the \(\ell _1\)-norm and often employed in various contexts (e.g., [1] for the statistical regression; [15] for agricultural planning; [17] for financial optimization).
Also, its counterparts are used in various context (e.g., in gauging the distance of two cumulative distribution functions for the Kolmogorov-Smirnov test).
MATLAB and R on a 64-bit PC output ‘Inf’ (i.e., \(\infty \)), which is clearly a wrong answer. On the other hand, algebraic computation softwares such as SageMathCloud output a reasonable value.
Another promising example is in a recent paper of the authors. Indeed, [12] reports that by applying the cross-validation method for calibrating the norm parameter, a support vector machine (SVM) with the CVaR norm regularizer attained better out-of-sample performance than the standard \(\ell _2\) regularizer SVM within a comparable amount of computation time.
It is known that \(p\)OCP can be efficiently approximated by a single LP via the polyhedral approximations by [3, 25]. Also, we cannot assert that the replacement of \(p\)OCP with an LP-representable norm approximation always improves the computational efficiency. Indeed, there are cases which need cautious treatment. For example, let us consider the least square regression via variable selection. It is known to be formulated as a 0-1 mixed integer quadratic programming (MIQP) problem, and that some researches propose an alternative of a 0-1 mixed integer LP (MILP) by replacing the absolute error to the squared error. However, as long as we compared these two formulations for the UCI repository data sets by using IBM ILOG CPLEX 12.5, we could not find any advantage of the MILP approach. In fact, the MILP approach spent much more time than the MIQP one. This may be because the additional increase of the number of constraints excessively relaxed the subproblems generated in the branch-and-bound scheme.
Indeed, several recent software packages, such as Portfolio Safeguard [21] and CVX [13], include the \(\ell _1\)- and \(\ell _\infty \)-norms as built-in functions. These examples show a potential usability of the parametrized families of norms. Those packages can easily incorporate the user-defined functions on the basis of simple operations such as ‘\(+\)’ and ‘max.’ The two norms are easily readily available like the \(\ell _2\)-norm since the dual CVaR and the deltoidal norm can be represented using the \(\ell _1\) and \(\ell _\infty \) norms and those operations, as will be described later.
In this sense, \(\langle \!\langle \cdot \rangle \!\rangle _\alpha \) may be called the knapsack norm.
The complexity of calculating the CVaR norm itself is defined by the complexity of a selection algorithm (and summation), i.e., \(O(n)\).
In [20], ‘\(p\)’ is used in place of \(\kappa \). In order to avoid a mix-up with the parameter ‘\(p\)’ for the \(\ell _p\)-norm, we employ \(\kappa \) instead. Also, this replacement may be convenient to remind the readers of the fact that for \(\alpha =\frac{j}{n},j=0,1,\ldots ,n-1\), the CVaR norm is equivalent to the largest-‘\(k\)’ norm. Indeed, the usage of \(\kappa \) implies that it can take any non-integer values, while \(k\) can often be linked only to integers.
Throughout the paper the notation \(\Vert \cdot \Vert \) (or \(\Vert \cdot \Vert '\)) is reserved for any norm, while \(\Vert \cdot \Vert _p\), \(\langle \!\langle \cdot \rangle \!\rangle _{\alpha }\), \((\!(\cdot )\!)_\lambda \) and \(|\!|\!|\cdot |\!|\!|_\kappa \) are for the named norms.
We write vectors in \({\mathbb {R}}^n\) as column vectors in the inner products.
Note that the dual CVaR norm formula implies the duality between the \(\ell _1\)-norm and the \(\ell _\infty \)-norm, i.e., \(\Vert \cdot \Vert _1^*=\Vert \cdot \Vert _\infty \) and \(\Vert \cdot \Vert _\infty ^*=\Vert \cdot \Vert _1\).
Indeed, it can be shown that if we have \(L\le \Vert \varvec{x}\Vert /\Vert \varvec{x}\Vert _2\le U\) for a norm \(\Vert \cdot \Vert \), then we also have \(L\le \Vert \varvec{x}\Vert /\Vert \varvec{x}\Vert _2\le \Vert \varvec{x}\Vert _2/\Vert \varvec{x}\Vert ^*\le U\).
Note that this ratio is independent of the locations of the norms in the ratio. Indeed, \(L\le \Vert \varvec{x}\Vert /\Vert \varvec{x}\Vert '\le U\) is equivalently presented by \(1/U\le \Vert \varvec{x}\Vert '/\Vert \varvec{x}\Vert \le 1/L\), but the ratio remains to be equal to \(U/L\). Note also that \(U/L\ge 1\) for any pair of norms \(\Vert \cdot \Vert ,\Vert \cdot \Vert '\), and \(U/L=1\) implies \(\Vert \cdot \Vert =\Vert \cdot \Vert '\). Further, the value \(\log (U/L)\) becomes a metric (or distance) of norms of \({\mathbb {R}}^n\). From the approximation viewpoint, the closer \(U/L\) is to \(1\) (or the closer \(\log (U/L)\) is to \(0\)), the better.
By construction, the ratio \(U/L\) evaluates the proximity in a worst-case sense.
The function (11) is well-defined and satisfies \(f_{n,p}(\kappa )\ge 1\) for any \(\kappa \in [1,n]\).
These values are computed by MATLAB.
With the \(\ell _p\)-norm, we have \(1\le \Vert \varvec{x}\Vert _2/\Vert \varvec{x}\Vert _p\le n^{\frac{1}{2}-\frac{1}{p}}\) for \(p\ge 2\); \(1\le \Vert \varvec{x}\Vert _p/\Vert \varvec{x}\Vert _2\le n^{\frac{1}{p}-\frac{1}{2}}\) for \(p\le 2\), which can be shown by using the Hölder’s inequality.
An Archimedean solid is a highly symmetric, semi-regular, three-dimensional convex polytope where two or more types of regular polygons meet at each vertex, while a Catalan solid is the polytope dual to an Archimedean solid and faces therein are not regular polygons.
The LP representation of the dual deltoidal norm of \(\varvec{x}\in {\mathbb {R}}^n\) is given by the dual of (43):
$$\begin{aligned} \begin{array}{r|ll} (\!(\varvec{x})\!)^*_\lambda \equiv &{}\underset{\varvec{z}}{\text{ minimize }}&{}\varvec{1}_{n}^\top \varvec{z}\\ &{}\text{ subject } \text{ to } &{} \left\{ \lambda \varvec{I}_n+(1-\lambda )\varvec{1}_{n}\varvec{1}_{n}^\top \right\} \varvec{z}\ge |\varvec{x}|,~\varvec{z}\ge \varvec{0}.\\ \end{array} \end{aligned}$$The deltoidal norm and its dual can be written by using the CVaR-norm. Indeed, we have \( (\!(\cdot )\!)_\lambda \equiv (1-\lambda )\langle \!\langle \cdot \rangle \!\rangle _{0}+\lambda \langle \!\langle \cdot \rangle \!\rangle _{\frac{n-1}{n}}\), \((\!(\cdot )\!)^*_\lambda \equiv \max \left\{ \langle \!\langle \cdot \rangle \!\rangle _{\frac{n-1}{n}}, \frac{1}{2-\lambda }\langle \!\langle \cdot \rangle \!\rangle _{\frac{n-2}{n}},\ldots , \frac{1}{n-(n-1)\lambda }\langle \!\langle \cdot \rangle \!\rangle _{0} \right\} . \) This representation can be useful if the CVaR operation is readily available as a built-in function.
The function (21) is well-defined and satisfies \(h_{n,p}(\lambda )\ge 1\) for any \(\lambda \in [0,1]\).
In contrast, when \(n=2\), the CVaR norm and the deltoidal norm coincide and the smallest value of \(h_{2,2}\) is equal to that of \(f_{2,2}\).
The \(\alpha ^\star \) is scaled by \(\frac{n}{n-1}\) so that the range \([0,\frac{n-1}{n}]\) of \(\alpha \) covers \([0,1]\) as \(\lambda \) does.
Note that minimization of (seemingly more general) quadratic function of the form \(\frac{1}{2}\varvec{\xi }^\top \varvec{D}\varvec{\xi }+\varvec{d}^\top \varvec{\xi }\) subject to \(\varvec{H}\varvec{\xi }\le \varvec{h}\) with a positive definite matrix \(\varvec{D}\) can be reduced to the above form by replacing the matrix and variables by
$$\begin{aligned} \varvec{Q}=\frac{1}{2}\left( \begin{array}{c@{\quad }c}\varvec{D}&{}\varvec{d}\\ \varvec{d}^\top &{}d\end{array}\right) ,~~~ \varvec{x}=\left( \begin{array}{c}\varvec{\xi }\\ \xi '\end{array}\right) ,~~~ \varvec{A}=\left( \begin{array}{c@{\quad }c}\varvec{H}&{}\varvec{0}\\ \varvec{0}^\top &{}1\\ \varvec{0}^\top &{}-1\end{array}\right) ,~~~ \varvec{d}=\left( \begin{array}{c} \varvec{h}\\ 1\\ -1\end{array}\right) , \end{aligned}$$for a constant \(d>0\) such that \(\varvec{D}-\varvec{d}\varvec{d}^\top /d\) is positive semidefinite.
Note that \(\phi ^\star \equiv \Vert \varvec{y}^\star \Vert '\) with an optimal solution \((\varvec{x}^\star ,\varvec{y}^\star )\) to (22).
(27) explains that minimizing the ratio \(U/L\) makes sense for better approximation.
Instances were generated as follows. We set \(\bar{\varvec{x}}=\varvec{0}\) and \(\varvec{b}=\varvec{1}_m\). The matrices \(\varvec{A}\) were randomly generated so that the reciprocal of each element followed a uniform distribution over \((0,1)\), i.e., \(1/a_{ij}\sim \mathrm{U}(0,1)\). The computation ran over 1000 instances for each combination of \((n,m)\in \{(2,1),\) \((5,2),\) \((10,3),\) \((20,4),\) \((50,7),\) \((100,10)\}\), where \(m\) was chosen equal to \(\lfloor \sqrt{n}\rfloor \) in order for optimal solutions not to be crowded in a certain area. In the following, only the results of \(n=2,20,100\) are presented for reducing the number of pages. See the discussion paper version [11] for the results of \(n=5,10,50\).
Computations in Sections 5.1 and 5.2 were done in MATLAB on a laptop PC, Windows 7 OS. We used IBM ILOG CPLEX 12.5 for solving the QPs (\(\ell _2\)-norm) and LPs (six LP-representable norms).
To be comparable within each dual pair (e.g., the \(\ell _1\)-norm minimizer and the \(\ell _\infty \)-norm minimizer), the positions of numerators and denominators are arranged so that the ratio is greater than 1.
The link to the large-scale case study on projection with different norms: http://www.ise.ufl.edu/uryasev/research/testproblems/advanced-statistics/case-study-projection-on-polyhedron-with-cvar-absolute-norm-4/.
With \(m\) historical returns \(\varvec{R}_1,\ldots ,\varvec{R}_m\in {\mathbb {R}}^n\), \(\varvec{r}\) is often estimated by \((\varvec{R}_1,\ldots ,\varvec{R}_m)\varvec{1}_m/m\) and \(\varvec{\varSigma }\) is estimated by \(\varvec{M}^\top \varvec{M}\) with \(\varvec{M}=(\varvec{R}_1-\varvec{r},\ldots ,\varvec{R}_m-\varvec{r})^\top /\sqrt{m}\). Note that the number of elements in \(\varvec{y}\) is equal to \(m\), which is not necessarily equal to \(n\).
The covariance matrix \(\varvec{\varSigma }\) and the expected return vector \(\varvec{r}\) were estimated based on the “25 Portfolios Formed on Size and Book-to-Market Daily” data downloaded from [9]. The matrix \(\varvec{M}\) was obtained by the eigenvalue decomposition of the estimated covariance matrix \(\varvec{\varSigma }\). We solved nine instances, each corresponding to \(d=\underline{d}+(\overline{d}-\underline{d})i/10,i=1,\ldots ,9\), where \(\underline{d}:=\min \{r_1,\ldots ,r_{25}\}\) and \(\overline{d}:=\max \{r_1,\ldots ,r_{25}\}\).
Formulation (37) is motivated by a robust optimization of an uncertain continuous knapsack problem:
$$\begin{aligned} \begin{array}{|ll} \underset{\varvec{x}}{\text{ maximize }} &{} \varvec{1}_n^\top \varvec{x} \\ \text{ subject } \text{ to } &{} \tilde{\varvec{c}}^\top \varvec{x}\le C,~\varvec{0}\le \varvec{x}\le \varvec{1}_n,\\ \end{array} \end{aligned}$$where \(C\) represents a capacity of a knapsack and \(\tilde{\varvec{c}}\) represents volumes of \(n\) (liquid) items. In this formulation, \(\tilde{\varvec{c}}\) is assumed to be under uncertainty, \(\mathcal U\), of the form: \(\tilde{\varvec{c}}\in \mathcal{U}:=\{\varvec{u}\in {\mathbb {R}}^n:\sqrt{(\varvec{u}-\varvec{c})^\top \varvec{\varSigma }^{-1}(\varvec{u}-\varvec{c})}\le \delta \},\) where \(\varvec{\varSigma }^{-1}\) is a positive definite matrix determining the shape of \(\mathcal U\) and \(\delta \) defines the size of the set \(\mathcal U\).
We randomly generated 1000 instances of (37) of size \(n=100\) each, as follows: Generating \(c_i\sim \mathrm{U}(0,1),i=1,\ldots ,n\), we set \(\sigma _i=0.1c_i\) and \(C=\lfloor \varvec{c}^\top \varvec{1}_n/2\rfloor \). (This indicates that the nominal volume, \(c_i\), of each item ranges over \((0,1)\) and its standard deviation is 1/10 of the nominal volume, and the capacity is about a half of the aggregate total, \(\varvec{c}^\top \varvec{1}_n\). We also set the covariance matrix as \(\varvec{\varSigma }=\mathrm{diag}(\varvec{\sigma })\{(1-\rho )\varvec{I}_n+\rho \varvec{1}_n\varvec{1}_n^\top \}\mathrm{diag}(\varvec{\sigma })\) so that the each pair of two assets has correlation coefficient of \(\rho \). (Note that \(\varvec{\varSigma }\) is positive definite as long as \(\rho >-1/(n-1)\).) Also, we set \(\delta =10.8857\). To solve the norm constrained problems, we used CVX, a package for specifying and solving convex programs [13, 14].
References
Arthanari, T.S., Dodge, Y.: Mathematical Programming in Statistics. Wiley, New York (1981)
Aulbach, S., Falk, M., Hofmann, M.: On max-stable processes and the functional D-norm. Extremes 16, 255–283 (2013)
Ben-Tal, A., Nemirovski, A.: On polyhedral approximations of the second-order cone. Math. Oper. Res. 26, 193–205 (2001)
Bertsimas, D., Pachamanova, D., Sim, M.: Robust linear optimization under general norms. Oper. Res. Lett. 32, 510–516 (2004)
Bertsimas, D., Sim, M.: The price of robustness. Oper. Res. 52, 35–53 (2004)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Candés, E.J., Wakin, M.B.: An introduction to compressive sampling. IEEE Signal Proc. Mag. 25, 21–30 (2008)
Dattorro, J.: Convex Optimization & Euclidean Distance Geometry. Meboo Publishing, USA (2005)
French, K.: Data Library, http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html Accessed 14 May 2013
Ge, D., Jiang, X., Ye, Y.: A note on the complexity of Lp minimization. Math. Prog. 129, 285–299 (2011)
Gotoh, J., Uryasev, S.: Two pairs of families of polyhedral norms versus \(\ell _p\)-Norms: proximity and applications in optimization, research report 2013–3, Department of Industrial Systems and Engineering, University of Florida, Gainesville, FL, http://www.ise.ufl.edu/uryasev/files/2013/10/TwoPairsOfPolyhedralNormsVersusLpNorms_20131025_UFISEPD.pdf (2013)
Gotoh, J., Uryasev, S.: Support vector machines based on convex risk functionals and general norms, research report 2013–6, Department of Industrial Systems and Engineering, University of Florida, Gainesville, FL
Grant, M., Boyd, S.: CVX: MATLAB software for disciplined convex programming, version 2.0 beta, http://cvxr.com/cvx (2012)
Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs, in Recent Advances in Learning and Control, V. Blondel, S. Boyd and H. Kimura, eds., Springer. 2008, pp. 95–110. http://stanford.edu/boyd/graph_dcp.html
Hazell, P.B.R.: A linear alternative to quadratic and semivariance programming for farm planning under uncertainty. Am. J. Agr. Econ. 53, 53–62 (1971)
Kloft, M., Brefeld, U., Sonnenburg, S., Zien, A.: \(\ell _p\)-norm multiple kernel learning. J. Mach. Learn. Res. 12, 953–997 (2011)
Konno, H., Yamazaki, H.: Mean-absolute deviation portfolio optimization model and its applications to Tokyo stock market. Manage. Sci. 37, 519–531 (1991)
Krokhmal, P.: Higher moment risk measures. Quant. Financ. 7, 373–387 (2007)
Mafusalov, A., Uryasev, S.: Conditional value-at-risk (CVaR) norm: stochastic case, research report 2013–5, Department of Industrial Systems and Engineering, University of Florida, Gainesville, FL, http://www.ise.ufl.edu/uryasev/files/2014/01/CVaR_norm_stochastic_case.pdf (2013)
Pavlikov, K., Uryasev, S.: CVaR norm and applications in optimization. Optim. Lett. 8, 1–22 (2014)
Portfolio Safeguard: http://www.aorda.com/aod/psg.action
Rockafellar, T.R., Uryasev, S.: Optimization of conditional value-at-risk. J. Risk 2, 21–41 (2000)
Rockafellar, T.R., Uryasev, S.: Conditional value-at-risk for general loss distributions. J. Bank. Financ. 26, 1443–1471 (2002)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B Met. 58, 267–288 (1996)
Vinel, A., Krokhmal, P.: Polyhedral approximations in \(p\)-order cone programming. Optim. Method Softw. 29, 1210–1237 (2014)
Xue, G., Ye, Y.: An efficient algorithm for minimizing a sum of P-norms. SIAM J.Optim. 10, 551–579 (1997)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B Met. 67, 301–320 (2005)
Acknowledgments
This research was done during the visit of the first author of the paper to the University of Florida, USA. The first author appreciates the financial support by Chuo university and the environmental support by the Department of Industrial and Systems Engineering of the University of Florida. The research of the first author is supported in part by a MEXT Grant-in-Aid for Young Scientists (B) 23710176. Research of the second author of the paper was partially supported by the AFOSR grant FA9550-11-1-0258, “New Developments in Uncertainty: Linking Risk Management, Reliability, Statistics and Stochastic Optimization.” The authors also thank Matt Norton for his help with proofreading.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of Propositions
Appendix: Proof of Propositions
1.1 Proof of Proposition 1
To show the lower bound of (6), consider the following maximization problem:
By symmetry, the optimal value is equal to
Since this is a convex maximization over a polytope, an optimal solution is attained at some vertex of the polytope. The constraints consist of one equality and \(\lfloor \kappa \rfloor +1\) inequalities while the dimension is \(\lfloor \kappa \rfloor +1\), and therefore, at most one inequality can be unbinding. Besides, due to the first (equality) constraint, it is impossible that all inequalities hold with equalities.
-
[Case 1: \(x_{\lfloor \kappa \rfloor +1}>0\)] The corresponding vertex satisfies \(x_1=\cdots =x_{\lfloor \kappa \rfloor +1}=\frac{1}{\kappa }\), and its objective value is \(\frac{n}{\kappa ^p}\).
-
[Case 2: \(x_{j}=x_{\lfloor \kappa \rfloor +1}=0\) for \(j\ne \hat{j}\) and \(x_{\hat{j}}>0\) for some \(\hat{j}\in \{1,\ldots ,\lfloor \kappa \rfloor \}\)] In this case, \(x_{\hat{j}}=1\) and the vertices attain the objective value of \(1\).
Therefore, we have \(\Vert \varvec{x}\Vert _p/\langle \!\langle \varvec{x}\rangle \!\rangle _\alpha \!\le \!\max \{1,n^\frac{1}{p}\kappa ^{-1}\}\), and \(\langle \!\langle \varvec{x}\rangle \!\rangle _\alpha /\Vert \varvec{x}\Vert _p\le \min \{1,\kappa n^{-\frac{1}{p}}\}\).
To show the upper bound of (6), consider the following minimization problem:
By using the symmetry, the optimal value is equal to
and we can consider the reduced problem having \(\lfloor \kappa \rfloor +1\) variables. A solution with \(x_j=\{\lfloor \kappa \rfloor +(\kappa -\lfloor \kappa \rfloor )^{\frac{p}{p-1}}\}^{-1}\), \(j=1,\ldots ,\lfloor \kappa \rfloor \), and \(x_{\lfloor \kappa \rfloor +1}=(\kappa -\lfloor \kappa \rfloor )^{\frac{1}{p-1}}\{\lfloor \kappa \rfloor +(\kappa -\lfloor \kappa \rfloor )^{\frac{p}{p-1}}\}^{-1}\) (and \(x_j=0,j=\lfloor \kappa \rfloor +2,\ldots ,n\)) satisfies the Karush-Kuhn-Tucker (KKT) condition. Since the problem is a convex minimization, this solution is optimal and the optimal value is \(\{\lfloor \kappa \rfloor +(\kappa -\lfloor \kappa \rfloor )^{\frac{p}{p-1}}\}^{1-p}\). Consequently, \(\Vert \varvec{x}\Vert _p/\langle \!\langle \varvec{x}\rangle \!\rangle _\alpha \ge \{\lfloor \kappa \rfloor +(\kappa -\lfloor \kappa \rfloor )^{\frac{p}{p-1}}\}^{-\frac{p-1}{p}}\) and \(\langle \!\langle \varvec{x}\rangle \!\rangle _\alpha /\Vert \varvec{x}\Vert _p\le \{\lfloor \kappa \rfloor +(\kappa -\lfloor \kappa \rfloor )^{\frac{p}{p-1}}\}^{\frac{p-1}{p}}\).\(\Box \)
1.2 Proof of Proposition 2
Similarly to the proof of Proposition 1, to prove the upper bound of (7), consider the following maximization problem:
Note that due to the symmetry in the \(n\) elements of \(\varvec{x}\), the optimal value of (39) can be obtained by the following convex maximization program:
where \(\kappa =n(1-\alpha )\). Obviously, an optimal solution can be obtained by setting \(x_1=\cdots =x_{\lfloor \kappa \rfloor }=1\), \(x_{\lfloor \kappa \rfloor +1}=\kappa -\lfloor \kappa \rfloor \), and \(x_{\lfloor \kappa \rfloor +2}=\cdots =x_n=0\), and its optimal value is \(\lfloor \kappa \rfloor +(\kappa -\lfloor \kappa \rfloor )^p\). Consequently, \(\Vert \varvec{x}\Vert _p/\langle \!\langle \varvec{x}\rangle \!\rangle _\alpha ^*\le \{\lfloor \kappa \rfloor +(\kappa -\lfloor \kappa \rfloor )^p\}^\frac{1}{p}\).
Next, we prove the lower bound of (7). Consider
which is equivalent to the minimum of the following two convex optimization problems:
It is easy to see that the solution \(x_1=\cdots =x_n=\kappa /n\) is a KKT solution to (41) and the optimal value is \(n(\kappa /n)^p\). On the other hand, \(x_1=1,x_2=\cdots =x_n=0\) is a KKT solution to (42) and the optimal value is \(1\). Consequently, the optimal value of (40) is given by \(\min \{\kappa n^{-1+\frac{1}{p}},1\}\), and we have \( \Vert \varvec{x}\Vert _p/\langle \!\langle \varvec{x}\rangle \!\rangle ^*_{\alpha }\ge \min \{n^{\frac{1}{p}}(1-\alpha ),1\} \). \(\square \)
1.3 Proof of Lemma 1
We here prove the continuity of the function at \(\kappa \in \mathbb {Z}\) since the continuity and differentiability at \(\kappa \not \in \mathbb {Z}\) is evident.
-
[Case: \(\kappa \ge n^{\frac{1}{p}}\)] We have \(f_{n,p}(\kappa )=\{\lfloor \kappa \rfloor +(\kappa -\lfloor \kappa \rfloor )^{\frac{p}{p-1}}\}^{\frac{p-1}{p}}\) and \(\lim \limits _{\varepsilon \rightarrow 0+} f_{n,p}(\kappa -\varepsilon )=\) \(\lim \limits _{\varepsilon \rightarrow 0+} [\kappa -1+\{\kappa -\varepsilon -(\kappa -1)\}^{\frac{p}{p-1}}]^{\frac{p-1}{p}} =\kappa ^{\frac{p-1}{p}}. \) On the other hand, \( \lim \limits _{\varepsilon \rightarrow 0+} f_{n,p}(\kappa +\varepsilon ) =\lim \limits _{\varepsilon \rightarrow 0+} \{\kappa +(\kappa -\varepsilon -\kappa )^{\frac{p}{p-1}}\}^{\frac{p-1}{p}} =\kappa ^{\frac{p-1}{p}}. \)
-
[Case: \(\kappa \le n^{\frac{1}{p}}\)] We have \(f_{n,p}(\kappa )=\frac{n^{\frac{1}{p}}}{\kappa }\{\lfloor \kappa \rfloor +(\kappa -\lfloor \kappa \rfloor )^{\frac{p}{p-1}}\}^{\frac{p-1}{p}}\) and \( \lim \limits _{\varepsilon \rightarrow 0+} f_{n,p}(\kappa -\varepsilon ) =\) \(\lim \limits _{\varepsilon \rightarrow 0+} \frac{n^{\frac{1}{p}}[\kappa -1+\{\kappa -\varepsilon -(\kappa -1)\}^{\frac{p}{p-1}}]^{\frac{p-1}{p}}}{\kappa -\varepsilon } =\left( \frac{n}{\kappa }\right) ^{\frac{1}{p}}. \) On the other hand, \( \lim \limits _{\varepsilon \rightarrow 0+} f_{n,p}(\kappa +\varepsilon ) =\lim \limits _{\varepsilon \rightarrow 0+} \frac{n^\frac{1}{p}\{\kappa +(\kappa -\varepsilon -\kappa )^{\frac{p}{p-1}}\}^{\frac{p-1}{p}}}{\kappa +\varepsilon } =\left( \frac{n}{\kappa }\right) ^{\frac{1}{p}}. \)
1.4 Proof of Proposition 3
-
[Case: \(\kappa \ge n^{\frac{1}{p}}\)] For any \(\kappa \not \in \mathbb {Z}\), it is differentiable and \(f'_{n,p}(\kappa )=\frac{(\kappa -C)^{\frac{1}{p-1}}}{\{C+(\kappa -C)^{\frac{p}{p-1}}\}^{\frac{1}{p}}}\). Since \(f_{n,p}(\kappa )\) is continuous at any \(\kappa \in \mathbb {Z}\), it is increasing for \(\kappa \ge n^{\frac{1}{p}}\).
-
[Case: \(\kappa \le n^{\frac{1}{p}}\)] For any \(\kappa \not \in \mathbb {Z}\), it is differentiable and \(f'_{n,p}(\kappa )=\frac{n^{\frac{1}{p}}C\{(\kappa -C)^{\frac{p}{p-1}}-(\kappa -C)\}}{\kappa ^2(\kappa -C)\{(\kappa -C)^{\frac{p}{p-1}}+C\}^{\frac{1}{p}}}<0\). Since \(f_{n,p}(\kappa )\) is continuous at any \(\kappa \in \mathbb {Z}\), it is decreasing for \(\kappa \le n^{\frac{1}{p}}\). \(\square \)
1.5 Proof of Proposition 4
Note that the dual norm to \((\!(\varvec{x})\!)_\lambda \) is written by an LP:
where \(|\varvec{x}|:=(|x_1|,\ldots ,|x_n|)^\top \). It is easy to see that any extreme point of the feasible region of the above LP is described as a solution to a system of \(n\) equalities of the form
where \(\varvec{y}_{S}\) denotes the vector \((y_i)_{i\in S}\) with \(S\subset \{1,\ldots ,n\}\). With the Sherman-Morrison-Woodbury formula, the explicit expression of the solution is given by \(\varvec{y}_S=\frac{1}{|S|-(|S|-1)\lambda }\varvec{1}_{|S|}\) and \(y_i=0,i\not \in S\), and its objective value is \( \frac{1}{|S|-(|S|-1)\lambda }\sum _{i\in S}|x_i|. \) Note that this solution is feasible to (43). It is easy to see that given \(|S|\), the maximum objective value is \( \frac{1}{|S|-(|S|-1)\lambda }\sum _{i=1}^{|S|}|x_{(i)}|. \) Consequently, the optimal value is given by the formula which is to be proved.\(\square \)
1.6 Proof of Proposition 5
In order to obtain the lower bound for the deltoidal norm, we consider the following optimization problem.
Taking into account the symmetry of \(n\) variables, it is sufficient to solve the following optimization problem:
Since this is a convex maximization over a polytope, we can find an optimal solution at an extreme point of the feasible region. Noting that the feasible region of (44) consists of the one equality and \(n\) inequalities, no less than \(n\) inequalities hold with equality at any extreme point. It is not hard to see that the extreme solution is defined by \( x_1=\cdots =x_k=\frac{1}{(1-\lambda )k+\lambda }>0,~x_{k+1}=\cdots =x_n=0, \) for some \(k\in \{1,2,\ldots ,n\}\), and the objective value is \( f_p(k):=\frac{k}{\{(1-\lambda )k+\lambda \}^p}. \) The derivative of \(f_p\) is \( f_p'(k)=\frac{\{(1-\lambda )k+\lambda \}^{p-1}\{(1-p)(1-\lambda )k+\lambda \}}{\{(1-\lambda )k+\lambda \}^{2p}}, \) which is increasing for \(k<\frac{\lambda }{(p-1)(1-\lambda )}=:\hat{k}\), whereas decreasing for \(k>\hat{k}\).
-
[Case: \(\lambda \in [0,1-\frac{1}{p})\)] Note that this condition is equivalent to \(\lfloor \hat{k}\rfloor +1\le 1\) (or \(\hat{k}<1\)). In this case, the maximum of \(f_p\) over \(k\in \{1,\ldots ,n\}\) is attained at \(k=1\) and its value is 1.
-
[Case: \(\lambda \in [1-\frac{1}{p},\frac{(p-1)n}{(p-1)n+1})\)] This condition is equivalent to \(1<\lfloor \hat{k}\rfloor +1\le n\) (or \(1\le \hat{k}<n\)).
The maximum is attained at \(k=\lfloor \hat{k}\rfloor \) or \(k=\lfloor \hat{k}\rfloor +1\), and the objective value is
$$\begin{aligned} \max \left\{ \frac{\lfloor \hat{k}\rfloor }{\{(1-\lambda )\lfloor \hat{k}\rfloor +\lambda \}^p}, \frac{\lfloor \hat{k}\rfloor +1}{\{(1-\lambda )(\lfloor \hat{k}\rfloor +1)+\lambda \}^p} \right\} . \end{aligned}$$ -
[Case: \(\lambda \in [\frac{(p-1)n}{(p-1)n+1},1]\)] This condition is equivalent to \(n\le \lfloor \hat{k}\rfloor \) (or \(\hat{k}\ge n\)). The maximum is attained at \(k=n\) and its objective is \( \frac{n}{\{(1-\lambda )n+\lambda \}^p}. \)
By taking the \(p\)-th root of the result above and taking the reciprocal, we obtain the lower bound, \(g_{n,p}(\lambda )\), of \((\!(\varvec{x})\!)_\lambda /\Vert \varvec{x}\Vert _p\).
In order to obtain the upper bound, we consider the following optimization problem:
Taking into account the symmetry of \(n\) variables, it is sufficient to solve the following optimization problem:
It is easy to see that the solution \( x_1=\frac{1}{(n-1)(1-\lambda )^2+1},~~x_i=\frac{1-\lambda }{(n-1)(1-\lambda )^2+1},i=2,\ldots ,n, \) satisfies the KKT condition. Since (45) is a convex program, this is an optimal solution, and the objective value is \( \frac{(n-1)(1-\lambda )^p+1}{\{(n-1)(1-\lambda )^2+1\}^p}. \) Therefore, we get \( \Vert \varvec{x}\Vert _p/(\!(\varvec{x})\!)_\lambda \ge \{(n-1)(1-\lambda )^p+1\}^{\frac{1}{p}}/\{(n-1)(1-\lambda )^2+1\}. \) By taking the reciprocal to the result above, we obtain the upper bound of \((\!(\varvec{x})\!)_\lambda /\Vert \varvec{x}\Vert _p\).
Next, we prove the bounds of the dual norm. Similarly to the proof of the dual CVaR norm case, we consider the maximization of \(\Vert \varvec{x}\Vert _{p}\) over a constraint \((\!(\varvec{x})\!)^*\le 1\) in order to show the upper bound.
Based on the explicit formula (17) of \((\!(\varvec{x})\!)^*_\lambda \), this is equivalent to
which is reduced to a convex maximization over a polytope
Let us observe that any optimal solution satisfies the first \(n\) inequality constraints with equalities. (Suppose on the contrary that there exists the smallest \(k\) such that \(x_1+\cdots +x_k<k-(k-1)\lambda \) at an optimal solution \(\varvec{x}\). Noting that \(x_1=1,x_2=\cdots =x_{k-1}=1-\lambda \) holds, this assumption implies that \(x_{k}<1-\lambda \) and the objective value can increase by setting \(x_k=1-\lambda \).) Therefore, the solution of the system of the \(n\) equalities, \(x_1=1,x_2=\cdots =x_{n}=1-\lambda \), is the unique solution, and the optimal value is then given by \(1+(n-1)(1-\lambda )^p\). Consequently, \(\Vert \varvec{x}\Vert _p/(\!(\varvec{x})\!)\le \{1+(n-1)(1-\lambda )^p\}^\frac{1}{p}\).
In order to obtain the lower bound for the dual deltoidal norm, we consider the following optimization problem.
Based on the explicit formula (17) of \((\!(\varvec{x})\!)^*_\lambda \), this is equivalent to
whose optimal value is equal to \(\min \{F_p(1),\ldots ,F_p(n)\}\) where \(F_p(k)\) can be computed by the following LP:
In place of (47), consider the following (relaxed) LP for some \(k\in \{1,\ldots ,n\}\):
It is easy to observe that the solution such that \(x_1=\cdots =x_k=\frac{k-(k-1)\lambda }{k}\) and \(x_{k+1}=\cdots =x_n=0\) satisfies the KKT condition. Since (48) is a convex program, such a KKT solution is optimal to that. Also, note that this KKT solution satisfies the inequalities \(x_{1}+\cdots +x_{k'}\le k'-(k'-1)\lambda \) for any \(k'\ne k\), which implies that this KKT solution is also optimal to (47), i.e., \(F(k)=\hat{F}(k)\). Therefore, the optimal value of (46) can be obtained by \(\min \{\hat{F}(1),\ldots ,\hat{F}(n)\}\), where \( \hat{F}_p(k)=\frac{\{(1-\lambda )k+\lambda \}^p}{k^{p-1}}. \) Since the first order derivative of \(\hat{F}_p(k)\) is given by \( \hat{F}_p'(k)=\frac{\{(k-1)(1-\lambda )+1\}^{p-1}}{k^p}\times \{(1-\lambda )k+\lambda (1-p)\}, \) we see that \(\hat{F}_p(k)\) is increasing (decreasing) if \(k>\frac{\lambda (p-1)}{1-\lambda }\) (\(k<\frac{\lambda (p-1)}{1-\lambda }\)) for \(\lambda \in (0,1)\) and \(p>1\), and therefore, attained its minimum at \(\tilde{k}:=\frac{\lambda (p-1)}{1-\lambda }\).
-
[Case: \(\lambda \in [0,\frac{1}{p})\)] Note that this condition is equivalent to \(\lfloor \tilde{k}\rfloor +1\le 1\) (or \(\tilde{k}<1\)). The minimum of \(F_p\) over \(k\in \{1,\ldots ,n\}\) is attained at \(k=1\) and its value is 1.
-
[Case: \(\lambda \in [\frac{1}{p},\frac{n}{n+p-1})\)] This condition is equivalent to \(1<\lfloor \tilde{k}\rfloor +1\le n\) (or \(1\le \tilde{k}<n\)). The maximum is attained at \(k=\lfloor \tilde{k}\rfloor \) or \(k=\lfloor \tilde{k}\rfloor +1\), and the objective value is then
$$\begin{aligned} \min \{F(\lfloor \tilde{k}\rfloor ),F(\lfloor \tilde{k}\rfloor \!+\!1)\}\!=\! \min \left\{ \! \frac{\{(1-\lambda )\lfloor \tilde{k}\rfloor +\lambda \}^p}{\lfloor \tilde{k}\rfloor ^{p-1}}, \frac{\{(1-\lambda )(\lfloor \tilde{k}\rfloor +1)+\lambda \}^p}{(\lfloor \tilde{k}\rfloor +1)^{p-1}} \!\right\} \!. \end{aligned}$$ -
[Case: \(\lambda \in [\frac{n}{n+p-1},1]\)] This condition is equivalent to \(n\le \lfloor \tilde{k}\rfloor \) (or \(\tilde{k}\ge n\)). The maximum is attained at \(k=n\) and its objective is \( \frac{\{(1-\lambda )n+\lambda \}^p}{n^{p-1}}. \)
By taking the \(p\)-th root, we finalize the proof of the lower bound.
1.7 Proof of Lemma 2
Note that on the interval \((\hat{\lambda }_{k-1},\hat{\lambda }_{k})\), the function \(h_{n,p}(\lambda )\) is explicitly written by
and its first derivative is
Noting that the sign of \(h_{n,p,k}'(\lambda )\) depends only on \(k-1-(n-1)(1-\lambda )^\frac{1}{p-1}\), we see that for each \(k\in \{1,\ldots ,n-1\}\), \(h_{n,p,k}(\lambda )\) uniquely attains the minimum at \(1-\left( \frac{k-1}{n-1}\right) ^{p-1}\) over \([0,1]\). Besides, we have
From these facts, we see that \(h_{n,p}(\lambda )\) is quasiconvex and has the unique minimum. \(\square \)
Rights and permissions
About this article
Cite this article
Gotoh, Jy., Uryasev, S. Two pairs of families of polyhedral norms versus \(\ell _p\)-norms: proximity and applications in optimization. Math. Program. 156, 391–431 (2016). https://doi.org/10.1007/s10107-015-0899-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10107-015-0899-9
Keywords
- \(\ell _p\)-norm
- CVaR norm
- Deltoidal norm
- Linear programming (LP)
- \(p\)th order cone programming (\(p\)OCP)