Abstract
Lexicographic linear programs are fixed-priority multiobjective linear programs that are a useful model of biological systems using flux balance analysis and for goal-programming problems. The objective function values of a lexicographic linear program as a function of its right-hand side are nonsmooth. This work derives generalized derivative information for lexicographic linear programs using lexicographic directional derivatives to obtain elements of the Bouligand subdifferential (limiting Jacobian). It is shown that elements of the limiting Jacobian can be obtained by solving related linear programs. A nonsmooth equation-solving problem is solved to illustrate the benefits of using elements of the limiting Jacobian of lexicographic linear programs.
Similar content being viewed by others
References
Harwood, S.M., Höffner, K., Barton, P.I.: Efficient solution of ordinary differential equations with a parametric lexicographic linear program embedded. Numer. Math. 133(4), 623–653 (2016)
Höffner, K., Harwood, S.M., Barton, P.I.: A reliable simulator for dynamic flux balance analysis. Biotechnol. Bioeng. 110(3), 792–802 (2013)
Ignizio, J.P.: Linear Programming in Single- & Multiple-Objective Systems. Prentice Hall, Englewoods Cliffs (1982)
Höffner, K., Barton, P.I.: Design of microbial consortia for industrial biotechnology. Comput. Aided Chem. Eng. 34, 65–74 (2014)
Von Stackelberg, H.: The Theory of the Market Economy. Oxford University Press, Oxford (1952)
Ferris, M.C., Pang, J.S.: Engineering and economic applications of complementarity problems. SIAM Rev. 39(4), 669–713 (1997)
Nesterov, Y.: Lexicographic differentiation of nonsmooth functions. Math. Program. 104(2–3), 669 (2005)
Khan, K.A., Barton, P.I.: A vector forward mode of automatic differentiation for generalized derivative evaluation. Optim. Methods Softw. 30(6), 1185–1212 (2015)
Scholtes, S.: Introduction to Piecewise Differentiable Equations. Springer, New York (2012)
Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization. Athena Scientific, Belmont (1997)
Khan, K.A., Barton, P.I.: Generalized derivatives for solutions of parametric ordinary differential equations with non-differentiable right-hand sides. J. Optim. Theory Appl. 163(2), 355–386 (2014)
Barton, P.I., Khan, K.A., Stechlinski, P., Watson, H.A.: Computationally relevant generalized derivatives: theory, evaluation and applications. Optim. Methods Soft. (2017). https://doi.org/10.1080/10556788.2017.1374385
Clarke, F.H.: Optimization and Nonsmooth Analysis. SIAM, Philadelphia (1990)
Bonnans, J.F., Shapiro, A.: Perturbation Analysis of Optimization Problems. Springer, New York (2013)
Höffner, K., Khan, K.A., Barton, P.I.: Generalized derivatives of dynamic systems with a linear program embedded. Automatica 63, 198–208 (2016)
Gomez, J.A., Höffner, K., Barton, P.I.: DFBAlab: a fast and reliable MATLAB code for dynamic flux balance analysis. BMC Bioinform. 15, 409 (2014)
CPLEX IBM ILOG: 12.7 User’s Manual. https://www.ibm.com/support/knowledgecenter/SSSA5P_12.7.0/ilog.odms.studio.help/pdf/usrcplex.pdf (2017)
Gurobi Optimization: Gurobi optimizer reference manual. http://www.gurobi.com/documentation/ (2017)
Chen, J., Gomez, J.A., Höffner, K., Barton, P.I., Henson, M.A.: Metabolic modeling of synthesis gas fermentation in bubble column reactors. Biotechnol. Biofuels 8, 89 (2015)
Chen, J., Gomez, J.A., Höffner, K., Phalak, P., Barton, P.I., Henson, M.A.: Spatiotemporal modeling of microbial metabolism. BMC Syst. Biol. 10, 21 (2016)
Qi, L., Sun, J.: A nonsmooth version of Newton’s method. Math. Program. 58(1–3), 353–367 (1993)
Boyd, S., Vandenberghe, L.: Subgradients: Notes for EE364b, Stanford University, Winter 2006-07. https://see.stanford.edu/materials/lsocoee364b/01-subgradients_notes.pdf (2008)
Acknowledgements
The authors would like to acknowledge Dr. Stuart Harwood for the insightful discussions that took place while preparing the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Alexander Martin.
Appendices
Appendices
Appendix A
Proof of Proposition 2.1
First we will show that if \(\text {int}(F) \ne \emptyset \), then \(\mathbf {A}\) is full row rank. Assume \(\mathbf {A}\) is not full row rank and \(\text {int}(F) \ne \emptyset \). Then, there exists at least one row that is linearly dependent on the other rows. Without loss of generality, let us assume that the last row of \(\mathbf {A}\) is linearly dependent. Let \(\mathbf {b}_0 \in \text {int}(F)\). Then, any small perturbation of the last component of \(\mathbf {b}_0\) while keeping all other components of \(\mathbf {b}_0\) constant results in LP (1) becoming infeasible. Then, \(\mathbf {b}_0 \notin \text {int}(F)\) and there is a contradiction.
For the next part assume \(\mathbf {A}\) is full row rank. Let \(\mathbf {v}= \mathbf {1}\) and \(\mathbf {b}= \mathbf {A}\mathbf {v}\). By hypothesis, this \(\mathbf {b}\in F\). Let \(\mathbf {d}\in \mathbb {R}^m\) and \(\epsilon > 0\). Without loss of generality, let \(\Vert \mathbf {d}\Vert = \mathbf {1}\). Let us find a \(\Delta \mathbf {v}\) that satisfies, \(\mathbf {A}(\mathbf {v}+ \Delta \mathbf {v}) = \mathbf {b}+ \epsilon \mathbf {d}\). Since \(\mathbf {A}\mathbf {v}= \mathbf {b}\), we need \(\mathbf {A}\Delta \mathbf {v}= \epsilon \mathbf {d}\). Since \(\mathbf {A}\) is full row rank, the columns of \(\mathbf {A}\) span \(\mathbb {R}^m\). Consider all possible collections of m columns that form a basis for the column space of \(\mathbf {A}\). Take such a basis \(\tilde{\mathbf {A}}\) and let \(\Delta {\tilde{\mathbf {v}}} = \epsilon \tilde{\mathbf {A}}^{-1}\mathbf {d}\). Without loss of generality, let the columns of \(\tilde{\mathbf {A}}\) be the first m columns of \(\mathbf {A}\). Let \(\Delta v_j = \Delta {\tilde{v}}_j\) if \(j \in \{1,\ldots ,m\}\) and 0 otherwise. It is clear that \(\Delta \mathbf {v}\) constructed in this form satisfies \(\mathbf {A}(\mathbf {v}+ \Delta \mathbf {v}) = \mathbf {b}+ \epsilon \mathbf {d}\). Moreover, \(\Vert \Delta \mathbf {v}\Vert = \Vert \Delta {\tilde{\mathbf {v}}} \Vert \). Therefore, \(\Vert \Delta \mathbf {v}\Vert = \Vert \Delta {\tilde{\mathbf {v}}} \Vert \le \epsilon \Vert \tilde{\mathbf {A}}^{-1} \Vert \Vert \mathbf {d}\Vert = \epsilon \Vert \tilde{\mathbf {A}}^{-1} \Vert \) and for small enough \(\epsilon >0\), \(\Vert \Delta \mathbf {v}\Vert < 1\). Then all components \(| \Delta v_j |<1\) and \(\mathbf {v}+ \Delta \mathbf {v}> \mathbf {0}\), and \(\mathbf {b}+ \epsilon \mathbf {d}\in F\). By hypothesis \(F = \{\mathbf {A}\mathbf {v}{:}\,\mathbf {v}\ge \mathbf {0} \}\) [10], so F is a nonempty convex set and \(\mathbf {b}+ \delta \mathbf {d}\in F\) for all \(\delta \in ]0,\epsilon [\). Since \(\mathbf {d}\) was arbitrary, \(\mathbf {b}\in \text {int}(F)\) and \(\text {int}(F) \ne \emptyset \). \(\square \)
Proof of Proposition 2.3
Notice that \(\varvec{\rho }(\mathbf {x}+ \epsilon \mathbf {d}) = \varvec{\theta }(\mathbf {x}+ \epsilon \mathbf {d})\) holds \(\forall \epsilon \in [0,\delta ^*_\mathbf {d}[\). From Lemma 2.1 there exists \(\delta >0\) such that \(\varvec{\theta }(\mathbf {x}+ \epsilon \mathbf {d}) = \varvec{\theta }(\mathbf {x}) + \varvec{\theta }^\prime (\mathbf {x}; \epsilon \mathbf {d})\), \(\forall \Vert \epsilon \mathbf {d}\Vert < \delta \). Let \(\delta _\mathbf {d}= \min (\delta ,\delta ^*_\mathbf {d})\). Given that \(\varvec{\rho }(\mathbf {x}+ \epsilon \mathbf {d}) = \varvec{\theta }(\mathbf {x}+ \epsilon \mathbf {d})\) for all \(\epsilon \in ]0,\delta _\mathbf {d}[\), then \(\varvec{\rho }(\mathbf {x}+ \epsilon \mathbf {d}) = \varvec{\rho }(\mathbf {x}) + \varvec{\theta }^\prime (\mathbf {x}; \epsilon \mathbf {d})\) for all \(\epsilon \in ]0,\delta _\mathbf {d}[\). Now let us show that \(\varvec{\rho }^\prime (\mathbf {x};\epsilon \mathbf {d})\) exists and is equal to \(\varvec{\theta }^\prime (\mathbf {x}; \epsilon \mathbf {d})\). Consider any sequence \(\{\tau _i\}^\infty _{i=0}\), \(\tau _i \in ]0,\delta _\mathbf {d}[\), \(\tau _i \downarrow 0\). From the definition of directional derivative: \(\begin{aligned} {\mathop {\lim }\nolimits _{i \rightarrow \infty }}\frac{\varvec{\theta }(\mathbf {x}+\tau _i\mathbf {d}) - \varvec{\theta }(\mathbf {x})}{\tau _i} =\varvec{\theta }^\prime (\mathbf {x};\mathbf {d}). \end{aligned}\) Also, for all \(\tau _i\), \(\begin{aligned} \frac{\varvec{\theta }(\mathbf {x}+\tau _i\mathbf {d}) - \varvec{\theta }(\mathbf {x})}{\tau _i} = \frac{\varvec{\rho }(\mathbf {x}+\tau _i\mathbf {d}) - \varvec{\rho }(\mathbf {x})}{\tau _i}. \end{aligned}\) Therefore, \(\varvec{\rho }^\prime (\mathbf {x};\mathbf {d})\) exists and \(\varvec{\rho }^\prime (\mathbf {x};\mathbf {d}) = \varvec{\theta }^\prime (\mathbf {x}; \mathbf {d})\).
From Eq. (4), \(\varvec{\rho }^\prime (\mathbf {x};\epsilon \mathbf {d}) = \epsilon \varvec{\rho }^\prime (\mathbf {x};\mathbf {d}) = \epsilon \varvec{\theta }^\prime (\mathbf {x}; \mathbf {d}) = \varvec{\theta }^\prime (\mathbf {x}; \epsilon \mathbf {d})\). Hence for all \(\epsilon \in ]0,\delta _\mathbf {d}[\), \(\varvec{\rho }(\mathbf {x}+ \epsilon \mathbf {d}) = \varvec{\rho }(\mathbf {x}) + \varvec{\rho }^\prime (\mathbf {x}; \epsilon \mathbf {d})\). \(\square \)
Proof of Theorem 2.1
For \(\varvec{\rho }\) to be l-smooth, it is necessarily Lipschitz continuous near \(\mathbf {x}\). Since \(\varvec{\zeta }\) is locally Lipschitz continuous, the composition \(\varvec{\sigma }:= \varvec{\zeta }\circ \varvec{\rho }\) is Lipschitz continuous near \(\mathbf {x}\). First, we show that \(\varvec{\sigma }\) is directionally differentiable at \(\mathbf {x}\).
We need to show that \(\begin{aligned} {\mathop {\lim }\nolimits _{\tau \downarrow 0}} \frac{\varvec{\sigma }(\mathbf {x}+ \tau \mathbf {d})-\varvec{\sigma }(\mathbf {x})}{\tau } = {\mathop {\lim }\nolimits _{\tau \downarrow 0}} \frac{\varvec{\zeta }(\varvec{\rho }(\mathbf {x}+ \tau \mathbf {d}))-\varvec{\zeta }(\varvec{\rho }(\mathbf {x}))}{\tau } \end{aligned} \in \mathbb {R}^p\) for all \(\mathbf {d}\in \mathbb {R}^m\). First, \(\begin{aligned}{\mathop {\lim }\nolimits _{\tau \downarrow 0}} \; \frac{\varvec{\rho }(\mathbf {x}+ \tau \mathbf {d})-\varvec{\rho }(\mathbf {x}) - \tau \varvec{\rho }'(\mathbf {x};\mathbf {d})}{\tau }=\mathbf {0}\end{aligned}\) from the definition of the directional derivative. Next given \(\mathbf {d}\in \mathbb {R}^m\), \(\begin{aligned} \varvec{\zeta }^\prime (\varvec{\rho }(\mathbf {x});\varvec{\rho }'(\mathbf {x};\mathbf {d})) = {\mathop {\lim }\nolimits _{\tau \downarrow 0}}\end{aligned}\)\(\begin{aligned}\frac{\varvec{\zeta }(\varvec{\rho }(\mathbf {x}) + \tau \varvec{\rho }'(\mathbf {x};\mathbf {d}))-\varvec{\zeta }(\varvec{\rho }(\mathbf {x}))}{\tau } \end{aligned} \in \mathbb {R}^p\) by assumption. We show that \(\begin{aligned} {\mathop {\lim }\nolimits _{\tau \downarrow 0}} \end{aligned}\)\(\begin{aligned} \frac{\varvec{\zeta }(\varvec{\rho }(\mathbf {x}) + \tau \varvec{\rho }'(\mathbf {x};\mathbf {d}))-\varvec{\zeta }(\varvec{\rho }(\mathbf {x}))}{\tau } = {\mathop {\lim }\nolimits _{\tau \downarrow 0}} \; \frac{\varvec{\zeta }(\varvec{\rho }(\mathbf {x}+ \tau \mathbf {d}))-\varvec{\zeta }(\varvec{\rho }(\mathbf {x}))}{\tau }, \end{aligned}\) which is equivalent to showing that \(\begin{aligned} {\mathop {\lim }\nolimits _{\tau \downarrow 0}} \;&\frac{\varvec{\zeta }(\varvec{\rho }(\mathbf {x}) + \tau \varvec{\rho }'(\mathbf {x};\mathbf {d}))-\varvec{\zeta }(\varvec{\rho }(\mathbf {x}+ \tau \mathbf {d}))}{\tau } = \mathbf {0}. \end{aligned}\) From the existence of the directional derivative \(\varvec{\zeta }^\prime (\varvec{\rho }(\mathbf {x});\varvec{\rho }'(\mathbf {x};\mathbf {d}))\), there exists \(\epsilon _1>0\) such that \( \varvec{\rho }(\mathbf {x}) + \tau \varvec{\rho }^\prime (\mathbf {x};\mathbf {d}) \in Z,\) for all \(\tau \in ]0,\epsilon _1[.\) Since \(\varvec{\zeta }\) is locally Lipschitz continuous there exists \(\epsilon _2>0\) and \(K>0\) such that,
\(\forall \tau \in ]0,\epsilon _1[\) such that \(\Vert \varvec{\rho }(\mathbf {x}+\tau \mathbf {d}) - (\varvec{\rho }(\mathbf {x})+\tau \varvec{\rho }^\prime (\mathbf {x};\mathbf {d})) \Vert < \epsilon _2\). This is equal to
\(\forall \tau \in ]0,\epsilon _1[\) such that \(\Vert \varvec{\rho }(\mathbf {x}+\tau \mathbf {d}) - (\varvec{\rho }(\mathbf {x})+\tau \varvec{\rho }^\prime (\mathbf {x};\mathbf {d})) \Vert < \epsilon _2 \). By the existence of \(\varvec{\rho }^\prime (\mathbf {x};\mathbf {d})\) it follows that \(\begin{aligned} {\mathop {\lim }\nolimits _{\tau \downarrow 0}} \left\| \frac{\varvec{\rho }(\mathbf {x}+\tau \mathbf {d}) - (\varvec{\rho }(\mathbf {x})+\tau \varvec{\rho }^\prime (\mathbf {x};\mathbf {d})) }{\tau } \right\| = 0, \end{aligned}\) and then
Since \(\mathbf {d}\in \mathbb {R}^m\) was arbitrary, \(\varvec{\sigma }\) is directionally differentiable at \(\mathbf {x}\) and
Now let us assume \(\varvec{\sigma }^{(j-1)}_{\mathbf {x},\mathbf {M}}(\mathbf {m}_j) = \varvec{\zeta }^{(j-1)}_{\varvec{\rho }(\mathbf {x}),\varvec{\rho }^\prime (\mathbf {x};\mathbf {M})}(\varvec{\rho }^{(j-1)}_{\mathbf {x},\mathbf {M}}(\mathbf {m}_j))\) for \(j \in \{1,\ldots ,q\}\), for all \(\mathbf {M}\in \mathbb {R}^{m\times q}\) and \(\mathbf {m}_{q+1} \in \mathbb {R}^m\). Let \(\mathbf {y}:= \varvec{\rho }(\mathbf {x})\), \(\mathbf {Y}:= \varvec{\rho }^\prime (\mathbf {x};\mathbf {M})\), and \({\hat{\mathbf {m}}} := \mathbf {m}_{j+1}\). We need to show that the following limit exists in \(\mathbb {R}^p\) for all \(\mathbf {M}\in \mathbb {R}^{m \times q}\) and \(\mathbf {m}_{q+1} \in \mathbb {R}^m\):
Given \(\mathbf {M}\in \mathbb {R}^{m\times q}\) and \(\mathbf {m}_{q+1} \in \mathbb {R}^m\), since \(\varvec{\zeta }\) is \(\varvec{\rho }-\)weakly l-smooth at \(\mathbf {x}\), we know the following limit exists in \(\mathbb {R}^p\):
and \(\varvec{\rho }^{(j-1)}_{\mathbf {x},\mathbf {M}}(\mathbf {m}_j) + \tau [\varvec{\rho }^{(j-1)}_{\mathbf {x},\mathbf {M}}]^\prime (\mathbf {m}_j;{\hat{\mathbf {m}}}) \in G^{(j-1)}_{\mathbf {x}}\) for \(\tau >0\) small enough. We also know from the definition of l-smoothness that:
We will show that
which is equivalent to
From Corollary 2.2, \(\varvec{\zeta }^{(k)}_{\mathbf {y},\mathbf {Y}}\) is globally Lipschitz continuous for all \(k \in \{0,\ldots ,q\}\). Then, there exists \(\delta >0\) such that \(\varvec{\rho }^{(j-1)}_{\mathbf {x},\mathbf {M}}(\mathbf {m}_j) + \tau [\varvec{\rho }^{(j-1)}_{\mathbf {x},\mathbf {M}}]^\prime (\mathbf {m}_j;{\hat{\mathbf {m}}}) \in G^{(j-1)}_{\mathbf {x}}\) if \(\tau \in [0,\delta [\), and there exists \(K>0\) and \(\tau \in [0,\delta [\) such that
By the existence of the directional derivative \([\varvec{\rho }^{(j-1)}_{\mathbf {x},\mathbf {M}}]^\prime (\mathbf {m}_j;{\hat{\mathbf {m}}})\),
and then
Since \(\mathbf {M}\in \mathbb {R}^{m \times q}\) and \(\mathbf {m}_{q+1} \in \mathbb {R}^m\) were arbitrary, \(\varvec{\sigma }^{(j-1)}_{\mathbf {x},\mathbf {M}}\) is directionally differentiable at \(\mathbf {m}_j\) and \(\varvec{\sigma }^{(j)}_{\mathbf {x},\mathbf {M}}({\hat{\mathbf {m}}}) = \varvec{\zeta }^{(j)}_{\mathbf {y},\mathbf {Y}}(\varvec{\rho }^{(j)}_{\mathbf {x},\mathbf {M}}({\hat{\mathbf {m}}}))\) for all \(\mathbf {M}\in \mathbb {R}^{m \times q}\) and \(\mathbf {m}_{q+1} \in \mathbb {R}^m\). Given that \(\varvec{\sigma }^{(0)}_{\mathbf {x},\mathbf {M}}(\mathbf {m}_1) = \varvec{\zeta }^{(0)}_{\mathbf {y},\mathbf {Y}}(\varvec{\rho }^{(0)}_{\mathbf {x},\mathbf {M}}(\mathbf {m}_1))\) for any \(\mathbf {M}\in \mathbb {R}^{m\times q}\), the rest of the LD-derivatives follow by induction. \(\square \)
Proof of Theorem 2.2
First we show the limit for the directional derivative. Given \(\mathbf {d}\in \mathbb {R}^m\) we want to show that the following limit exists in \(\mathbb {R}^p\):
We know that \( \begin{aligned} {\mathop {\lim }\nolimits _{\tau \downarrow 0}} \; \frac{\varvec{\sigma }(\mathbf {x}+ \tau \mathbf {d})-\varvec{\sigma }(\mathbf {x})}{\tau } = {\mathop {\lim }\nolimits _{\tau \downarrow 0}} \; \frac{\varvec{\zeta }(\varvec{\rho }(\mathbf {x}+ \tau \mathbf {d}))-\varvec{\zeta }(\varvec{\rho }(\mathbf {x}))}{\tau } \end{aligned}\) exists in \(\mathbb {R}^p\).
If \(\begin{aligned} {\mathop {\lim }\nolimits _{\tau \downarrow 0}} \; \frac{\varvec{\zeta }(\varvec{\rho }(\mathbf {x}) + \tau \varvec{\rho }'(\mathbf {x};\mathbf {d}))-\varvec{\zeta }(\varvec{\rho }(\mathbf {x}))}{\tau } = {\mathop {\lim }\nolimits _{\tau \downarrow 0}} \; \frac{\varvec{\zeta }(\varvec{\rho }(\mathbf {x}+ \tau \mathbf {d}))-\varvec{\zeta }(\varvec{\rho }(\mathbf {x}))}{\tau }, \end{aligned}\) which is equivalent to showing that
the proof is complete. Notice that \(\varvec{\zeta }(\varvec{\rho }(\mathbf {x}) + \tau \varvec{\rho }'(\mathbf {x};\mathbf {d}))\) is well defined by the assumption of \(\varvec{\rho }(\mathbf {x}) + \tau \varvec{\rho }'(\mathbf {x};\mathbf {d}) \in Z\) for all \(\tau \in ]0,\delta _\mathbf {d}[\). Since \(\varvec{\zeta }\) is locally Lipschitz, the proof of Theorem 2.1 establishes (14). Also notice that
Given that \(\mathbf {d}\in \mathbb {R}^m\) was arbitrary, the directional derivative of \(\varvec{\zeta }\) at \(\varvec{\rho }(\mathbf {x})\) exists in all directions \(\varvec{\rho }^{(0)}_{\mathbf {x},\mathbf {M}}(\mathbf {d})\) and the first function of sequence (7) is well defined with its corresponding domain \(G^{(0)}_{\mathbf {x}}= \{\varvec{\rho }^{(0)}_{\mathbf {x},\mathbf {Z}}(\mathbf {z}_1){:}\,\mathbf {Z} \in \mathbb {R}^{m \times q}, \mathbf {z}_{q+1} \in \mathbb {R}^m \}\). By Proposition 2.5, \(\varvec{\zeta }^{(0)}_{\varvec{\rho }(\mathbf {x}),\varvec{\rho }^\prime (\mathbf {x};\mathbf {M})}\) is globally Lipschitz on \(G^{(0)}_{\mathbf {x}}\).
Now let us assume that for \(j \in \{1,\ldots ,q\}\) and for all \(\mathbf {M}\in \mathbb {R}^{m \times q}\) and \(\mathbf {m}_{q+1} \in \mathbb {R}^m\), \(\varvec{\sigma }^{(j-1)}_{\mathbf {x},\mathbf {M}}(\mathbf {m}_j) = \varvec{\zeta }^{(j-1)}_{\varvec{\rho }(\mathbf {x}),\varvec{\rho }^\prime (\mathbf {x};\mathbf {M})}(\varvec{\rho }^{(j-1)}_{\mathbf {x},\mathbf {M}}(\mathbf {m}_j)),\) and that the function \(\varvec{\zeta }^{(j-1)}_{\varvec{\rho }(\mathbf {x}),\varvec{\rho }^\prime (\mathbf {x};\mathbf {M})}\) is well defined in the sense of (7) and is globally Lipschitz on its domain
For brevity, let \(\mathbf {y}:= \varvec{\rho }(\mathbf {x})\) and \(\mathbf {Y}:= \varvec{\rho }^\prime (\mathbf {x};\mathbf {M})\). Also, let \({\hat{\mathbf {m}}} := \mathbf {m}_{j+1}\). We want to show that
exists in \(\mathbb {R}^p\). We know that the following limits exist in \(\mathbb {R}^p\):
Analogous to the proof in Theorem 2.1, the proof is complete if we show that
For \(\tau >0\) small enough, \(\mathbf {m}_{q+1} \in \mathbb {R}^m\), and \(\mathbf {M}\in \mathbb {R}^{m \times q}\), \(\varvec{\zeta }^{(j-1)}_{\mathbf {y},\mathbf {Y}}(\varvec{\rho }^{(j-1)}_{\mathbf {x},\mathbf {M}}(\mathbf {m}_j)+ \tau [\varvec{\rho }^{(j-1)}_{\mathbf {x},\mathbf {M}}]^\prime (\mathbf {m}_j;{\hat{\mathbf {m}}}))\) is well-determined by assumption. Then, the proof of Theorem 2.1 establishes (15) and \(\varvec{\sigma }^{(j)}_{\mathbf {x},\mathbf {M}}({\hat{\mathbf {m}}}) = \varvec{\zeta }^{(j)}_{\mathbf {y},\mathbf {Y}}(\varvec{\rho }^{(j)}_{\mathbf {x},\mathbf {M}}({\hat{\mathbf {m}}})).\) Since \(\mathbf {M}\) and \(\mathbf {m}_{q+1}\) are arbitrary, the domain of \(\varvec{\zeta }^{(j)}_{\mathbf {y},\mathbf {Y}}\) is \(\begin{aligned} G^{(j)}_{\mathbf {x}}=\{\varvec{\rho }^{(j)}_{\mathbf {x},\mathbf {Z}}(\mathbf {z}_{j+1}){:}\,\mathbf {Z} \in \mathbb {R}^{m \times q}, \mathbf {z}_{q+1} \in \mathbb {R}^m \},\end{aligned}\) and by Proposition 2.5 \(\varvec{\zeta }^{(j)}_{\mathbf {y},\mathbf {Y}}\) is globally Lipschitz on \(G^{(j)}_{\mathbf {x}}\). Since the case for \(j=0\) was established, the proof follows by induction. \(\square \)
Proof of Proposition 3.3
Notice that \(\mathbf {z}_0\) is not required to be in the interior of F. g is a convex and piecewise linear function on F [10] of the form \(g(\mathbf {z}) = \underset{\varvec{\lambda }\in \varLambda }{\max } \;\;\mathbf {z}^\text {T}\varvec{\lambda }\) where \(\varLambda \) is the finite set that contains all extreme points of the polyhedron \(\mathbf {A}^\text {T}\varvec{\lambda }\le \mathbf {c}_0\). \(\varLambda \) is nonempty because \(\mathbf {A}\) is full row rank (Theorem 2.6 in [10]). Let \({\tilde{g}}{:}\,\mathbb {R}^{m} \rightarrow \mathbb {R}{:}\,\mathbf {z}\mapsto {\mathop {\max }\nolimits _{\varvec{\lambda }\in \varLambda }} \;\; \mathbf {z}^\text {T}\varvec{\lambda }\) and let \(\mathbf {z}\in \mathbb {R}^m\). From Proposition 2.2.7 in [13] since \({\tilde{g}}\) is a convex function that is Lipschitz near \(\mathbf {z}\), the generalized gradient of \({\tilde{g}}\) at \(\mathbf {z}\) (\(\partial {\tilde{g}}(\mathbf {z})\)) coincides with the subdifferential at \(\mathbf {z}\) and for all \(\mathbf {d}\in \mathbb {R}^{m}\), \({\tilde{g}}^\prime (\mathbf {z};\mathbf {d}) = {\tilde{g}}^\circ (\mathbf {z};\mathbf {d})\) (where the second quantity is the generalized directional derivative defined in Section 2 of [13]). Let \({\tilde{J}}(\mathbf {z}) := \{ \varvec{\lambda }\in \varLambda {:}\,{\tilde{g}}(\mathbf {z}) = \mathbf {z}^\text {T}\varvec{\lambda }\}\) and \(J(\mathbf {z}) := \{ \varvec{\lambda }\in \varLambda {:}\,g(\mathbf {z}) = \mathbf {z}^\text {T}\varvec{\lambda }\}\); it is clear that \({\tilde{J}}(\mathbf {z}) = J(\mathbf {z}), \; \forall \mathbf {z}\in F\). Given that \({\tilde{g}}\) is the pointwise maximum of convex differentiable functions, \(\partial {\tilde{g}}(\mathbf {z}) = \text {co}({\tilde{J}}(\mathbf {z}))\) (section 3.4 in [22]) and for \(\mathbf {z}\in F\),
Finally, \(\{\varvec{\lambda }{:}\,\mathbf {A}^\text {T}\varvec{\lambda }\le \mathbf {c}_0, \mathbf {z}^\text {T}\varvec{\lambda }= g(\mathbf {z})\} =\{\varvec{\lambda }{:}\,\mathbf {A}^\text {T}\varvec{\lambda }\le \mathbf {c}_0, -\mathbf {z}^\text {T}\varvec{\lambda }\le -g(\mathbf {z}) \}\). For \(\mathbf {z}\in F\), \({\tilde{g}}^\prime (\mathbf {z};\mathbf {d}) = {\tilde{g}}^\circ (\mathbf {z};\mathbf {d}) = \max \{\mathbf {d}^\text {T}\varvec{\lambda }{:}\,\varvec{\lambda }\in \partial {\tilde{g}}(\mathbf {z})\}\) by Proposition 2.1.2 in [13], and therefore \(\max \{\mathbf {d}^\text {T}\varvec{\lambda }{:}\,\varvec{\lambda }\in \partial {\tilde{g}}(\mathbf {z})\} = \max \{\mathbf {d}^\text {T}\varvec{\lambda }{:}\,\mathbf {A}^\text {T}\varvec{\lambda }\le \mathbf {c}_0, -\mathbf {z}^\text {T}\varvec{\lambda }\le -g(\mathbf {z})\}\). For \(\mathbf {z}\in F\) and \(\mathbf {d}\in \mathbb {R}^{m}\) such that there exists \(\delta >0\) such that for all \(\epsilon \in [0,\delta [\), \(\mathbf {z}+ \epsilon \mathbf {d}\in F\) and \({\tilde{g}}^\prime (\mathbf {z};\mathbf {d}) = g^\prime (\mathbf {z};\mathbf {d})\). Then, \(g^\prime (\mathbf {z}_0;\mathbf {d})\) is given by LP (8). \(\square \)
Proof of Proposition 3.6
Under Assumption 2.1 F is nonempty. If \(h^E_{-1}(\mathbf {z}) = 0\), then \(\mathbf {z}\in F\) because there exists \(\mathbf {v}\ge \mathbf {0}\) such that \(\mathbf {A}\mathbf {v}= \mathbf {z}\). Then LP (1) is feasible and by Proposition (2.2) LP (2) is also feasible for all i. If \(h^E_{-1}(\mathbf {z}) = 0\), then the variables \(\mathbf {s}_+, \; \mathbf {s}_- = \mathbf {0}\) and they can be removed together with the constraint \(\sum _{i=1}^m s_{+i} + s_{-i} = h^E_{-1}(\mathbf {z})\) from LP (12) resulting in \(h^E_i(\mathbf {z}) = h_i(\mathbf {z})\) for all \(i \in \{0,\ldots ,n_h\}\). By Proposition 2.1, F is nonempty and for \(\mathbf {z}\in F\), \(\mathbf {h}^E(\mathbf {z}) \in \mathbb {R}^{n_h+2}\). This implies that dual LPs of (11) and (12) are always feasible. Now let us show that (11) and (12) satisfy Assumption 3.1. Let \(n_h^p := n_h + 1\), \(\mathbf {A}^p := \begin{bmatrix} \mathbf {A}&\mathbf {I}_m&-\mathbf {I}_m \end{bmatrix}\). Let \((\mathbf {c}_{0}^p)^\text {T}:= \begin{bmatrix}\mathbf {0}^\text {T}&\mathbf {1}^\text {T}&\mathbf {1}^\text {T}\end{bmatrix}\) and for \(i \in \{0,\ldots ,n_h\}\), let \((\mathbf {c}_{i+1}^p)^\text {T}:= \begin{bmatrix}\mathbf {c}_i^\text {T}&\mathbf {0}^\text {T}&\mathbf {0}^\text {T}\end{bmatrix}\). Then LPs (11) and (12) can be expressed in the format of LPs (1) and (2). Since the dual LPs of (11) and (12) are always feasible, \(\mathbf {A}^p\) and \(\mathbf {c}_i^p\) for \(i \in \{0,\ldots ,n_h^p\}\) are such that \(\mathbf {h}^E(\mathbf {z})>-\varvec{\infty }\) for all \(\mathbf {z}\in \mathbb {R}^m\). Since \(\mathbf {A}\) is full row rank, \(\mathbf {A}_p\) is full row rank. Then LPs (11) and (12) satisfy Assumption 3.1. Then by Proposition 3.1, \(\mathbf {h}^E\) is l-smooth for any \(\mathbf {z}\in \mathbb {R}^m\). \(\square \)
1.1 Appendix B
Proof of Proposition 4.12 in [14] not Being Applicable
This proposition refers to optimization problems of the form
where \(\mathbf {u}\in U\) is a parameter vector and \(\varPhi \) is nonempty and closed. Under certain conditions, this proposition calculates the directional derivative of the optimal value function. To apply Proposition 4.12 in [14], we need to consider the duals of each LP in (1) and (2):
For \(i \in \{0,\ldots ,n_h\}\), let \(\mathscr {D}^i\subset \mathbb {R}^{m+i}\) be the feasible set of LP (17). Since Proposition 4.12 in [14] considers a minimization problem, \(f^i(\varvec{\lambda },\mathbf {z}) = -[\mathbf {q}^i(\mathbf {z})]^\text {T}\varvec{\lambda }\). Notice that the feasible set of (17) is independent of \(\mathbf {z}\) and nonempty under Assumption 2.1 for \(i \in \{0,\ldots ,n_h\}\). The application of Proposition 4.12 in [14] requires that the inf-compactness condition is satisfied at \(\mathbf {z}\in F\). The following propositions show that the inf-compactness condition cannot be satisfied by LLPs.
Definition B.1
Consider the optimization problem (16). The inf-compactness condition holds at \(\mathbf {u}_0 \in U\) if there exists \(\alpha \in \mathbb {R}\) and a compact set \(C \subset X\) such that for every \(\mathbf {u}\) near \(\mathbf {u}_0\), the level set \(\text {lev}_\alpha f(\cdot ,\mathbf {u}) := \{ \mathbf {x}\in \varPhi {:}\,f(\mathbf {x},\mathbf {u}) \le \alpha \}\) is nonempty and contained in C.
Proposition B.1
Let Assumption 2.1 hold and consider LP (17). Then for all i, the inf-compactness condition is not satisfied at \(\mathbf {z}_0\) if \(\mathbf {q}^i(\mathbf {z}_0) \in \text {bnd}(F_i)\).
Proof
\(F_i\) is closed [1]. Let us assume \(\mathbf {q}^i(\mathbf {z}_0) \in \text {bnd}(F_i)\). Since \(\mathbf {q}^i(\mathbf {z}_0) \in F_i\), the solution set of LP (17) is nonempty, and since it can be described as a polyhedron, it is closed and convex. Proposition 3.5 in [15] implies that \(\mathbf {q}^i(\mathbf {z}_0) \in \text {int}(F_i)\) if and only if solution set of LP (17) is nonempty, convex, and compact. Since \(\mathbf {q}^i(\mathbf {z}_0) \in \text {bnd}(F_i)\), the solution set of LP (17) cannot be compact and therefore must be unbounded. Since the optimal level set is unbounded and this is the smallest nonempty level set, there is no nonempty bounded level set at \(\mathbf {z}_0\) and the inf-compactness condition is not satisfied. \(\square \)
Proposition B.2
Let Assumption 2.1 hold. For all \(i>0\), \(\mathbf {q}^i(F) \subset \text {bnd}(F_i)\).
Proof
By Proposition 2.2, \(\mathbf {q}^i(F) \subset F_i\). In addition, \(q^i_{m+i}(\mathbf {z}) = h_{i-1}(\mathbf {z})\). Let \(\mathbf {d}= -\mathbf {e}_{m+i}\). For any \(\epsilon >0\), \(\mathbf {q}^i(\mathbf {z}) + \epsilon \mathbf {d}\notin F_i\) because the \((m+i)^{th}\) component of \(\mathbf {q}^i(\mathbf {z})\) is \(h_{i-1}(\mathbf {z})\) which is optimal. Hence, any value less than \(h_{i-1}(\mathbf {z})\) results in an infeasible LP. Then \(\mathbf {q}^i(\mathbf {z}) \in \text {bnd}(F_i)\) and thus \(\mathbf {q}^i(F) \subset \text {bnd}(F_i)\). \(\square \)
Hence, by Propositions B.1 and B.2, Proposition 4.12 in [14] cannot be applied to LLPs and an alternative method to calculate the LD-derivatives of the optimal values function is required.
1.2 Appendix C
Let \(\mathbf {h}{:}\,\mathbb {R}^2_+ \rightarrow \mathbb {R}^2\) where:
with \(\mathbf {C} = -\mathbf {I}_2\), where first \(v_1\) is maximized, and then \(v_2\) is maximized. LP (18) can be transformed into standard form to obtain the form of LLP (1) and (2):
where \({\hat{\mathbf {C}}} = \begin{bmatrix} -1&0&0&0 \\ 0&-1&0&0\end{bmatrix}\). For LP (18), \(F = \left\{ \mathbf {b}{:}\,\mathbf {b}\ge \mathbf {0} \right\} \). Consider \(\mathbf {b}_0 = [1 \; \; 1]^\text {T}\). Clearly, \(\mathbf {b}_0 \in \text {int}(F)\) and \(\mathbf {h}(\mathbf {b}_0) = [-1 \;\; 0]^\text {T}\). \(\mathbf {h}\) is not differentiable at \(\mathbf {b}_0\). Given \(\mathbf {M}\in \mathbb {R}^{2 \times 2}\) and LP (19), \(\mathbf {h}^\prime (\mathbf {b}_0;\mathbf {M})\) can be computed with Proposition 3.4. Consider \(\mathbf {M}_1 = \mathbf {I}_2\), \(\mathbf {M}_2 = \begin{bmatrix} 1&0 \\ 0&-1\end{bmatrix}\), \(\mathbf {M}_3 = \begin{bmatrix} -1&0 \\ 0&1\end{bmatrix}\), and \(\mathbf {M}_4 = -\mathbf {I}_2\). The resulting LD-derivatives are: \(\mathbf {h}^\prime (\mathbf {b}_0;\mathbf {M}_1) = \begin{bmatrix} 0&-1 \\ 0&0\end{bmatrix}\), \(\mathbf {h}^\prime (\mathbf {b}_0;\mathbf {M}_2) = \begin{bmatrix} 0&1 \\ 0&0\end{bmatrix}\), \(\mathbf {h}^\prime (\mathbf {b}_0;\mathbf {M}_3) = \begin{bmatrix} 1&0 \\ -1&-1\end{bmatrix}\), and \(\mathbf {h}^\prime (\mathbf {b}_0;\mathbf {M}_4)= \begin{bmatrix} 1&0 \\ -1&1\end{bmatrix}\) (Fig. 1).
Different elements of the lexicographic subdifferential are obtained from the linear system \(\mathbf {J}_\text {L}\mathbf {h}(\mathbf {b};\mathbf {M})\mathbf {M}= \mathbf {h}^\prime (\mathbf {b};\mathbf {M})\). The resulting l-derivatives are:
This figure shows graphically how \(\mathbf {M}_1,\mathbf {M}_2,\mathbf {M}_3\) and \(\mathbf {M}_4\) result in different LD-derivatives at \(\mathbf {b}_0 = [1 \; \; 1]^\text {T}\). a Feasible set (gray) and optimal solution point (red dot) for LP (18). If the first column of the matrix of directions is \([1 \; \; 0]^\text {T}\), b is obtained. In b, \(b_1\) is increased and the optimal solution point does not change. Then, the second column of the matrix of directions can be \([0 \; 1]^\text {T}\) or \([0 \; \; -1]^\text {T}\) resulting in c and d, respectively. In c, the solution point changes such that \(h_0\) decreases and in d it changes such that \(h_0\) increases. If the first column of the matrix of directions is \([-1 \; \; 0]^\text {T}\), e is obtained. In e, the solution point changes such that \(h_0\) increases and \(h_1\) decreases. Then, the second column of the matrix of directions can be \([0 \; \; 1]^\text {T}\) or \([0 \; \; -1]^\text {T}\) resulting in f and g, respectively. In f, the solution point changes such that \(h_1\) decreases, and in g it changes such that \(h_1\) increases
Notice that \(\mathbf {M}_1\) and \(\mathbf {M}_2\) result in the same l-derivative matrix as well as \(\mathbf {M}_3\) and \(\mathbf {M}_4\). These two l-derivatives form the B-subdifferential of \(\mathbf {h}\) at \(\mathbf {b}_0 = [1 \; \; 1]^\text {T}\). Proposition 2.6.2 in [13] shows that for a nonscalar function \(\mathbf {h}\) evaluated at \(\mathbf {b}_0\), Clarke’s generalized Jacobian is a subset of the Cartesian product of the generalized gradients of each component of \(\mathbf {h}\). In this example, the Cartesian product of the generalized gradients of \(h_0\) and \(h_1\) at \(\mathbf {b}_0 = [1 \; \; 1]^\text {T}\) results in the convex hull of \(\left\{ \begin{bmatrix} 0&-\,1 \\ 0&0 \end{bmatrix}, \begin{bmatrix} 0&-\,1 \\ 1&-\,1 \end{bmatrix}, \begin{bmatrix} -\,1&0 \\ 0&0 \end{bmatrix}, \begin{bmatrix} -\,1&0 \\ 1&-\,1 \end{bmatrix} \right\} \). However, the kinks in the functions \(h_0\) and \(h_1\) are lined up such that the B-subdifferential of \(\mathbf {h}\) at \(\mathbf {b}_0 = [1 \; \; 1]^\text {T}\) contains only two matrices. The LD-derivatives are guaranteed to find at most these two matrices (Fig. 2). The results of this example can be easily verified by expressing \(\mathbf {h}\) as:
Surface plots of \(\mathbf {h}\) with respect to \(\mathbf {b}\). The red dots indicate the point \(\mathbf {b}_0 = [1 \; \; 1]^\text {T}\). Notice that both components of \(\mathbf {h}\) can be divided into two regions of differentiability with two different gradients. In particular, \(\nabla h_0(\mathbf {b}) = [-1 \; \; 0]^\text {T}\) or \(\nabla h_0(\mathbf {b}) = [0 \; \; -1]^\text {T}\) and \(\nabla h_1(\mathbf {b}) = [0 \; \; 0]^\text {T}\) or \(\nabla h_1(\mathbf {b})[1 \; -1]^\text {T}\). The four matrices of directions \(\mathbf {M}_1,\mathbf {M}_2,\mathbf {M}_3\) and \(\mathbf {M}_4\) probe possible combinations of these gradients at \(\mathbf {b}_0\), resulting in two different l-derivative matrices. In fact, these two matrices form the B-subdifferential of \(\mathbf {h}\) at \(\mathbf {b}_0\). In this case, the generalized Jacobian of \(\mathbf {h}\) at \(\mathbf {b}_0\) is a strict subset of the Cartesian product of the generalized gradients of each component of \(\mathbf {h}\) at \(\mathbf {b}_0\)
1.3 Appendix D
Model Description of Example 4.1
-
(a)
Mass balance of biomass of C. ijungdahlii:
$$\begin{aligned}&\frac{\partial X}{\partial t}(z,t) = \mu (z,t) X(z,t) - \frac{u_L}{\epsilon _L}\frac{\partial X}{\partial z}(z,t) + D_A \frac{\partial ^2 X}{\partial z^2}(z,t), \\&u_L X(0,t) - \epsilon _L D_A \frac{\partial X}{\partial z}(0,t) = 0, \;\; \frac{\partial X}{\partial z}(L,t) = 0, \; X(z,0) = X_0, \end{aligned}$$where X is the concentration of biomass, t is time, \(\mu \) is the growth rate, \(u_L\) is the liquid velocity, z is the spatial position, \(D_A\) is the diffusivity, \(\epsilon _L\) is the liquid volume fraction in the reactor, and \(X_0\) is the initial biomass concentration.
-
(b)
Mole balances of liquid-phase CO, H\(_2\), CO\(_2\), ethanol, and acetate:
$$\begin{aligned}&\frac{\partial G_L}{\partial t}(z,t) = v_G(z,t) X(z,t) + \frac{k_{m,G}}{\epsilon _L}(G^*(z,t) - G_L(z,t)) \\&\quad -\,\frac{u_L}{\epsilon _L} \frac{\partial G_L}{\partial z}(z,t) + D_A \frac{\partial ^2 G_L}{\partial z^2}(z,t), \\&u_L G_L(0,t) - \epsilon _L D_A \frac{\partial G_L}{\partial z}(0,t) = u_L G_{gF} H_G, \; \frac{\partial G_L}{\partial z}(L,t) = 0, \; G_L(z,0) = G_{L0}, \end{aligned}$$where G can be CO, H\(_2\), CO\(_2\), ethanol, and acetate concentrations, \(v_G\) refers to the exchange flux rate for species G, \(k_{m,G}\) the liquid mass transfer coefficient for species G, \(G_L\) the liquid concentration of species G, \(G^*\) the liquid concentration in equilibrium with the gas concentration of species G, \(G_{gF}\) the gas concentration in the feed of species G, \(G_{L0}\) the initial concentration of species G in the liquid, and \(H_G\) is Henry’s constant for species G.
-
(c)
Mole balances of gas-phase CO, H\(_2\), and CO\(_2\):
$$\begin{aligned} \frac{\partial G_g}{\partial t}(z,t)&= - \frac{k_{m,G}}{\epsilon _g}(G^*(z,t) - G_L(z,t)) - \frac{u_g}{\epsilon _g} \frac{\partial G_g}{\partial z}(z,t), \\ G_g(0,t)&= G_{gF}, G_g(z,0) = G_{g0}, \end{aligned}$$where \(G_g\) is the concentration of species G in the gas, \(\epsilon _g\) is the gas volume fraction, \(u_g\) is the gas velocity, and \(G_{g0}\) is the initial gas concentration of species G.
-
(d)
Column pressure profile: \(P(z) = P_L + \rho _L g(L-z)\), where P is the pressure as a function of position in the column, \(P_L\) is the pressure at the top of the column, \(\rho _L\) is the density of the liquid, L is the size of the column, and g is gravitational acceleration.
The spatial dimension of the bubble column was discretized using the finite volume strategy; 100 nodes were considered. The state vector for each finite volume is:
where A stands for acetate and E for ethanol. To obtain the growth rate \(\mu \) and the exchange flux rates \(v_G\), this problem is transformed into a DFBA problem using LP (13).
The resulting nonsmooth system of equations were solved for the following parameter values:
where \(P_0\) is the pressure at the bottom of the column, \(P_G\) is the partial pressure of species G, T is the temperature in Kelvin, and R is the universal gas constant. This system comprises 901 equations (the last equation corresponds to the penalty), 100 LLPs each one with 682 equality constraints, 1715 variables, and 7 objective functions.
Rights and permissions
About this article
Cite this article
Gomez, J.A., Höffner, K., Khan, K.A. et al. Generalized Derivatives of Lexicographic Linear Programs. J Optim Theory Appl 178, 477–501 (2018). https://doi.org/10.1007/s10957-018-1309-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-018-1309-2
Keywords
- Linear programming
- Lexicographic linear programming
- Lexicographic optimization
- Nonsmooth analysis
- Generalized derivatives
- Lexicographic differentiation
- Nonsmooth equation solving