5.1 拉格朗日对偶函数（The Lagrange dual function）

5.1.1 拉格朗日（Lagrange）

An optimization problem in the standard form:
min⁡f0(x)s.t.fi(x)≤0,i=1,...,mhi(x)=0,i=1,...,p\begin{array}{lll} \min f_0(x) \\ s.t. ~ f_i(x) \le 0, i=1,...,m \\ \quad ~~ h_i(x) = 0, i =1,...,p \end{array}minf0(x)s.t. fi(x)≤0,i=1,...,m hi(x)=0,i=1,...,p with variable x∈Rnx\in \mathbb{R}^nx∈Rn. We assume its domain D=⋂i=0mdom⁡fi∩⋂i=1pdom⁡hi\mathcal{D}=\bigcap_{i=0}^{m} \operatorname{dom} f_{i} \cap \bigcap_{i=1}^{p} \operatorname{dom} h_{i}D=⋂i=0mdomfi∩⋂i=1pdomhi is nonempty, and denote the optimal value of the problem by p∗p^*p∗. We do not assume the problem is convex.

The basic idea in Lagrangian duality is to take the constraints into account by augmenting the objective function with a weighted sum of the constraint functions. We define the Lagrangian (function) LLL: Rn×Rm×Rp→R\mathbb{R}^n \times \mathbb{R}^m \times \mathbb{R}^p \rightarrow \mathbb{R}Rn×Rm×Rp→R associated with the problem as
L(x,λ,v)=f0(x)+∑i=1mλifi(x)+∑i=1pvihi(x)L(x,\lambda,v)=f_0(x)+\sum_{i=1}^{m}\lambda_i f_i(x) + \sum_{i=1}^{p} v_i h_i(x) L(x,λ,v)=f0(x)+i=1∑mλifi(x)+i=1∑pvihi(x) with dom L=D×Rm×RpL=D \times \mathbb{R}^m \times \mathbb{R}^pL=D×Rm×Rp. We refer to λi\lambda_iλi as the Lagrange multiplier associated with the iiith inequality constraint fi(x)≤0f_i(x)\le 0fi(x)≤0; similarly viv_ivi is the Lagrange multiplier associated with the iiith inequality constraint hi(x)=0h_i(x) = 0hi(x)=0. The vectors λ\lambdaλ and ννν are called the dual variables or Lagrange multiplier vectors associated with the problem.

5.1.2 朗格朗日对偶函数（The Lagrange dual function）

Define Lagrange dual function ggg: Rm×Rp→R\mathbb{R}^m \times \mathbb{R}^p \rightarrow \mathbb{R}Rm×Rp→R as the minimum value of the Lagrangian over xxx: for λ∈Rm\lambda \in \mathbb{R}^mλ∈Rm, v∈Rpv \in \mathbb{R}^pv∈Rp,
g(λ,v)=inf⁡x∈DL(x,λ,v)=inf⁡x∈D(f0(x)+∑i=1mλifi(x)+∑i=1pvihi(x))g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)=\inf _{x \in \mathcal{D}}\left(f_{0}(x)+\sum_{i=1}^{m} \lambda_{i} f_{i}(x)+\sum_{i=1}^{p} v_{i} h_{i}(x)\right)g(λ,v)=x∈DinfL(x,λ,v)=x∈Dinf(f0(x)+i=1∑mλifi(x)+i=1∑pvihi(x))
When the Lagrangian is unbounded below in xxx, the dual function takes on the value −∞−\infty−∞. Since the dual function is the pointwise infimum of a family of affine functions of (λ,ν)(λ,ν)(λ,ν), it is concave, even when the problem is not convex.

5.1.3 最优值的下界（Lower bounds on optimal value）

The dual function yields lower bounds on the optimal value p ⋆ of the problem (5.1): For any 0⪯λ0 \preceq \lambda0⪯λ and any ννν we have
g(λ,v)=inf⁡x∈DL(x,λ,v)≤p∗,g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)\le p^* ,g(λ,v)=x∈DinfL(x,λ,v)≤p∗, since g(λ,v)=inf⁡x∈DL(x,λ,v)≤L(x~,λ,v)≤f0(x~)≤p∗,g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)\le L(\tilde{x}, \lambda, v)\le f_0(\tilde{x}) \le p^* , g(λ,v)=x∈DinfL(x,λ,v)≤L(x~,λ,v)≤f0(x~)≤p∗, where x~\tilde{x}x~ is a feasible solution for the problem.

5.1.5 例子（Examples）

线性等式的最小二乘解（Least-squares solution of linear equations）

we consider the problem:
min⁡xTxs.t.Ax=b\begin{array}{ll} \min~ x^Tx \\ s.t. ~~Ax=b \end{array}min xTxs.t. Ax=b where A∈Rp×nA\in \mathbb{R}^{p \times n}A∈Rp×n.
Then, we give the Lagrangian :
L(x,v)=xTx+vT(Ax−b),L(x,v) = x^Tx + v^T(Ax-b),L(x,v)=xTx+vT(Ax−b), with domain Rn×Rp.\mathbb{R}^n \times \mathbb{R}^p.Rn×Rp.
Since L(x,ν) is a convex quadratic function of xxx, we can find the minimizing x from the optimality condition
∇xL(x,v)=2x+ATv=0,\nabla_xL(x,v) = 2x + A^Tv =0,∇xL(x,v)=2x+ATv=0, which yields x∗=−(12ATv)x^*= -(\frac{1}{2}A^Tv)x∗=−(21ATv) . Thereforethe dual function is
g(v)=L(−(1/2)ATv,v)=−(1/4)vTAATv−bTvg(v)=L\left(-(1 / 2) A^{T} v, v \right)=-(1 / 4) v^{T} A A^{T} v-b^{T} vg(v)=L(−(1/2)ATv,v)=−(1/4)vTAATv−bTv, which is a concave quadratic function of xxx, with domain Rp\mathbb{R}^pRp.

5.1.6 拉格朗日函数&共轭函数（The Lagrange dual function and conjugate functions）

the conjugate f∗f^*f∗ of a function fff: Rn→R\mathbb{R}^n\rightarrow \mathbb{R}Rn→R is given by
f∗(y)=sup⁡x∈domf(yTx−f(x))f^*(y) = \sup_{x\in \mathbf{dom} f} (y^Tx-f(x))f∗(y)=x∈domfsup(yTx−f(x))

Given a problem:
min⁡f(x)s.t.x=0\begin{array}{ll} \min~ f(x) \\ s.t. ~~x=0 \end{array}min f(x)s.t. x=0 Then, we have Lagrangian L(x,v)=f(x)+vTxL(x,v)=f(x)+v^TxL(x,v)=f(x)+vTx, and dual function is g(v)=inf⁡x(f(x)+vTx)=−sup⁡x((−v)Tx−f(x))=−f∗(−v)g(v)=\inf_x (f(x)+v^Tx)=-\sup_x((-v)^Tx-f(x))=-f^*(-v)g(v)=xinf(f(x)+vTx)=−xsup((−v)Tx−f(x))=−f∗(−v).
More generally, consider an optimization problem with linear inequality and equality constraints,
min⁡f0(x)s.t.Ax⪯bCx=d.\begin{array}{ll} \min~ f_0(x) \\ s.t. ~~Ax\preceq b\\ \qquad Cx=d. \end{array}min f0(x)s.t. Ax⪯bCx=d.
Using the conjugate of f0f_0f0, we can rewrite the dual function as follows,
g(λ,v)=inf⁡x(f0(x)+λ(Ax−b)+vT(Cx−d))=−bTλ−dTv+inf⁡x(f0(x)+(AT+CTv)Tx)=−bTλ−dTv−f0∗(−ATλ−CTv).\begin{array}{ll} g(\lambda,v)&=\inf_x (f_0(x)+\lambda(Ax-b)+v^T(Cx-d)) \\ &=-b^T\lambda-d^Tv+\inf_x (f_0(x)+(A^T+C^Tv)^Tx) \\ &=-b^T\lambda-d^Tv-f_0^*(-A^T\lambda-C^Tv). \end{array}g(λ,v)=infx(f0(x)+λ(Ax−b)+vT(Cx−d))=−bTλ−dTv+infx(f0(x)+(AT+CTv)Tx)=−bTλ−dTv−f0∗(−ATλ−CTv).

5.2 朗格朗日对偶问题（The Lagrange dual problem）

The Lagrange dual problem of a Lagrange dual problem is primary problem.

For each pair (λ,v)(\lambda,v)(λ,v) with λ>0\lambda>0λ>0, the Lagrange dual function gives us a lower bound on the optimal value p∗p^*p∗ of the optimization problem. We can obtain from the Lagrange dual function by the optimization problem:
max⁡g(λ,v)s.t.λ>0\begin{array}{ll} \max~ g(\lambda,v) \\ s.t. ~~\lambda>0 \end{array}max g(λ,v)s.t. λ>0 The above problem is called the Lagrange dual problem. The term dual feasible, to describe a pair (λ,v)(\lambda,v)(λ,v) with λ>0λ> 0λ>0 and g(λ,v)>−∞g(\lambda,v) > −\inftyg(λ,v)>−∞, means, as the name implies, that (λ,v)(λ,v)(λ,v) is feasible for the dual problem. We refer to (λ⋆,ν∗)(λ^⋆ ,ν^*)(λ⋆,ν∗) as dual optimal or optimal Lagrange multipliers if they are optimal for the problem. The Lagrange dual problem is a convex optimization problem, since the objective to be maximized is concave and the constraint is convex.

5.2.1 明确双重约束（Making Dual constraints explicit）

The examples above show that it is not uncommon for the domain of the dual function, domg={(λ,v)∣g(λ,v)>−∞)}\mathbf{dom} ~g = \{ (\lambda,v)~|~g(\lambda ,v)>-\infty ) \}dom g={(λ,v) ∣ g(λ,v)>−∞)}, to have dimension smaller than m+pm+pm+p, i.e., domg∈Rm+p\mathbf{dom} ~g \in \mathbb{R}^{m+p}dom g∈Rm+p.

A. 标准形式的朗格朗日对偶（Lagrange dual of standard from LP）

We found that the Lagrange dual function for the standard form LP
min⁡cTxs.t.Ax=bx⪰0\begin{array}{ll} \min c^Tx \\ s.t. ~~Ax = b \\ \qquad x \succeq 0 \end{array}mincTxs.t.  Ax=bx⪰0 is given by g(λ,v)={−bTv,ATv−λ+c=0−∞,otherwiseg(\lambda,v) = \{ \begin{array}{rcl} -b^Tv, ~A^Tv-\lambda + c = 0 \\ -\infty, \qquad \quad otherwise \end{array} g(λ,v)={−bTv, ATv−λ+c=0−∞,otherwise Strictly speaking, the Lagrange dual problem of the standard form LP is to maximize this dual function g subject to λ>0λ > 0λ>0, i.e., max⁡g(λ,v)={−bTv,ATv−λ+c=0−∞,otherwises.t.λ>0\begin{array}{ll} \max ~g(\lambda,v) = \{ \begin{array}{rcl} -b^Tv, ~A^Tv-\lambda + c = 0 \\ -\infty, \qquad \quad otherwise \end{array} \\ s.t. ~~~ \lambda > 0 \end{array}max g(λ,v)={−bTv, ATv−λ+c=0−∞,otherwises.t.   λ>0
Here, ggg is finite only when ATv−λ+c=0A^Tv - \lambda+c=0ATv−λ+c=0
We can form an equivalent problem by making these equality constraints explicit: max⁡−bTvs.t.ATv−λ+c=0λ⪰0\begin{array}{ll} \max~ -b^Tv \\ s.t. ~~A^T v - \lambda + c = 0 \\ \qquad \lambda \succeq 0 \end{array}max −bTvs.t.  ATv−λ+c=0λ⪰0
This problem, in turn, can be expressed as max⁡−bTvs.t.ATv+c⪰0\begin{array}{ll} \max~ -b^Tv \\ s.t. ~~A^T v + c \succeq 0 \end{array}max −bTvs.t.  ATv+c⪰0 which is an LP in inequality form.
Note that the first problem is the Lagrange dual of the standard form LP, which is equivalent to the two problems last.

B. 不等式形式线性规划的朗格朗日对偶（Lagrange Dual of Inequality Form LP）

In a similar way, we can find the Lagrange dual problem pf a linear program in inequality form
P0:min⁡cTxs.t.Ax⪯b.\begin{array}{ll} P0: &\min ~c^Tx \\ &s.t. ~~ Ax \preceq b. \end{array}P0:min cTxs.t. Ax⪯b. The Lagrangian is L(x,λ)=cTx+λT(Ax−b)=−bTλ+(ATλ+c)Tx,L(x,\lambda)=c^Tx+\lambda^T(Ax-b) = -b^T\lambda + (A^T\lambda+c)^Tx,L(x,λ)=cTx+λT(Ax−b)=−bTλ+(ATλ+c)Tx, so the dual function is g(λ)=inf⁡xL(x,λ)=−bTλ+inf⁡x(ATλ+c)Tx.g(\lambda)=\inf_x L(x,\lambda) = -b^T \lambda + \inf_x (A^T\lambda + c)^T x .g(λ)=xinfL(x,λ)=−bTλ+xinf(ATλ+c)Tx.
So the dual function is
g(λ)={−bTλ,ATλ+c=0−∞,otherwiseg(\lambda) = \{ \begin{array}{rcl} -b^T\lambda, ~A^T\lambda + c = 0 \\ -\infty, ~\quad otherwise \end{array} g(λ)={−bTλ, ATλ+c=0−∞, otherwise
The dual variable λ\lambdaλ is dual feasible if λ⪰0\lambda \succeq0λ⪰0 and ATλ+c=0.A^T \lambda + c=0.ATλ+c=0.
The Langrange dual of the LP is to maximize ggg over all λ⪰0\lambda \succeq 0λ⪰0. Again we can reformulate the Lagrange dual by explicitly including the dual feasibility conditions as constraints, as in
P1:max⁡bTλs.t.ATλ+c=0,λ⪰0,\begin{array}{ll} P1: &\max~b^T\lambda \\ &s.t. ~~ A^T \lambda + c = 0 ,\\ &\qquad \lambda \succeq 0, \end{array}P1:max bTλs.t. ATλ+c=0,λ⪰0, which is an LP in standard form.
Note that the Lagrange dual of the problem P1P1P1 is (equivalent to) the primal problem P0P0P0.

5.2.2 弱对偶（Weak Duality）

The optimal value of the Lagrange dual problem, which we denote d∗d^*d∗, is, by definition, the best lower bound on p∗p^*p∗ that can be obtained from the Lagrange dual function. In particular, we have the simple but important inequality, called as weak duality, d∗<p∗,d^*<p^*,d∗<p∗, which holds even if the original problem is not convex. The weak duality inequality holds even if d∗d^*d∗ and p∗p^*p∗ are infinite.
We refer to the difference p∗−d∗p^*-d^*p∗−d∗ as the optimal duality gap of the original problem, since it gives the gap between the optimal value of the primal problem and the best (i.e., greatest) lower bound on it that can be obtained from the Lagrange dual function.

5.2.3 强对偶&Slater的约束准则（Strong Duality & Slater’s Constraint Qualification）

If the equality d∗=p∗d^* = p^*d∗=p∗ holds, i.e., the optimal duality gap is zero, then we say that strong duality holds.

Strong duality does not, in general, hold. But if the primal problem is convex, i.e., of the form
P0:min⁡f0(x)s.t.fi(x)≤b.i=1,...,m,Ax=b,\begin{array}{ll} P0: & \min~ f_0(x) \\ & s.t. ~~ f_i(x) \leq b. i =1,...,m,\\ & \qquad Ax=b, \end{array}P0:min f0(x)s.t. fi(x)≤b.i=1,...,m,Ax=b, with f0,...,fmf_0,...,f_mf0,...,fm convex, we usually (but not always) have the strong duality.
Some conditions on the problem, under which strong holds, are called constraint qualifications. One simple constraint qualification is Salter’s condition: There exists an x∈relintDx \in \mathbf{relint}~ Dx∈relint D such that fi(x)<0,i=1,...,m,Ax=b.f_i(x)<0,i=1,...,m, \quad Ax = b.fi(x)<0,i=1,...,m,Ax=b. Such a point is sometimes called strictly feasible, since the inequality constraints holds with strict inequalities. Slater’s theorem states that strong duality holds, if 1) Slater’s condition holds and 2) the problem is convex.
Slater’s condition can be refined when some of the inequality constraint functions fif_ifi are affine. If the first kkk constraint functions f1,...,fkf_1,...,f_kf1,...,fk are affine, then the strong duality holds provided the following condition holds: There exists an There exists an x∈relintDx \in \mathbf{relint}~ Dx∈relint D such that fi(x)≤0,i=1,...,k,fi(x)<0,i=k+1,...,m,Ax=b.f_i(x)\leq 0,i=1,...,k, \quad f_i(x)<0, i=k+1,...,m, \quad Ax = b.fi(x)≤0,i=1,...,k,fi(x)<0,i=k+1,...,m,Ax=b.

5.2.4 Examples

A. Lagrange dual of QCQP

We consider the QCQP
P0:min⁡12xTP0x+q0Tx+r0s.t.12xTP0x+q0Tx+r0≤0,i=1,...,m,\begin{array}{ll} P0: & \min~ \frac{1}{2}x^TP_0x+q^T_0x +r_0 \\ & s.t. ~~ \frac{1}{2}x^TP_0x+q^T_0x +r_0 \le 0, i =1,...,m, \end{array}P0:min 21xTP0x+q0Tx+r0s.t. 21xTP0x+q0Tx+r0≤0,i=1,...,m, with P0∈S++nP_0 \in \mathbf{S}_{++}^nP0∈S++n and Pi∈S+nP_i \in \mathbf{S}_{+}^nPi∈S+n, i=1,...,mi=1,...,mi=1,...,m.
The Lagrangian is L(x,λ)=12xTP0x+q0Tx+r0+∑i=1mλi[12xTP0x+q0Tx+r0],i=1,...,m，=12xTP(λ)x+q(λ)Tx+r(λ)\begin{array}{ll}L(x,\lambda) & = \frac{1}{2}x^TP_0x+q^T_0x +r_0 + \sum_{i=1}^{m} \lambda_i [ \frac{1}{2}x^TP_0x+q^T_0x +r_0], i=1,...,m，\\ &= \frac{1}{2}x^TP(\lambda)x+q(\lambda)^Tx +r(\lambda) \end{array}L(x,λ)=21xTP0x+q0Tx+r0+∑i=1mλi[21xTP0x+q0Tx+r0],i=1,...,m，=21xTP(λ)x+q(λ)Tx+r(λ) where P(λ)=P0+∑i=1mλiPi,P(\lambda)=P_0 + \sum_{i=1}^m \lambda_i P_i,P(λ)=P0+∑i=1mλiPi, q(λ)=q0+∑i=1mλiqi,q(\lambda)=q_0 + \sum_{i=1}^m \lambda_i q_i,q(λ)=q0+∑i=1mλiqi, and r(λ)=r0+λiri.r(\lambda)=r_0+\lambda_i r_i.r(λ)=r0+λiri.
If λ⪰0λ \succeq 0λ⪰0, however, we have P(λ)≻0P(λ) \succ 0P(λ)≻0 and g(λ)=inf⁡xL(x,λ)=−12q(λ)TP(λ)−1q(λ)+r(λ).g(\lambda) = \inf_x L(x,\lambda) = - \frac{1}{2}q(\lambda)^T P(\lambda)^{-1} q(\lambda) + r(\lambda).g(λ)=xinfL(x,λ)=−21q(λ)TP(λ)−1q(λ)+r(λ). We can therefore express the dual problem as
P1:max⁡g(λ)s.t.λ⪰0\begin{array}{ll} P1: &\max ~ g(\lambda) \\ &s.t. ~~ \lambda \succeq 0 \end{array}P1:max g(λ)s.t. λ⪰0 The Slater condition says that strong duality between the primal problem P0P0P0 and the dual problem P1P1P1 holds if the quadratic inequality constraints are strictly feasible, i.e., there exists an xxx with
(12)xTPix+qiTx+ri<0,i=1,...,m.( \frac{1}{2}) x^T P_i x + q_i^T x + r_i < 0, i=1,...,m.(21)xTPix+qiTx+ri<0,i=1,...,m.

B. A nonconvex quadratic problem with strong duality

On rare occasions, strong duality obtains for a nonconvex problem. As an important example, we consider the problem of minimizing a nonconvex quadratic function over the unit ball,
P0:max⁡xTAx+2bTxs.t.xTx≤1,\begin{array}{ll} P0: &\max ~ x^TAx + 2b^Tx \\ &s.t. ~~~ x^Tx \leq 1, \end{array}P0:max xTAx+2bTxs.t.   xTx≤1, where A∈SnA \in \mathbf{S}^nA∈Sn and b∈Rnb\in\mathbf{R}^nb∈Rn. When A⋡0A \nsucceq 0A⋡0, ths is not a convex problem. This problem is called the trust region problem.
The Lagrangian is L(x,λ)=xTAx+2bTx+λ(xTx−1)=xT(A+λI)x+2bTx−λ,L(x,\lambda) = x^TAx + 2b^Tx + \lambda(x^Tx-1)=x^T(A+\lambda \mathbf{I})x + 2b^Tx - \lambda,L(x,λ)=xTAx+2bTx+λ(xTx−1)=xT(A+λI)x+2bTx−λ,
so the dual function is given by
g(λ)={−bT(A+λI)†b−λ,ifA+λI⪰0,b∈R(A+λI)−∞,otherwise,g(\lambda) = \{ \begin{array}{ll} -b^T(A+\lambda\mathbf{I})^{\dagger} b -\lambda, &\mathrm{if} ~ A+\lambda\mathbf{I} \succeq 0, b \in \mathcal{R}(A+\lambda\mathbf{I}) \\ -\infty, &otherwise, \end{array}g(λ)={−bT(A+λI)†b−λ,−∞,if A+λI⪰0,b∈R(A+λI)otherwise,
where (A+λI)†(A+\lambda\mathbf{I})^\dagger(A+λI)† is the preudo-inverse of (A+λI)(A+\lambda\mathbf{I})(A+λI). The Lagrange dual problem is thus
P1:max⁡−bT(A+λI)†b−λs.t.A+λI⪰0,b∈R(A+λI),\begin{array}{ll} P1: & \max ~ -b^T(A+\lambda\mathbf{I})\dagger b - \lambda \\ &s.t. ~~~ A + \lambda\mathbf{I} \succeq 0, ~ b \in \mathcal{R}(A + \lambda \mathbf{I} ) , \end{array}P1:max −bT(A+λI)†b−λs.t.   A+λI⪰0, b∈R(A+λI), with variable λ∈R\lambda \in \mathbf{R}λ∈R.
The Lagrange dual problem is a convex optimization problem. In fact, it is readily solved since it can be expressed as
max⁡−∑i=1n(qiTb)2(λi+λ)−λs.t.λ≥−λmin⁡(A),\begin{array}{ll} \max ~ -\frac{\sum_{i=1}^{n} (q_i^T b)^2}{(\lambda_i+\lambda)} - \lambda \\ s.t. ~~~ \lambda \geq - \lambda_{\min}(A), \end{array}max −(λi+λ)∑i=1n(qiTb)2−λs.t.   λ≥−λmin(A), where λi\lambda_iλi and qiq_iqi are the eigenvalues and corresponding (orthonormal) eigenvectors of AAA, and we interpret (qiTb)2/0(q_i^Tb)^2 / 0(qiTb)2/0 as 000, if qiTb=0q_i^T b = 0qiTb=0 and as ∞\infty∞ otherwise.
Despite the original problem P0P0P0 is not convex, the strong duality still holds. In fact, a more general result holds: strong duality holds for any optimization problem with quadratic objective and one quadratic inequality constraint, provided Slater’s condition holds.

5.4 鞍点解释（Saddle-Point Interpretation）

5.4.1 Max-Min characterization of weak and strong duality

First note that
sup⁡λ⪰0L(x,λ)=sup⁡λ⪰0(f0(x)+∑i=1mλifi(x))={f0(x),iffi(x)<0,i=1,...,m∞,otherwise.\sup_{\lambda \succeq 0} L(x,\lambda) = \sup_{\lambda \succeq 0 } (f_0(x) + \sum_{i=1}^m \lambda_i f_i(x)) = \{ \begin{array}{ll} f_0(x), ~&\mathrm{if}~f_i(x)<0, ~ i =1,...,m \\ \infty, &otherwise. \end{array}λ⪰0supL(x,λ)=λ⪰0sup(f0(x)+i=1∑mλifi(x))={f0(x), ∞,if fi(x)<0, i=1,...,motherwise.
Suppose xxx is not feasible, and fi(x)>0f_i(x)>0fi(x)>0 for some iii. Then sup⁡λ⪰0L(x,λ)=∞,\sup_{\lambda \succeq 0} L(x,\lambda) = \infty,supλ⪰0L(x,λ)=∞, as can be seen by choosing λj=0,j≠i,\lambda_j = 0,~j \neq i,λj=0, j=i, and λi→∞\lambda_i \rightarrow \inftyλi→∞. On the other hand, if fi(x)<0f_i(x)<0fi(x)<0, i=1,...,mi=1,...,mi=1,...,m, then the optimal choice of λ\lambdaλ is λ=0\lambda = 0λ=0 and sup⁡λ⪰0L(x,λ)=f0(x).\sup_{\lambda \succeq 0} L(x,\lambda) = f_0(x).supλ⪰0L(x,λ)=f0(x). This means that we can express the optimal value of the primal problem as p∗=inf⁡xsup⁡λ⪰0L(x,λ).p^* = \inf_x \sup_{\lambda \succeq 0} L(x,\lambda).p∗=xinfλ⪰0supL(x,λ).
By the definition of the dual function, we also have optimal value of the dual problem d∗=sup⁡λ⪰0inf⁡xL(x,λ).d^* = \sup_{\lambda \succeq 0} \inf_x L(x,\lambda).d∗=λ⪰0supxinfL(x,λ).
Thus, the weak duality can be expressed as the inequality d∗=sup⁡λ⪰0inf⁡xL(x,λ)≤inf⁡xsup⁡λ⪰0L(x,λ)=p∗,d^* = \sup_{\lambda \succeq 0} \inf_x L(x,\lambda) \leq \inf_x \sup_{\lambda \succeq 0} L(x,\lambda) = p^* ,d∗=λ⪰0supxinfL(x,λ)≤xinfλ⪰0supL(x,λ)=p∗, and strong duality as the equality sup⁡λ⪰0inf⁡xL(x,λ)≤inf⁡xsup⁡λ⪰0L(x,λ).\sup_{\lambda \succeq 0} \inf_x L(x,\lambda) \leq \inf_x \sup_{\lambda \succeq 0} L(x,\lambda) .λ⪰0supxinfL(x,λ)≤xinfλ⪰0supL(x,λ).
Strong duality means that the order of the minimization over xxx and the maximization over λ⪰0λ \succeq 0λ⪰0 can be switched without affecting the result.
In fact, the inequality does not depend on any properties of L:L:L: We have sup⁡z∈Zinf⁡w∈Wf(w,z)≤inf⁡w∈Wsup⁡z∈Zf(w,z)\sup_{z \in \mathbf{Z}} \inf_{w \in \mathbf{W}} f(w,z) \leq \inf_{w \in \mathbf{W}} \sup_{z \in \mathbf{Z}} f(w,z)z∈Zsupw∈Winff(w,z)≤w∈Winfz∈Zsupf(w,z)
for any f:Rn×Rm→Rf:~\mathbf{R}^n \times \mathbf{R}^m \rightarrow \mathbf{R}f: Rn×Rm→R (and any W⊆Rn\mathbf{W} \subseteq \mathbf{R}^nW⊆Rn and Z⊆Rm\mathbf{Z} \subseteq \mathbf{R}^mZ⊆Rm). This general inequality is called the max-min inequality. When equality holds, i.e., sup⁡z∈Zinf⁡w∈Wf(w,z)=inf⁡w∈Wsup⁡z∈Zf(w,z),\sup_{z \in \mathbf{Z}} \inf_{w \in \mathbf{W}} f(w,z) = \inf_{w \in \mathbf{W}} \sup_{z \in \mathbf{Z}} f(w,z),z∈Zsupw∈Winff(w,z)=w∈Winfz∈Zsupf(w,z), fff (and W\mathbf{W}W and Z\mathbf{Z}Z) satisfy the strong max-min property or saddle-point property.

5.4.2 Saddle-Point Interpretation

We refer to a pair w~∈W,z~∈Z\tilde{w} \in W, ~\tilde{z} \in Zw~∈W, z~∈Z as a saddle-point for fff (and WWW and ZZZ) if f(w~,z)≤f(w~,z~)≤f(w,z~)f(\tilde{w},z) \leq f(\tilde{w},\tilde{z}) \leq f(w,\tilde{z})f(w~,z)≤f(w~,z~)≤f(w,z~) for all w∈W,z∈Z.{w} \in W, ~{z} \in Z.w∈W, z∈Z. In other words, w~\tilde{w}w~ minimizes f(w,z~)f(w,\tilde{z})f(w,z~) (over w∈Ww \in Ww∈W) and z~\tilde{z}z~ maximizes f(w~,z)f(\tilde{w},z)f(w~,z) (over z∈Zz \in Zz∈Z): f(w~,z~)=inf⁡w∈Wf(w,z~),f(w~,z~)=sup⁡z∈Zf(w~,z).f(\tilde{w},\tilde{z}) = \inf_{w \in W} f(w,\tilde{z}),\quad f(\tilde{w},\tilde{z}) = \sup_{z \in Z} f(\tilde{w},z).f(w~,z~)=w∈Winff(w,z~),f(w~,z~)=z∈Zsupf(w~,z). This implies that the strong max-min property holds, and that the common value if f(w~,z~).f(\tilde{w},\tilde{z}).f(w~,z~).
Returning to our discussion of Lagrange duality, we see that if x⋆x^⋆x⋆ and λ⋆λ^⋆λ⋆ are respectively primal and dual optimal points for a problem in which strong duality obtains, they form a saddle-point for the Lagrangian. The converse is also true: If (x,λ)(x,λ)(x,λ) is a saddle-point of the Lagrangian, then x is primal optimal, λ is dual optimal, and the optimal duality gap is zero.

5.5 最优化条件（Optimality conditions）

5.5.1 Certificate of suboptimality and stopping criteria

If we can find a dual feasible g(λ,ν),g(\lambda,\nu),g(λ,ν), we can establish a lower bound on the optimal value of the primal problem: p∗≤g(λ,ν).p^* \leq g(\lambda, \nu).p∗≤g(λ,ν). Thus， a dual feasible point (λ,ν) provides a proof or certificate that g(λ,ν).g(\lambda, \nu).g(λ,ν).

5.5.2 Complementary slackness

Let x⋆x^⋆x⋆ be a primal optimal and (λ⋆,ν⋆)(λ^⋆,ν^⋆ )(λ⋆,ν⋆) be a dual optimal point. This means that
f0(x∗)=g(λ∗,ν∗)=inf⁡x(f0(x)+∑i=1mλi∗fi(x)+∑i=1mνi∗hi(x))≤f0(x∗)+∑i=1mλi∗fi(x∗)+∑i=1mνi∗hi(x∗)≤f0(x∗)\begin{array}{ll} f_0(x^*) & = g(\lambda^*,\nu^*) \\ &= \inf_x (f_0(x) + \sum_{i=1}^{m} \lambda_i^* f_i(x) + \sum_{i=1}^m \nu_i^* h_i (x)) \\ & \leq f_0(x^*) + \sum_{i=1}^{m} \lambda_i^* f_i(x^*) + \sum_{i=1}^m \nu_i^* h_i (x^*) \\ & \leq f_0(x^*) \end{array}f0(x∗)=g(λ∗,ν∗)=infx(f0(x)+∑i=1mλi∗fi(x)+∑i=1mνi∗hi(x))≤f0(x∗)+∑i=1mλi∗fi(x∗)+∑i=1mνi∗hi(x∗)≤f0(x∗)

The first line states that the optimal duality is zero.
The second line is the definition of the dual function.
The third line follows since the infimum of the Lagrangian over xxx is less than or equal to its value at x=x∗.x = x^*.x=x∗.
The last inequality follows from λi∗≥0,\lambda_i^* \geq 0,~λi∗≥0, fi(x∗)≤0,f_i(x^*)\leq 0,~fi(x∗)≤0, i=1,...,m,i=1,...,m,~i=1,...,m, and hi(x∗)=0,i=1,...,p.h_i(x^*)=0,~ i=1,...,p.hi(x∗)=0, i=1,...,p.

We conclude that the two inequalities (3-4 lines) in this chain hold with equality.
The first conclusion: since the inequality in the third line is an equality, we conclude that x∗⋆x^*⋆x∗⋆ minimizes L(x,λ∗,ν⋆)L(x,λ^*,\nu^⋆)L(x,λ∗,ν⋆) over xxx.
The second conclusion (Complementary Slackness): ∑i=1mλi∗fi(x∗)=0.\sum_{i=1}^m \lambda_i^* f_i(x^*) = 0.i=1∑mλi∗fi(x∗)=0.
Since each term in this sum is nonpositive, we conclude that λi∗fi(x∗)=0,i=1,...,m.\lambda_i^* f_i(x^*) = 0,~ i =1,...,m.λi∗fi(x∗)=0, i=1,...,m. it holds for any primal optimal x⋆x^⋆x⋆ and any dual optimal (λ⋆,ν⋆)(λ^⋆ ,ν^⋆ )(λ⋆,ν⋆) (when strong duality holds).
We can express the complementary slackness condition as λi∗>0→fi(x∗)=0,\lambda_i^* >0 ~\rightarrow ~ f_i(x^*)=0,λi∗>0 → fi(x∗)=0, or, equivalently fi(x∗)<0→λi∗=0.f_i(x^*)<0 ~ \rightarrow ~ \lambda_i^* = 0 .fi(x∗)<0 → λi∗=0. Roughly speaking, this means the ith optimal Lagrange multiplier is zero unless the iiith constraint is active at the optimum.

5.5.3 KKT optimality conditions

We now assume that the functions f0,...,fm,h1,...,hpf_0,...,f_m,~h_1,...,h_pf0,...,fm, h1,...,hp are differentiable (and therefore have open domains), but we make no assumptions yet about convexity.

A. KKT conditions for nonconvex problems

As above, let x⋆x^⋆x⋆ and (λ⋆,ν⋆)(λ^⋆ ,ν^⋆)(λ⋆,ν⋆) be any primal and dual optimal points with zero duality gap. Since x ⋆ minimizes L(x,λ⋆,ν⋆)L(x,λ ⋆ ,ν ⋆ )L(x,λ⋆,ν⋆) over xxx, it follows that its gradient must vanish at x⋆x^⋆x⋆ , i.e., ∇f0(x∗)+∑i=1mλi∗∇fi(x∗)+∑i=1pνi∗∇fi(x∗)=0.\nabla f_0(x^*) + \sum_{i=1}^m \lambda_i^* \nabla f_i(x^*) + \sum_{i=1}^p \nu_i^* \nabla f_i(x^*) = 0.∇f0(x∗)+i=1∑mλi∗∇fi(x∗)+i=1∑pνi∗∇fi(x∗)=0.
Thus, we have
fi(x⋆)≤0,i=1,…,mhi(x⋆)=0,i=1,…,pλi⋆≥0,i=1,…,mλi⋆fi(x⋆)=0,i=1,…,m∇f0(x⋆)+∑i=1mλi⋆∇fi(x⋆)+∑i=1pνi⋆∇hi(x⋆)=0,\begin{aligned} f_{i}\left(x^{\star}\right) & \leq 0, \quad i=1, \ldots, m \\ h_{i}\left(x^{\star}\right) &=0, \quad i=1, \ldots, p \\ \lambda_{i}^{\star} & \geq 0, \quad i=1, \ldots, m \\ \lambda_{i}^{\star} f_{i}\left(x^{\star}\right) &=0, \quad i=1, \ldots, m \\ \nabla f_{0}\left(x^{\star}\right)+\sum_{i=1}^{m} \lambda_{i}^{\star} \nabla f_{i}\left(x^{\star}\right)+\sum_{i=1}^{p} \nu_{i}^{\star} \nabla h_{i}\left(x^{\star}\right) &=0, \end{aligned}fi(x⋆)hi(x⋆)λi⋆λi⋆fi(x⋆)∇f0(x⋆)+i=1∑mλi⋆∇fi(x⋆)+i=1∑pνi⋆∇hi(x⋆)≤0,i=1,…,m=0,i=1,…,p≥0,i=1,…,m=0,i=1,…,m=0, which are called the Karush-Kuhn-Tucker (KKT) conditions.
To summarize, for any optimization problem with differentiable objective and differentiable constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions.

B. KKT conditions for convex problems

When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal. In other words, if fif_ifi are convex and hih_ihi are affine, and x~,λ~,ν~\tilde{x},\tilde{λ}, \tilde{ν}x~,λ~,ν~ are any points that satisfy the KKT conditions
fi(x~)≤0,i=1,…,mhi(x~)=0,i=1,…,pλi~≥0,i=1,…,mλi~fi(x~)=0,i=1,…,m∇f0(x~)+∑i=1mλi~∇fi(x~)+∑i=1pνi~∇hi(x~)=0,\begin{aligned} f_{i}\left(\tilde{x}\right) & \leq 0, \quad i=1, \ldots, m \\ h_{i}\left(\tilde{x}\right) &=0, \quad i=1, \ldots, p \\ \tilde{\lambda_{i}} & \geq 0, \quad i=1, \ldots, m \\ \tilde{\lambda_{i}} f_{i}\left(\tilde{x}\right) &=0, \quad i=1, \ldots, m \\ \nabla f_{0}\left( \tilde{x} \right)+\sum_{i=1}^{m} \tilde{\lambda_{i}} \nabla f_{i}\left(\tilde{x}\right)+\sum_{i=1}^{p} \tilde{\nu_{i}} \nabla h_{i}\left( \tilde{x} \right) &=0, \end{aligned}fi(x~)hi(x~)λi~λi~fi(x~)∇f0(x~)+i=1∑mλi~∇fi(x~)+i=1∑pνi~∇hi(x~)≤0,i=1,…,m=0,i=1,…,p≥0,i=1,…,m=0,i=1,…,m=0, then x~\tilde{x}x~ and (λ~,ν~)( \tilde{λ}, \tilde{ν})(λ~,ν~) are primal and dual optimal, with zero duality gap.

To see this, note that the first two conditions state that x~\tilde{x}x~ is primal feasible. Since λi~≤0,L(x,λ~,ν~)\tilde{λ_i} \leq 0, ~L(x, \tilde{λ}, \tilde{ν})λi~≤0, L(x,λ~,ν~) is convex in xxx; the last KKT condition states that its gradient with respect to xxx vanishes at x=x~x = \tilde{x}x=x~, so it follows that x~\tilde{x}x~ minimizes L(x,λ~,ν~)L(x, \tilde{λ}, \tilde{ν})L(x,λ~,ν~) over xxx. From this we conclude that

g(λ~,ν~)=L(x~,λ~,ν~)=f0(x~)=f0(x~)+∑i=1mλi~fi(x~)+∑i=1pνi~hi(x~)\begin{aligned} g(\tilde{\lambda},\tilde{\nu}) & = L(\tilde{x},\tilde{\lambda},\tilde{\nu}) \\ &= f_0( \tilde{x} ) \\ &= f_0( \tilde{x} ) +\sum_{i=1}^{m} \tilde{\lambda_{i}} f_{i}\left(\tilde{x}\right)+\sum_{i=1}^{p} \tilde{\nu_{i}} h_{i}\left( \tilde{x} \right) \end{aligned}g(λ~,ν~)=L(x~,λ~,ν~)=f0(x~)=f0(x~)+i=1∑mλi~fi(x~)+i=1∑pνi~hi(x~) where in the last line we use hi(x~)=0h_i (\tilde{x}) = 0hi(x~)=0 and λi~fi(x~)=0\tilde{λ_i} f_i (\tilde{x}) = 0λi~fi(x~)=0. This shows that x~\tilde{x}x~ and (λ~,ν~\tilde{λ}, \tilde{ν}λ~,ν~) have zero duality gap, and therefore are primal and dual optimal.
In summary, for any convex optimization problem with differentiable objective and differentiable constraint functions, any points that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap.
If a convex optimization problem with differentiable objective and differentiable constraint functions satisfies Slater’s condition, then the KKT conditions provide necessary and sufficient conditions for optimality: Slater’s condition implies that the optimal duality gap is zero and the dual optimum is attained, so xxx is optimal if and only if there are (λ,ν)(\lambda,\nu)(λ,ν) that, together with xxx, satisfy the KKT conditions.
The KKT conditions play an important role in optimization. In a few special cases, it is possible to solve the KKT conditions analytically. More generally, many algorithms for convex optimization are conceived as, or can be interpreted as, methods for solving the KKT conditions.

Example 5.1

Equality constrained convex quadratic minimization. We consider the problem
P0:min⁡(12)xTPx+qTx+rs.t.Ax=b,\begin{array}{ll} P0: ~&\min ~~ &(\frac{1}{2})x^TPx + q^Tx + r \\ &s.t. &Ax = b, \end{array}P0: min s.t.(21)xTPx+qTx+rAx=b, where P∈S+n.P \in S_{+}^n.P∈S+n.
The KKT conditions for this problem is
min⁡Ax∗=b,Px∗+q+ATν=0,\begin{array}{ll} \min ~ &Ax^* = b, \\ &Px^* + q + A^T \nu = 0, \end{array}min Ax∗=b,Px∗+q+ATν=0, which we can write as
[PATA0][x⋆ν⋆]=[−qb].\left[\begin{array}{cc} P & A^{T} \\ A & 0 \end{array}\right]\left[\begin{array}{l} x^{\star} \\ \nu^{\star} \end{array}\right]=\left[\begin{array}{c} -q \\ b \end{array}\right].[PAAT0][x⋆ν⋆]=[−qb].
Solving this set of m+nm + nm+n equations in the m+nm + nm+n variables x⋆,ν⋆x^⋆, ν^⋆x⋆,ν⋆ gives the optimal primal and dual variables for P0P0P0.

Example 5.2 Water-filling.

We consider the convex optimization problem
P0:min⁡−∑i=1nlog⁡(αi+xi)s.t.x⪰0,1Tx=1,\begin{array}{ll} P0: ~&\min ~ &-\sum_{i=1}^n \log (\alpha_i + x_i ) \\ &s.t. & x \succeq 0, \mathbf{1}^T x = 1, \end{array}P0: min s.t.−∑i=1nlog(αi+xi)x⪰0,1Tx=1, where αi>0\alpha_i > 0αi>0. This problem arises in information theory, in allocating power to a set of nnn communication channels. The variable xix_ixi represents the transmitter power allocated to the ith channel, and log⁡(αi+xi)\log(\alpha_i + x_i )log(αi+xi) gives the capacity or communication rate of the channel, so the problem is to allocate a total power of one to the channels, in order to maximize the total communication rate.
Introducing Lagrange multipliers λ⋆∈Rn\lambda^⋆ \in \mathbb{R}^nλ⋆∈Rn for the inequality constraints x⋆⪰0x^⋆ \succeq 0x⋆⪰0, and a multiplier ν⋆∈R\nu^⋆ \in Rν⋆∈R for the equality constraint 1Tx=1\mathbf{1}^T x = 11Tx=1, we obtain the KKT conditions
x∗⪰0,1Tx=1λ∗⪰0,i=1,...,nλi∗xi∗=0,−1(αi+xi∗)−λi∗+ν∗=0,i=1,...,n.\begin{array}{ll} \qquad \qquad ~~x^* &\succeq 0, \\ \qquad \qquad \mathbf{1}^T x &= 1 \\ \qquad \qquad ~~ \lambda^* &\succeq 0, ~~ i=1,...,n \\ \qquad \qquad \lambda_i^* x_i^* &= 0, \\ -\frac{1}{(\alpha_i+x_i^*)} - \lambda_i^* + \nu^* &= 0, ~~ i=1,...,n . \end{array} x∗1Tx λ∗λi∗xi∗−(αi+xi∗)1−λi∗+ν∗⪰0,=1⪰0, i=1,...,n=0,=0, i=1,...,n. We can directly solve these equations to find x⋆x^⋆x⋆, λ⋆λ^⋆λ⋆, and ν⋆ν^⋆ν⋆. We start by noting that λ⋆λ^⋆λ⋆ acts as a slack variable in the last equation, so it can be eliminated, leaving
x∗⪰0,1Tx=1xi∗(ν∗−1(αi+xi∗))=0,ν∗≥1αi+xi∗,i=1,...,n.\begin{array}{ll} \qquad \qquad ~~x^* &\succeq 0, \\ \qquad \qquad \mathbf{1}^T x &= 1 \\ \quad ~~ x_i^*(\nu^* - \frac{1}{(\alpha_i+x_i^*)}) &= 0, \\ \qquad \qquad ~~ \nu^* & \geq \frac{1}{\alpha_i+x_i^*}, ~~ i=1,...,n . \end{array} x∗1Tx xi∗(ν∗−(αi+xi∗)1) ν∗⪰0,=1=0,≥αi+xi∗1, i=1,...,n.

If ν⋆<1/αiν^⋆ < 1/α_iν⋆<1/αi , this last condition can only hold if xi⋆>0x^⋆_i > 0xi⋆>0, which by the third condition implies that ν⋆=1αi+xi⋆ν^⋆ = \frac{1}{α_i + x^⋆_i }ν⋆=αi+xi⋆1.
Solving for xi⋆x^⋆_ixi⋆, we conclude that xi⋆=1ν⋆−αix^⋆_i= \frac{1}{ν^⋆} −α_ixi⋆=ν⋆1−αi if ν⋆<1αiν^⋆ < \frac{1}{α_i}ν⋆<αi1.
If ν⋆≥1/αiν^⋆ \geq 1/α_iν⋆≥1/αi, then xi⋆>0x^⋆_i> 0xi⋆>0 is impossible, because it would imply ν⋆≥1αi>1αi+xi⋆ν^⋆ \geq \frac{1}{α_i} > \frac{1}{α_i + x^⋆_i }ν⋆≥αi1>αi+xi⋆1, which violates the complementary slackness condition.
Therefore, xi⋆=0x^⋆_i = 0xi⋆=0 if ν⋆≥1/αiν^⋆\geq 1/α_iν⋆≥1/αi.
Thus we have
xi∗={1ν∗−αi,ifν∗<1αi0,ifν∗≥1αix_i^* = \{\begin{array}{ll} \frac{1}{\nu^*} - \alpha_i, &\mathrm{if} ~~ \nu^* < \frac{1}{\alpha_i}\\ \quad~ 0,& \mathrm{if} ~~ \nu^* \geq \frac{1}{\alpha_i} \end{array}xi∗={ν∗1−αi, 0,if ν∗<αi1if ν∗≥αi1 or, put more simply, xi∗=max⁡{0,1ν∗−αi}x_i^* =\max \{0,\frac{1}{\nu^*} - \alpha_i \}xi∗=max{0,ν∗1−αi}.
Substituting this expression for xi⋆x^⋆_ixi⋆ into the condition 1Tx⋆=1\mathbf{1}^T x^⋆ = 11Tx⋆=1, we obtain
∑i=1nmax⁡{0,1ν∗−αi}=1.\sum_{i=1}^n \max \{0,\frac{1}{\nu^*} - \alpha_i \} = 1.i=1∑nmax{0,ν∗1−αi}=1. The lefthand side is a piecewise-linear increasing function of 1/ν⋆1/ν^⋆1/ν⋆ , with breakpoints at αiα_iαi , so the equation has a unique solution which is readily determined.

5.5.5 Solving the primal problem via the dual

if strong duality holds and a dual optimal solution (λ⋆,ν⋆)(λ^⋆ ,ν^⋆ )(λ⋆,ν⋆) exists, then any primal optimal point is also a minimizer of L(x,λ⋆,ν⋆)L(x,λ^⋆ ,ν^⋆ )L(x,λ⋆,ν⋆). This fact sometimes allows us to compute a primal optimal solution from a dual optimal solution. More precisely, suppose we have strong duality and an optimal (λ⋆,ν⋆)(λ^⋆ ,ν^⋆ )(λ⋆,ν⋆) is known. Suppose that the minimizer of L(x,λ⋆,ν⋆)L(x,λ^⋆ ,ν^⋆ )L(x,λ⋆,ν⋆), i.e., the solution of min⁡f0(x)+∑i=1mλi⋆fi(x)+∑i=1pνi⋆hi(x)\min \quad f_{0}(x)+\sum_{i=1}^{m} \lambda_{i}^{\star} f_{i}(x)+\sum_{i=1}^{p} \nu_{i}^{\star} h_{i}(x)minf0(x)+i=1∑mλi⋆fi(x)+i=1∑pνi⋆hi(x) is unique.

Example 5.3 Entropy maximization.

We consider the entropy maximization problem
min⁡f0(x)=∑i=1nxilog⁡xisubject toAx⪯b1Tx=1\begin{array}{ll} \operatorname{min} & f_{0}(x)=\sum_{i=1}^{n} x_{i} \log x_{i} \\ \text {subject to} & A x \preceq b \\ & \mathbf{1}^{T} x=1 \end{array}minsubject tof0(x)=∑i=1nxilogxiAx⪯b1Tx=1 with domain R++n,\mathbf{R}_{++}^n,R++n, and its Lagrange dual problem
maximize −bTλ−ν−e−ν−1∑i=1ne−aiTλsubject to λ⪰0\begin{array}{ll} \text { maximize } & -b^{T} \lambda-\nu-e^{-\nu-1} \sum_{i=1}^{n} e^{-a_{i}^{T} \lambda} \\ \text { subject to } & \lambda \succeq 0 \end{array} maximize subject to −bTλ−ν−e−ν−1∑i=1ne−aiTλλ⪰0 where aia_iai are the columns of AAA. We assume that the weak form of Slater’s condition holds, i.e., there exists an x≻0x ≻ 0x≻0 with Ax⪯bAx \preceq bAx⪯b and 1Tx=1\mathbf{1}^T x = 11Tx=1, so strong duality holds and an optimal solution (λ⋆,ν⋆)(λ^⋆,ν^⋆ )(λ⋆,ν⋆) exists.
Suppose we have solved the dual problem. The Lagrangian at (λ⋆,ν⋆λ^⋆ ,ν^⋆λ⋆,ν⋆) is
L(x,λ⋆,ν⋆)=∑i=1nxilog⁡xi+λ⋆T(Ax−b)+ν⋆(1Tx−1)L\left(x, \lambda^{\star}, \nu^{\star}\right)=\sum_{i=1}^{n} x_{i} \log x_{i}+\lambda^{\star T}(A x-b)+\nu^{\star}\left(\mathbf{1}^{T} x-1\right)L(x,λ⋆,ν⋆)=i=1∑nxilogxi+λ⋆T(Ax−b)+ν⋆(1Tx−1) which is strictly convex on D\mathcal{D}D and bounded below, so it has a unique solution x⋆x^⋆x⋆ , given by
xi∗=1/exp⁡(aiTλ∗+ν∗+1),i=1,...,n.x^*_i = 1/ \exp (a_i^T \lambda^*+\nu^* + 1), ~~i=1,...,n.xi∗=1/exp(aiTλ∗+ν∗+1), i=1,...,n.
If x⋆x^⋆x⋆ is primal feasible, it must be the optimal solution of the primal problem. If x⋆x^⋆x⋆ is not primal feasible, then we can conclude that the primal optimum is not attained.

5.7 Examples (reformulations)

In this section, we show by example that simple equivalent reformulations of a problem can lead to very different dual problems. We consider the following types of reformulations:

Introducing new variables and associated equality constraints.
Replacing the objective with an increasing function of the original objective.
Making explicit constraints implicit, i.e., incorporating them into the domain of the objective.

5.7.1 Introducing new variables and equality constraints

Consider an unconstrained problem of the form
P0:min⁡f0(Ax+b).P0: ~\min ~f_0(Ax + b).P0: min f0(Ax+b). Its Lagrange dual function is the constant p⋆p^⋆p⋆ . So while we do have strong duality, i.e., p⋆=d⋆p^⋆= d^⋆p⋆=d⋆, the Lagrangian dual is neither useful nor interesting.

Now let us reformulate the problem as
P1:min⁡f0(Ax+b)s.t.Ax+b=y.\begin{array}{ll} P1: &\min~~ f_0(Ax + b) \\ &s.t. ~~~Ax +b = y. \end{array}P1:min f0(Ax+b)s.t. Ax+b=y. Here we have introduced new variables y, as well as new equality constraints Ax+b=yAx+b = yAx+b=y. The problems P0P0P0 and P1P1P1 are clearly equivalent.
The Lagrangian of the reformulated problem is
L(x,y,ν)=f0(y)+νT(Ax+b−y).L(x,y,\nu) = f_0(y) + \nu^T(Ax+b-y).L(x,y,ν)=f0(y)+νT(Ax+b−y). To find the dual function we minimize LLL over xxx and yyy. Minimizing over xxx, we find that g(ν)=−∞g(ν) = −\inftyg(ν)=−∞ unless ATν=0A^T\nu = 0ATν=0, in which case we are left with
g(ν)=bTν+inf⁡y(f0(y)−νTy)=bTν−f0∗(ν),g(\nu) = b^T \nu + \inf_y (f_0(y) - \nu^T y ) = b^T \nu - f_0^*(\nu),g(ν)=bTν+yinf(f0(y)−νTy)=bTν−f0∗(ν), where f0∗f_0^*f0∗ is the conjugate of f0f_0f0. The dual problem of P1P1P1 can therefore be expressed as
P1:min⁡g(ν)=bT−f0∗(ν)s.t.ATν=0.\begin{array}{ll} P1: &\min~~ g(\nu)=b^T-f_0^*(\nu) \\ &s.t. ~~~A^T \nu= 0. \end{array}P1:min g(ν)=bT−f0∗(ν)s.t. ATν=0. Thus, the dual of the reformulated problem P1P1P1 is considerably more useful than the dual of the original problem P0P0P0.

Example 5.5 Unconstrained geometric program.

Consider the unconstrained geometric program
min⁡log⁡(∑i=1mexp⁡(aiTx+bi)).\min~ \log (\sum_{i=1}^m \exp (a_i^T x + b_i)).min log(i=1∑mexp(aiTx+bi)). We first reformulate it by introducing new variables and equality constraints:
P1:min⁡f0(y)=log⁡(∑i=1mexp⁡(aiTx+bi))s.t.Ax+b=y.\begin{array}{ll} P1: &\min~~ f_0(y) = \log (\sum_{i=1}^m \exp (a_i^T x + b_i)) \\ &s.t. ~~~Ax + b = y. \end{array}P1:min f0(y)=log(∑i=1mexp(aiTx+bi))s.t. Ax+b=y. where aiTa_i^TaiT are the rows of AAA. The conjugate of the log-sum-exp function is
f0∗={∑i=1mνilog⁡νi,ifν⪰0,1Tν=1∞otherwisef_0^* = \{\begin{array}{ll} \sum_{i=1}^m \nu_i \log \nu_i, &\mathrm{if}~ \nu \succeq 0, \mathbf{1}^T\nu =1 \\ \qquad ~~ \infty &\mathrm{otherwise} \end{array} f0∗={∑i=1mνilogνi, ∞if ν⪰0,1Tν=1otherwise so the dual of the reformulated problem can be expressed
as max⁡bTν−∑i=1mνilog⁡νi,1Tν=1ATν=0ν⪰0,\begin{array}{ll} \max &b^T \nu - \sum_{i=1}^m \nu_i \log \nu_i , \\ & \mathbf{1}^T\nu =1\\ &A^T \nu = 0 \\ & \nu \succeq 0, \end{array} maxbTν−∑i=1mνilogνi,1Tν=1ATν=0ν⪰0, which is an entropy maximization problem.

Example 5.6 Norm approximation problem.

We consider the unconstrained norm approximation problem
P0:max⁡∥Ax−b∥,P0: ~ \max \| Ax-b \|,P0: max∥Ax−b∥, where ∥⋅∥\|\cdot\|∥⋅∥ is any norm. Here too the Lagrange dual function is constant, equal to the optimal value of P0P0P0, and therefore not useful.
Once again we reformulate the problem as
min⁡∥y∥Ax−b=y.\begin{array}{ll} \min &\| y \| \\ &Ax -b = y. \end{array}min∥y∥Ax−b=y.
The Lagrange dual problem is,
min⁡bTν∥ν∥∗≤1ATν=0,\begin{array}{ll} \min &b^T \nu \\ & \| \nu \|_* \leq 1 \\ & A^T \nu = 0, \end{array}minbTν∥ν∥∗≤1ATν=0, where we use the fact that the conjugate of a norm is the indicator function of the dual norm unit ball.
The idea of introducing new equality constraints can be applied to the constraint functions as well. Consider, for example, the problem
min⁡bTν∥ν∥∗≤1ATν=0,\begin{array}{ll} \min &b^T \nu \\ & \| \nu \|_* \leq 1 \\ & A^T \nu = 0, \end{array}minbTν∥ν∥∗≤1ATν=0, where Ai∈Rki×nA_i \in \mathbf{R}^{k_i \times n}Ai∈Rki×n and fi:Rki→Rf_i: \mathbf{R}^{k_i} \rightarrow \mathbf{R}fi:Rki→R are convex. We introduce a new variable yi∈Rkiy_i \in \mathbf{R}^{k_i}yi∈Rki , for i=0,...,mi = 0,...,mi=0,...,m, and reformulate the problem as
min⁡f0(y0)fi(yi)≤0,i=1,...,m.Aix+bi=yi,i=0,...,m.\begin{array}{ll} \min &f_0(y_0) \\ & f_i(y_i) \le 0, i =1,...,m. \\ & A_i x + b_i = y_i , i =0,...,m. \end{array}minf0(y0)fi(yi)≤0,i=1,...,m.Aix+bi=yi,i=0,...,m.
The Lagrangian for this problem is
L(x,y0,...,λ,νo,...,νm)=f0(y0)+∑i=1mλifi(yi)+∑i=0mνiT(Aix+bi−yi).L(x,y_0,...,\lambda,\nu_o,...,\nu_m) = f_0(y_0) + \sum_{i=1}^m \lambda_i f_i(y_i) + \sum_{i=0}^m \nu_i^T (A_i x + b_i - y_i).L(x,y0,...,λ,νo,...,νm)=f0(y0)+i=1∑mλifi(yi)+i=0∑mνiT(Aix+bi−yi).
To find the dual function, we minimize over xxx and yiy_iyi. The minimum over xxx is −∞-\infty−∞ unless ∑i=0mAiTνi=0,\sum_{i=0}^m A_i^T \nu_i = 0,i=0∑mAiTνi=0, in which case we have, for λ≻0\lambda \succ 0λ≻0,
g(λ,ν0,…,νm)=∑i=0mνiTbi+inf⁡y0,…,ym(f0(y0)+∑i=1mλifi(yi)−∑i=0mνiTyi)=∑i=0mνiTbi+inf⁡y0(f0(y0)−ν0Ty0)+∑i=1mλiinf⁡yi(fi(yi)−(νi/λi)Tyi)=∑i=0mνiTbi−f0∗(ν0)−∑i=1mλifi∗(νi/λi)\begin{aligned} &g\left(\lambda, \nu_{0}, \ldots, \nu_{m}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}+\inf _{y_{0}, \ldots, y_{m}}\left(f_{0}\left(y_{0}\right)+\sum_{i=1}^{m} \lambda_{i} f_{i}\left(y_{i}\right)-\sum_{i=0}^{m} \nu_{i}^{T} y_{i}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}+\inf _{y_{0}}\left(f_{0}\left(y_{0}\right)-\nu_{0}^{T} y_{0}\right)+\sum_{i=1}^{m} \lambda_{i} \inf _{y_{i}}\left(f_{i}\left(y_{i}\right)-\left(\nu_{i} / \lambda_{i}\right)^{T} y_{i}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}-f_{0}^{*}\left(\nu_{0}\right)-\sum_{i=1}^{m} \lambda_{i} f_{i}^{*}\left(\nu_{i} / \lambda_{i}\right) \end{aligned}g(λ,ν0,…,νm)=i=0∑mνiTbi+y0,…,yminf(f0(y0)+i=1∑mλifi(yi)−i=0∑mνiTyi)=i=0∑mνiTbi+y0inf(f0(y0)−ν0Ty0)+i=1∑mλiyiinf(fi(yi)−(νi/λi)Tyi)=i=0∑mνiTbi−f0∗(ν0)−i=1∑mλifi∗(νi/λi)
The last expression involves the perspective of the conjugate function, and is therefore concave in the dual variables. Finally, we address the question of what happens when λ≻0λ \succ 0λ≻0, but some λiλ_iλi are zero. If λi=0λ_i = 0λi=0 and νi≠0ν_i \neq 0νi=0, then the dual function is −∞−∞−∞. If λi=0λ_i = 0λi=0 and νi=0ν_i = 0νi=0, however, the terms involving yiy_iyi, νiν_iνi, and λiλ_iλi are all zero. Thus, the expression above for g is valid for all λ≻0λ \succ 0λ≻0, if we take λifi∗(νi/λi)=0λ_i f^∗_i (ν_i /λ_i ) = 0λifi∗(νi/λi)=0 when λi=0λ_i = 0λi=0 and νi=0ν_i = 0νi=0, and λifi∗(νi/λi)=∞λ_i f^∗_i (ν_i /λ_i ) = \inftyλifi∗(νi/λi)=∞ when λi=0λ_i = 0λi=0 and νi≠0ν_i \neq 0νi=0.
Therefore we can express the dual of the problem as
min⁡∑i=0mνiTbi−f0∗(ν0)−∑i=1mλifi∗(νi/λi)λ⪰0∑i=0mAiTνi=0.\begin{array}{ll} \min & \sum_{i=0}^{m} \nu_{i}^{T} b_{i}-f_{0}^{*}\left(\nu_{0}\right)-\sum_{i=1}^{m} \lambda_{i} f_{i}^{*}\left(\nu_{i} / \lambda_{i}\right)\\ & \lambda \succeq 0 \\ & \sum_{i=0}^m A_i^T \nu_i =0. \end{array}min∑i=0mνiTbi−f0∗(ν0)−∑i=1mλifi∗(νi/λi)λ⪰0∑i=0mAiTνi=0.

5.7.2 Transforming the objective

If we replace the objective f0f_0f0 by an increasing function of f0f_0f0, the resulting problem is clearly equivalent. The dual of this equivalent problem, however, can be very different from the dual of the original problem.

Example 5.8

We consider again the minimum norm problem
min⁡∥Ax−b∥,\min \| Ax - b \|,min∥Ax−b∥, where ∥⋅∥\| \cdot \|∥⋅∥ is some norm. We reformulate this problem as
min⁡12∥y∥2s.t.Ax−b=y.\begin{aligned} \min ~~&\frac{1}{2} \| y \|^2 \\ s.t .~~& Ax -b = y. \end{aligned}min s.t. 21∥y∥2Ax−b=y. Here we have introduced new variables, and replaced the objective by half its square. Evidently it is equivalent to the original problem.
The dual of the reformulated problem is
min⁡−12∥y∥∗2+bTνs.t.ATν=0.\begin{aligned} \min ~~&-\frac{1}{2} \| y \|^2_* + b^T \nu \\ s.t .~~& A^T \nu = 0. \end{aligned}min s.t. −21∥y∥∗2+bTνATν=0. where we use the fact that the conjugate of (1/2)∥⋅∥2(1/2)\|\cdot\|^2(1/2)∥⋅∥2 is (1/2)∥⋅∥∗2(1/2)\|\cdot\|^2_*(1/2)∥⋅∥∗2.
Note that this dual problem is not the same as the dual problem (Example 5.6) derived earlier.

5.7.3 Implicit constraints

The next simple reformulation we study is to include some of the constraints in the objective function, by modifying the objective function to be infinite when the constraint is violated.

凸优化基础知识—对偶（Duality）相关推荐

凸优化“傻瓜”教程-----凸优化基础知识
目录凸优化基础知识 1.AI问题是什么? 2.对于常见的优化问题,我们可以写成什么形式? 3.针对一般的优化问题,我们从哪几个方向思考? 4.什么样的问题是凸优化问题? 4.1凸优化问题需要同时满足 ...
凸优化基础知识笔记-凸集、凸函数、凸优化问题
文章目录 1. 凸集 2. 凸函数 2.1. 凸函数的一阶条件 2.1. 凸函数例子 3. 凸优化问题 4. 对偶 4.1. Lagrange函数与Lagrange对偶 4.2. 共轭函数 4.3. ...
机器学习——凸优化基础知识
文章目录一.计算几何 (一)计算几何是研究什么的 (二)直线的表达式二.凸集 (一)凸集是什么 (二)三维空间中的一个平面如何表达 (三)更高维度的"超平面"如何表达三.凸函 ...
前端性能优化基础知识--幕课网
作为一个前端小码农,在页面样式都能实现以后,就开始考虑:同一个效果,我该用什么样的方式和代码去实现它比较规范?前两天逛幕课网发现了两门课程–<前端性能优化-基础知识认知>和<前端性能 ...
凸优化基础学习：凸集、凸函数、凸规划理论概念学习
凸优化基础概念学习 1.计算几何是研究什么的? 2.计算几何理论中(或凸集中)过两点的一条直线的表达式,是如何描述的?与初中数学中那些直线方程有什么差异?有什么好处? 3.凸集是什么? 直线是凸集吗? ...
【001】机器学习基础-凸优化基础
为什么开篇第一件事是介绍凸优化呢,原因很简单,就是它很重要! 凸优化属于数学最优化的一个子领域,所以其理论本身也是科研领域一门比较复杂高深的研究方向,常被应用于运筹学.管理科学.运营管理.工业工程.系 ...
凸优化——详解对偶和鞍点
对偶原问题的最优解(最小解)p∗p^*p∗一定是大于等于其对偶问题的最优解(最大值)d∗d^*d∗的: p∗>=d∗p^*>=d^*p∗>=d∗ 这是对偶问题最重要的一条性质弱对 ...
分布式鲁棒优化基础知识学习 | Ref：《鲁棒优化入门》「运筹OR帷幄」
鲁棒:考虑最坏情况: 分布:最坏情况的主体是环境参数的分布变量. 从数学角度说,分布式鲁棒优化囊括随机规划和传统鲁棒优化两种形式. 当分布式鲁棒优化下,环境变量的分布函数获知时,分布鲁棒优化退化为随机 ...
【笔记】Unity优化基础知识
目录 Find 和 FindObjectOfType Camera.main 按 ID 寻址与 UnityEngine.Object 子类进行 Null 比较矢量和四元数数学以及运算顺序使用非分 ...
凸优化学习-（十八）对偶性Duality 拉格朗日函数与对偶函数
凸优化学习对偶性是凸优化学习的核心,重中之重. 学习笔记一.拉格朗日函数与对偶函数对于一个普通优化问题: min⁡f0(x)s.t.fi(x)≤0i=1⋯mhi(x)=0i=1⋯p\begin{ ...

凸优化基础知识—对偶（Duality）

Directory