Directory

  • 5.1 拉格朗日对偶函数(The Lagrange dual function)
    • 5.1.1 拉格朗日(Lagrange)
    • 5.1.2 朗格朗日对偶函数(The Lagrange dual function)
    • 5.1.3 最优值的下界(Lower bounds on optimal value)
    • 5.1.5 例子(Examples)
      • 线性等式的最小二乘解(Least-squares solution of linear equations)
    • 5.1.6 拉格朗日函数&共轭函数(The Lagrange dual function and conjugate functions)
  • 5.2 朗格朗日对偶问题(The Lagrange dual problem)
    • 5.2.1 明确双重约束(Making Dual constraints explicit)
      • A. 标准形式的朗格朗日对偶(Lagrange dual of standard from LP)
      • B. 不等式形式线性规划的朗格朗日对偶(Lagrange Dual of Inequality Form LP)
    • 5.2.2 弱对偶(Weak Duality)
    • 5.2.3 强对偶&Slater的约束准则(Strong Duality & Slater's Constraint Qualification)
    • 5.2.4 Examples
      • A. Lagrange dual of QCQP
      • B. A nonconvex quadratic problem with strong duality
  • 5.4 鞍点解释(Saddle-Point Interpretation)
    • 5.4.1 Max-Min characterization of weak and strong duality
    • 5.4.2 Saddle-Point Interpretation
  • 5.5 最优化条件(Optimality conditions)
    • 5.5.1 Certificate of suboptimality and stopping criteria
    • 5.5.2 Complementary slackness
    • 5.5.3 KKT optimality conditions
      • A. KKT conditions for nonconvex problems
      • B. KKT conditions for convex problems
        • Example 5.1
        • Example 5.2 Water-filling.
    • 5.5.5 Solving the primal problem via the dual
      • Example 5.3 Entropy maximization.
    • 5.7 Examples (reformulations)
      • 5.7.1 Introducing new variables and equality constraints
      • Example 5.5 Unconstrained geometric program.
      • Example 5.6 Norm approximation problem.
      • 5.7.2 Transforming the objective
        • Example 5.8
      • 5.7.3 Implicit constraints

5.1 拉格朗日对偶函数(The Lagrange dual function)

5.1.1 拉格朗日(Lagrange)

An optimization problem in the standard form:
min⁡f0(x)s.t.fi(x)≤0,i=1,...,mhi(x)=0,i=1,...,p\begin{array}{lll} \min f_0(x) \\ s.t. ~ f_i(x) \le 0, i=1,...,m \\ \quad ~~ h_i(x) = 0, i =1,...,p \end{array}minf0​(x)s.t. fi​(x)≤0,i=1,...,m  hi​(x)=0,i=1,...,p​ with variable x∈Rnx\in \mathbb{R}^nx∈Rn. We assume its domain D=⋂i=0mdom⁡fi∩⋂i=1pdom⁡hi\mathcal{D}=\bigcap_{i=0}^{m} \operatorname{dom} f_{i} \cap \bigcap_{i=1}^{p} \operatorname{dom} h_{i}D=⋂i=0m​domfi​∩⋂i=1p​domhi​ is nonempty, and denote the optimal value of the problem by p∗p^*p∗. We do not assume the problem is convex.

The basic idea in Lagrangian duality is to take the constraints into account by augmenting the objective function with a weighted sum of the constraint functions. We define the Lagrangian (function) LLL: Rn×Rm×Rp→R\mathbb{R}^n \times \mathbb{R}^m \times \mathbb{R}^p \rightarrow \mathbb{R}Rn×Rm×Rp→R associated with the problem as
L(x,λ,v)=f0(x)+∑i=1mλifi(x)+∑i=1pvihi(x)L(x,\lambda,v)=f_0(x)+\sum_{i=1}^{m}\lambda_i f_i(x) + \sum_{i=1}^{p} v_i h_i(x) L(x,λ,v)=f0​(x)+i=1∑m​λi​fi​(x)+i=1∑p​vi​hi​(x) with dom L=D×Rm×RpL=D \times \mathbb{R}^m \times \mathbb{R}^pL=D×Rm×Rp. We refer to λi\lambda_iλi​ as the Lagrange multiplier associated with the iiith inequality constraint fi(x)≤0f_i(x)\le 0fi​(x)≤0; similarly viv_ivi​ is the Lagrange multiplier associated with the iiith inequality constraint hi(x)=0h_i(x) = 0hi​(x)=0. The vectors λ\lambdaλ and ννν are called the dual variables or Lagrange multiplier vectors associated with the problem.


5.1.2 朗格朗日对偶函数(The Lagrange dual function)

Define Lagrange dual function ggg: Rm×Rp→R\mathbb{R}^m \times \mathbb{R}^p \rightarrow \mathbb{R}Rm×Rp→R as the minimum value of the Lagrangian over xxx: for λ∈Rm\lambda \in \mathbb{R}^mλ∈Rm, v∈Rpv \in \mathbb{R}^pv∈Rp,
g(λ,v)=inf⁡x∈DL(x,λ,v)=inf⁡x∈D(f0(x)+∑i=1mλifi(x)+∑i=1pvihi(x))g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)=\inf _{x \in \mathcal{D}}\left(f_{0}(x)+\sum_{i=1}^{m} \lambda_{i} f_{i}(x)+\sum_{i=1}^{p} v_{i} h_{i}(x)\right)g(λ,v)=x∈Dinf​L(x,λ,v)=x∈Dinf​(f0​(x)+i=1∑m​λi​fi​(x)+i=1∑p​vi​hi​(x))
When the Lagrangian is unbounded below in xxx, the dual function takes on the value −∞−\infty−∞. Since the dual function is the pointwise infimum of a family of affine functions of (λ,ν)(λ,ν)(λ,ν), it is concave, even when the problem is not convex.


5.1.3 最优值的下界(Lower bounds on optimal value)

The dual function yields lower bounds on the optimal value p ⋆ of the problem (5.1): For any 0⪯λ0 \preceq \lambda0⪯λ and any ννν we have
g(λ,v)=inf⁡x∈DL(x,λ,v)≤p∗,g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)\le p^* ,g(λ,v)=x∈Dinf​L(x,λ,v)≤p∗, since g(λ,v)=inf⁡x∈DL(x,λ,v)≤L(x~,λ,v)≤f0(x~)≤p∗,g(\lambda, v)=\inf _{x \in \mathcal{D}} L(x, \lambda, v)\le L(\tilde{x}, \lambda, v)\le f_0(\tilde{x}) \le p^* , g(λ,v)=x∈Dinf​L(x,λ,v)≤L(x~,λ,v)≤f0​(x~)≤p∗, where x~\tilde{x}x~ is a feasible solution for the problem.


5.1.5 例子(Examples)

线性等式的最小二乘解(Least-squares solution of linear equations)

we consider the problem:
min⁡xTxs.t.Ax=b\begin{array}{ll} \min~ x^Tx \\ s.t. ~~Ax=b \end{array}min xTxs.t.  Ax=b​ where A∈Rp×nA\in \mathbb{R}^{p \times n}A∈Rp×n.
Then, we give the Lagrangian :
L(x,v)=xTx+vT(Ax−b),L(x,v) = x^Tx + v^T(Ax-b),L(x,v)=xTx+vT(Ax−b), with domain Rn×Rp.\mathbb{R}^n \times \mathbb{R}^p.Rn×Rp.
Since L(x,ν) is a convex quadratic function of xxx, we can find the minimizing x from the optimality condition
∇xL(x,v)=2x+ATv=0,\nabla_xL(x,v) = 2x + A^Tv =0,∇x​L(x,v)=2x+ATv=0, which yields x∗=−(12ATv)x^*= -(\frac{1}{2}A^Tv)x∗=−(21​ATv) . Thereforethe dual function is
g(v)=L(−(1/2)ATv,v)=−(1/4)vTAATv−bTvg(v)=L\left(-(1 / 2) A^{T} v, v \right)=-(1 / 4) v^{T} A A^{T} v-b^{T} vg(v)=L(−(1/2)ATv,v)=−(1/4)vTAATv−bTv, which is a concave quadratic function of xxx, with domain Rp\mathbb{R}^pRp.


5.1.6 拉格朗日函数&共轭函数(The Lagrange dual function and conjugate functions)

the conjugate f∗f^*f∗ of a function fff: Rn→R\mathbb{R}^n\rightarrow \mathbb{R}Rn→R is given by
f∗(y)=sup⁡x∈domf(yTx−f(x))f^*(y) = \sup_{x\in \mathbf{dom} f} (y^Tx-f(x))f∗(y)=x∈domfsup​(yTx−f(x))

Given a problem:
min⁡f(x)s.t.x=0\begin{array}{ll} \min~ f(x) \\ s.t. ~~x=0 \end{array}min f(x)s.t.  x=0​ Then, we have Lagrangian L(x,v)=f(x)+vTxL(x,v)=f(x)+v^TxL(x,v)=f(x)+vTx, and dual function is g(v)=inf⁡x(f(x)+vTx)=−sup⁡x((−v)Tx−f(x))=−f∗(−v)g(v)=\inf_x (f(x)+v^Tx)=-\sup_x((-v)^Tx-f(x))=-f^*(-v)g(v)=xinf​(f(x)+vTx)=−xsup​((−v)Tx−f(x))=−f∗(−v).
More generally, consider an optimization problem with linear inequality and equality constraints,
min⁡f0(x)s.t.Ax⪯bCx=d.\begin{array}{ll} \min~ f_0(x) \\ s.t. ~~Ax\preceq b\\ \qquad Cx=d. \end{array}min f0​(x)s.t.  Ax⪯bCx=d.​
Using the conjugate of f0f_0f0​, we can rewrite the dual function as follows,
g(λ,v)=inf⁡x(f0(x)+λ(Ax−b)+vT(Cx−d))=−bTλ−dTv+inf⁡x(f0(x)+(AT+CTv)Tx)=−bTλ−dTv−f0∗(−ATλ−CTv).\begin{array}{ll} g(\lambda,v)&=\inf_x (f_0(x)+\lambda(Ax-b)+v^T(Cx-d)) \\ &=-b^T\lambda-d^Tv+\inf_x (f_0(x)+(A^T+C^Tv)^Tx) \\ &=-b^T\lambda-d^Tv-f_0^*(-A^T\lambda-C^Tv). \end{array}g(λ,v)​=infx​(f0​(x)+λ(Ax−b)+vT(Cx−d))=−bTλ−dTv+infx​(f0​(x)+(AT+CTv)Tx)=−bTλ−dTv−f0∗​(−ATλ−CTv).​


5.2 朗格朗日对偶问题(The Lagrange dual problem)

The Lagrange dual problem of a Lagrange dual problem is primary problem.

For each pair (λ,v)(\lambda,v)(λ,v) with λ>0\lambda>0λ>0, the Lagrange dual function gives us a lower bound on the optimal value p∗p^*p∗ of the optimization problem. We can obtain from the Lagrange dual function by the optimization problem:
max⁡g(λ,v)s.t.λ>0\begin{array}{ll} \max~ g(\lambda,v) \\ s.t. ~~\lambda>0 \end{array}max g(λ,v)s.t.  λ>0​ The above problem is called the Lagrange dual problem. The term dual feasible, to describe a pair (λ,v)(\lambda,v)(λ,v) with λ>0λ> 0λ>0 and g(λ,v)>−∞g(\lambda,v) > −\inftyg(λ,v)>−∞, means, as the name implies, that (λ,v)(λ,v)(λ,v) is feasible for the dual problem. We refer to (λ⋆,ν∗)(λ^⋆ ,ν^*)(λ⋆,ν∗) as dual optimal or optimal Lagrange multipliers if they are optimal for the problem. The Lagrange dual problem is a convex optimization problem, since the objective to be maximized is concave and the constraint is convex.

5.2.1 明确双重约束(Making Dual constraints explicit)

The examples above show that it is not uncommon for the domain of the dual function, domg={(λ,v)∣g(λ,v)>−∞)}\mathbf{dom} ~g = \{ (\lambda,v)~|~g(\lambda ,v)>-\infty ) \}dom g={(λ,v) ∣ g(λ,v)>−∞)}, to have dimension smaller than m+pm+pm+p, i.e., domg∈Rm+p\mathbf{dom} ~g \in \mathbb{R}^{m+p}dom g∈Rm+p.

A. 标准形式的朗格朗日对偶(Lagrange dual of standard from LP)

We found that the Lagrange dual function for the standard form LP
min⁡cTxs.t.Ax=bx⪰0\begin{array}{ll} \min c^Tx \\ s.t. ~~Ax = b \\ \qquad x \succeq 0 \end{array}mincTxs.t.  Ax=bx⪰0​ is given by g(λ,v)={−bTv,ATv−λ+c=0−∞,otherwiseg(\lambda,v) = \{ \begin{array}{rcl} -b^Tv, ~A^Tv-\lambda + c = 0 \\ -\infty, \qquad \quad otherwise \end{array} g(λ,v)={−bTv, ATv−λ+c=0−∞,otherwise​ Strictly speaking, the Lagrange dual problem of the standard form LP is to maximize this dual function g subject to λ>0λ > 0λ>0, i.e., max⁡g(λ,v)={−bTv,ATv−λ+c=0−∞,otherwises.t.λ>0\begin{array}{ll} \max ~g(\lambda,v) = \{ \begin{array}{rcl} -b^Tv, ~A^Tv-\lambda + c = 0 \\ -\infty, \qquad \quad otherwise \end{array} \\ s.t. ~~~ \lambda > 0 \end{array}max g(λ,v)={−bTv, ATv−λ+c=0−∞,otherwise​s.t.   λ>0​
Here, ggg is finite only when ATv−λ+c=0A^Tv - \lambda+c=0ATv−λ+c=0
We can form an equivalent problem by making these equality constraints explicit: max⁡−bTvs.t.ATv−λ+c=0λ⪰0\begin{array}{ll} \max~ -b^Tv \\ s.t. ~~A^T v - \lambda + c = 0 \\ \qquad \lambda \succeq 0 \end{array}max −bTvs.t.  ATv−λ+c=0λ⪰0​
This problem, in turn, can be expressed as max⁡−bTvs.t.ATv+c⪰0\begin{array}{ll} \max~ -b^Tv \\ s.t. ~~A^T v + c \succeq 0 \end{array}max −bTvs.t.  ATv+c⪰0​ which is an LP in inequality form.
Note that the first problem is the Lagrange dual of the standard form LP, which is equivalent to the two problems last.

B. 不等式形式线性规划的朗格朗日对偶(Lagrange Dual of Inequality Form LP)

In a similar way, we can find the Lagrange dual problem pf a linear program in inequality form
P0:min⁡cTxs.t.Ax⪯b.\begin{array}{ll} P0: &\min ~c^Tx \\ &s.t. ~~ Ax \preceq b. \end{array}P0:​min cTxs.t.  Ax⪯b.​ The Lagrangian is L(x,λ)=cTx+λT(Ax−b)=−bTλ+(ATλ+c)Tx,L(x,\lambda)=c^Tx+\lambda^T(Ax-b) = -b^T\lambda + (A^T\lambda+c)^Tx,L(x,λ)=cTx+λT(Ax−b)=−bTλ+(ATλ+c)Tx, so the dual function is g(λ)=inf⁡xL(x,λ)=−bTλ+inf⁡x(ATλ+c)Tx.g(\lambda)=\inf_x L(x,\lambda) = -b^T \lambda + \inf_x (A^T\lambda + c)^T x .g(λ)=xinf​L(x,λ)=−bTλ+xinf​(ATλ+c)Tx.
So the dual function is
g(λ)={−bTλ,ATλ+c=0−∞,otherwiseg(\lambda) = \{ \begin{array}{rcl} -b^T\lambda, ~A^T\lambda + c = 0 \\ -\infty, ~\quad otherwise \end{array} g(λ)={−bTλ, ATλ+c=0−∞, otherwise​
The dual variable λ\lambdaλ is dual feasible if λ⪰0\lambda \succeq0λ⪰0 and ATλ+c=0.A^T \lambda + c=0.ATλ+c=0.
The Langrange dual of the LP is to maximize ggg over all λ⪰0\lambda \succeq 0λ⪰0. Again we can reformulate the Lagrange dual by explicitly including the dual feasibility conditions as constraints, as in
P1:max⁡bTλs.t.ATλ+c=0,λ⪰0,\begin{array}{ll} P1: &\max~b^T\lambda \\ &s.t. ~~ A^T \lambda + c = 0 ,\\ &\qquad \lambda \succeq 0, \end{array}P1:​max bTλs.t.  ATλ+c=0,λ⪰0,​ which is an LP in standard form.
Note that the Lagrange dual of the problem P1P1P1 is (equivalent to) the primal problem P0P0P0.


5.2.2 弱对偶(Weak Duality)

The optimal value of the Lagrange dual problem, which we denote d∗d^*d∗, is, by definition, the best lower bound on p∗p^*p∗ that can be obtained from the Lagrange dual function. In particular, we have the simple but important inequality, called as weak duality, d∗<p∗,d^*<p^*,d∗<p∗, which holds even if the original problem is not convex. The weak duality inequality holds even if d∗d^*d∗ and p∗p^*p∗ are infinite.
We refer to the difference p∗−d∗p^*-d^*p∗−d∗ as the optimal duality gap of the original problem, since it gives the gap between the optimal value of the primal problem and the best (i.e., greatest) lower bound on it that can be obtained from the Lagrange dual function.


5.2.3 强对偶&Slater的约束准则(Strong Duality & Slater’s Constraint Qualification)

If the equality d∗=p∗d^* = p^*d∗=p∗ holds, i.e., the optimal duality gap is zero, then we say that strong duality holds.

Strong duality does not, in general, hold. But if the primal problem is convex, i.e., of the form
P0:min⁡f0(x)s.t.fi(x)≤b.i=1,...,m,Ax=b,\begin{array}{ll} P0: & \min~ f_0(x) \\ & s.t. ~~ f_i(x) \leq b. i =1,...,m,\\ & \qquad Ax=b, \end{array}P0:​min f0​(x)s.t.  fi​(x)≤b.i=1,...,m,Ax=b,​ with f0,...,fmf_0,...,f_mf0​,...,fm​ convex, we usually (but not always) have the strong duality.
Some conditions on the problem, under which strong holds, are called constraint qualifications. One simple constraint qualification is Salter’s condition: There exists an x∈relintDx \in \mathbf{relint}~ Dx∈relint D such that fi(x)<0,i=1,...,m,Ax=b.f_i(x)<0,i=1,...,m, \quad Ax = b.fi​(x)<0,i=1,...,m,Ax=b. Such a point is sometimes called strictly feasible, since the inequality constraints holds with strict inequalities. Slater’s theorem states that strong duality holds, if 1) Slater’s condition holds and 2) the problem is convex.
Slater’s condition can be refined when some of the inequality constraint functions fif_ifi​ are affine. If the first kkk constraint functions f1,...,fkf_1,...,f_kf1​,...,fk​ are affine, then the strong duality holds provided the following condition holds: There exists an There exists an x∈relintDx \in \mathbf{relint}~ Dx∈relint D such that fi(x)≤0,i=1,...,k,fi(x)<0,i=k+1,...,m,Ax=b.f_i(x)\leq 0,i=1,...,k, \quad f_i(x)<0, i=k+1,...,m, \quad Ax = b.fi​(x)≤0,i=1,...,k,fi​(x)<0,i=k+1,...,m,Ax=b.


5.2.4 Examples

A. Lagrange dual of QCQP

We consider the QCQP
P0:min⁡12xTP0x+q0Tx+r0s.t.12xTP0x+q0Tx+r0≤0,i=1,...,m,\begin{array}{ll} P0: & \min~ \frac{1}{2}x^TP_0x+q^T_0x +r_0 \\ & s.t. ~~ \frac{1}{2}x^TP_0x+q^T_0x +r_0 \le 0, i =1,...,m, \end{array}P0:​min 21​xTP0​x+q0T​x+r0​s.t.  21​xTP0​x+q0T​x+r0​≤0,i=1,...,m,​ with P0∈S++nP_0 \in \mathbf{S}_{++}^nP0​∈S++n​ and Pi∈S+nP_i \in \mathbf{S}_{+}^nPi​∈S+n​, i=1,...,mi=1,...,mi=1,...,m.
The Lagrangian is L(x,λ)=12xTP0x+q0Tx+r0+∑i=1mλi[12xTP0x+q0Tx+r0],i=1,...,m,=12xTP(λ)x+q(λ)Tx+r(λ)\begin{array}{ll}L(x,\lambda) & = \frac{1}{2}x^TP_0x+q^T_0x +r_0 + \sum_{i=1}^{m} \lambda_i [ \frac{1}{2}x^TP_0x+q^T_0x +r_0], i=1,...,m,\\ &= \frac{1}{2}x^TP(\lambda)x+q(\lambda)^Tx +r(\lambda) \end{array}L(x,λ)​=21​xTP0​x+q0T​x+r0​+∑i=1m​λi​[21​xTP0​x+q0T​x+r0​],i=1,...,m,=21​xTP(λ)x+q(λ)Tx+r(λ)​ where P(λ)=P0+∑i=1mλiPi,P(\lambda)=P_0 + \sum_{i=1}^m \lambda_i P_i,P(λ)=P0​+∑i=1m​λi​Pi​, q(λ)=q0+∑i=1mλiqi,q(\lambda)=q_0 + \sum_{i=1}^m \lambda_i q_i,q(λ)=q0​+∑i=1m​λi​qi​, and r(λ)=r0+λiri.r(\lambda)=r_0+\lambda_i r_i.r(λ)=r0​+λi​ri​.
If λ⪰0λ \succeq 0λ⪰0, however, we have P(λ)≻0P(λ) \succ 0P(λ)≻0 and g(λ)=inf⁡xL(x,λ)=−12q(λ)TP(λ)−1q(λ)+r(λ).g(\lambda) = \inf_x L(x,\lambda) = - \frac{1}{2}q(\lambda)^T P(\lambda)^{-1} q(\lambda) + r(\lambda).g(λ)=xinf​L(x,λ)=−21​q(λ)TP(λ)−1q(λ)+r(λ). We can therefore express the dual problem as
P1:max⁡g(λ)s.t.λ⪰0\begin{array}{ll} P1: &\max ~ g(\lambda) \\ &s.t. ~~ \lambda \succeq 0 \end{array}P1:​max g(λ)s.t.  λ⪰0​ The Slater condition says that strong duality between the primal problem P0P0P0 and the dual problem P1P1P1 holds if the quadratic inequality constraints are strictly feasible, i.e., there exists an xxx with
(12)xTPix+qiTx+ri<0,i=1,...,m.( \frac{1}{2}) x^T P_i x + q_i^T x + r_i < 0, i=1,...,m.(21​)xTPi​x+qiT​x+ri​<0,i=1,...,m.

B. A nonconvex quadratic problem with strong duality

On rare occasions, strong duality obtains for a nonconvex problem. As an important example, we consider the problem of minimizing a nonconvex quadratic function over the unit ball,
P0:max⁡xTAx+2bTxs.t.xTx≤1,\begin{array}{ll} P0: &\max ~ x^TAx + 2b^Tx \\ &s.t. ~~~ x^Tx \leq 1, \end{array}P0:​max xTAx+2bTxs.t.   xTx≤1,​ where A∈SnA \in \mathbf{S}^nA∈Sn and b∈Rnb\in\mathbf{R}^nb∈Rn. When A⋡0A \nsucceq 0A⋡0, ths is not a convex problem. This problem is called the trust region problem.
The Lagrangian is L(x,λ)=xTAx+2bTx+λ(xTx−1)=xT(A+λI)x+2bTx−λ,L(x,\lambda) = x^TAx + 2b^Tx + \lambda(x^Tx-1)=x^T(A+\lambda \mathbf{I})x + 2b^Tx - \lambda,L(x,λ)=xTAx+2bTx+λ(xTx−1)=xT(A+λI)x+2bTx−λ,
so the dual function is given by
g(λ)={−bT(A+λI)†b−λ,ifA+λI⪰0,b∈R(A+λI)−∞,otherwise,g(\lambda) = \{ \begin{array}{ll} -b^T(A+\lambda\mathbf{I})^{\dagger} b -\lambda, &\mathrm{if} ~ A+\lambda\mathbf{I} \succeq 0, b \in \mathcal{R}(A+\lambda\mathbf{I}) \\ -\infty, &otherwise, \end{array}g(λ)={−bT(A+λI)†b−λ,−∞,​if A+λI⪰0,b∈R(A+λI)otherwise,​
where (A+λI)†(A+\lambda\mathbf{I})^\dagger(A+λI)† is the preudo-inverse of (A+λI)(A+\lambda\mathbf{I})(A+λI). The Lagrange dual problem is thus
P1:max⁡−bT(A+λI)†b−λs.t.A+λI⪰0,b∈R(A+λI),\begin{array}{ll} P1: & \max ~ -b^T(A+\lambda\mathbf{I})\dagger b - \lambda \\ &s.t. ~~~ A + \lambda\mathbf{I} \succeq 0, ~ b \in \mathcal{R}(A + \lambda \mathbf{I} ) , \end{array}P1:​max −bT(A+λI)†b−λs.t.   A+λI⪰0, b∈R(A+λI),​ with variable λ∈R\lambda \in \mathbf{R}λ∈R.
The Lagrange dual problem is a convex optimization problem. In fact, it is readily solved since it can be expressed as
max⁡−∑i=1n(qiTb)2(λi+λ)−λs.t.λ≥−λmin⁡(A),\begin{array}{ll} \max ~ -\frac{\sum_{i=1}^{n} (q_i^T b)^2}{(\lambda_i+\lambda)} - \lambda \\ s.t. ~~~ \lambda \geq - \lambda_{\min}(A), \end{array}max −(λi​+λ)∑i=1n​(qiT​b)2​−λs.t.   λ≥−λmin​(A),​ where λi\lambda_iλi​ and qiq_iqi​ are the eigenvalues and corresponding (orthonormal) eigenvectors of AAA, and we interpret (qiTb)2/0(q_i^Tb)^2 / 0(qiT​b)2/0 as 000, if qiTb=0q_i^T b = 0qiT​b=0 and as ∞\infty∞ otherwise.
Despite the original problem P0P0P0 is not convex, the strong duality still holds. In fact, a more general result holds: strong duality holds for any optimization problem with quadratic objective and one quadratic inequality constraint, provided Slater’s condition holds.


5.4 鞍点解释(Saddle-Point Interpretation)

5.4.1 Max-Min characterization of weak and strong duality

First note that
sup⁡λ⪰0L(x,λ)=sup⁡λ⪰0(f0(x)+∑i=1mλifi(x))={f0(x),iffi(x)<0,i=1,...,m∞,otherwise.\sup_{\lambda \succeq 0} L(x,\lambda) = \sup_{\lambda \succeq 0 } (f_0(x) + \sum_{i=1}^m \lambda_i f_i(x)) = \{ \begin{array}{ll} f_0(x), ~&\mathrm{if}~f_i(x)<0, ~ i =1,...,m \\ \infty, &otherwise. \end{array}λ⪰0sup​L(x,λ)=λ⪰0sup​(f0​(x)+i=1∑m​λi​fi​(x))={f0​(x), ∞,​if fi​(x)<0, i=1,...,motherwise.​
Suppose xxx is not feasible, and fi(x)>0f_i(x)>0fi​(x)>0 for some iii. Then sup⁡λ⪰0L(x,λ)=∞,\sup_{\lambda \succeq 0} L(x,\lambda) = \infty,supλ⪰0​L(x,λ)=∞, as can be seen by choosing λj=0,j≠i,\lambda_j = 0,~j \neq i,λj​=0, j​=i, and λi→∞\lambda_i \rightarrow \inftyλi​→∞. On the other hand, if fi(x)<0f_i(x)<0fi​(x)<0, i=1,...,mi=1,...,mi=1,...,m, then the optimal choice of λ\lambdaλ is λ=0\lambda = 0λ=0 and sup⁡λ⪰0L(x,λ)=f0(x).\sup_{\lambda \succeq 0} L(x,\lambda) = f_0(x).supλ⪰0​L(x,λ)=f0​(x). This means that we can express the optimal value of the primal problem as p∗=inf⁡xsup⁡λ⪰0L(x,λ).p^* = \inf_x \sup_{\lambda \succeq 0} L(x,\lambda).p∗=xinf​λ⪰0sup​L(x,λ).
By the definition of the dual function, we also have optimal value of the dual problem d∗=sup⁡λ⪰0inf⁡xL(x,λ).d^* = \sup_{\lambda \succeq 0} \inf_x L(x,\lambda).d∗=λ⪰0sup​xinf​L(x,λ).
Thus, the weak duality can be expressed as the inequality d∗=sup⁡λ⪰0inf⁡xL(x,λ)≤inf⁡xsup⁡λ⪰0L(x,λ)=p∗,d^* = \sup_{\lambda \succeq 0} \inf_x L(x,\lambda) \leq \inf_x \sup_{\lambda \succeq 0} L(x,\lambda) = p^* ,d∗=λ⪰0sup​xinf​L(x,λ)≤xinf​λ⪰0sup​L(x,λ)=p∗, and strong duality as the equality sup⁡λ⪰0inf⁡xL(x,λ)≤inf⁡xsup⁡λ⪰0L(x,λ).\sup_{\lambda \succeq 0} \inf_x L(x,\lambda) \leq \inf_x \sup_{\lambda \succeq 0} L(x,\lambda) .λ⪰0sup​xinf​L(x,λ)≤xinf​λ⪰0sup​L(x,λ).
Strong duality means that the order of the minimization over xxx and the maximization over λ⪰0λ \succeq 0λ⪰0 can be switched without affecting the result.
In fact, the inequality does not depend on any properties of L:L:L: We have sup⁡z∈Zinf⁡w∈Wf(w,z)≤inf⁡w∈Wsup⁡z∈Zf(w,z)\sup_{z \in \mathbf{Z}} \inf_{w \in \mathbf{W}} f(w,z) \leq \inf_{w \in \mathbf{W}} \sup_{z \in \mathbf{Z}} f(w,z)z∈Zsup​w∈Winf​f(w,z)≤w∈Winf​z∈Zsup​f(w,z)
for any f:Rn×Rm→Rf:~\mathbf{R}^n \times \mathbf{R}^m \rightarrow \mathbf{R}f: Rn×Rm→R (and any W⊆Rn\mathbf{W} \subseteq \mathbf{R}^nW⊆Rn and Z⊆Rm\mathbf{Z} \subseteq \mathbf{R}^mZ⊆Rm). This general inequality is called the max-min inequality. When equality holds, i.e., sup⁡z∈Zinf⁡w∈Wf(w,z)=inf⁡w∈Wsup⁡z∈Zf(w,z),\sup_{z \in \mathbf{Z}} \inf_{w \in \mathbf{W}} f(w,z) = \inf_{w \in \mathbf{W}} \sup_{z \in \mathbf{Z}} f(w,z),z∈Zsup​w∈Winf​f(w,z)=w∈Winf​z∈Zsup​f(w,z), fff (and W\mathbf{W}W and Z\mathbf{Z}Z) satisfy the strong max-min property or saddle-point property.


5.4.2 Saddle-Point Interpretation

We refer to a pair w~∈W,z~∈Z\tilde{w} \in W, ~\tilde{z} \in Zw~∈W, z~∈Z as a saddle-point for fff (and WWW and ZZZ) if f(w~,z)≤f(w~,z~)≤f(w,z~)f(\tilde{w},z) \leq f(\tilde{w},\tilde{z}) \leq f(w,\tilde{z})f(w~,z)≤f(w~,z~)≤f(w,z~) for all w∈W,z∈Z.{w} \in W, ~{z} \in Z.w∈W, z∈Z. In other words, w~\tilde{w}w~ minimizes f(w,z~)f(w,\tilde{z})f(w,z~) (over w∈Ww \in Ww∈W) and z~\tilde{z}z~ maximizes f(w~,z)f(\tilde{w},z)f(w~,z) (over z∈Zz \in Zz∈Z): f(w~,z~)=inf⁡w∈Wf(w,z~),f(w~,z~)=sup⁡z∈Zf(w~,z).f(\tilde{w},\tilde{z}) = \inf_{w \in W} f(w,\tilde{z}),\quad f(\tilde{w},\tilde{z}) = \sup_{z \in Z} f(\tilde{w},z).f(w~,z~)=w∈Winf​f(w,z~),f(w~,z~)=z∈Zsup​f(w~,z). This implies that the strong max-min property holds, and that the common value if f(w~,z~).f(\tilde{w},\tilde{z}).f(w~,z~).
Returning to our discussion of Lagrange duality, we see that if x⋆x^⋆x⋆ and λ⋆λ^⋆λ⋆ are respectively primal and dual optimal points for a problem in which strong duality obtains, they form a saddle-point for the Lagrangian. The converse is also true: If (x,λ)(x,λ)(x,λ) is a saddle-point of the Lagrangian, then x is primal optimal, λ is dual optimal, and the optimal duality gap is zero.


5.5 最优化条件(Optimality conditions)

5.5.1 Certificate of suboptimality and stopping criteria

If we can find a dual feasible g(λ,ν),g(\lambda,\nu),g(λ,ν), we can establish a lower bound on the optimal value of the primal problem: p∗≤g(λ,ν).p^* \leq g(\lambda, \nu).p∗≤g(λ,ν). Thus, a dual feasible point (λ,ν) provides a proof or certificate that g(λ,ν).g(\lambda, \nu).g(λ,ν).


5.5.2 Complementary slackness

Let x⋆x^⋆x⋆ be a primal optimal and (λ⋆,ν⋆)(λ^⋆,ν^⋆ )(λ⋆,ν⋆) be a dual optimal point. This means that
f0(x∗)=g(λ∗,ν∗)=inf⁡x(f0(x)+∑i=1mλi∗fi(x)+∑i=1mνi∗hi(x))≤f0(x∗)+∑i=1mλi∗fi(x∗)+∑i=1mνi∗hi(x∗)≤f0(x∗)\begin{array}{ll} f_0(x^*) & = g(\lambda^*,\nu^*) \\ &= \inf_x (f_0(x) + \sum_{i=1}^{m} \lambda_i^* f_i(x) + \sum_{i=1}^m \nu_i^* h_i (x)) \\ & \leq f_0(x^*) + \sum_{i=1}^{m} \lambda_i^* f_i(x^*) + \sum_{i=1}^m \nu_i^* h_i (x^*) \\ & \leq f_0(x^*) \end{array}f0​(x∗)​=g(λ∗,ν∗)=infx​(f0​(x)+∑i=1m​λi∗​fi​(x)+∑i=1m​νi∗​hi​(x))≤f0​(x∗)+∑i=1m​λi∗​fi​(x∗)+∑i=1m​νi∗​hi​(x∗)≤f0​(x∗)​

  • The first line states that the optimal duality is zero.
  • The second line is the definition of the dual function.
  • The third line follows since the infimum of the Lagrangian over xxx is less than or equal to its value at x=x∗.x = x^*.x=x∗.
  • The last inequality follows from λi∗≥0,\lambda_i^* \geq 0,~λi∗​≥0,  fi(x∗)≤0,f_i(x^*)\leq 0,~fi​(x∗)≤0,  i=1,...,m,i=1,...,m,~i=1,...,m,  and hi(x∗)=0,i=1,...,p.h_i(x^*)=0,~ i=1,...,p.hi​(x∗)=0, i=1,...,p.

We conclude that the two inequalities (3-4 lines) in this chain hold with equality.
The first conclusion: since the inequality in the third line is an equality, we conclude that x∗⋆x^*⋆x∗⋆ minimizes L(x,λ∗,ν⋆)L(x,λ^*,\nu^⋆)L(x,λ∗,ν⋆) over xxx.
The second conclusion (Complementary Slackness): ∑i=1mλi∗fi(x∗)=0.\sum_{i=1}^m \lambda_i^* f_i(x^*) = 0.i=1∑m​λi∗​fi​(x∗)=0.
Since each term in this sum is nonpositive, we conclude that λi∗fi(x∗)=0,i=1,...,m.\lambda_i^* f_i(x^*) = 0,~ i =1,...,m.λi∗​fi​(x∗)=0, i=1,...,m. it holds for any primal optimal x⋆x^⋆x⋆ and any dual optimal (λ⋆,ν⋆)(λ^⋆ ,ν^⋆ )(λ⋆,ν⋆) (when strong duality holds).
We can express the complementary slackness condition as λi∗>0→fi(x∗)=0,\lambda_i^* >0 ~\rightarrow ~ f_i(x^*)=0,λi∗​>0 → fi​(x∗)=0, or, equivalently fi(x∗)<0→λi∗=0.f_i(x^*)<0 ~ \rightarrow ~ \lambda_i^* = 0 .fi​(x∗)<0 → λi∗​=0. Roughly speaking, this means the ith optimal Lagrange multiplier is zero unless the iiith constraint is active at the optimum.


5.5.3 KKT optimality conditions

We now assume that the functions f0,...,fm,h1,...,hpf_0,...,f_m,~h_1,...,h_pf0​,...,fm​, h1​,...,hp​ are differentiable (and therefore have open domains), but we make no assumptions yet about convexity.

A. KKT conditions for nonconvex problems

As above, let x⋆x^⋆x⋆ and (λ⋆,ν⋆)(λ^⋆ ,ν^⋆)(λ⋆,ν⋆) be any primal and dual optimal points with zero duality gap. Since x ⋆ minimizes L(x,λ⋆,ν⋆)L(x,λ ⋆ ,ν ⋆ )L(x,λ⋆,ν⋆) over xxx, it follows that its gradient must vanish at x⋆x^⋆x⋆ , i.e., ∇f0(x∗)+∑i=1mλi∗∇fi(x∗)+∑i=1pνi∗∇fi(x∗)=0.\nabla f_0(x^*) + \sum_{i=1}^m \lambda_i^* \nabla f_i(x^*) + \sum_{i=1}^p \nu_i^* \nabla f_i(x^*) = 0.∇f0​(x∗)+i=1∑m​λi∗​∇fi​(x∗)+i=1∑p​νi∗​∇fi​(x∗)=0.
Thus, we have
fi(x⋆)≤0,i=1,…,mhi(x⋆)=0,i=1,…,pλi⋆≥0,i=1,…,mλi⋆fi(x⋆)=0,i=1,…,m∇f0(x⋆)+∑i=1mλi⋆∇fi(x⋆)+∑i=1pνi⋆∇hi(x⋆)=0,\begin{aligned} f_{i}\left(x^{\star}\right) & \leq 0, \quad i=1, \ldots, m \\ h_{i}\left(x^{\star}\right) &=0, \quad i=1, \ldots, p \\ \lambda_{i}^{\star} & \geq 0, \quad i=1, \ldots, m \\ \lambda_{i}^{\star} f_{i}\left(x^{\star}\right) &=0, \quad i=1, \ldots, m \\ \nabla f_{0}\left(x^{\star}\right)+\sum_{i=1}^{m} \lambda_{i}^{\star} \nabla f_{i}\left(x^{\star}\right)+\sum_{i=1}^{p} \nu_{i}^{\star} \nabla h_{i}\left(x^{\star}\right) &=0, \end{aligned}fi​(x⋆)hi​(x⋆)λi⋆​λi⋆​fi​(x⋆)∇f0​(x⋆)+i=1∑m​λi⋆​∇fi​(x⋆)+i=1∑p​νi⋆​∇hi​(x⋆)​≤0,i=1,…,m=0,i=1,…,p≥0,i=1,…,m=0,i=1,…,m=0,​ which are called the Karush-Kuhn-Tucker (KKT) conditions.
To summarize, for any optimization problem with differentiable objective and differentiable constraint functions for which strong duality obtains, any pair of primal and dual optimal points must satisfy the KKT conditions.

B. KKT conditions for convex problems

When the primal problem is convex, the KKT conditions are also sufficient for the points to be primal and dual optimal. In other words, if fif_ifi​ are convex and hih_ihi​ are affine, and x~,λ~,ν~\tilde{x},\tilde{λ}, \tilde{ν}x~,λ~,ν~ are any points that satisfy the KKT conditions
fi(x~)≤0,i=1,…,mhi(x~)=0,i=1,…,pλi~≥0,i=1,…,mλi~fi(x~)=0,i=1,…,m∇f0(x~)+∑i=1mλi~∇fi(x~)+∑i=1pνi~∇hi(x~)=0,\begin{aligned} f_{i}\left(\tilde{x}\right) & \leq 0, \quad i=1, \ldots, m \\ h_{i}\left(\tilde{x}\right) &=0, \quad i=1, \ldots, p \\ \tilde{\lambda_{i}} & \geq 0, \quad i=1, \ldots, m \\ \tilde{\lambda_{i}} f_{i}\left(\tilde{x}\right) &=0, \quad i=1, \ldots, m \\ \nabla f_{0}\left( \tilde{x} \right)+\sum_{i=1}^{m} \tilde{\lambda_{i}} \nabla f_{i}\left(\tilde{x}\right)+\sum_{i=1}^{p} \tilde{\nu_{i}} \nabla h_{i}\left( \tilde{x} \right) &=0, \end{aligned}fi​(x~)hi​(x~)λi​~​λi​~​fi​(x~)∇f0​(x~)+i=1∑m​λi​~​∇fi​(x~)+i=1∑p​νi​~​∇hi​(x~)​≤0,i=1,…,m=0,i=1,…,p≥0,i=1,…,m=0,i=1,…,m=0,​ then x~\tilde{x}x~ and (λ~,ν~)( \tilde{λ}, \tilde{ν})(λ~,ν~) are primal and dual optimal, with zero duality gap.

To see this, note that the first two conditions state that x~\tilde{x}x~ is primal feasible. Since λi~≤0,L(x,λ~,ν~)\tilde{λ_i} \leq 0, ~L(x, \tilde{λ}, \tilde{ν})λi​~​≤0, L(x,λ~,ν~) is convex in xxx; the last KKT condition states that its gradient with respect to xxx vanishes at x=x~x = \tilde{x}x=x~, so it follows that x~\tilde{x}x~ minimizes L(x,λ~,ν~)L(x, \tilde{λ}, \tilde{ν})L(x,λ~,ν~) over xxx. From this we conclude that

g(λ~,ν~)=L(x~,λ~,ν~)=f0(x~)=f0(x~)+∑i=1mλi~fi(x~)+∑i=1pνi~hi(x~)\begin{aligned} g(\tilde{\lambda},\tilde{\nu}) & = L(\tilde{x},\tilde{\lambda},\tilde{\nu}) \\ &= f_0( \tilde{x} ) \\ &= f_0( \tilde{x} ) +\sum_{i=1}^{m} \tilde{\lambda_{i}} f_{i}\left(\tilde{x}\right)+\sum_{i=1}^{p} \tilde{\nu_{i}} h_{i}\left( \tilde{x} \right) \end{aligned}g(λ~,ν~)​=L(x~,λ~,ν~)=f0​(x~)=f0​(x~)+i=1∑m​λi​~​fi​(x~)+i=1∑p​νi​~​hi​(x~)​ where in the last line we use hi(x~)=0h_i (\tilde{x}) = 0hi​(x~)=0 and λi~fi(x~)=0\tilde{λ_i} f_i (\tilde{x}) = 0λi​~​fi​(x~)=0. This shows that x~\tilde{x}x~ and (λ~,ν~\tilde{λ}, \tilde{ν}λ~,ν~) have zero duality gap, and therefore are primal and dual optimal.
In summary, for any convex optimization problem with differentiable objective and differentiable constraint functions, any points that satisfy the KKT conditions are primal and dual optimal, and have zero duality gap.
If a convex optimization problem with differentiable objective and differentiable constraint functions satisfies Slater’s condition, then the KKT conditions provide necessary and sufficient conditions for optimality: Slater’s condition implies that the optimal duality gap is zero and the dual optimum is attained, so xxx is optimal if and only if there are (λ,ν)(\lambda,\nu)(λ,ν) that, together with xxx, satisfy the KKT conditions.
The KKT conditions play an important role in optimization. In a few special cases, it is possible to solve the KKT conditions analytically. More generally, many algorithms for convex optimization are conceived as, or can be interpreted as, methods for solving the KKT conditions.

Example 5.1

Equality constrained convex quadratic minimization. We consider the problem
P0:min⁡(12)xTPx+qTx+rs.t.Ax=b,\begin{array}{ll} P0: ~&\min ~~ &(\frac{1}{2})x^TPx + q^Tx + r \\ &s.t. &Ax = b, \end{array}P0: ​min  s.t.​(21​)xTPx+qTx+rAx=b,​ where P∈S+n.P \in S_{+}^n.P∈S+n​.
The KKT conditions for this problem is
min⁡Ax∗=b,Px∗+q+ATν=0,\begin{array}{ll} \min ~ &Ax^* = b, \\ &Px^* + q + A^T \nu = 0, \end{array}min ​Ax∗=b,Px∗+q+ATν=0,​ which we can write as
[PATA0][x⋆ν⋆]=[−qb].\left[\begin{array}{cc} P & A^{T} \\ A & 0 \end{array}\right]\left[\begin{array}{l} x^{\star} \\ \nu^{\star} \end{array}\right]=\left[\begin{array}{c} -q \\ b \end{array}\right].[PA​AT0​][x⋆ν⋆​]=[−qb​].
Solving this set of m+nm + nm+n equations in the m+nm + nm+n variables x⋆,ν⋆x^⋆, ν^⋆x⋆,ν⋆ gives the optimal primal and dual variables for P0P0P0.

Example 5.2 Water-filling.

We consider the convex optimization problem
P0:min⁡−∑i=1nlog⁡(αi+xi)s.t.x⪰0,1Tx=1,\begin{array}{ll} P0: ~&\min ~ &-\sum_{i=1}^n \log (\alpha_i + x_i ) \\ &s.t. & x \succeq 0, \mathbf{1}^T x = 1, \end{array}P0: ​min s.t.​−∑i=1n​log(αi​+xi​)x⪰0,1Tx=1,​ where αi>0\alpha_i > 0αi​>0. This problem arises in information theory, in allocating power to a set of nnn communication channels. The variable xix_ixi​ represents the transmitter power allocated to the ith channel, and log⁡(αi+xi)\log(\alpha_i + x_i )log(αi​+xi​) gives the capacity or communication rate of the channel, so the problem is to allocate a total power of one to the channels, in order to maximize the total communication rate.
Introducing Lagrange multipliers λ⋆∈Rn\lambda^⋆ \in \mathbb{R}^nλ⋆∈Rn for the inequality constraints x⋆⪰0x^⋆ \succeq 0x⋆⪰0, and a multiplier ν⋆∈R\nu^⋆ \in Rν⋆∈R for the equality constraint 1Tx=1\mathbf{1}^T x = 11Tx=1, we obtain the KKT conditions
x∗⪰0,1Tx=1λ∗⪰0,i=1,...,nλi∗xi∗=0,−1(αi+xi∗)−λi∗+ν∗=0,i=1,...,n.\begin{array}{ll} \qquad \qquad ~~x^* &\succeq 0, \\ \qquad \qquad \mathbf{1}^T x &= 1 \\ \qquad \qquad ~~ \lambda^* &\succeq 0, ~~ i=1,...,n \\ \qquad \qquad \lambda_i^* x_i^* &= 0, \\ -\frac{1}{(\alpha_i+x_i^*)} - \lambda_i^* + \nu^* &= 0, ~~ i=1,...,n . \end{array}  x∗1Tx  λ∗λi∗​xi∗​−(αi​+xi∗​)1​−λi∗​+ν∗​⪰0,=1⪰0,  i=1,...,n=0,=0,  i=1,...,n.​ We can directly solve these equations to find x⋆x^⋆x⋆, λ⋆λ^⋆λ⋆, and ν⋆ν^⋆ν⋆. We start by noting that λ⋆λ^⋆λ⋆ acts as a slack variable in the last equation, so it can be eliminated, leaving
x∗⪰0,1Tx=1xi∗(ν∗−1(αi+xi∗))=0,ν∗≥1αi+xi∗,i=1,...,n.\begin{array}{ll} \qquad \qquad ~~x^* &\succeq 0, \\ \qquad \qquad \mathbf{1}^T x &= 1 \\ \quad ~~ x_i^*(\nu^* - \frac{1}{(\alpha_i+x_i^*)}) &= 0, \\ \qquad \qquad ~~ \nu^* & \geq \frac{1}{\alpha_i+x_i^*}, ~~ i=1,...,n . \end{array}  x∗1Tx  xi∗​(ν∗−(αi​+xi∗​)1​)  ν∗​⪰0,=1=0,≥αi​+xi∗​1​,  i=1,...,n.​

  • If ν⋆<1/αiν^⋆ < 1/α_iν⋆<1/αi​ , this last condition can only hold if xi⋆>0x^⋆_i > 0xi⋆​>0, which by the third condition implies that ν⋆=1αi+xi⋆ν^⋆ = \frac{1}{α_i + x^⋆_i }ν⋆=αi​+xi⋆​1​.
  • Solving for xi⋆x^⋆_ixi⋆​, we conclude that xi⋆=1ν⋆−αix^⋆_i= \frac{1}{ν^⋆} −α_ixi⋆​=ν⋆1​−αi​ if ν⋆<1αiν^⋆ < \frac{1}{α_i}ν⋆<αi​1​.
  • If ν⋆≥1/αiν^⋆ \geq 1/α_iν⋆≥1/αi​, then xi⋆>0x^⋆_i> 0xi⋆​>0 is impossible, because it would imply ν⋆≥1αi>1αi+xi⋆ν^⋆ \geq \frac{1}{α_i} > \frac{1}{α_i + x^⋆_i }ν⋆≥αi​1​>αi​+xi⋆​1​, which violates the complementary slackness condition.
  • Therefore, xi⋆=0x^⋆_i = 0xi⋆​=0 if ν⋆≥1/αiν^⋆\geq 1/α_iν⋆≥1/αi​.
    Thus we have
    xi∗={1ν∗−αi,ifν∗<1αi0,ifν∗≥1αix_i^* = \{\begin{array}{ll} \frac{1}{\nu^*} - \alpha_i, &\mathrm{if} ~~ \nu^* < \frac{1}{\alpha_i}\\ \quad~ 0,& \mathrm{if} ~~ \nu^* \geq \frac{1}{\alpha_i} \end{array}xi∗​={ν∗1​−αi​, 0,​if  ν∗<αi​1​if  ν∗≥αi​1​​ or, put more simply, xi∗=max⁡{0,1ν∗−αi}x_i^* =\max \{0,\frac{1}{\nu^*} - \alpha_i \}xi∗​=max{0,ν∗1​−αi​}.
    Substituting this expression for xi⋆x^⋆_ixi⋆​ into the condition 1Tx⋆=1\mathbf{1}^T x^⋆ = 11Tx⋆=1, we obtain
    ∑i=1nmax⁡{0,1ν∗−αi}=1.\sum_{i=1}^n \max \{0,\frac{1}{\nu^*} - \alpha_i \} = 1.i=1∑n​max{0,ν∗1​−αi​}=1. The lefthand side is a piecewise-linear increasing function of 1/ν⋆1/ν^⋆1/ν⋆ , with breakpoints at αiα_iαi​ , so the equation has a unique solution which is readily determined.

5.5.5 Solving the primal problem via the dual

if strong duality holds and a dual optimal solution (λ⋆,ν⋆)(λ^⋆ ,ν^⋆ )(λ⋆,ν⋆) exists, then any primal optimal point is also a minimizer of L(x,λ⋆,ν⋆)L(x,λ^⋆ ,ν^⋆ )L(x,λ⋆,ν⋆). This fact sometimes allows us to compute a primal optimal solution from a dual optimal solution. More precisely, suppose we have strong duality and an optimal (λ⋆,ν⋆)(λ^⋆ ,ν^⋆ )(λ⋆,ν⋆) is known. Suppose that the minimizer of L(x,λ⋆,ν⋆)L(x,λ^⋆ ,ν^⋆ )L(x,λ⋆,ν⋆), i.e., the solution of min⁡f0(x)+∑i=1mλi⋆fi(x)+∑i=1pνi⋆hi(x)\min \quad f_{0}(x)+\sum_{i=1}^{m} \lambda_{i}^{\star} f_{i}(x)+\sum_{i=1}^{p} \nu_{i}^{\star} h_{i}(x)minf0​(x)+i=1∑m​λi⋆​fi​(x)+i=1∑p​νi⋆​hi​(x) is unique.

Example 5.3 Entropy maximization.

We consider the entropy maximization problem
min⁡f0(x)=∑i=1nxilog⁡xisubject toAx⪯b1Tx=1\begin{array}{ll} \operatorname{min} & f_{0}(x)=\sum_{i=1}^{n} x_{i} \log x_{i} \\ \text {subject to} & A x \preceq b \\ & \mathbf{1}^{T} x=1 \end{array}minsubject to​f0​(x)=∑i=1n​xi​logxi​Ax⪯b1Tx=1​ with domain R++n,\mathbf{R}_{++}^n,R++n​, and its Lagrange dual problem
maximize −bTλ−ν−e−ν−1∑i=1ne−aiTλsubject to λ⪰0\begin{array}{ll} \text { maximize } & -b^{T} \lambda-\nu-e^{-\nu-1} \sum_{i=1}^{n} e^{-a_{i}^{T} \lambda} \\ \text { subject to } & \lambda \succeq 0 \end{array} maximize  subject to ​−bTλ−ν−e−ν−1∑i=1n​e−aiT​λλ⪰0​ where aia_iai​ are the columns of AAA. We assume that the weak form of Slater’s condition holds, i.e., there exists an x≻0x ≻ 0x≻0 with Ax⪯bAx \preceq bAx⪯b and 1Tx=1\mathbf{1}^T x = 11Tx=1, so strong duality holds and an optimal solution (λ⋆,ν⋆)(λ^⋆,ν^⋆ )(λ⋆,ν⋆) exists.
Suppose we have solved the dual problem. The Lagrangian at (λ⋆,ν⋆λ^⋆ ,ν^⋆λ⋆,ν⋆) is
L(x,λ⋆,ν⋆)=∑i=1nxilog⁡xi+λ⋆T(Ax−b)+ν⋆(1Tx−1)L\left(x, \lambda^{\star}, \nu^{\star}\right)=\sum_{i=1}^{n} x_{i} \log x_{i}+\lambda^{\star T}(A x-b)+\nu^{\star}\left(\mathbf{1}^{T} x-1\right)L(x,λ⋆,ν⋆)=i=1∑n​xi​logxi​+λ⋆T(Ax−b)+ν⋆(1Tx−1) which is strictly convex on D\mathcal{D}D and bounded below, so it has a unique solution x⋆x^⋆x⋆ , given by
xi∗=1/exp⁡(aiTλ∗+ν∗+1),i=1,...,n.x^*_i = 1/ \exp (a_i^T \lambda^*+\nu^* + 1), ~~i=1,...,n.xi∗​=1/exp(aiT​λ∗+ν∗+1),  i=1,...,n.
If x⋆x^⋆x⋆ is primal feasible, it must be the optimal solution of the primal problem. If x⋆x^⋆x⋆ is not primal feasible, then we can conclude that the primal optimum is not attained.


5.7 Examples (reformulations)

In this section, we show by example that simple equivalent reformulations of a problem can lead to very different dual problems. We consider the following types of reformulations:

  • Introducing new variables and associated equality constraints.
  • Replacing the objective with an increasing function of the original objective.
  • Making explicit constraints implicit, i.e., incorporating them into the domain of the objective.

5.7.1 Introducing new variables and equality constraints

Consider an unconstrained problem of the form
P0:min⁡f0(Ax+b).P0: ~\min ~f_0(Ax + b).P0: min f0​(Ax+b). Its Lagrange dual function is the constant p⋆p^⋆p⋆ . So while we do have strong duality, i.e., p⋆=d⋆p^⋆= d^⋆p⋆=d⋆, the Lagrangian dual is neither useful nor interesting.

Now let us reformulate the problem as
P1:min⁡f0(Ax+b)s.t.Ax+b=y.\begin{array}{ll} P1: &\min~~ f_0(Ax + b) \\ &s.t. ~~~Ax +b = y. \end{array}P1:​min  f0​(Ax+b)s.t.   Ax+b=y.​ Here we have introduced new variables y, as well as new equality constraints Ax+b=yAx+b = yAx+b=y. The problems P0P0P0 and P1P1P1 are clearly equivalent.
The Lagrangian of the reformulated problem is
L(x,y,ν)=f0(y)+νT(Ax+b−y).L(x,y,\nu) = f_0(y) + \nu^T(Ax+b-y).L(x,y,ν)=f0​(y)+νT(Ax+b−y). To find the dual function we minimize LLL over xxx and yyy. Minimizing over xxx, we find that g(ν)=−∞g(ν) = −\inftyg(ν)=−∞ unless ATν=0A^T\nu = 0ATν=0, in which case we are left with
g(ν)=bTν+inf⁡y(f0(y)−νTy)=bTν−f0∗(ν),g(\nu) = b^T \nu + \inf_y (f_0(y) - \nu^T y ) = b^T \nu - f_0^*(\nu),g(ν)=bTν+yinf​(f0​(y)−νTy)=bTν−f0∗​(ν), where f0∗f_0^*f0∗​ is the conjugate of f0f_0f0​. The dual problem of P1P1P1 can therefore be expressed as
P1:min⁡g(ν)=bT−f0∗(ν)s.t.ATν=0.\begin{array}{ll} P1: &\min~~ g(\nu)=b^T-f_0^*(\nu) \\ &s.t. ~~~A^T \nu= 0. \end{array}P1:​min  g(ν)=bT−f0∗​(ν)s.t.   ATν=0.​ Thus, the dual of the reformulated problem P1P1P1 is considerably more useful than the dual of the original problem P0P0P0.

Example 5.5 Unconstrained geometric program.

Consider the unconstrained geometric program
min⁡log⁡(∑i=1mexp⁡(aiTx+bi)).\min~ \log (\sum_{i=1}^m \exp (a_i^T x + b_i)).min log(i=1∑m​exp(aiT​x+bi​)). We first reformulate it by introducing new variables and equality constraints:
P1:min⁡f0(y)=log⁡(∑i=1mexp⁡(aiTx+bi))s.t.Ax+b=y.\begin{array}{ll} P1: &\min~~ f_0(y) = \log (\sum_{i=1}^m \exp (a_i^T x + b_i)) \\ &s.t. ~~~Ax + b = y. \end{array}P1:​min  f0​(y)=log(∑i=1m​exp(aiT​x+bi​))s.t.   Ax+b=y.​ where aiTa_i^TaiT​ are the rows of AAA. The conjugate of the log-sum-exp function is
f0∗={∑i=1mνilog⁡νi,ifν⪰0,1Tν=1∞otherwisef_0^* = \{\begin{array}{ll} \sum_{i=1}^m \nu_i \log \nu_i, &\mathrm{if}~ \nu \succeq 0, \mathbf{1}^T\nu =1 \\ \qquad ~~ \infty &\mathrm{otherwise} \end{array} f0∗​={∑i=1m​νi​logνi​,  ∞​if ν⪰0,1Tν=1otherwise​ so the dual of the reformulated problem can be expressed
as max⁡bTν−∑i=1mνilog⁡νi,1Tν=1ATν=0ν⪰0,\begin{array}{ll} \max &b^T \nu - \sum_{i=1}^m \nu_i \log \nu_i , \\ & \mathbf{1}^T\nu =1\\ &A^T \nu = 0 \\ & \nu \succeq 0, \end{array} max​bTν−∑i=1m​νi​logνi​,1Tν=1ATν=0ν⪰0,​ which is an entropy maximization problem.

Example 5.6 Norm approximation problem.

We consider the unconstrained norm approximation problem
P0:max⁡∥Ax−b∥,P0: ~ \max \| Ax-b \|,P0: max∥Ax−b∥, where ∥⋅∥\|\cdot\|∥⋅∥ is any norm. Here too the Lagrange dual function is constant, equal to the optimal value of P0P0P0, and therefore not useful.
Once again we reformulate the problem as
min⁡∥y∥Ax−b=y.\begin{array}{ll} \min &\| y \| \\ &Ax -b = y. \end{array}min​∥y∥Ax−b=y.​
The Lagrange dual problem is,
min⁡bTν∥ν∥∗≤1ATν=0,\begin{array}{ll} \min &b^T \nu \\ & \| \nu \|_* \leq 1 \\ & A^T \nu = 0, \end{array}min​bTν∥ν∥∗​≤1ATν=0,​ where we use the fact that the conjugate of a norm is the indicator function of the dual norm unit ball.
The idea of introducing new equality constraints can be applied to the constraint functions as well. Consider, for example, the problem
min⁡bTν∥ν∥∗≤1ATν=0,\begin{array}{ll} \min &b^T \nu \\ & \| \nu \|_* \leq 1 \\ & A^T \nu = 0, \end{array}min​bTν∥ν∥∗​≤1ATν=0,​ where Ai∈Rki×nA_i \in \mathbf{R}^{k_i \times n}Ai​∈Rki​×n and fi:Rki→Rf_i: \mathbf{R}^{k_i} \rightarrow \mathbf{R}fi​:Rki​→R are convex. We introduce a new variable yi∈Rkiy_i \in \mathbf{R}^{k_i}yi​∈Rki​ , for i=0,...,mi = 0,...,mi=0,...,m, and reformulate the problem as
min⁡f0(y0)fi(yi)≤0,i=1,...,m.Aix+bi=yi,i=0,...,m.\begin{array}{ll} \min &f_0(y_0) \\ & f_i(y_i) \le 0, i =1,...,m. \\ & A_i x + b_i = y_i , i =0,...,m. \end{array}min​f0​(y0​)fi​(yi​)≤0,i=1,...,m.Ai​x+bi​=yi​,i=0,...,m.​
The Lagrangian for this problem is
L(x,y0,...,λ,νo,...,νm)=f0(y0)+∑i=1mλifi(yi)+∑i=0mνiT(Aix+bi−yi).L(x,y_0,...,\lambda,\nu_o,...,\nu_m) = f_0(y_0) + \sum_{i=1}^m \lambda_i f_i(y_i) + \sum_{i=0}^m \nu_i^T (A_i x + b_i - y_i).L(x,y0​,...,λ,νo​,...,νm​)=f0​(y0​)+i=1∑m​λi​fi​(yi​)+i=0∑m​νiT​(Ai​x+bi​−yi​).
To find the dual function, we minimize over xxx and yiy_iyi​. The minimum over xxx is −∞-\infty−∞ unless ∑i=0mAiTνi=0,\sum_{i=0}^m A_i^T \nu_i = 0,i=0∑m​AiT​νi​=0, in which case we have, for λ≻0\lambda \succ 0λ≻0,
g(λ,ν0,…,νm)=∑i=0mνiTbi+inf⁡y0,…,ym(f0(y0)+∑i=1mλifi(yi)−∑i=0mνiTyi)=∑i=0mνiTbi+inf⁡y0(f0(y0)−ν0Ty0)+∑i=1mλiinf⁡yi(fi(yi)−(νi/λi)Tyi)=∑i=0mνiTbi−f0∗(ν0)−∑i=1mλifi∗(νi/λi)\begin{aligned} &g\left(\lambda, \nu_{0}, \ldots, \nu_{m}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}+\inf _{y_{0}, \ldots, y_{m}}\left(f_{0}\left(y_{0}\right)+\sum_{i=1}^{m} \lambda_{i} f_{i}\left(y_{i}\right)-\sum_{i=0}^{m} \nu_{i}^{T} y_{i}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}+\inf _{y_{0}}\left(f_{0}\left(y_{0}\right)-\nu_{0}^{T} y_{0}\right)+\sum_{i=1}^{m} \lambda_{i} \inf _{y_{i}}\left(f_{i}\left(y_{i}\right)-\left(\nu_{i} / \lambda_{i}\right)^{T} y_{i}\right) \\ &\quad=\sum_{i=0}^{m} \nu_{i}^{T} b_{i}-f_{0}^{*}\left(\nu_{0}\right)-\sum_{i=1}^{m} \lambda_{i} f_{i}^{*}\left(\nu_{i} / \lambda_{i}\right) \end{aligned}​g(λ,ν0​,…,νm​)=i=0∑m​νiT​bi​+y0​,…,ym​inf​(f0​(y0​)+i=1∑m​λi​fi​(yi​)−i=0∑m​νiT​yi​)=i=0∑m​νiT​bi​+y0​inf​(f0​(y0​)−ν0T​y0​)+i=1∑m​λi​yi​inf​(fi​(yi​)−(νi​/λi​)Tyi​)=i=0∑m​νiT​bi​−f0∗​(ν0​)−i=1∑m​λi​fi∗​(νi​/λi​)​
The last expression involves the perspective of the conjugate function, and is therefore concave in the dual variables. Finally, we address the question of what happens when λ≻0λ \succ 0λ≻0, but some λiλ_iλi​ are zero. If λi=0λ_i = 0λi​=0 and νi≠0ν_i \neq 0νi​​=0, then the dual function is −∞−∞−∞. If λi=0λ_i = 0λi​=0 and νi=0ν_i = 0νi​=0, however, the terms involving yiy_iyi​, νiν_iνi​, and λiλ_iλi​ are all zero. Thus, the expression above for g is valid for all λ≻0λ \succ 0λ≻0, if we take λifi∗(νi/λi)=0λ_i f^∗_i (ν_i /λ_i ) = 0λi​fi∗​(νi​/λi​)=0 when λi=0λ_i = 0λi​=0 and νi=0ν_i = 0νi​=0, and λifi∗(νi/λi)=∞λ_i f^∗_i (ν_i /λ_i ) = \inftyλi​fi∗​(νi​/λi​)=∞ when λi=0λ_i = 0λi​=0 and νi≠0ν_i \neq 0νi​​=0.
Therefore we can express the dual of the problem as
min⁡∑i=0mνiTbi−f0∗(ν0)−∑i=1mλifi∗(νi/λi)λ⪰0∑i=0mAiTνi=0.\begin{array}{ll} \min & \sum_{i=0}^{m} \nu_{i}^{T} b_{i}-f_{0}^{*}\left(\nu_{0}\right)-\sum_{i=1}^{m} \lambda_{i} f_{i}^{*}\left(\nu_{i} / \lambda_{i}\right)\\ & \lambda \succeq 0 \\ & \sum_{i=0}^m A_i^T \nu_i =0. \end{array}min​∑i=0m​νiT​bi​−f0∗​(ν0​)−∑i=1m​λi​fi∗​(νi​/λi​)λ⪰0∑i=0m​AiT​νi​=0.​

5.7.2 Transforming the objective

If we replace the objective f0f_0f0​ by an increasing function of f0f_0f0​, the resulting problem is clearly equivalent. The dual of this equivalent problem, however, can be very different from the dual of the original problem.

Example 5.8

We consider again the minimum norm problem
min⁡∥Ax−b∥,\min \| Ax - b \|,min∥Ax−b∥, where ∥⋅∥\| \cdot \|∥⋅∥ is some norm. We reformulate this problem as
min⁡12∥y∥2s.t.Ax−b=y.\begin{aligned} \min ~~&\frac{1}{2} \| y \|^2 \\ s.t .~~& Ax -b = y. \end{aligned}min  s.t.  ​21​∥y∥2Ax−b=y.​ Here we have introduced new variables, and replaced the objective by half its square. Evidently it is equivalent to the original problem.
The dual of the reformulated problem is
min⁡−12∥y∥∗2+bTνs.t.ATν=0.\begin{aligned} \min ~~&-\frac{1}{2} \| y \|^2_* + b^T \nu \\ s.t .~~& A^T \nu = 0. \end{aligned}min  s.t.  ​−21​∥y∥∗2​+bTνATν=0.​ where we use the fact that the conjugate of (1/2)∥⋅∥2(1/2)\|\cdot\|^2(1/2)∥⋅∥2 is (1/2)∥⋅∥∗2(1/2)\|\cdot\|^2_*(1/2)∥⋅∥∗2​.
Note that this dual problem is not the same as the dual problem (Example 5.6) derived earlier.

5.7.3 Implicit constraints

The next simple reformulation we study is to include some of the constraints in the objective function, by modifying the objective function to be infinite when the constraint is violated.


凸优化基础知识—对偶(Duality)相关推荐

  1. 凸优化“傻瓜”教程-----凸优化基础知识

    目录 凸优化基础知识 1.AI问题是什么? 2.对于常见的优化问题,我们可以写成什么形式? 3.针对一般的优化问题,我们从哪几个方向思考? 4.什么样的问题是凸优化问题? 4.1凸优化问题需要同时满足 ...

  2. 凸优化基础知识笔记-凸集、凸函数、凸优化问题

    文章目录 1. 凸集 2. 凸函数 2.1. 凸函数的一阶条件 2.1. 凸函数例子 3. 凸优化问题 4. 对偶 4.1. Lagrange函数与Lagrange对偶 4.2. 共轭函数 4.3. ...

  3. 机器学习——凸优化基础知识

    文章目录 一.计算几何 (一)计算几何是研究什么的 (二)直线的表达式 二.凸集 (一)凸集是什么 (二)三维空间中的一个平面如何表达 (三)更高维度的"超平面"如何表达 三.凸函 ...

  4. 前端性能优化基础知识--幕课网

    作为一个前端小码农,在页面样式都能实现以后,就开始考虑:同一个效果,我该用什么样的方式和代码去实现它比较规范?前两天逛幕课网发现了两门课程–<前端性能优化-基础知识认知>和<前端性能 ...

  5. 凸优化基础学习:凸集、凸函数、凸规划理论概念学习

    凸优化基础概念学习 1.计算几何是研究什么的? 2.计算几何理论中(或凸集中)过两点的一条直线的表达式,是如何描述的?与初中数学中那些直线方程有什么差异?有什么好处? 3.凸集是什么? 直线是凸集吗? ...

  6. 【001】机器学习基础-凸优化基础

    为什么开篇第一件事是介绍凸优化呢,原因很简单,就是它很重要! 凸优化属于数学最优化的一个子领域,所以其理论本身也是科研领域一门比较复杂高深的研究方向,常被应用于运筹学.管理科学.运营管理.工业工程.系 ...

  7. 凸优化——详解对偶和鞍点

    对偶 原问题的最优解(最小解)p∗p^*p∗一定是大于等于其对偶问题的最优解(最大值)d∗d^*d∗的: p∗>=d∗p^*>=d^*p∗>=d∗ 这是对偶问题最重要的一条性质 弱对 ...

  8. 分布式鲁棒优化基础知识学习 | Ref:《鲁棒优化入门》「运筹OR帷幄」

    鲁棒:考虑最坏情况: 分布:最坏情况的主体是环境参数的分布变量. 从数学角度说,分布式鲁棒优化囊括随机规划和传统鲁棒优化两种形式. 当分布式鲁棒优化下,环境变量的分布函数获知时,分布鲁棒优化退化为随机 ...

  9. 【笔记】Unity优化 基础知识

    目录 Find 和 FindObjectOfType Camera.main 按 ID 寻址 与 UnityEngine.Object 子类进行 Null 比较 矢量和四元数数学以及运算顺序 使用非分 ...

  10. 凸优化学习-(十八)对偶性Duality 拉格朗日函数与对偶函数

    凸优化学习 对偶性是凸优化学习的核心,重中之重. 学习笔记 一.拉格朗日函数与对偶函数 对于一个普通优化问题: min⁡f0(x)s.t.fi(x)≤0i=1⋯mhi(x)=0i=1⋯p\begin{ ...

最新文章

  1. mysql多字段数据统计_超详细的mysql元数据sql统计--information_schema
  2. pxe安装linux后命令不可用,CentOS7下的PXE无人值守系统安装(亲测成功)
  3. small2java_java类
  4. vue 修改favicon
  5. html-----020----事件
  6. C#数组和集合专题4(Hashtable类)
  7. java三层架构是不是策略模式,把「策略模式」应用到实际项目中
  8. swfupload 实例 php,文件上传之SWFUpload插件(代码)
  9. 操作系统——实验壹——熟悉Linux基础命令及进程管理
  10. 从淘宝来看后端架构发展
  11. 【SSH网上商城项目实战18】过滤器实现购物登录功能的判断
  12. 什么是DevOps?人员,流程和产品的结合,过程、方法与系统的统称
  13. Mysql 5.7 Gtid内部学习(十) 实际案例(二)
  14. KnockOutJS入门
  15. 深入计算机组成原理(四)穿越功耗墙,我们该从哪些方面提升“性能”?
  16. 景深决定照相机什么特性_行政执法考试题库2017 2017摄影专业考试题库
  17. 【学习笔记2】新机折腾:装系统、远程控制和插头转换
  18. 中科院oracle,《Oracle 10G 系统教程 中科院培训老师讲授》
  19. 【Spark2运算效率】第四节 影响生产集群运算效率的原因之数据倾斜
  20. .bat文件实现对Sql Server数据库的查询

热门文章

  1. 网站的PV是什么意思
  2. 【数据结构与算法】之深入解析“香槟塔”的求解思路与算法示例
  3. (Unity)太空大战笔记
  4. Java ee 数据链路层重点协议 以太网
  5. PPT中插入的图片如何铺满整页
  6. lvm扩容lv扩容操作记录
  7. 跑马灯(走马灯)的js实现
  8. 一道被称为“神题”的试题之求熊是什么颜色的
  9. MAC下显示隐藏文件夹
  10. android朋友圈九宫格图片尺寸,朋友圈图片尺寸多少厘米(附朋友圈发图技巧)...