Introduction to Convex Optimization Basic Concepts 详细

往期文章链接目录

文章目录

往期文章链接目录
Optimization problem
Optimization Categories
Different initialization brings different optimum (if not convex)
Affine sets
Affine combination
Affine hull
Convex Sets
Convex combination
Convex hull
Cones
Hyperplanes and halfspaces
Polyhedra
Linearly Independent v.s. Affinely Independent
Simplexes
What is the key distinction between a convex hull and a simplex?
Convex Functions
First-order conditions
Second-order conditions
Examples of Convex and Concave Functions
往期文章链接目录

Optimization problem

All optimization problems can be written as:

Optimization Categories

convex v.s. non-convex
Deep Neural Network is non-convex
continuous v.s.discrete
Most are continuous variable; tree structure is discrete
constrained v.s. non-constrained
We add prior to make it a constrained problem
smooth v.s.non-smooth
Most are smooth optimization

Different initialization brings different optimum (if not convex)

Idea: Give up global optimal and find a good local optimal.

Purpose of pre-training: Find a good initialization to start training, and then find a better local optimal.
Relaxation: Convert to a convex optimization problem.
Brute force: If a problem is small, we can use brute force.

Affine sets

A set C⊆RnC \subseteq \mathbf R^nC⊆Rn is affine if the line through any two distinct points in CCC lies in CCC, i.e., if for any x1x1x1, x2∈Cx2 \in Cx2∈C and θ∈R\theta \in \mathbf Rθ∈R, we have θx1+(1−θ)x2∈C.\theta x_1 + (1-\theta) x_2 \in C.θx1+(1−θ)x2∈C.

Note: The line passing throught x1x_1x1 and x2x_2x2: y=θx1+(1−θ)x2y=\theta x_1 + (1-\theta)x_2y=θx1+(1−θ)x2.

Affine combination

We refer to a point of the form θ1x1+θ2x2+...+θkxk\theta_1 x_1 + \theta_2 x_2 + ... + \theta_k x_kθ1x1+θ2x2+...+θkxk, where θ1+θ2+...+θk=1\theta_1 + \theta_2 + ... + \theta_k = 1θ1+θ2+...+θk=1 as an affine combination of the points x1,x2,...,xkx_1, x_2, ..., x_kx1,x2,...,xk. An affine set contains every affine combination of its points.

Affine hull

The set of all affine combinations of points in some set C⊆RnC \subseteq \mathbf R^nC⊆Rn is called the affine hull of CCC, and denoted affC\mathbf{aff}\, CaffC:

affC={θ1x1+θ2x2+...+θkxk∣x1,x2,...,xk∈C,θ1+θ2+...+θk=1}.\mathbf{aff}\, C =\{\theta_1 x_1 + \theta_2 x_2 + ... + \theta_k x_k \, | x_1, x_2, ..., x_k \in C, \theta_1 + \theta_2 + ... + \theta_k = 1\}.affC={θ1x1+θ2x2+...+θkxk∣x1,x2,...,xk∈C,θ1+θ2+...+θk=1}.

The affine hull is the smallest affine set that contains CCC, in the following sense: if
SSS is any affine set with C⊆SC \subseteq SC⊆S, then aff⁡C⊆S\operatorname{aff} C \subseteq SaffC⊆S.

Affine dimension: We define the affine dimension of a set CCC as the dimension of its affine hull.

Convex Sets

A set CCC is convex if the line segment between any two points in CCC lies in CCC, i.e., if for any x1x1x1, x2∈Cx2 \in Cx2∈C and any θ\thetaθ with 0≤θ≤10 \leq \theta \leq 10≤θ≤1, we have
θx1+(1−θ)x2∈C.\theta x_1 + (1-\theta) x_2 \in C.θx1+(1−θ)x2∈C.

Roughly speaking, a set is convex if every point in the set can be seen by every other
point. Every affine set is also convex, since it contains the entire line between any two distinct points in it, and therefore also the line segment between the points.

Convex combination

We call a point of the form θ1x1+θ2x2+...+θkxk\theta_1 x_1 + \theta_2 x_2 + ... + \theta_k x_kθ1x1+θ2x2+...+θkxk, where θ1+θ2+...+θk=1\theta_1 + \theta_2 + ... + \theta_k = 1θ1+θ2+...+θk=1 and θi≥0,i=1,2,...k\theta_i \geq 0, i = 1,2,...kθi≥0,i=1,2,...k, a convex combination of the points x1,...,xkx_1, ..., x_kx1,...,xk.

Convex hull

The convex hull of a set CCC, denoted convC\mathbf{conv} \, CconvC, is the set of all convex combinations of points in CCC:

convC={θ1x1+θ2x2+...+θkxk∣xi∈C,θi≥0,i=1,...,k,θ1+θ2+...+θk=1}.\mathbf{conv}\, C =\{\theta_1 x_1 + \theta_2 x_2 + ... + \theta_k x_k \, | x_i \in C, \theta_i \geq 0, i=1,...,k, \theta_1 + \theta_2 + ... + \theta_k = 1\}.convC={θ1x1+θ2x2+...+θkxk∣xi∈C,θi≥0,i=1,...,k,θ1+θ2+...+θk=1}.

The convex hull conv⁡C\operatorname{conv} CconvC is always convex. It is the smallest convex set that contains CCC: If BBB is any convex set that contains CCC, then conv⁡C⊆B\operatorname{conv} C \subseteq BconvC⊆B.

Cones

A set CCC is called a cone, or nonnegative homogeneous, if for every x∈Cx \in Cx∈C and θ≥0\theta \geq 0θ≥0 we have θx∈C\theta x \in Cθx∈C. A set CCC is a convex cone if it is convex and a cone, which means that for any x1,x2∈Cx_1, x_2 \in Cx1,x2∈C and θ1,θ2≥0\theta_1, \theta_2 \geq 0θ1,θ2≥0, we have

θ1x1+θ2x2∈C\theta_1 x_1 + \theta_2 x_2 \in Cθ1x1+θ2x2∈C

Hyperplanes and halfspaces

A hyperplane is a set of the form
{x∣aTx=b},\{ x \, | a^T x = b\},{x∣aTx=b},

where a∈Rn,a≠0a \in \mathbf R^n, a \neq 0a∈Rn,a=0, and b∈Rb \in \mathbf Rb∈R.

This geometric interpretation can be understood by expressing the hyperplane in the form

{x∣aT(x−x0)=0},\{ x \, | a^T (x - x_0) = 0\}, {x∣aT(x−x0)=0},
where x0x_0x0 is any point in the hyperplane.

A hyperplane divides Rn\mathbf R^nRn into two halfspaces. A (closed) halfspace is a set of the form

{x∣aTx≤b}.\{x \, | a^T x \leq b \}.{x∣aTx≤b}.

where x0≠0x_0 \neq 0x0=0. Halfspaces are convex but not affine. The set $ {x | a^T < b }$ is called an open halfspace.

Polyhedra

A polyhedron is defined as the solution set of a finite number of linear equalities and inequalities:

P={x∣ajT≤bj,j=1,...,m,cjTx=dj,j=1,...,p}P = \{ x\, | a_j^T \leq b_j, j=1,...,m, c_j^T x = d_j, j = 1, ..., p\}P={x∣ajT≤bj,j=1,...,m,cjTx=dj,j=1,...,p}

A polyhedron is thus the intersection of a finite number of halfspaces and hyperplanes. Here is the compact notations:

P={x∣Ax⪯b,Cx=d}P = \{ x\, | Ax \preceq b, Cx=d\}P={x∣Ax⪯b,Cx=d}

Linearly Independent v.s. Affinely Independent

Consider the vectors (1,0), (0,1) and (1,1). These are affinely independent, but not independent. If you remove any one of them, their affine hull has dimension one. In contrast, the span of any two of them is all of R2\mathbf R^2R2, and hence these are not independent.

Simplexes

Suppose the k+1k+1k+1 points v0,...,vk∈Rnv_0, ..., v_k \in \mathbf R^nv0,...,vk∈Rn are affinely independent, which means v1−v0,...,vk−v0v_1 - v_0, ..., v_k - v_0v1−v0,...,vk−v0 are linearly independent. The simplex determined by them is given by

C=conv{v0,...,vk}={θ0v0+...+θkvk∣θ⪰0,1Tθ=1}C = \mathbf{conv}\{ v_0, ..., v_k\} = \{ \theta_0 v_0 + ... + \theta_k v_k \,| \theta \succeq 0, \mathbf 1^T \theta = 1\}C=conv{v0,...,vk}={θ0v0+...+θkvk∣θ⪰0,1Tθ=1}

Note:

The affine dimension of this simplex is kkk.

A 1-dimensional simplex is a line segment; a 2-dimensional simplex is a triangle (including its interior); and a 3-dimensional simplex is a tetrahedron.

What is the key distinction between a convex hull and a simplex?

If the elements of the set on which the convex hull is defined are affinely independent, then the convex hull and the simplex defined on this set are the same. Otherwise, simplex can’t be defined on this set, but convex hull can.

Convex Functions

A function f:Rn→Rf: \mathbf{R}^n \rightarrow \mathbf{R}f:Rn→R is convex if dom fff is a convex set and if for all xxx, y∈domfy \in \mathbf{dom} \, fy∈domf, and θ\thetaθ with $ 0 \leq \theta \leq 1$, we have

f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y).f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta) f(y).f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y).

We say fff is concave is −f-f−f is convex, and strictly concave if −f-f−f is strictly convex.
A function is convex if and only if it is convex when restricted to any line that intersects its domain. In other words f is convex if and only if for all x∈domfx \in \mathbf{dom} \, fx∈domf and all vvv, the function g(t)=f(x+tv)g(t) = f(x + tv)g(t)=f(x+tv) is convex (on its domain, {t∣x+tv∈domf}\{t \, | \, x + tv \in \mathbf{dom} \, f \}{t∣x+tv∈domf}).

First-order conditions

Suppose fff is differentiable, then fff is convex if and only if domf\mathbf{dom} \, fdomf is convex and
f(y)≥f(x)+∇f(x)T(y−x)f(y) \geq f(x) + \nabla f(x)^{T}(y-x)f(y)≥f(x)+∇f(x)T(y−x) holds for all x,y∈domfx,y \in \mathbf{dom} \, fx,y∈domf

For a convex function, the first-order Taylor approximation is in fact a global underestimator of the function. Conversely, if the first-order Taylor approximation of a function is always a global underestimator of the function, then the function is convex.
The inequality shows that from local information about a convex function (i.e., its value and derivative at a point) we can derive global information (i.e., a global underestimator of it).

Second-order conditions

Suppose that fff is twice differentiable. The fff is convex if and only if domf\mathbf{dom} \, fdomf is convex and its Hessian is positive semidefinite: for all x∈domfx \in \mathbf{dom} fx∈domf,
∇2f(x)⪰0.\nabla^2f(x) \succeq 0.∇2f(x)⪰0.
fff is concave if and only if domf\mathbf{dom} fdomf is convex and ∇2f(x)⪯0\nabla^2f(x) \preceq 0∇2f(x)⪯0 for all x∈domfx \in \mathbf{dom} \, fx∈domf.
If $ \nabla^2f(x) \succ 0$ for all x∈domfx \in \mathbf{dom} \, fx∈domf, then fff is strictly convex. The converse is not true. e.x. f(x)=x4f(x) = x^4f(x)=x4 has zero second derivative at x=0x=0x=0 but is strictly convex.
Quadratic functions: Consider the quadratic function f:Rn→Rf:\mathbf{R}^n \rightarrow \mathbf{R}f:Rn→R, with domf=Rn\mathbf{dom} \, f = \mathbf{R}^ndomf=Rn, given by
f(x)=(1/2)xTPx+qTx+r,f(x) = (1/2)x^{T}Px + q^Tx + r,f(x)=(1/2)xTPx+qTx+r,
with P∈Sn,q∈RnP \in \mathbf{S}^n, q \in \mathbf R^nP∈Sn,q∈Rn, and r∈Rr \in \mathbf{R}r∈R. Since ∇2f(x)=P\nabla^2f(x) = P∇2f(x)=P for all x, f is convex if and only if P⪰0P \succeq 0P⪰0 (and concave if and only if P⪯0P \preceq 0P⪯0).

Examples of Convex and Concave Functions

Exponential. eaxe^{ax}eax is convex on R\mathbf{R}R, for any a∈Ra \in \mathbf{R}a∈R.
Powers. xax^axa is convex on R++\mathbf R_{++}R++ when a≥1a \geq 1a≥1 or a≤0a \leq 0a≤0, and concave for 0≤a≤10 \leq a \leq 10≤a≤1.
Powers of absolute value. ∣x∣p|x|^p∣x∣p, for p≥1p \geq 1p≥1, is convex on R\mathbf RR.
Logarithm. logxlog \, xlogx is concave on R++R_{++}R++.
Negative Entropy. xlogxx\,log\,xxlogx (either on R++\mathbf{R}_{++}R++, or on R+\mathbf R_+R+, defined as 000 for x=0x = 0x=0) is convex.
Norms. Every norm on Rn\mathbf{R}^nRn is convex.
Max function. f(x)=max{x1,...,xn}f(x) = max \{ x_1, ..., x_n\}f(x)=max{x1,...,xn} is convex on Rn\mathbf R^nRn.
Quadratic-over-linear function. The function f(x,y)=x2/yf(x,y) = x^2/yf(x,y)=x2/y, with
domf=R×R++={(x,y)∈R2∣y>0},\mathbf{dom} \, f = \mathbf R \times \mathbf R_{++} = \{ (x,y) \in \mathbf R^2\, | y > 0\},domf=R×R++={(x,y)∈R2∣y>0}, is convex.
Log-sum-exp. The function f(x)=log(ex1+⋅⋅⋅+exn)f(x) = log (e^{x_1} + · · · + e^{x_n} )f(x)=log(ex1+⋅⋅⋅+exn) is convex on Rn\mathbf R^nRn.
Geometric mean. The geometric mean f(x)=(∏i=1nxi)1/nf(x) = (\prod^n_{i = 1} x_i)^{1/n}f(x)=(∏i=1nxi)1/n is concave on domf=S++n\mathbf {dom} \, f = \mathbf S^n_{++}domf=S++n
Log-determinant. The function f(X)=logdetXf(X) =\mathrm{log \, det \,} Xf(X)=logdetX is concave.

Reference: Convex Optimization by Stephen Boyd and Lieven Vandenberghe.

往期文章链接目录