Convex and strictly convex
Strong convex

1. Convex and strictly convex

Common used notations about convexity are convex and strictly convex. Their definitions are

Definition 1: [convex]: f(x)f(x) is said to be convex if one of the following holds ∀x,y\forall x,y

f(λx+(1−λ)y)≤λf(x)+(1−λ)f(y)

f(\lambda x+(1-\lambda)y)\leq \lambda f(x)+(1-\lambda)f(y)

Definition 2: [strictly convex]: f(x)f(x) is said to be strictly convex if one of the following holds ∀x,y\forall x,y

f(λx+(1−λ)y)<λf(x)+(1−λ)f(y)

f(\lambda x+(1-\lambda)y)

And there exist two equivalent definitions:

Theorem 3. [first order condition(1)]: If f(x)f(x) is first differentiable, then f(x)f(x) is convex iff ∀x,y\forall x,y

f(y)≥f(x)+∇f(x)⋅(y−x)

f(y) \geq f(x)+\nabla f(x)\cdot (y-x)
This equivalence holds for strictly convex for >>.

proof:
necessary: If f(x)f(x) is convex and let λ→0\lambda \rightarrow 0

f(x)≥f(λx+(1−λ)y)−(1−λ)f(y)λ=f(y)+f(y+λ(x−y))−f(y)λ=f(y)+f(y+λ(x−y))−f(y)λ(x−y)⋅(x−y)=f(y)+∇f(y)⋅(x−y)

\begin{align} f(x)&\geq \frac{f(\lambda x+(1-\lambda)y ) - (1-\lambda)f(y)}{\lambda}\\ &=f(y) + \frac{f(y+\lambda (x-y)) - f(y)}{\lambda}\\ &=f(y) + \frac{f(y+\lambda (x-y)) - f(y)}{\lambda (x-y)}\cdot(x-y)\\ &=f(y) + \nabla f(y)\cdot(x-y)\\ \end{align}
sufficient: If the first order condition is satisfied,

f(x)f(y)≥f(λx+(1−λ)y)+∇f(λx+(1−λ)y)⋅(1−λ)(x−y)≥f(λx+(1−λ)y)+∇f(λx+(1−λ)y)⋅λ(y−x)

\begin{align} f(x)&\geq f(\lambda x+(1-\lambda)y)+\nabla f(\lambda x+(1-\lambda)y)\cdot (1-\lambda)(x-y)\\ f(y)&\geq f(\lambda x+(1-\lambda)y)+\nabla f(\lambda x+(1-\lambda)y)\cdot \lambda(y-x)\\ \end{align}

combining these two together, we get:

λf(x)+(1−λ)f(y)≤f(λx+(1−λ)y)

\begin{align} \lambda f(x)+(1-\lambda)f(y) \leq f(\lambda x+(1-\lambda)y) \end{align}

**Theorem 4. [first order condition(2)[monotone of ∇f(x)\nabla f(x)]]: f(x)f(x) is convex iff (∇f(x)−∇f(y))⋅(x−y)≥0(\nabla f(x)-\nabla f(y))\cdot (x-y)\geq 0.
proof: necessary:
If f(x)f(x) is convex, then ∀x,y\forall x,y, we have

f(x)≥f(y)+∇f(y)⋅(x−y)f(y)≥f(x)+∇f(x)⋅(y−x)

\begin{align} &f(x)\geq f(y) +\nabla f(y)\cdot (x-y)\\ &f(y)\geq f(x)+\nabla f(x)\cdot (y-x) \end{align}
adding these two equalities:

f(x)+f(y)≥f(y)+f(x)+(∇f(y)−∇f(x))⋅(x−y)

f(x)+f(y)\geq f(y)+f(x)+(\nabla f(y) - \nabla f(x))\cdot (x-y)
i.e.

(∇f(x)−∇f(y))⋅(x−y)≥0

(\nabla f(x)-\nabla f(y))\cdot (x-y)\geq 0
sufficient:
Let g(t)=f(x+t(y−x))g(t) = f(x +t(y-x)). Then ∇g(x)=∇f(x+t(y−x))⋅(y−x)\nabla g(x) = \nabla f(x+t(y-x))\cdot (y-x)

∇g(t)−∇g(0)=∇f(x+t(y−x))⋅(y−x)−∇f(x)⋅(y−x)=1t(∇f(x+t(y−x))−∇f(x))⋅t(y−x)≥0

\begin{align} \nabla g(t) - \nabla g(0) & = \nabla f(x+t(y-x))\cdot (y-x) - \nabla f(x)\cdot (y-x) \\ &= \frac{1}{t}(\nabla f(x+t(y-x)) - \nabla f(x))\cdot t(y-x)\\ &\geq 0 \end{align}
so ∇g(t)\nabla g(t) is monotone increasing.

g(1)⇒=g(0)+∫10∇g(t)dt≥∇g(0)f(y)≥f(x)+∇f(x)⋅(y−x)

\begin{align} g(1) &=g(0)+ \int_0^1 \nabla g(t) dt\geq \nabla g(0) \\ \Rightarrow& f(y) \geq f(x) +\nabla f(x)\cdot (y-x) \end{align}

Theorem 5. [second order condition]: If f(x)f(x) is second differentiable, then f(x)f(x) is convex iff ∀x\forall x

∇2f(x)≥0

\nabla^2 f(x)\geq 0
This equivalence holds for strictly convex for >>.
proof:
For simply, we firstly prove one variable function situation:
If h(x):x∈Rh(x): x\in \mathbb{R} is convex iff its twice derivative h′′(x)≥0h''(x)\geq 0
sufficient:
From h′′(x)≥0h''(x)\geq 0 and taylor expansion, we have

h(y)≥h(x)+h′(x)(y−x)

h(y) \geq h(x)+h'(x)(y-x)
and from last theorem, we know h(x)h(x) is convex.
necessary:
∀x≤z≤y\forall x\leq z\leq y, we have z=λx+(1−λy)z=\lambda x+(1-\lambda y) with λ=y−zy−x\lambda = \frac{y-z}{y-x}

h(z)=h(λx+(1−λ)y)≤λh(x)+(1−λ)h(y)=y−zy−xh(x)+z−xy−xh(y)

\begin{align} h(z) &= h(\lambda x+(1-\lambda)y)\\&\leq \lambda h(x)+(1-\lambda)h(y)\\&=\frac{y-z}{y-x} h(x)+\frac{z-x}{y-x}h(y) \end{align}

⇒(y−x)h(z)≤(y−z)h(x)+(z−x)h(y)

\Rightarrow (y-x)h(z)\leq (y-z)h(x)+(z-x)h(y)

⇒(y−z)(h(z)−h(x))≤(z−x)(h(y)−h(z))

\Rightarrow(y-z)(h(z)-h(x))\leq (z-x)(h(y)-h(z))

⇒h(z)−h(x)z−x≤h(y)−h(z)y−z

\Rightarrow\frac{h(z)-h(x)}{z-x}\leq \frac{h(y) - h(z)}{y-z}

So for t1≤x≤z≤y≤t2t_1\leq x\leq z\leq y\leq t_2, we have

h(x)−h(t1)x−t1≤h(z)−h(x)z−x≤h(y)−h(z)y−z≤h(t2)−h(y)t2−y

\frac{h(x)-h(t_1)}{x-t_1}\leq \frac{h(z)-h(x)}{z-x}\leq \frac{h(y) - h(z)}{y-z}\leq \frac{h(t_2) - h(y)}{t_2 - y}
letting t1→xt_1\rightarrow x and t2→yt_2 \rightarrow y, we have

h′(x)≤h(z)−h(x)z−x≤h(y)−h(z)y−z≤h′(y)

h'(x)\leq \frac{h(z)-h(x)}{z-x}\leq \frac{h(y) - h(z)}{y-z} \leq h'(y)

So h′(x)h'(x) is increasing →\rightarrow h′′(x)≥0h''(x)\geq 0.

Now we prove for multivariable function. Let g(t)=f(x+tℓ)g(t)=f(x+t\ell) be one variable function.
sufficient:
From convexity of f(x)f(x),

g(λt1+(1−λ)t2)=f(x+λt1ℓ+(1−λ)ℓ)≤λf(x+t1ℓ)+(1−λ)f(x+t2ℓ)=g(t1)+g(t2)

g(\lambda t_1+(1-\lambda)t_2) = f(x+\lambda t_1\ell+(1-\lambda)\ell) \leq \lambda f(x+t_1\ell)+(1-\lambda)f(x+t_2\ell) = g(t_1)+g(t_2)
So g(t)g(t) is convex as a one variable function. Then

g′′(t)=ℓt∇2f(x+tℓ)ℓ≥0

g''(t) = \ell^t\nabla^2 f(x+t\ell) \ell\geq 0
So

∇2f(x)≥0

\nabla^2 f(x) \geq 0
necessary:*
Let g(t)=f(x+t(y−x))g(t) = f(x+t(y-x)), then

g′′(t)=(y−x)t∇2f(x+t(y−x))(y−x)≥0

g''(t)=(y-x)^t \nabla^2 f(x+t(y-x)) (y-x)\geq 0
So g(t)g(t) is convex.

Then

f(λx+(1−λ)y)=f(x+(1−λ)(y−x))=g(1−λ)=g(λ0+(1−λ)1)≤λg(0)+(1−λ)g(1)=λg(x)+(1−λ)f(y)

\begin{align} f(\lambda x+(1-\lambda)y) &= f(x+(1-\lambda)(y-x))\\ &=g(1-\lambda) = g(\lambda 0 +(1-\lambda) 1)\\ &\leq \lambda g(0)+(1-\lambda)g(1)\\ &=\lambda g(x)+(1-\lambda)f(y) \end{align}
So f(x)f(x) is convex.

From the proof, we know that the convexity of a function on a convex set is one-dimensional fact.

Intuition:

convex says a function is convex ≥\geq a linear function
strictly convex says a function is convex >> a linear function

2. Strong convex

Definition 3: [strong convex]: f(x)f(x) is said to be m-strong convex if f(x)−m2∥x∥22f(x)-\frac{m}{2}\|x\|_2^2 is convex.

Then from last section, we have that:
first order condition (1):

f(y)≥f(x)+∇f(x)⋅(y−x)+m2∥y−x∥22

f(y)\geq f(x)+\nabla f(x)\cdot (y-x)+\frac{m}{2}\|y-x\|_2^2
first order condition (2)[monotone of derivative]:

(∇f(x)−∇f(y))⋅(x−y)>m∥x−y∥22

(\nabla f(x) - \nabla f(y)) \cdot (x-y)> m\|x-y\|_2^2
seconf order condition :

∇2f(x)>m⋅I

\nabla^2 f(x)> m\cdot I

Intuition: strong convex says a function is convex ≥\geq a quadratic function.

Theorem: If a function is strong convex then the first derivative of it is Lipschitz continuous.
proof: Firstly, we claim that the subset S={x,f(x)≤f(x(0))}S=\{x, f(x)\leq f(x^{(0)})\} is closed. Since ∀y∈S\forall y \in S, we have

f(x(0))≥f(y)≥f(x∗)+∇f(x∗)⋅(y−x)+m2∥y−x∗∥22⇒∥y−x∗∥22≤2mf(x(0))

\begin{align} &f(x^{(0)})\geq f(y) \geq f(x^*) +\nabla f(x^*)\cdot (y-x)+\frac{m}{2}\|y-x^*\|_2^2\\ &\Rightarrow \| y - x^*\|_2^2 \leq \frac{2}{m} f(x^{(0)}) \end{align}

And the maximum eigenvalue of ∇2f(x)\nabla^2 f(x) is continuous, so there exists a upper bound MM for ∇2f(x)\nabla^2 f(x), which says that ∇f(x)\nabla f(x) is lipschitz continuous.

[LA] Different convexity相关推荐

WCDMA中的URA和LA/RA
1.关于URA的概念: URA(UTRAN Registration Area)是UTRAN内部区域的划分适用于UE处于RRC连接状态的情形,而且只能在UTRAN端使用(比如由UTRAN发起的寻呼). ...
LA 5717枚举+最小生成树回路性质
1 /*LA 5717 2 <训练指南>P343 3 最小生成树的回路性质 4 在生成的最小生成树上,新增一条边e(u,v) 5 若原图上u到v的路径的最大边大于e,则删除此边,加上e,否 ...
编译php时错误make ***[libphp5.la] Error 1
错误信息 make ***[libphp5.la] Error 1 /usr/bin/ld: cannot find -lltdl collect2: ld returned 1 exit statu ...
获取长度length_lab、labE、la、laE、ll、llE 钢筋锚固搭接长度6项参数的相互关系...
文|施工小诸葛目录 01 相关概念 02 字母含义 03 lab 非抗震纵向受拉钢筋的基本锚固长度 04 la 非抗震纵向受拉钢筋的锚固长度 05 ll 非抗震纵向受拉钢筋搭接长 ...
qu.la网站上的小说爬取
qu.la网站上的小说爬取 ##这个项目是我最早开始写的爬虫项目,代码比较简陋在写这个项目时,我还不会Python的协程编程,用协程可提升爬虫速度至少5倍,参考我的文章[线程,协程对比和Python ...
Linux中的动态库和静态库(.a/.la/.so/.o)
为什么80%的码农都做不了架构师?>>> Linux中的动态库和静态库(.a/.la/.so/.o) Linux中的动态库和静态库(.a/.la/.so/.o) C/C++程序 ...
Android 解决 No static method in class La/a/a/a; or its super classes
错误堆栈: Process: com.chaozh.iReader, PID: 24217java.lang.NoSuchMethodError: No static method getDrawab ...
c语言数据结构线性表LA和LB,数据结构（C语言版）设有线性表LA(3,5,8,110）和LB（2,6,8,9,11,15,20）求新集合?...
数据结构(C语言版)设有线性表LA(3,5,8,110)和LB(2,6,8,9,11,15,20)求新集合? 数据结构(C语言版)设有线性表LA(3,5,8,110)和LB(2,6,8,9,11,15 ...
android .a .so区别,.so，.la和.a库文件有什么区别？
.so文件是动态库.后缀代表"共享对象",因为所有与该库链接的应用程序都使用同一文件,而不是在生成的可执行文件中进行复制. .a文件是静态库.后缀代表"存档", ...
Linux中.a,.la,.o,.so文件的意义和编程实现
Linux中.a,.la,.o,.so文件的意义和编程实现 (转) Linux下文件的类型是不依赖于其后缀名的,但一般来讲: .o,是目标文件,相当于windows中的.obj文件 .so 为共享库, ...

[LA] Different convexity

1. Convex and strictly convex

2. Strong convex

[LA] Different convexity相关推荐

最新文章

热门文章