往期文章链接目录

文章目录

  • 往期文章链接目录
  • Optimization problem
  • Optimization Categories
  • Different initialization brings different optimum (if not convex)
  • Affine sets
  • Affine combination
  • Affine hull
  • Convex Sets
  • Convex combination
  • Convex hull
  • Cones
  • Hyperplanes and halfspaces
  • Polyhedra
  • Linearly Independent v.s. Affinely Independent
  • Simplexes
  • What is the key distinction between a convex hull and a simplex?
  • Convex Functions
  • First-order conditions
  • Second-order conditions
  • Examples of Convex and Concave Functions
  • 往期文章链接目录

Optimization problem

All optimization problems can be written as:

Optimization Categories

  1. convex v.s. non-convex
    Deep Neural Network is non-convex

  2. continuous v.s.discrete
    Most are continuous variable; tree structure is discrete

  3. constrained v.s. non-constrained
    We add prior to make it a constrained problem

  4. smooth v.s.non-smooth
    Most are smooth optimization

Different initialization brings different optimum (if not convex)

Idea: Give up global optimal and find a good local optimal.

  • Purpose of pre-training: Find a good initialization to start training, and then find a better local optimal.

  • Relaxation: Convert to a convex optimization problem.

  • Brute force: If a problem is small, we can use brute force.

Affine sets

A set C⊆RnC \subseteq \mathbf R^nC⊆Rn is affine if the line through any two distinct points in CCC lies in CCC, i.e., if for any x1x1x1, x2∈Cx2 \in Cx2∈C and θ∈R\theta \in \mathbf Rθ∈R, we have θx1+(1−θ)x2∈C.\theta x_1 + (1-\theta) x_2 \in C.θx1​+(1−θ)x2​∈C.

Note: The line passing throught x1x_1x1​ and x2x_2x2​: y=θx1+(1−θ)x2y=\theta x_1 + (1-\theta)x_2y=θx1​+(1−θ)x2​.

Affine combination

We refer to a point of the form θ1x1+θ2x2+...+θkxk\theta_1 x_1 + \theta_2 x_2 + ... + \theta_k x_kθ1​x1​+θ2​x2​+...+θk​xk​, where θ1+θ2+...+θk=1\theta_1 + \theta_2 + ... + \theta_k = 1θ1​+θ2​+...+θk​=1 as an affine combination of the points x1,x2,...,xkx_1, x_2, ..., x_kx1​,x2​,...,xk​. An affine set contains every affine combination of its points.

Affine hull

The set of all affine combinations of points in some set C⊆RnC \subseteq \mathbf R^nC⊆Rn is called the affine hull of CCC, and denoted affC\mathbf{aff}\, CaffC:

affC={θ1x1+θ2x2+...+θkxk∣x1,x2,...,xk∈C,θ1+θ2+...+θk=1}.\mathbf{aff}\, C =\{\theta_1 x_1 + \theta_2 x_2 + ... + \theta_k x_k \, | x_1, x_2, ..., x_k \in C, \theta_1 + \theta_2 + ... + \theta_k = 1\}.affC={θ1​x1​+θ2​x2​+...+θk​xk​∣x1​,x2​,...,xk​∈C,θ1​+θ2​+...+θk​=1}.

The affine hull is the smallest affine set that contains CCC, in the following sense: if
SSS is any affine set with C⊆SC \subseteq SC⊆S, then aff⁡C⊆S\operatorname{aff} C \subseteq SaffC⊆S.

Affine dimension: We define the affine dimension of a set CCC as the dimension of its affine hull.

Convex Sets

A set CCC is convex if the line segment between any two points in CCC lies in CCC, i.e., if for any x1x1x1, x2∈Cx2 \in Cx2∈C and any θ\thetaθ with 0≤θ≤10 \leq \theta \leq 10≤θ≤1, we have
θx1+(1−θ)x2∈C.\theta x_1 + (1-\theta) x_2 \in C.θx1​+(1−θ)x2​∈C.

Roughly speaking, a set is convex if every point in the set can be seen by every other
point. Every affine set is also convex, since it contains the entire line between any two distinct points in it, and therefore also the line segment between the points.

Convex combination

We call a point of the form θ1x1+θ2x2+...+θkxk\theta_1 x_1 + \theta_2 x_2 + ... + \theta_k x_kθ1​x1​+θ2​x2​+...+θk​xk​, where θ1+θ2+...+θk=1\theta_1 + \theta_2 + ... + \theta_k = 1θ1​+θ2​+...+θk​=1 and θi≥0,i=1,2,...k\theta_i \geq 0, i = 1,2,...kθi​≥0,i=1,2,...k, a convex combination of the points x1,...,xkx_1, ..., x_kx1​,...,xk​.

Convex hull

The convex hull of a set CCC, denoted convC\mathbf{conv} \, CconvC, is the set of all convex combinations of points in CCC:

convC={θ1x1+θ2x2+...+θkxk∣xi∈C,θi≥0,i=1,...,k,θ1+θ2+...+θk=1}.\mathbf{conv}\, C =\{\theta_1 x_1 + \theta_2 x_2 + ... + \theta_k x_k \, | x_i \in C, \theta_i \geq 0, i=1,...,k, \theta_1 + \theta_2 + ... + \theta_k = 1\}.convC={θ1​x1​+θ2​x2​+...+θk​xk​∣xi​∈C,θi​≥0,i=1,...,k,θ1​+θ2​+...+θk​=1}.

The convex hull conv⁡C\operatorname{conv} CconvC is always convex. It is the smallest convex set that contains CCC: If BBB is any convex set that contains CCC, then conv⁡C⊆B\operatorname{conv} C \subseteq BconvC⊆B.

Cones

A set CCC is called a cone, or nonnegative homogeneous, if for every x∈Cx \in Cx∈C and θ≥0\theta \geq 0θ≥0 we have θx∈C\theta x \in Cθx∈C. A set CCC is a convex cone if it is convex and a cone, which means that for any x1,x2∈Cx_1, x_2 \in Cx1​,x2​∈C and θ1,θ2≥0\theta_1, \theta_2 \geq 0θ1​,θ2​≥0, we have

θ1x1+θ2x2∈C\theta_1 x_1 + \theta_2 x_2 \in Cθ1​x1​+θ2​x2​∈C

Hyperplanes and halfspaces

A hyperplane is a set of the form
{x∣aTx=b},\{ x \, | a^T x = b\},{x∣aTx=b},

where a∈Rn,a≠0a \in \mathbf R^n, a \neq 0a∈Rn,a​=0, and b∈Rb \in \mathbf Rb∈R.

This geometric interpretation can be understood by expressing the hyperplane in the form

{x∣aT(x−x0)=0},\{ x \, | a^T (x - x_0) = 0\}, {x∣aT(x−x0​)=0},
where x0x_0x0​ is any point in the hyperplane.

A hyperplane divides Rn\mathbf R^nRn into two halfspaces. A (closed) halfspace is a set of the form

{x∣aTx≤b}.\{x \, | a^T x \leq b \}.{x∣aTx≤b}.

where x0≠0x_0 \neq 0x0​​=0. Halfspaces are convex but not affine. The set $ {x | a^T < b }$ is called an open halfspace.

Polyhedra

A polyhedron is defined as the solution set of a finite number of linear equalities and inequalities:

P={x∣ajT≤bj,j=1,...,m,cjTx=dj,j=1,...,p}P = \{ x\, | a_j^T \leq b_j, j=1,...,m, c_j^T x = d_j, j = 1, ..., p\}P={x∣ajT​≤bj​,j=1,...,m,cjT​x=dj​,j=1,...,p}

A polyhedron is thus the intersection of a finite number of halfspaces and hyperplanes. Here is the compact notations:

P={x∣Ax⪯b,Cx=d}P = \{ x\, | Ax \preceq b, Cx=d\}P={x∣Ax⪯b,Cx=d}

Linearly Independent v.s. Affinely Independent

Consider the vectors (1,0), (0,1) and (1,1). These are affinely independent, but not independent. If you remove any one of them, their affine hull has dimension one. In contrast, the span of any two of them is all of R2\mathbf R^2R2, and hence these are not independent.

Simplexes

Suppose the k+1k+1k+1 points v0,...,vk∈Rnv_0, ..., v_k \in \mathbf R^nv0​,...,vk​∈Rn are affinely independent, which means v1−v0,...,vk−v0v_1 - v_0, ..., v_k - v_0v1​−v0​,...,vk​−v0​ are linearly independent. The simplex determined by them is given by

C=conv{v0,...,vk}={θ0v0+...+θkvk∣θ⪰0,1Tθ=1}C = \mathbf{conv}\{ v_0, ..., v_k\} = \{ \theta_0 v_0 + ... + \theta_k v_k \,| \theta \succeq 0, \mathbf 1^T \theta = 1\}C=conv{v0​,...,vk​}={θ0​v0​+...+θk​vk​∣θ⪰0,1Tθ=1}

Note:

  • The affine dimension of this simplex is kkk.

A 1-dimensional simplex is a line segment; a 2-dimensional simplex is a triangle (including its interior); and a 3-dimensional simplex is a tetrahedron.

What is the key distinction between a convex hull and a simplex?

If the elements of the set on which the convex hull is defined are affinely independent, then the convex hull and the simplex defined on this set are the same. Otherwise, simplex can’t be defined on this set, but convex hull can.

Convex Functions

  • A function f:Rn→Rf: \mathbf{R}^n \rightarrow \mathbf{R}f:Rn→R is convex if dom fff is a convex set and if for all xxx, y∈domfy \in \mathbf{dom} \, fy∈domf, and θ\thetaθ with $ 0 \leq \theta \leq 1$, we have

f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y).f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta) f(y).f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y).

  • We say fff is concave is −f-f−f is convex, and strictly concave if −f-f−f is strictly convex.

  • A function is convex if and only if it is convex when restricted to any line that intersects its domain. In other words f is convex if and only if for all x∈domfx \in \mathbf{dom} \, fx∈domf and all vvv, the function g(t)=f(x+tv)g(t) = f(x + tv)g(t)=f(x+tv) is convex (on its domain, {t∣x+tv∈domf}\{t \, | \, x + tv \in \mathbf{dom} \, f \}{t∣x+tv∈domf}).

First-order conditions

  • Suppose fff is differentiable, then fff is convex if and only if domf\mathbf{dom} \, fdomf is convex and
    f(y)≥f(x)+∇f(x)T(y−x)f(y) \geq f(x) + \nabla f(x)^{T}(y-x)f(y)≥f(x)+∇f(x)T(y−x) holds for all x,y∈domfx,y \in \mathbf{dom} \, fx,y∈domf

  • For a convex function, the first-order Taylor approximation is in fact a global underestimator of the function. Conversely, if the first-order Taylor approximation of a function is always a global underestimator of the function, then the function is convex.

  • The inequality shows that from local information about a convex function (i.e., its value and derivative at a point) we can derive global information (i.e., a global underestimator of it).

Second-order conditions

  • Suppose that fff is twice differentiable. The fff is convex if and only if domf\mathbf{dom} \, fdomf is convex and its Hessian is positive semidefinite: for all x∈domfx \in \mathbf{dom} fx∈domf,
    ∇2f(x)⪰0.\nabla^2f(x) \succeq 0.∇2f(x)⪰0.

  • fff is concave if and only if domf\mathbf{dom} fdomf is convex and ∇2f(x)⪯0\nabla^2f(x) \preceq 0∇2f(x)⪯0 for all x∈domfx \in \mathbf{dom} \, fx∈domf.

  • If $ \nabla^2f(x) \succ 0$ for all x∈domfx \in \mathbf{dom} \, fx∈domf, then fff is strictly convex. The converse is not true. e.x. f(x)=x4f(x) = x^4f(x)=x4 has zero second derivative at x=0x=0x=0 but is strictly convex.

  • Quadratic functions: Consider the quadratic function f:Rn→Rf:\mathbf{R}^n \rightarrow \mathbf{R}f:Rn→R, with domf=Rn\mathbf{dom} \, f = \mathbf{R}^ndomf=Rn, given by
    f(x)=(1/2)xTPx+qTx+r,f(x) = (1/2)x^{T}Px + q^Tx + r,f(x)=(1/2)xTPx+qTx+r,
    with P∈Sn,q∈RnP \in \mathbf{S}^n, q \in \mathbf R^nP∈Sn,q∈Rn, and r∈Rr \in \mathbf{R}r∈R. Since ∇2f(x)=P\nabla^2f(x) = P∇2f(x)=P for all x, f is convex if and only if P⪰0P \succeq 0P⪰0 (and concave if and only if P⪯0P \preceq 0P⪯0).

Examples of Convex and Concave Functions

  • Exponential. eaxe^{ax}eax is convex on R\mathbf{R}R, for any a∈Ra \in \mathbf{R}a∈R.

  • Powers. xax^axa is convex on R++\mathbf R_{++}R++​ when a≥1a \geq 1a≥1 or a≤0a \leq 0a≤0, and concave for 0≤a≤10 \leq a \leq 10≤a≤1.

  • Powers of absolute value. ∣x∣p|x|^p∣x∣p, for p≥1p \geq 1p≥1, is convex on R\mathbf RR.

  • Logarithm. logxlog \, xlogx is concave on R++R_{++}R++​.

  • Negative Entropy. xlogxx\,log\,xxlogx (either on R++\mathbf{R}_{++}R++​, or on R+\mathbf R_+R+​, defined as 000 for x=0x = 0x=0) is convex.

  • Norms. Every norm on Rn\mathbf{R}^nRn is convex.

  • Max function. f(x)=max{x1,...,xn}f(x) = max \{ x_1, ..., x_n\}f(x)=max{x1​,...,xn​} is convex on Rn\mathbf R^nRn.

  • Quadratic-over-linear function. The function f(x,y)=x2/yf(x,y) = x^2/yf(x,y)=x2/y, with
    domf=R×R++={(x,y)∈R2∣y>0},\mathbf{dom} \, f = \mathbf R \times \mathbf R_{++} = \{ (x,y) \in \mathbf R^2\, | y > 0\},domf=R×R++​={(x,y)∈R2∣y>0}, is convex.

  • Log-sum-exp. The function f(x)=log(ex1+⋅⋅⋅+exn)f(x) = log (e^{x_1} + · · · + e^{x_n} )f(x)=log(ex1​+⋅⋅⋅+exn​) is convex on Rn\mathbf R^nRn.

  • Geometric mean. The geometric mean f(x)=(∏i=1nxi)1/nf(x) = (\prod^n_{i = 1} x_i)^{1/n}f(x)=(∏i=1n​xi​)1/n is concave on domf=S++n\mathbf {dom} \, f = \mathbf S^n_{++}domf=S++n​

  • Log-determinant. The function f(X)=logdetXf(X) =\mathrm{log \, det \,} Xf(X)=logdetX is concave.


Reference: Convex Optimization by Stephen Boyd and Lieven Vandenberghe.

往期文章链接目录

Introduction to Convex Optimization Basic Concepts 详细相关推荐

  1. Convex Optimization: Primal Problem to Dual problem clearly explained 详细

    往期文章链接目录 文章目录 往期文章链接目录 The Lagrange dual function Lower bound property Derive an analytical expressi ...

  2. Numerical Optimization和Convex optimization 两本书的选择?

    Numerical Optimization和Convex optimization 两本书的选择? - 知乎https://www.zhihu.com/question/49689245 Numer ...

  3. 今天开始学Convex Optimization:引言、第1章基本概念介绍

    文章目录 引言 第1章 Introduction 凸优化问题 最小二乘问题 线性规划问题 一个优化问题例子:最佳灯源问题 Chebyshev逼近问题,转化成线性规划 参考资料 2020年我自己希望多看 ...

  4. 对凸优化(Convex Optimization)的一些浅显理解

    ©作者 | 李航前 单位 | EPFL 研究方向 | 计算机图形学与三维视觉 最近学习了一些凸优化课程,整理笔记的同时写下一些自己的理解,向着头秃的道路上越走越远. 凸优化是应用数学的一个基本分支,几 ...

  5. 今天开始学Convex Optimization:第2章 背景数学知识简述

    文章目录 第2章 背景数学知识简述 2.1 数学分析和微积分基础 函数性质 集合Sets Norms 线性函数.仿射函数 函数的微分(导数) 2.2 线性代数基础 Matrix Subspaces 正 ...

  6. Cluster analysis :Basic Concepts and Algorithms -- Part 5 Cluster Evalation

    系列文章 Cluster analysis :Basic Concepts and Algorithms – Part 1 Overview Cluster analysis :Basic Conce ...

  7. 凸优化(convex optimization)第二讲:convex set

    Convex opt  第二讲(convex set) Affine set affine set 表示经过两点的一条线,这条线满足: 相较于后面我们要讨论的convex set,这里少了一些限制,是 ...

  8. Introduction to Linear Optimization 2.2 极点,顶角与基可行解

    1.极点 极点的定义及理解 Definition 2.6 Let P be a polyhedron. A vector x ∈ P is an extreme point of P if we ca ...

  9. 实用线性代数和凸优化 Convex Optimization

    If not specified, the following conditions are assumed. X∈Rn∗mA∈Rm∗nX \in R^{n*m} \\ A \in R^{m*n} X ...

  10. 笔记:Tensor RPCA: Exact recovery of corrupted low-rank tensors via convex optimization

    Lu, C., et al., Tensor robust principal component analysis: Exact recovery of corrupted low-rank ten ...

最新文章

  1. 顶会论文9篇,又斩获百度奖学金!哈工大NLP“新生代”正崭露头角
  2. docker 容器数据备份
  3. hikaridatasource 加密后登陆不上_python测试开发django42.auth模块登陆认证
  4. 【Android 热修复】热修复原理 ( 热修复框架简介 | 将 Java 字节码文件打包到 Dex 文件 )
  5. NIO--Buffer
  6. 序列元素IT面试题——判断合法出栈序列
  7. 移动端判断手机横竖屏状态
  8. git push 和 pull 时 免密执行的方法
  9. ASP.“.NET研究”NET MVC 3 —— Model的使用?
  10. Processing编程学习指南2.5 Processing中的代码
  11. vbscript运行环境linux,VBScript 是什么?
  12. 每日新闻丨阿里巴巴香港IPO指引价每股176港元左右;全球超级计算机500强榜单出炉...
  13. python socket和多线程实现多人对话聊天室
  14. 数据库实验第七周【集合查询数据更新】
  15. 三色旗问题中的快排应用
  16. AcWing 4246. 最短路径和(反向建图+链式前向星+堆优化)
  17. 关于宽哥英语课,本人的遭遇
  18. APP如何借助种子用户运营
  19. Linux 触摸屏 笔记本,Linux 5.2应该可以解决许多AMD Ryzen笔记本电脑触摸屏/触摸板无法工作的问题...
  20. c++读取倍福PLC中轴状态

热门文章

  1. win10配置jdk11
  2. 学习笔记——meta analysis
  3. html发布机制tacat,序列分析一般程序中的一个实例
  4. 自动控制原理--卢京潮 2009(免费)课本完整
  5. 关于利用Windows权限屏蔽Win10易升
  6. python正则匹配中文
  7. Xmind 8 下载以及破解
  8. java打印字符串_Java 打印字符串
  9. abab的四字成语_abab式的四字词语
  10. C语言的文件读取------C语言