前言

这一小篇文章承接“Fundations of Machine learning 2nd”系列笔记的第二篇，本来想把PAC分两次写，后来发现第一次写的太多了，因此这篇文章的内容不是特别多~看这篇文章之前建议先读上半部分。

Generalities 一般性

这一节主要讨论了一些更为普遍的学习场景。

可确定性 VS 随机场景

大部分有监督学习场景下，DDD分布式定义在X×YX\times YX×Y上的，训练集样本独立同分布于DDD:
S=((x1,y1),..,(xm,ym))S=((x_1,y1),..,(x_m,y_m))S=((x1,y1),..,(xm,ym))
我们要学习的就是找到一个具有最小的generalization error的映射h∈Hh\in Hh∈H：
R(h)=P(x,y)∼D[h(x)≠y]=E(x,y)∼D[1h(x)≠y]R(h) = \mathop{P}\limits_{(x,y)\sim D}[h(x)\neq y] = \mathop{E}\limits_{(x,y)\sim D}[1_{h(x)\neq y}]R(h)=(x,y)∼DP[h(x)=y]=(x,y)∼DE[1h(x)=y]
这种称为随机场景，标签的输出是一个关于输入的概率函数，输入样本的标签并不唯一。例如：如果根据身高体重的值来预测这个人是男是女，这个样本的标签就可以不唯一，有可能是男的，也有可能是女的。
把PAC-learning框架扩展到这一设定下，就称为“agnostic PAC-learning”

定义1 Agnostic PAC-learning

令HHH是一个映射集，AAA是agnostic PAC-learning算法的条件是：如果存在一个多项式函数poly(⋅,⋅,⋅,⋅)poly(·,·,·,·)poly(⋅,⋅,⋅,⋅),使得对于任意的ϵ>0,δ>0\epsilon > 0, \delta > 0ϵ>0,δ>0，对于X×YX\times YX×Y上的所有分布DDD,当样本量m≥poly(1/ϵ,1/δ,n,size(c))m\geq poly(1/\epsilon,1/\delta,n,size(c))m≥poly(1/ϵ,1/δ,n,size(c))时，下式都成立：
PS∼Dm[R(hS)−min⁡h∈HR(h)≤ϵ]≥1−δ\mathop{P}\limits_{S\sim D^m}[R(h_S)-\min\limits_{h\in H}R(h)\leq\epsilon]\geq1-\deltaS∼DmP[R(hS)−h∈HminR(h)≤ϵ]≥1−δ
如果AAA可以在poly(1/ϵ,1/δ,n)poly(1/\epsilon,1/\delta,n)poly(1/ϵ,1/δ,n)的时间内运行的话，就是efficiently agnostic PAC-learning算法。

当一个样本的标签是唯一的，并且存在可计算的函数f:X→Yf:X\rightarrow Yf:X→Y来确定标签，这种情况被称为可确定的（deterministic）。这时只在输入空间考虑分布DDD就足够了。训练样本是从DDD采样的(x1,...,xm)(x_1,...,x_m)(x1,...,xm)，标签是通过f:yi=f(xi)f:y_i = f(x_i)f:yi=f(xi)获得的。

Bayes error and noise 贝叶斯误差和噪声

根据我们上面的定义，在确定的情况下，存在一个目标函数他的generalization error R(h)=0R(h)= 0R(h)=0，而对于随机场景，存在一个映射具有最小的非零误差。

定义2 Bayes error

给定一个在X×YX\times YX×Y上的分布DDD,贝叶斯误差R∗R^*R∗定义为可计算映射h:X→Yh:X\rightarrow Yh:X→Y可以实现的最小误差：
R∗=inf⁡h,measurableR(h)R^*=\inf\limits_{h, measurable}R(h)R∗=h,measurableinfR(h)
这样的映射成为“Bayes hypothesis”，贝叶斯映射，或者贝叶斯分类器。

显然，在可确定情况下R∗=0R^*=0R∗=0,随机情况下R∗≠0R^*\neq 0R∗=0
贝叶斯分类器也可以在条件概率下定义:
∀x∈X,hBayes(x)=arg max⁡y∈{0,1}P[y∣x]\forall x\in X,\quad h_{Bayes}(x)=\argmax\limits_{y\in\{0,1\}}P[y|x]∀x∈X,hBayes(x)=y∈{0,1}argmaxP[y∣x]

hBayes在x∈Xh_{Bayes}在x\in XhBayes在x∈X上的平均损失就是min⁡{P[0∣x],P[1∣x]}\min\{P[0|x],P[1|x]\}min{P[0∣x],P[1∣x]}，这也是最小可能损失。同时导出了noise的定义：

定义3 Noise

给定一个在X×YX\times YX×Y上的分布DDD，点x∈Xx\in Xx∈X的noise定义如下：
noise(x)=min⁡{P[1∣x],P[0∣x]}noise(x)=\min\{P[1|x],P[0|x]\}noise(x)=min{P[1∣x],P[0∣x]}
(一个贝叶斯分类器在点xxx上的误差）

E[noise(x)]E[noise(x)]E[noise(x)]即为平均噪声。
平均噪声即为贝叶斯误差：E[noise(x)]=R∗E[noise(x)]=R^*E[noise(x)]=R∗。他是学习任务的一个特征，用来表示困难程度。对于一个样本x∈Xx\in Xx∈X,他的noise(x)noise(x)noise(x)接近1/21/21/2时，就被认为是噪声点（noisy）,学习起来十分困难，自然也会影响预测准确度。

Foundations of Machine Learning 2nd——第二章 PAC学习框架后记相关推荐

Foundations of Machine Learning 2nd——第二章 PAC学习框架
Foundations of Machine Learning 2nd--第二章 PAC学习框架前言定义介绍 Generalization error Empirical error 定理1 PA ...
Foundations of Machine Learning 2nd——第一章机器学习预备知识
Foundations of Machine Learning 2nd--第一章机器学习预备知识前言 1.1 什么是机器学习(Machine learning) 1.2 什么样的问题可以用机器学习 ...
Foundations of Machine Learning 2nd——第三章（一）拉德马赫复杂度
Foundations of Machine Learning 2nd--第三章(一)拉德马赫复杂度和VC维度回顾第二章拉德马赫复杂度定义1 经验拉德马赫复杂度(Empirical Radema ...
Foundations of Machine Learning 2nd——第三章（二）growth fuction和 VC-Dimension
Foundations of Machine Learning 2nd--第三章(二)growth fuction和 VC-Dimension 前言 Growth function 引理1 Massa ...
Foundations of Machine Learning 2nd——第五章SVMs（一）
Foundations of Machine Learning 2nd--第五章(一) 本章内容线性分类可分情况定义5.1 Geometric margin(几何边距) 优化目标支持向量 Su ...
Foundations of Machine Learning 2nd——第四章Model Selection（二）
Foundations of Machine Learning 2nd--第四章Model Selection(二) 交叉验证 Cross Validation(CV) 交叉验证的步骤交叉验证有效性 ...
图机器学习（Graph Machine Learning）- 第二章图机器学习简介 Graph Machine Learning
第二章图机器学习简介 Graph Machine Learning 文章目录第二章图机器学习简介 Graph Machine Learning 前言 1. 环境要求Technical requi ...
Hand on Machine Learning第三章课后作业(1)：垃圾邮件分类
import os import email import email.policy 1. 读取邮件数据 SPAM_PATH = os.path.join("E:\\3.Study\\机器学 ...
Foundations of Machine Learning: Rademacher complexity and VC-Dimension(2)
Foundations of Machine Learning: Rademacher complexity and VC-Dimension(2) Foundations of Machine Le ...

Foundations of Machine Learning 2nd——第二章 PAC学习框架后记

Foundations of Machine Learning 2nd——第二章 PAC学习框架后记

前言

Generalities 一般性

可确定性 VS 随机场景

定义1 Agnostic PAC-learning

Bayes error and noise 贝叶斯误差和噪声

定义2 Bayes error

定义3 Noise

Foundations of Machine Learning 2nd——第二章 PAC学习框架后记相关推荐

最新文章

热门文章

Foundations of Machine Learning 2nd——第二章 PAC学习框架 后记

Foundations of Machine Learning 2nd——第二章 PAC学习框架后记

前言

Generalities 一般性

可确定性 VS 随机场景

定义1 Agnostic PAC-learning

Bayes error and noise 贝叶斯误差和噪声

定义2 Bayes error

定义3 Noise

Foundations of Machine Learning 2nd——第二章 PAC学习框架 后记相关推荐

最新文章

热门文章

Foundations of Machine Learning 2nd——第二章 PAC学习框架后记

Foundations of Machine Learning 2nd——第二章 PAC学习框架后记相关推荐