page 11th
R(d)=1N∑n=1NX(d(xn)≠jn)−−−−−−−(1.8)R(d)=\frac{1}{N}\sum_{n=1}^NX(d(x_n)≠j_n)-------(1.8)R(d)=N1​n=1∑N​X(d(xn​)̸​=jn​)−−−−−−−(1.8)
where X(⋅X(·X(⋅)is index function.

For large datasets.
L1:L_1:L1​:used to train classifier
L2:L_2:L2​:used to estimate
Rts(d)=1N2∑xn,jn∈L2X(d(xn)≠jn)−−−−−(1.9)R^{ts}(d)=\frac{1}{N_2}\sum_{x_n,j_n\in L_2}X(d(x_n)≠j_n)-----(1.9)Rts(d)=N2​1​xn​,jn​∈L2​∑​X(d(xn​)̸​=jn​)−−−−−(1.9)

For smaller datasets.v-fold cross-validation is used.
L-Lv:L-L_v:L-Lv​:used to train classifier d(v)(x)d^{(v)}(x)d(v)(x)

R∗(d(v))=Rts(d(v))=1Nv∑(xn,jn)∈LvX(d(v)(xn)≠jn)R^{*}{(d^{(v)})}=R^{ts}{(d^{(v)})}=\frac{1}{N_v} \sum_{(x_n,j_n)\in L_v}X(d^{(v)}(x_n)≠j_n)R∗(d(v))=Rts(d(v))=Nv​1​(xn​,jn​)∈Lv​∑​X(d(v)(xn​)̸​=jn​)
(using all the data,except the vth fold data)


R∗=Rcv(d)=1V∑V=1VRts(d(v))R^{*}=R^{cv}(d)=\frac{1}{V}\sum_{V=1}^VR^{ts}(d^{(v)})R∗=Rcv(d)=V1​V=1∑V​Rts(d(v))
(using all the data)
这个公式的具体例子在85页

2IntroductiontoTreeClassification2\ Introduction\ to\ Tree\ Classification2 Introduction to Tree Classification

p(j,t)=π(j)⋅Nj(t)Nj−−−−−−−−−(2.2)p(j,t)=\pi(j)·\frac{N_j(t)}{N_j}---------(2.2)p(j,t)=π(j)⋅Nj​Nj​(t)​−−−−−−−−−(2.2)
p(t)=∑jp(j,t)−−−−−−−−−−−(2.3)p(t)=\sum_{j}p(j,t)----------- (2.3)p(t)=j∑​p(j,t)−−−−−−−−−−−(2.3)
p(j∣t)=p(j,t)p(t)−−−−−−−−−−−(2.4)p(j|t)=\frac{p(j,t)}{p(t)}----------- (2.4)p(j∣t)=p(t)p(j,t)​−−−−−−−−−−−(2.4)
T~:currentsetofterminalnodes\tilde{T}:current\ set\ of\ terminal\ nodesT~:current set of terminal nodes

$\$
$\$
$\$
P35th$\$
R(T)=∑t∈T~R(t)R(T)=\sum_{t\in\tilde{T}}R(t)R(T)=t∈T~∑​R(t)

r(t)=min∑i≠jC(i∣j)⋅p(j∣t)r(t)=min\sum_{i≠j}C(i|j)·p(j|t)r(t)=mini̸​=j∑​C(i∣j)⋅p(j∣t)

R(T)=∑t∈T~r(t)⋅p(t)=∑t∈T~R(t)R(T)=\sum_{t\in \tilde{T}}r(t)·p(t)=\sum_{t\in{\tilde{T}}}R(t)R(T)=t∈T~∑​r(t)⋅p(t)=t∈T~∑​R(t)
(落在节点t的数据集·分错的概率)

p36th
propostion 2.14
R(t)≥R(tL)+R(tR)R(t)≥R(t_L)+R(t_R)R(t)≥R(tL​)+R(tR​)

3.3 MINIMAL COST-COMPLEXITY PRUNING

Rα(T)=R(T)+α∣T~∣R_{\alpha}(T)=R(T)+\alpha|\tilde{T}|Rα​(T)=R(T)+α∣T~∣

DEFINITION:
For each value of α\alphaα,find the T(α)≤TmaxT(\alpha)≤T_{max}T(α)≤Tmax​ which minimize Ra(T)R_a(T)Ra​(T)
Rα(T(α))=minT≤TmaxRα(T)R_{\alpha}(T(\alpha))=min_{T≤T_{max}}R_{\alpha}(T)Rα​(T(α))=minT≤Tmax​​Rα​(T)

complexity:∣T~∣|\tilde{T}|∣T~∣
TtT_tTt​:any branch of T1T_1T1​
R(Tt)=∑t′∈Tt~R(t′)R(T_t)=\sum_{t'\in\tilde{T_t}}R(t')R(Tt​)=t′∈Tt​~​∑​R(t′)
R(t)>R(Tt))R(t)>R(T_t))R(t)>R(Tt​))

page 73th
Q∗(i∣j)=P(d(X)=i∣Y=j)Q^{*}(i|j)=P(d(X)=i|Y=j)Q∗(i∣j)=P(d(X)=i∣Y=j)
Q∗(i∣j)Q^{*}(i|j)Q∗(i∣j) is the probability that a case in j is classified into i by d(错分概率),Define:

R∗(j)=∑iC(i∣j)⋅Q∗(i∣j)R^{*}(j)=\sum_iC(i|j)·Q^{*}(i|j)R∗(j)=∑i​C(i∣j)⋅Q∗(i∣j)
so that R∗(j)R^{*}(j)R∗(j)is the expected cost of misclassification for class j items.
Deaine:
R∗(d)=∑jR∗(j)⋅π(j)R^{*}(d)=\sum_jR^{*}(j)·\pi(j)R∗(d)=∑j​R∗(j)⋅π(j)
as the expected misclassification cost for the classifier d.


3.4.1 Test Sample Estimates
page 74th
Basic estimate:
Qts(i∣j)=Nij(2)Nj(2)Q^{ts}({i|j})=\frac{N_{ij}^{(2)}}{N_j^{(2)}}Qts(i∣j)=Nj(2)​Nij(2)​​
That is,Q∗(i∣j)Q^{*}({i|j})Q∗(i∣j) is estimated as the proportion of the test sample class j cases that the tree T classifies as i(set Qts(i∣j)Q^{ts}(i|j)Qts(i∣j)=0,if Nj(2)=0N_j^{(2)}=0Nj(2)​=0)

Rts(j)=∑iC(i∣j)⋅Qts(i∣j)R^{ts}(j)=\sum_iC(i|j)·Q^{ts}(i|j)Rts(j)=∑i​C(i∣j)⋅Qts(i∣j)(j表示第j类)
(第j类数据错分类为其他类别的概率,C(i∣j)C(i|j)C(i∣j)的含义在书上没有提到,应该是第j类数据的数量)

Rts(T)=∑jRts(j)⋅π(j)(3.13)R^{ts}(T)=\sum_jR^{ts}(j)·\pi(j)(3.13)Rts(T)=∑j​Rts(j)⋅π(j)(3.13)
If the priors are data estimated,use L2L_2L2​
to estimate them as
π(j)=Nj(2)N(2)\pi(j)=\frac{N_j^{(2)}}{N^{(2)}}π(j)=N(2)Nj(2)​​(第二折数据中,第j类数据的比例),in this case,(3.13)simplifies to
Rts(T)=1N(2)∑i,jC(i∣j)Nij(2)R^{ts}(T)=\frac{1}{N^{(2)}}\sum_{i,j}C(i|j)N_{ij}^{(2)}Rts(T)=N(2)1​∑i,j​C(i∣j)Nij(2)​


P75th:
The test sample estimates can be used to select the right sized tree Tk0T_{k0}Tk0​by the rule
Rts(Tk0)=minkRts(Tk)R^{ts}(T_{k0})=min_kR^{ts}(T_k)Rts(Tk0​)=mink​Rts(Tk​)

P75th:
3.4.2 Cross-Validation Estimates
3.4.3 Standard Errors and the 1 SE Rule

P78th:
SE(Rts(T))=Rts(T)(1−Rts(T))N2SE(R^{ts}(T))=\sqrt{\frac{R^{ts}(T)(1-R^{ts}(T))}{N_2}}SE(Rts(T))=N2​Rts(T)(1−Rts(T))​​

R^(Tk)=Rts(Tk)\hat{R}(T_k)=R^{ts}(T_k)R^(Tk​)=Rts(Tk​) or Rcv(Tk)R^{cv}(T_k)Rcv(Tk​)

page 152th
R(T)=∑t∈T~r(t)⋅p(t)R(T)=\sum_{t\in \tilde{T}}r(t)·p(t)R(T)=t∈T~∑​r(t)⋅p(t)
(这个节点对于到达的数据错分类的概率·这个意思是数据进入这个节点的概率)

Rcv(d)=1Nv∑(xn,yn)∈Lv(yn−d(v)(xn))2(8.7)R^{cv}(d)=\frac{1}{N_v}\sum_{(x_n,y_n)\in L_v} (y_n-d^{(v)}(x_n))^2(8.7)Rcv(d)=Nv​1​(xn​,yn​)∈Lv​∑​(yn​−d(v)(xn​))2(8.7)
这个式子在书上有两个Σ,应该是写错了,上面已经修正。

P225th
REts(d)=Rts(d)Rts(y‾)RE^{ts}(d)=\frac{R^{ts}(d)}{R^{ts}(\overline{y})}REts(d)=Rts(y​)Rts(d)​

REcv(d)=Rcv(d)R(y‾)RE^{cv}(d)=\frac{R^{cv}(d)}{R(\overline{y})}REcv(d)=R(y​)Rcv(d)​
ρ2=1−RE(d)\rho^2=1-RE(d)ρ2=1−RE(d)
(这个好像就是传说中的R2R^2R2值,注意验证下)

E(Y1−d(X1))4≈1N2∑n=1N2(Yn−d(Xn))4E(Y_1-d(X_1))^4\approx\frac{1}{N_2}\sum_{n=1}^{N_2}(Y_n-d(X_n))^4E(Y1​−d(X1​))4≈N2​1​n=1∑N2​​(Yn​−d(Xn​))4

E(Y1−d(X1))2=1N2∑n=1N2(Yn−d(Xn))2=(Rts)2E(Y_1-d(X_1))^2=\frac{1}{N_2}\sum_{n=1}^{N_2}(Y_n-d(X_n))^2=(R^{ts})^2E(Y1​−d(X1​))2=N2​1​n=1∑N2​​(Yn​−d(Xn​))2=(Rts)2

The following are deduced by myself for page 226th
D(Rts)\sqrt{D(R_{ts})}D(Rts​)​

=D(1N∑n=1N(Yn−d(Xn))2)=\sqrt{D(\frac{1}{N}\sum_{n=1}^{N}(Y_n-d(X_n))^2)}=D(N1​n=1∑N​(Yn​−d(Xn​))2)​
=1ND[∑n=1N(Yn−d(Xn))2]=\frac{1}{N}\sqrt{D[\sum_{n=1}^{N}(Y_n-d(X_n))^2]}=N1​D[n=1∑N​(Yn​−d(Xn​))2]​
=1N∑n=1ND(Yn−d(Xn))2=\frac{1}{N}\sqrt{\sum_{n=1}^{N}D(Y_n-d(X_n))^2}=N1​n=1∑N​D(Yn​−d(Xn​))2​
=E(Rts2)−E2(Rts)=\sqrt{E(R_{ts}^2)-E^{2}(R_{ts})}=E(Rts2​)−E2(Rts​)​

=1N∑n=1N{E(Yn−d(Xn))4−E2[(Yn−d(Xn))2]}=\frac{1}{N}\sqrt{\sum_{n=1}^N\{E(Y_n-d(X_n))^4-E^2[(Y_n-d(X_n))^2] \}}=N1​n=1∑N​{E(Yn​−d(Xn​))4−E2[(Yn​−d(Xn​))2]}​
=1NE(Yn−d(Xn))4−E2[(Yn−d(Xn))2]=\frac{1}{\sqrt{N}}\sqrt{E(Y_n-d(X_n))^4-E^2[(Y_n-d(X_n))^2]}=N​1​E(Yn​−d(Xn​))4−E2[(Yn​−d(Xn​))2]​
=1N[1N∑n=1(Yn−d(Xn))4]−E2[(Yn−d(Xn))2]=\frac{1}{\sqrt{N}}\sqrt{[\frac{1}{N}\sum_{n=1}(Y_n-d(X_n))^4]-E^2[(Y_n-d(X_n))^2]}=N​1​[N1​n=1∑​(Yn​−d(Xn​))4]−E2[(Yn​−d(Xn​))2]​
=1N[1N∑n=1(Yn−d(Xn))4]−(Rts)2=\frac{1}{\sqrt{N}}\sqrt{[\frac{1}{N}\sum_{n=1}(Y_n-d(X_n))^4]-(R_{ts})^2}=N​1​[N1​n=1∑​(Yn​−d(Xn​))4]−(Rts​)2​

where
E[∑n=1N(Yn−d(Xn))2]E[\sum_{n=1}^N(Y_n-d(X_n))^2]E[n=1∑N​(Yn​−d(Xn​))2]

=1N∑n=1N(Yn−d(Xn))2=\frac{1}{N}\sum_{n=1}^N(Y_n-d(X_n))^2=N1​n=1∑N​(Yn​−d(Xn​))2

=Rts=R^{ts}=Rts

Page 230th
y‾(t)=1N(t)∑xn∈t(yn)\overline{y}(t)=\frac{1}{N(t)}\sum_{x_n\in t}(y_n)y​(t)=N(t)1​xn​∈t∑​(yn​)
R(T)=1N∑t∈T~∑xn∈t(yn−y‾(t))2R(T)=\frac{1}{N}\sum_{t\in \tilde{T}}\sum_{x_n\in t}(y_n-\overline{y}(t))^2R(T)=N1​t∈T~∑​xn​∈t∑​(yn​−y​(t))2

R(t)=1N∑xn∈t(yn−y‾(t))2R(t)=\frac{1}{N}\sum_{x_n\in t}(y_n-\overline{y}(t))^2R(t)=N1​xn​∈t∑​(yn​−y​(t))2

R(T)=∑t∈T~R(t)R(T)=\sum_{t\in \tilde{T}}R(t)R(T)=t∈T~∑​R(t)

Page 232th
s2(t)=1N(t)∑xn∈t(yn−y‾(t))2s^2(t)=\frac{1}{N(t)}\sum_{x_n\in t}(y_n-\overline{y}(t))^2s2(t)=N(t)1​xn​∈t∑​(yn​−y​(t))2

R(T)=∑t∈T^s2(t)⋅p(t)R(T)=\sum_{t\in \hat{T}}s^2(t)·p(t)R(T)=t∈T^∑​s2(t)⋅p(t)
we can decuce from above that:
p(t)=N(t)Np(t)=\frac{N(t)}{N}p(t)=NN(t)​

Page 233th
Rα(T)=R(T)+α∣T~∣R_{\alpha}(T)=R(T)+\alpha|\tilde{T}|Rα​(T)=R(T)+α∣T~∣
Now minimal error-complexity pruning is done exactly as minimal cost-complexity pruning in classification.
The result is a a decreasing sequence of tree:
T1>T2>⋅⋅⋅>t1T_1>T_2>···>{t_1}T1​>T2​>⋅⋅⋅>t1​
with T1≤TmaxT_1≤T_{max}T1​≤Tmax​ and a corresponding increasing sequence of α\alphaα values:
0≤α1≤α2≤⋅⋅⋅0≤\alpha_1≤\alpha_2≤···0≤α1​≤α2​≤⋅⋅⋅
such for α∈[αk,αk+1)\alpha\in[\alpha_{k},\alpha_{k+1})α∈[αk​,αk+1​),TkT_kTk​ is the smallest subtree of TmaxT_maxTm​ax minimizing Rα(T)R_{\alpha}(T)Rα​(T)

page 234th
Rts(Tk)=1N2∑(xn,yn)∈L2(yn−dk(xn))2R^{ts}(T_k)=\frac{1}{N_2}\sum_{(x_n,y_n)\in L_2}(y_n-d_k(x_n))^2Rts(Tk​)=N2​1​(xn​,yn​)∈L2​∑​(yn​−dk​(xn​))2
Rcv(Tk)=1N∑v=1V∑(xn,yn)∈Lv(yn−dk(v)(xn))2R^{cv}(T_k)=\frac{1}{N}\sum_{v=1}^V\sum_{(x_n,y_n)\in L_v}(y_n-d_k^{(v)}(x_n))^2Rcv(Tk​)=N1​v=1∑V​(xn​,yn​)∈Lv​∑​(yn​−dk(v)​(xn​))2
αk′=αkαk+1\alpha_k^{'}=\sqrt{\alpha_k\alpha_{k+1}}αk′​=αk​αk+1​​
REcv(Tk)=Rcv(Tk)R(y‾)RE^{cv}(T_k)=\frac{R^{cv}(T_k)}{R(\overline{y})}REcv(Tk​)=R(y​)Rcv(Tk​)​

page 237th
TkT_kTk​ selected was the smallest tree such that
Rcv(Tk)≤Rcv(Tk0)+SER^{cv}(T_k)≤R^{cv}(T_{k0})+SERcv(Tk​)≤Rcv(Tk0​)+SE
where
RcvTk0=minkRcv(Tk)R^{cv}{T_{k0}}=min_kR^{cv}(T_k)RcvTk0​=mink​Rcv(Tk​)

page 303th
11.4 Test Samples
An obvious way to cure overoptimistic tendency of the empirical estimate R(dk)R(d_k)R(dk​) ofR∗(dk)R^{*}(d_k)R∗(dk​)is to base the estimate of R∗(dk)R^{*}(d_k)R∗(dk​) on data not used to construct dkd_kdk​
There are two ways to evaluate the SE:
model 1 version(page 303th lower part)
model 2 version(page 304th before"To evaluate the preceding formulas efficiently")

page 306th
11.5 cross-validation
The use of test samples to estimate the risk of tree structured procedures requires that one set of sample data be used to construct the procedure and a disjoint set be used to evaluate it.
When the combined set of available data contains a thousand or more cases,this is a reasonable approach. But if only a few hundred cases or less in total are available,it can be inefficient in its use of the available data;cross-validation is then preferable.

Page 309th
11.6 Final Tree Selection
Test samples or cross-validation can be used to select a particular procedure dk=dTkd_k=d_{T_k}dk​=dTk​​ from among the candidate dkd_kdk​.

In summary:
We can use test samples or cross-validation to select the best tree.

selectthebesttree={testsampleswithminimumMSEtestsampleswith1SErulecross−validationselect\ the\ best \ tree=\left\{ \begin{aligned} test\ samples\ with minimum\ MSE\\ test\ samples\ with\ 1SE\ rule\\ cross-validation \end{aligned} \right.select the best tree=⎩⎪⎨⎪⎧​test samples withminimum MSEtest samples with 1SE rulecross−validation​
where
testsampleswith1SErule={model1versionmodel2versiontest\ samples\ with\ 1SE\ rule=\left\{ \begin{aligned} model\ 1\ version\\ model\ 2\ version \end{aligned} \right.test samples with 1SE rule={model 1 versionmodel 2 version​

Selecting the best pruned-tree with the cross-validation has some defects,it is listed as follows:
https://blog.csdn.net/appleyuchi/article/details/84957220

notes from《classification and regression trees》相关推荐

  1. Paper之CV:《One Millisecond Face Alignment with an Ensemble of Regression Trees》的翻译与解读

    Paper之CV:<One Millisecond Face Alignment with an Ensemble of Regression Trees>的翻译与解读 目录 One Mi ...

  2. 陈天奇《Introduction to Boosted Trees》PPT 缩略版笔记

    深入研究了一下陈天奇Boosted Tree的PPT,做了点简单的笔记,可以说是PPT的缩略版: 框架有了,截了些重要的图和公式. 虽然简略,但是足以学习大牛思考问题的方式. Review of ke ...

  3. 《Machine Learning in action》- (笔记)之Logistic regression(2_实战篇)

    <Machine Learning in action>,机器学习实战(笔记)之Logistic regression 使用工具 - Python3.7 - pycharm - anaco ...

  4. 《深度学习之TensorFlow》reading notes(3)—— MNIST手写数字识别之二

    文章目录 模型保存 模型读取 测试模型 搭建测试模型 使用模型 模型可视化 本文是在上一篇文章 <深度学习之TensorFlow>reading notes(2)-- MNIST手写数字识 ...

  5. 综述:基于深度学习的文本分类 --《Deep Learning Based Text Classification: A Comprehensive Review》总结(一)

    文章目录 综述:基于深度学习的文本分类 <Deep Learning Based Text Classification: A Comprehensive Review>论文总结(一) 总 ...

  6. 《Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network》论文学习笔记

    <Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network>–<基于 ...

  7. Paper:可解释性之SHAP《Fast TreeSHAP: Accelerating SHAP Value Computation for Trees》翻译与解读

    Paper:可解释性之SHAP<Fast TreeSHAP: Accelerating SHAP Value Computation for Trees>翻译与解读 目录 <Fast ...

  8. 2022年 Q1书单:17本书《可口可乐传》《随机漫步的傻瓜》等 | δ星 丨读书笔记与书单 notes...

    出品丨punkboy的理想星球 作者丨punkboy 公众号:punkboy的理想星球 总 第 180 篇文章 今年前三个月的读书计划顺利完成了. 1. <可口可乐传> 在这一季度里读过品 ...

  9. 多元回归树分析Multivariate Regression Trees,MRT

    作者:陈亮 单位:中国科学院微生物研究所 多元回归树分析 多元回归树(Multivariate Regression Trees,MRT)是单元回归树的拓展,是一种对一系列连续型变量递归划分成多个类群 ...

最新文章

  1. Java的内存分配策略有哪些_Java的内存分配策略
  2. 使用visualVM launcher的一些注意事项
  3. 【渝粤教育】21秋期末考试市场调查与预测10243k2
  4. AI工程师职业规划和学习路线完整版
  5. MongoDB Insert(插入)
  6. python 中判断是不是通过break;跳出循环(一分钟读懂)
  7. 一些SharePoint 2007开发的在线课程
  8. VM虚拟机安装win7系统(亲测可用!!!)
  9. 用逻辑覆盖法设计测试用例 int fun(int x,int y, int z) { if((y>1)(z==0)) {x=x/y} if((y==2)||(x==1)) {x=x+1}
  10. 【Cesium】点击billboard弹出自定义气泡框
  11. NOIP模拟题——复制粘贴2
  12. Android手机安全软件的恶意程序检测靠谱吗--LBE安全大师、腾讯手机管家、360手机卫士恶意软件检测方法研究...
  13. APP上传照片后台实现方法
  14. centos shell基础 alias 变量单引号 双引号 history 错误重定向 21 jobs 环境变量 .bash_history source配置文件 nohup ...
  15. Swift ——String 与 Array
  16. 分享基于JavaEE移动平台的企业级房地产ERP采购系统全程开发实录
  17. 无法登陆github的解决方法
  18. udp洪水攻击java_Linux网络编程之UDP洪水攻击
  19. jstl处理栏目与子栏目_jeecms子栏目或者文章页导航父栏目选中解决方法
  20. bc计算A股上市新股依次涨停股价

热门文章

  1. 工作中常用到的一些方法集合
  2. Swift中文教程(三)--流程控制
  3. 异步编程Promise、Generator和Async
  4. pfamscan 的使用_Hmmer安装与使用
  5. css怎么把横向菜单变纵向_CSS 布局模式 + 居中布局
  6. CSS position 属性
  7. Angular使用中的编码tips(持续更)
  8. 在P5QL上激活Windows Vista
  9. 微软私有云分享(R2)16PowerShell查看虚拟机信息
  10. Mysql,ERROR 1044 (42000): Access denied for user ''@'localhost' to database 'mysql'