Note:
CVP(Critical Value Pruning) is also called
Chi-Square Pruning(test) in many materials.

The following is a contingency table[1]:

H0:Xijn=NiNjn2H_0:\frac{X_{ij}}{n}=\frac{N_iN_j}{n^2}H0​:nXij​​=n2Ni​Nj​​
H1:Xijn≠NiNjn2H_1:\frac{X_{ij}}{n}≠\frac{N_iN_j}{n^2}H1​:nXij​​​=n2Ni​Nj​​
Nij=XijN_{ij}=X_{ij}Nij​=Xij​
when
∑i=1i=r∑j=1j=s(Nij−Ni⋅Njn)2Ni⋅Njn<χ[(r−1)(s−1)],α2=criticalvalue\sum_{i=1}^{i=r}\sum_{j=1}^{j=s}\frac{(N_{ij}-\frac{N_i·N_j}{n})^2} {\frac{N_i·N_j}{n}}<\chi_{[(r-1)(s-1)],\alpha}^2=critical \ valuei=1∑i=r​j=1∑j=s​nNi​⋅Nj​​(Nij​−nNi​⋅Nj​​)2​<χ[(r−1)(s−1)],α2​=critical value
then H0H_0H0​ is accepted and the decision tree is pruned.
among which α\alphaα can be set as 0.05,etc.
nnn is the total length(means counts,quantity) of your datasets.
The relationships between contingency table and Decision Tree are listed in the following table:

Split node with Attribute fff class 111 class 222 …\dots… class sss
branch 111 nL1n_{L1}nL1​ nL2n_{L2}nL2​ …\dots… nLn_LnL​
branch 222 nR1n_{R1}nR1​ nR2n_{R2}nR2​ …\dots… nRn_RnR​
⋮\vdots⋮ ⋮\vdots⋮ ⋮\vdots⋮ ⋱\ddots⋱ ⋮\vdots⋮
branch rrr n1n_1n1​ n2n_2n2​ …\dots… nnn

Now let’s use the above table to learn the following lecture PPT[2].

in the above picture,some parameters are explained in the table in front of it.

Let’s go on…

In the above ppt,note that CVP can both be used in pre-pruning and post-pruning stages.
According to the growth stage ,we know “less than critical value” happend before current sub-tree is pruned.
So,we can infer from the above lecture PDF that the post-pruning will have the same experience.

How to understand the above pruning criterion?

------------------------------------------
In above table,
different branches
=different value level of current decision node of decision tree
(also called split node,one split node owns one Attribute of datasets)

when H0H_0H0​ is accepted,then
XijNi⋅≈N⋅jn\frac{X_{ij}}{N_{i·}}≈\frac{N_{·j}}{n}Ni⋅​Xij​​≈nN⋅j​​
which means:

the probability of “items belongs to class j” in each ith(i∈[1,r])i_{th}(i∈[1,r])ith​(i∈[1,r]) branch
=the probability of “items belongs to class j” in all datasets
=>Merging(prune) these branches into one leaf will Not make the probability of “items belongs to class j”vary too much,which means that accuracy will not vary too much after being pruned

In conclusion,when Chi-Square Statistics do Not reach the Critical Value,branches(different value levels of split Attribute of Decision Tree)
will not contribute too much for increasing the accuracy,then these branches can be pruned.

The above conclusion can be used directly when we implement our CVP(Critical Value Pruning) algorithm with python.

We can also learn from above analysis that CVP is targeted at simplifying your decision tree while Not losing accuracy too much.

Reference:
[1]http://www.maths.manchester.ac.uk/~saralees/pslect8.pdf
[2]https://www.docin.com/p1-2336928230.html

CVP(Critical Value Pruning)illustration with clear principle in details相关推荐

  1. History of pruning algorithm development and python implementation(finished)

    All the python-implementation for 7 post-pruning Algorithms are here. Table of Decision Trees: name ...

  2. 百面机器学习03-经典算法

    01 支持向量机 支持向量机 (Support Vector Machine, SVM)是众多监督学习万法中十分出色的一种,几乎所有讲述经典机器学习万洼的教材都会介绍 . 关于 SVM,流传着一个关于 ...

  3. 机器学习-预剪枝和后剪枝

    一棵完全生长的决策树会面临一个很严重的问题,即过拟合.当模型过拟合进行预测时,在测试集上的效果将会很差.因此我们需要对决策树进行剪枝, 剪掉一些枝叶,提升模型的泛化能力. 决策树的剪枝通常有两种方法, ...

  4. 决策树及对优惠券使用进行预测案例 2021-10-01

    人工智能基础总目录 决策树 一.介绍 决策树分类的两个步骤 二.决策树的主要优点 缺点 三. 信息增益 1 Entropy 熵 3.3 Loss 函数 算法流程 算法优缺点 四. 如何对决策树进行剪枝 ...

  5. python 决策模型_python小白之路:第十七章 决策树模型

    决策树 1 什么是决策树 生活中我们经常会遇到需要进行决策的事情,而在得到最终决定的过程中,我们通过一些策略或方法来进行判断,并不断思考,最终得到一个判断或结论. 决策树是一种树形结构,可以把我们进行 ...

  6. 决策树剪枝:预剪枝、后剪枝

    一棵完全生长的决策树会面临一个很严重的问题,即过拟合.当模型过拟合进行预测时,在测试集上的效果将会很差.因此我们需要对决策树进行剪枝, 剪掉一些枝叶,提升模型的泛化能力. 决策树的剪枝通常有两种方法, ...

  7. 【机器学习】决策树分类(简介、原理、代码)

    决策树分类 一.决策树分类简介: 决策树方法是利用信息论中的信息增益寻找数据库中具有最大信息量的属性字段,建立决策树的一个结点,再根据该属性字段的不同取值建立树的分支,再在每个分支子集中重复建立树的下 ...

  8. [渝粤教育] 南开大学 思辨式英文写作 参考 资料

    教育 -思辨式英文写作-章节资料考试资料-南开大学[] 随堂小测:What are the characteristics of critical essays? 1.[多选题]Which of th ...

  9. 机器学习课程期末综合测评

    机器学习课程期末综合测评 文章目录 机器学习课程期末综合测评 问题一: 机器学习的基本流程 问题二: 决策树 问题三: 模型评估方法 问题四: 神经网络 参考书籍及文献 问题一: 机器学习的基本流程 ...

最新文章

  1. 算法基础知识科普:8大搜索算法之AVL树(中)
  2. “计算机之子”winter:我的前端学习路线与方法
  3. Android中RatingBar的自定义效果
  4. VMware Resource Pool Recommendations
  5. 无法设置共享文件夹的解决方法收集
  6. npm run mock | npm run dev只能启动一个
  7. js读取服务器txt文件,ZK中使用JS读取客户端txt文件内容问题
  8. Java EE 8的前5个新功能
  9. 取地址符和解引用符的区别_(&)和解引用(*)运算符的地址以及C中的指针...
  10. 用matlab简单电路模型,基于MATLAB的电路模型仿真应用
  11. 统计学习方法 第八章总结
  12. 预测模型(数学建模)
  13. 网络规划设计师教程知识点精讲之计算机网络分类
  14. 项目实施方案指导性文件
  15. photoshop之CameraRaw
  16. 5个问题教你如何更好解决问题
  17. 第十二周练兵区——编程题——不计入总分
  18. 6827台!中移动数据中心交换机采购结果公示 华为、中兴遭滑铁卢
  19. MATLAB利用散点进行函数曲线拟合
  20. 便携式电子血压计【Part I】

热门文章

  1. 【转】Linux写时拷贝技术(copy-on-write)
  2. 关于Windows8.1更新后Sql Server服务消失的处理办法
  3. for惠普2013实习生
  4. 相对熵(relative entropy或 Kullback-Leibler divergence,KL距离)的java实现(三)
  5. sandy引擎学习笔记: 创建一个立方体
  6. MOCTF-Web-机器蛇
  7. Mac OS 怎么设置host
  8. Notes on language modeling-COMS W4705: Natural Language Processing-学习笔记
  9. 解决sockjs.js?9be2:1609 GET http://192.168.1.1:8080/sockjs-node/info?t=1634257459…… 中的报错
  10. 用原生JavaScript实现无缝轮播