Here are formulas provided in
“Improved Use of Continuous Attributes in C4.5”
1996,Journal of Artificial Intelligence Research 4 (1996)77-90

Info(D)=−∑j=1Cp(D,j)⋅log2(p(D,j))Info(D)=-\sum_{j=1}^{C}p(D,j)·log_2(p(D,j))Info(D)=−∑j=1C​p(D,j)⋅log2​(p(D,j))

Gain(D,T)=Info(D)−∑i=1k∣Di∣∣D∣⋅Info(Di)Gain(D,T)=Info(D)-\sum_{i=1}^{k}\frac{|D_i|}{|D|}·Info(D_i)Gain(D,T)=Info(D)−∑i=1k​∣D∣∣Di​∣​⋅Info(Di​)

Split(D,T)=−∑i=1k∣Di∣∣D∣⋅log2(∣Di∣∣D∣)Split(D,T)=-\sum_{i=1}^{k}\frac{|D_i|}{|D|}·log_2(\frac{|D_i|}{|D|})Split(D,T)=−∑i=1k​∣D∣∣Di​∣​⋅log2​(∣D∣∣Di​∣​)

The followding are my understandings:
------------------first change-----------------------------
then,
Gain_Ratio=Gain(D,T)Split(D,T)Gain\_Ratio=\frac{Gain(D,T)}{Split(D,T)}Gain_Ratio=Split(D,T)Gain(D,T)​

Then ,my understanding of the "first change"is
Gain_Ratio_adjusted=Gain(D,T)−log2(N−1)DSplit(D,T)Gain\_Ratio\_adjusted=\frac{Gain(D,T)-\frac{log_2(N-1)}{D}}{Split(D,T)}Gain_Ratio_adjusted=Split(D,T)Gain(D,T)−Dlog2​(N−1)​​
is this right?
Many Thanks~
--------------------second change---------------------------
Relevant part of “second change” in this article is:
"This seems to be an unnecessary complication,so the threshold t is chosen instead to maximize gain.Once the threshold is chosen,however,the final selection of the attribute to be used for the test is still made on the basis of the gain ratio criterion using the adjusted gain
"
My understanding is:


1st step:
choose threshold t according to Gain(D,T)maxGain(D,T)_{max}Gain(D,T)max​,
Not Gain_RatiomaxGain\_Ratio_{max}Gain_Ratiomax​
Not (Gain(D,T)−log2(N−1)/∣D∣)max(Gain(D,T)-log_2(N-1)/|D|)_{max}(Gain(D,T)−log2​(N−1)/∣D∣)max​
2nd step:
the criterion to choose best feature is according to:
Gain_Ratio(discretefeature)=Gain(D,T)Split(D,T)Gain\_Ratio(discrete\ feature)=\frac{Gain(D,T)}{Split(D,T)}Gain_Ratio(discrete feature)=Split(D,T)Gain(D,T)​
Gain_Ratio_adjusted(continuousfeature)=Gain(D,T)−log2(N−1)DSplit(D,T)Gain\_Ratio\_adjusted(continuous\ feature)=\frac{Gain(D,T)-\frac{log_2(N-1)}{D}}{Split(D,T)}Gain_Ratio_adjusted(continuous feature)=Split(D,T)Gain(D,T)−Dlog2​(N−1)​​
Finally,just choose the feature whose Gain Ratio or Gain Ratio(adjusted) is the largest.


is this understanding right?
Many thanks~

some understanding of《Improved Use of Continuous Attributes in C4.5》相关推荐

  1. 《Improved Crowd Counting Method Based onScale-Adaptive Convolutional Neural Network》论文笔记

    <Improved Crowd Counting Method Based onScale-Adaptive Convolutional Neural Network>论文笔记 论文地址 ...

  2. 《Improved Techniques for Training GANs》-论文阅读笔记

    <Improved Techniques for Training GANs>-论文阅读笔记 文章目录 <Improved Techniques for Training GANs& ...

  3. 【论文学习笔记】《An Overview of Voice Conversion and Its Challenges》

    <An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning&g ...

  4. nlp论文——《Efficient Estimation of Word Representations in Vector Space》(向量空间中词表示的有效估计)

    目录 <Efficient Estimation of Word Representations in Vector Space> 第一课时:论文导读 (1)语言模型 (2)词向量简介-- ...

  5. (四十五:2021.08.05)《利用深度学习对ecg信号进行分割》

    <Deep Learning for ECG Segmentation><利用深度学习对ecg信号进行分割> 讲在前面 摘要 1. 介绍 2. 算法 2.1 预处理 2.2 神 ...

  6. 深度学习论文阅读图像分类篇(三):VGGNet《Very Deep Convolutional Networks for Large-Scale Image Recognition》

    深度学习论文阅读图像分类篇(三):VGGNet<Very Deep Convolutional Networks for Large-Scale Image Recognition> Ab ...

  7. 《用户至上:用户研究方法与实践》用户体验入门

    本节书摘来自华章出版社<用户至上:用户研究方法与实践>一书中的第1章,第1节,作者凯茜·巴克斯特(Kathy Baxter)[美] 凯瑟琳·卡里(Catherine Courage)凯莉· ...

  8. 深度学习论文阅读目标检测篇(三):Faster R-CNN《 Towards Real-Time Object Detection with Region Proposal Networks》

    深度学习论文阅读目标检测篇(三):Faster R-CNN< Towards Real-Time Object Detection with Region Proposal Networks&g ...

  9. 《异常检测——从经典算法到深度学习》15 通过无监督和主动学习进行实用的白盒异常检测

    <异常检测--从经典算法到深度学习> 0 概论 1 基于隔离森林的异常检测算法 2 基于LOF的异常检测算法 3 基于One-Class SVM的异常检测算法 4 基于高斯概率密度异常检测 ...

最新文章

  1. storm能不能测试wadl_情感测试:4朵玫瑰花,哪个会最扎手?测你婚后能不能享住TA?...
  2. python【蓝桥杯vip练习题库】ALGO-39数组排序去重
  3. 【Qt】Qt之网格布局
  4. 的函数原型_JS基础函数、对象和原型、原型链的关系
  5. 大话设计模式-策略模式与简单工厂模式
  6. MyBatis整合Spring的实现(13)
  7. python文件处理,将DNA序列转换为RNA序列
  8. 社区发现(六)--模块度
  9. 流程图动画效果html,jQuery创意线条步骤流程图动画特效
  10. 【笔记】结巴分词绘制词云图
  11. Oracle验证身份证号码有效性
  12. android tv盒子哪个好用,电视盒子什么牌子好?内行人都选这五款好用又不贵的机型...
  13. openwrt 中 Luci 的简单使用
  14. WCF基础教程(三)——WCF通信过程及配置文件解析
  15. 云计算大数据学习中心作业2
  16. Verilog中task使用
  17. 工程光学第一、二、六章学习总结
  18. mysql小版本升级补丁操作
  19. php 计算时差,php 计算时区的时差的简单示例
  20. 【报告分享】 2021Q1中国主要城市交通分析报告-高德(附下载)

热门文章

  1. PHP学习:字符串操作和正则表达式
  2. document.body.scrollTop的值为零问题
  3. Vue2.0项目中使用sass(踩坑之路)
  4. putty连接Linux
  5. java里的关键字有什么用_java语言关键字有哪些?都有什么用处?
  6. Vue2.0 传值方式
  7. linux创建用户,并修改分组,改变权限
  8. java设计模式之设计原则⑤迪米特原则
  9. Learning Less is More – 6D Camera Localization via 3D Surface Regression
  10. ANN:ML方法与概率图模型