1. 何为决策树桩?

单层决策树(decision stump),也称决策树桩,它是一种简单的决策树,通过给定的阈值进行分类。


2. 关键问题



  1. 从所有属性中,选择那个属性作为决策树桩(弱学习器)
  2. 该决策树桩的阈值设定为何值(上图中是1.75)?
  3. 是小于阈值识别为1(yes),还是大于阈值识别为1(yes)。一般情况下设定为小于。

3. MATLAB代码实现

主函数test主要是数据的加载和决策树桩的调用, 并呈现最佳的决策树桩以及判断错误的概率。


关键代码:err = sum(weight .* (predi_labels ~= label));

function [classifier, min_error, best_labels] = decision_stump(data, weight, label)
% decision_stump 确定最优决策树桩并返回分类函数
% data:num_row*num_col; weights:num_row*1(初始值均为1/num_row); labels:num_row*1(初始值为1或0)
% classifier有dim(哪一维特征识别率最高),thresh_val(阈值何值时候该维征识别率最高),thresh_ineq(是大于阈值识别高还是小于阈值识别高)
num_row = size(data, 1);
num_col = size(data, 2);
% 优化迭代次数
max_iter = 10;
% 初始化相关参数
min_error = Inf;
best_labels = ones(num_row , 1);
classifier.dim = 0;
classifier.thresh_val = 0;
classifier.thresh_ineq = '';
for i = 1:num_colcur_thresh_val = min(data(:, i));step_size = (max(data(:, i)) - min(data(:, i))) / max_iter;for j = 1:max_iterfor k = ['l', 'g']thresh_val = cur_thresh_val + (j - 1) * step_size;predi_labels = decision(data, i, thresh_val, k);err = sum(weight .* (predi_labels ~= label));fprintf("iter %d dim %d, threshVal %.2f, thresh ineqal: %s, the weighted error is %.3f\n", j, i, thresh_val, k, err);if err < min_error%更新相关参数min_error = err;best_labels = predi_labels;classifier.dim = i;classifier.thresh_val = thresh_val;classifier.thresh_ineq = k;endendend
end% 根据决策树桩的阈值来分类(1,0)
function predi_labels = decision(data, i_col, thresh_val, thresh_ineq)
% data:num_row*num_col,i_col:表示第i列,thresh_val:表示决策树桩的阈值,thresh_ineq:采用大于或者小于来比较阈值
% predi_labels:根据决策树桩的阈值来获得分类后的labels,num_row*1
num_row = size(data, 1);
% 初始化predi_labels为全1
predi_labels = ones(num_row, 1);
if thresh_ineq == 'l'% 如果小于阈值,则判定为0predi_labels(data(:, i_col) <= thresh_val) = 0;
elseif thresh_ineq == 'g'% 如果大于阈值,则判定为0predi_labels(data(:, i_col) > thresh_val) = 0;


% 加载数据[data, label]([X,Y])
[data, label] = loadData();
% 初始化每个数据点的相对权重,创建数值均为1/num_row的num_row*num_col数组
num_row = size(data, 1);
num_col = size(data, 2);
weight = repmat(1 / num_row, num_row, 1);
[classifier, min_error, best_labels] = decision_stump(data, weight, label);
% 输出最佳的决策树桩相关参数
fprintf("dim %d, threshVal %.2f, thresh ineqal: %s, the weighted error is %.3f\n", ...classifier.dim, classifier.thresh_val, classifier.thresh_ineq, min_error);
disp(best_labels);function [data, label] = loadData()
% data:5*2,label:5*1,标签为0,1.
data = [1, 2.1; 1.5, 1.6; 1.3, 1; 1, 1; 2, 1];
label = [1; 1; 0 ; 0 ; 1];

iter 1 dim 1, threshVal 1.00, thresh ineqal: l, the weighted error is 0.400
iter 1 dim 1, threshVal 1.00, thresh ineqal: g, the weighted error is 0.600
iter 2 dim 1, threshVal 1.10, thresh ineqal: l, the weighted error is 0.400
iter 2 dim 1, threshVal 1.10, thresh ineqal: g, the weighted error is 0.600
iter 3 dim 1, threshVal 1.20, thresh ineqal: l, the weighted error is 0.400
iter 3 dim 1, threshVal 1.20, thresh ineqal: g, the weighted error is 0.600
iter 4 dim 1, threshVal 1.30, thresh ineqal: l, the weighted error is 0.200
iter 4 dim 1, threshVal 1.30, thresh ineqal: g, the weighted error is 0.800
iter 5 dim 1, threshVal 1.40, thresh ineqal: l, the weighted error is 0.200
iter 5 dim 1, threshVal 1.40, thresh ineqal: g, the weighted error is 0.800
iter 6 dim 1, threshVal 1.50, thresh ineqal: l, the weighted error is 0.400
iter 6 dim 1, threshVal 1.50, thresh ineqal: g, the weighted error is 0.600
iter 7 dim 1, threshVal 1.60, thresh ineqal: l, the weighted error is 0.400
iter 7 dim 1, threshVal 1.60, thresh ineqal: g, the weighted error is 0.600
iter 8 dim 1, threshVal 1.70, thresh ineqal: l, the weighted error is 0.400
iter 8 dim 1, threshVal 1.70, thresh ineqal: g, the weighted error is 0.600
iter 9 dim 1, threshVal 1.80, thresh ineqal: l, the weighted error is 0.400
iter 9 dim 1, threshVal 1.80, thresh ineqal: g, the weighted error is 0.600
iter 10 dim 1, threshVal 1.90, thresh ineqal: l, the weighted error is 0.400
iter 10 dim 1, threshVal 1.90, thresh ineqal: g, the weighted error is 0.600
iter 1 dim 2, threshVal 1.00, thresh ineqal: l, the weighted error is 0.200
iter 1 dim 2, threshVal 1.00, thresh ineqal: g, the weighted error is 0.800
iter 2 dim 2, threshVal 1.11, thresh ineqal: l, the weighted error is 0.200
iter 2 dim 2, threshVal 1.11, thresh ineqal: g, the weighted error is 0.800
iter 3 dim 2, threshVal 1.22, thresh ineqal: l, the weighted error is 0.200
iter 3 dim 2, threshVal 1.22, thresh ineqal: g, the weighted error is 0.800
iter 4 dim 2, threshVal 1.33, thresh ineqal: l, the weighted error is 0.200
iter 4 dim 2, threshVal 1.33, thresh ineqal: g, the weighted error is 0.800
iter 5 dim 2, threshVal 1.44, thresh ineqal: l, the weighted error is 0.200
iter 5 dim 2, threshVal 1.44, thresh ineqal: g, the weighted error is 0.800
iter 6 dim 2, threshVal 1.55, thresh ineqal: l, the weighted error is 0.200
iter 6 dim 2, threshVal 1.55, thresh ineqal: g, the weighted error is 0.800
iter 7 dim 2, threshVal 1.66, thresh ineqal: l, the weighted error is 0.400
iter 7 dim 2, threshVal 1.66, thresh ineqal: g, the weighted error is 0.600
iter 8 dim 2, threshVal 1.77, thresh ineqal: l, the weighted error is 0.400
iter 8 dim 2, threshVal 1.77, thresh ineqal: g, the weighted error is 0.600
iter 9 dim 2, threshVal 1.88, thresh ineqal: l, the weighted error is 0.400
iter 9 dim 2, threshVal 1.88, thresh ineqal: g, the weighted error is 0.600
iter 10 dim 2, threshVal 1.99, thresh ineqal: l, the weighted error is 0.400
iter 10 dim 2, threshVal 1.99, thresh ineqal: g, the weighted error is 0.600
dim 1, threshVal 1.30, thresh ineqal: l, the weighted error is 0.200


决策树桩(Decision Stump)相关推荐

  1. 机器学习基础(十八) —— decision stump

    基本原理 decision stump,决策树桩(我称它为一刀切),也称单层决策树(a one level decision tree),单层也就意味着尽可对每一列属性进行一次判断.如下图所示(仅对 ...

  2. Signal prediction based on boosting and decision stump

    1. 基本介绍: 改论文题目的中文意思:基于boosting和decision stump(决策树桩)的信号预测. 论文下载 Shi L, Duan Q, Dong P, et al. Signal ...

  3. Decision stump、Bootstraping、bagging、boosting、Random Forest、Gradient Boosting

    1)首先来看看 Decision stump https://en.wikipedia.org/wiki/Decision_stump A decision stump is a machine le ...

  4. 【机器学习算法-python实现】Adaboost的实现(1)-单层决策树(decision stump)

    (转载请注明出处:http://blog.csdn.net/buptgshengod) 1.背景      上一节学习支持向量机,感觉公式都太难理解了,弄得我有点头大.不过这一章的Adaboost线比 ...

  5. 【沃顿商学院学习笔记】商业基础——Financing:11 决策标准 Decision Criteria

    商业基础--预测自由现金流量 本章主要是从财务数据的标准来做决策. 决策标准 Decision Criteria 1.比较净现值 Compute the NPV NPV规则是表示接受所有具有正NPV的 ...

  6. 决策曲线 Decision Curve

    本文转自:决策曲线分析法(Decision Curve Analysis,DCA) 简介 评价一种诊断方法是否好用,一般是作ROC曲线,计算AUC.但是,ROC只是从该方法的特异性和敏感性考虑,追求的 ...

  7. 机器学习 决策树算法 (Decision Tree)

    ____tz_zs学习笔记 决策树算法概念: 决策树(decision tree)是一个类似于流程图的树结构:其中,每个内部结点表示在一个属性上的测试,每个分支代表一个属性输出,而每个树叶结点代表类或 ...

  8. 强化学习笔记2:序列决策(Sequential Decision Making)过程

    1 Agent and Environment 强化学习研究的问题是 agent 跟环境交互,上图左边画的是一个 agent,agent 一直在跟环境进行交互. 这个 agent 把它输出的动作给环境 ...

  9. 【沃顿商学院学习笔记】商业基础——Financing:12 决策标准 Decision Criteria

    商业基础--决策标准 本章主要是从财务数据的标准来做决策. 盈亏平衡分析 BREAK EVEN ANALYSIS 1.盈亏平衡分析的目的是为了找到将项目的NPV归零的参数值,同时还能保持固定所有其他参 ...


  1. Javaweb中提到的反射浅析(附源码)
  2. 使用 IDEA 解决 Java8 的数据流问题,极大提升生产力!!
  3. 三个容器倒水_绿茶“最忌讳”先放茶叶再倒水,想要茶味香浓,记住正确泡茶法...
  4. Prism+WPF使用DependencyInjection实现AutoMapper的依赖注入功能
  5. [剑指offer]面试题17:合并两个排序的链表
  6. 新能源汽车太猛了,这些卡脖子技术你了解吗?
  7. uint16 累加_如何把一个uint16整数分解成两个字节并传输?
  8. 接口测试基础——第5篇xlrd模块
  9. apipost提示error:invalid protocol的解决方案
  10. 解决shiro和quartz2 版本冲突问题
  11. windows聚焦图片为什么不更新了_网站内容更新,相同内容,不同网站为什么排名不一样?...
  12. 电子技术课程设计—交通灯控制系统设计
  13. 录屏并制作动图gif的方法
  14. duilib入门教程
  15. 【U盘检测】为了转移压箱底的资料,买了个2T U盘检测仅仅只有47G~
  16. 设备驱动安装不上怎么办?
  17. Extreme DAX中文第1章 商业智能中的DAX
  18. 敏涵化妆品何以圈粉Z世代消费群体?
  19. bulk这个词的用法_bulk是什么意思_bulk的翻译_音标_读音_用法_例句_爱词霸在线词典...
  20. 使用Scratch制作项目《弹珠游戏》


  1. 【Vue】Java后端程序员也必须掌握的前端框架(下)
  2. loopback 搭建
  3. Pyomo/python_02 微分代数方程定义
  4. 机器学习实战(八)——预测数值型数据:回归
  5. 国服Cytus2解密与注入
  6. RadAsm更换主题
  7. jsbarcode 条形码
  8. 查看Eigen、CMake、ceres、opencv版本
  9. 如何分析上市公司利润表?
  10. wamp集成环境php扩展,redis学习之路:wampserver集成环境安装php redis拓展