1. 何为决策树桩?

单层决策树(decision stump),也称决策树桩,它是一种简单的决策树,通过给定的阈值进行分类。

从实际意义上来看,决策树桩根据一个属性的单个判断(但是实际上待判断的物体具有多个属性)就确定最终的分类结果。这种特性比较适合做集成学习中的弱学习器,因为其至少比随机的效果好一些,又计算较为容易。

2. 关键问题

根本目的:通过选择一个合适的决策树桩(弱学习器),使得物体类别识别准确率尽可能高。

怎么选择一个合适的决策树桩?

  1. 从所有属性中,选择那个属性作为决策树桩(弱学习器)
  2. 该决策树桩的阈值设定为何值(上图中是1.75)?
  3. 是小于阈值识别为1(yes),还是大于阈值识别为1(yes)。一般情况下设定为小于。

3. MATLAB代码实现

主函数test主要是数据的加载和决策树桩的调用, 并呈现最佳的决策树桩以及判断错误的概率。

首先看如何在整个数据集上获得最佳的决策树桩,使得判断错误的概率最小。

关键代码:err = sum(weight .* (predi_labels ~= label));

function [classifier, min_error, best_labels] = decision_stump(data, weight, label)
% decision_stump 确定最优决策树桩并返回分类函数
% data:num_row*num_col; weights:num_row*1(初始值均为1/num_row); labels:num_row*1(初始值为1或0)
% classifier有dim(哪一维特征识别率最高),thresh_val(阈值何值时候该维征识别率最高),thresh_ineq(是大于阈值识别高还是小于阈值识别高)
num_row = size(data, 1);
num_col = size(data, 2);
% 优化迭代次数
max_iter = 10;
% 初始化相关参数
min_error = Inf;
best_labels = ones(num_row , 1);
classifier.dim = 0;
classifier.thresh_val = 0;
classifier.thresh_ineq = '';
for i = 1:num_colcur_thresh_val = min(data(:, i));step_size = (max(data(:, i)) - min(data(:, i))) / max_iter;for j = 1:max_iterfor k = ['l', 'g']thresh_val = cur_thresh_val + (j - 1) * step_size;predi_labels = decision(data, i, thresh_val, k);err = sum(weight .* (predi_labels ~= label));fprintf("iter %d dim %d, threshVal %.2f, thresh ineqal: %s, the weighted error is %.3f\n", j, i, thresh_val, k, err);if err < min_error%更新相关参数min_error = err;best_labels = predi_labels;classifier.dim = i;classifier.thresh_val = thresh_val;classifier.thresh_ineq = k;endendend
end
end% 根据决策树桩的阈值来分类(1,0)
function predi_labels = decision(data, i_col, thresh_val, thresh_ineq)
% data:num_row*num_col,i_col:表示第i列,thresh_val:表示决策树桩的阈值,thresh_ineq:采用大于或者小于来比较阈值
% predi_labels:根据决策树桩的阈值来获得分类后的labels,num_row*1
num_row = size(data, 1);
% 初始化predi_labels为全1
predi_labels = ones(num_row, 1);
if thresh_ineq == 'l'% 如果小于阈值,则判定为0predi_labels(data(:, i_col) <= thresh_val) = 0;
elseif thresh_ineq == 'g'% 如果大于阈值,则判定为0predi_labels(data(:, i_col) > thresh_val) = 0;
end
end

测试一下:

clc;clear;clearvars;
% 加载数据[data, label]([X,Y])
[data, label] = loadData();
% 初始化每个数据点的相对权重,创建数值均为1/num_row的num_row*num_col数组
num_row = size(data, 1);
num_col = size(data, 2);
weight = repmat(1 / num_row, num_row, 1);
[classifier, min_error, best_labels] = decision_stump(data, weight, label);
% 输出最佳的决策树桩相关参数
fprintf("dim %d, threshVal %.2f, thresh ineqal: %s, the weighted error is %.3f\n", ...classifier.dim, classifier.thresh_val, classifier.thresh_ineq, min_error);
disp(best_labels);function [data, label] = loadData()
% data:5*2,label:5*1,标签为0,1.
data = [1, 2.1; 1.5, 1.6; 1.3, 1; 1, 1; 2, 1];
label = [1; 1; 0 ; 0 ; 1];
end

iter 1 dim 1, threshVal 1.00, thresh ineqal: l, the weighted error is 0.400
iter 1 dim 1, threshVal 1.00, thresh ineqal: g, the weighted error is 0.600
iter 2 dim 1, threshVal 1.10, thresh ineqal: l, the weighted error is 0.400
iter 2 dim 1, threshVal 1.10, thresh ineqal: g, the weighted error is 0.600
iter 3 dim 1, threshVal 1.20, thresh ineqal: l, the weighted error is 0.400
iter 3 dim 1, threshVal 1.20, thresh ineqal: g, the weighted error is 0.600
iter 4 dim 1, threshVal 1.30, thresh ineqal: l, the weighted error is 0.200
iter 4 dim 1, threshVal 1.30, thresh ineqal: g, the weighted error is 0.800
iter 5 dim 1, threshVal 1.40, thresh ineqal: l, the weighted error is 0.200
iter 5 dim 1, threshVal 1.40, thresh ineqal: g, the weighted error is 0.800
iter 6 dim 1, threshVal 1.50, thresh ineqal: l, the weighted error is 0.400
iter 6 dim 1, threshVal 1.50, thresh ineqal: g, the weighted error is 0.600
iter 7 dim 1, threshVal 1.60, thresh ineqal: l, the weighted error is 0.400
iter 7 dim 1, threshVal 1.60, thresh ineqal: g, the weighted error is 0.600
iter 8 dim 1, threshVal 1.70, thresh ineqal: l, the weighted error is 0.400
iter 8 dim 1, threshVal 1.70, thresh ineqal: g, the weighted error is 0.600
iter 9 dim 1, threshVal 1.80, thresh ineqal: l, the weighted error is 0.400
iter 9 dim 1, threshVal 1.80, thresh ineqal: g, the weighted error is 0.600
iter 10 dim 1, threshVal 1.90, thresh ineqal: l, the weighted error is 0.400
iter 10 dim 1, threshVal 1.90, thresh ineqal: g, the weighted error is 0.600
iter 1 dim 2, threshVal 1.00, thresh ineqal: l, the weighted error is 0.200
iter 1 dim 2, threshVal 1.00, thresh ineqal: g, the weighted error is 0.800
iter 2 dim 2, threshVal 1.11, thresh ineqal: l, the weighted error is 0.200
iter 2 dim 2, threshVal 1.11, thresh ineqal: g, the weighted error is 0.800
iter 3 dim 2, threshVal 1.22, thresh ineqal: l, the weighted error is 0.200
iter 3 dim 2, threshVal 1.22, thresh ineqal: g, the weighted error is 0.800
iter 4 dim 2, threshVal 1.33, thresh ineqal: l, the weighted error is 0.200
iter 4 dim 2, threshVal 1.33, thresh ineqal: g, the weighted error is 0.800
iter 5 dim 2, threshVal 1.44, thresh ineqal: l, the weighted error is 0.200
iter 5 dim 2, threshVal 1.44, thresh ineqal: g, the weighted error is 0.800
iter 6 dim 2, threshVal 1.55, thresh ineqal: l, the weighted error is 0.200
iter 6 dim 2, threshVal 1.55, thresh ineqal: g, the weighted error is 0.800
iter 7 dim 2, threshVal 1.66, thresh ineqal: l, the weighted error is 0.400
iter 7 dim 2, threshVal 1.66, thresh ineqal: g, the weighted error is 0.600
iter 8 dim 2, threshVal 1.77, thresh ineqal: l, the weighted error is 0.400
iter 8 dim 2, threshVal 1.77, thresh ineqal: g, the weighted error is 0.600
iter 9 dim 2, threshVal 1.88, thresh ineqal: l, the weighted error is 0.400
iter 9 dim 2, threshVal 1.88, thresh ineqal: g, the weighted error is 0.600
iter 10 dim 2, threshVal 1.99, thresh ineqal: l, the weighted error is 0.400
iter 10 dim 2, threshVal 1.99, thresh ineqal: g, the weighted error is 0.600
dim 1, threshVal 1.30, thresh ineqal: l, the weighted error is 0.200
     0
     1
     0
     0
     1

可以看到最佳佳决策树桩选择的是第一个特征,阈值1.30,识别错误的概率是0.2,第一个识别错误。

决策树桩(Decision Stump)相关推荐

  1. 机器学习基础(十八) —— decision stump

    基本原理 decision stump,决策树桩(我称它为一刀切),也称单层决策树(a one level decision tree),单层也就意味着尽可对每一列属性进行一次判断.如下图所示(仅对 ...

  2. Signal prediction based on boosting and decision stump

    1. 基本介绍: 改论文题目的中文意思:基于boosting和decision stump(决策树桩)的信号预测. 论文下载 Shi L, Duan Q, Dong P, et al. Signal ...

  3. Decision stump、Bootstraping、bagging、boosting、Random Forest、Gradient Boosting

    1)首先来看看 Decision stump https://en.wikipedia.org/wiki/Decision_stump A decision stump is a machine le ...

  4. 【机器学习算法-python实现】Adaboost的实现(1)-单层决策树(decision stump)

    (转载请注明出处:http://blog.csdn.net/buptgshengod) 1.背景      上一节学习支持向量机,感觉公式都太难理解了,弄得我有点头大.不过这一章的Adaboost线比 ...

  5. 【沃顿商学院学习笔记】商业基础——Financing:11 决策标准 Decision Criteria

    商业基础--预测自由现金流量 本章主要是从财务数据的标准来做决策. 决策标准 Decision Criteria 1.比较净现值 Compute the NPV NPV规则是表示接受所有具有正NPV的 ...

  6. 决策曲线 Decision Curve

    本文转自:决策曲线分析法(Decision Curve Analysis,DCA) 简介 评价一种诊断方法是否好用,一般是作ROC曲线,计算AUC.但是,ROC只是从该方法的特异性和敏感性考虑,追求的 ...

  7. 机器学习 决策树算法 (Decision Tree)

    ____tz_zs学习笔记 决策树算法概念: 决策树(decision tree)是一个类似于流程图的树结构:其中,每个内部结点表示在一个属性上的测试,每个分支代表一个属性输出,而每个树叶结点代表类或 ...

  8. 强化学习笔记2:序列决策(Sequential Decision Making)过程

    1 Agent and Environment 强化学习研究的问题是 agent 跟环境交互,上图左边画的是一个 agent,agent 一直在跟环境进行交互. 这个 agent 把它输出的动作给环境 ...

  9. 【沃顿商学院学习笔记】商业基础——Financing:12 决策标准 Decision Criteria

    商业基础--决策标准 本章主要是从财务数据的标准来做决策. 盈亏平衡分析 BREAK EVEN ANALYSIS 1.盈亏平衡分析的目的是为了找到将项目的NPV归零的参数值,同时还能保持固定所有其他参 ...

最新文章

  1. Javaweb中提到的反射浅析(附源码)
  2. 使用 IDEA 解决 Java8 的数据流问题,极大提升生产力!!
  3. 三个容器倒水_绿茶“最忌讳”先放茶叶再倒水,想要茶味香浓,记住正确泡茶法...
  4. Prism+WPF使用DependencyInjection实现AutoMapper的依赖注入功能
  5. [剑指offer]面试题17:合并两个排序的链表
  6. 新能源汽车太猛了,这些卡脖子技术你了解吗?
  7. uint16 累加_如何把一个uint16整数分解成两个字节并传输?
  8. 接口测试基础——第5篇xlrd模块
  9. apipost提示error:invalid protocol的解决方案
  10. 解决shiro和quartz2 版本冲突问题
  11. windows聚焦图片为什么不更新了_网站内容更新,相同内容,不同网站为什么排名不一样?...
  12. 电子技术课程设计—交通灯控制系统设计
  13. 录屏并制作动图gif的方法
  14. duilib入门教程
  15. 【U盘检测】为了转移压箱底的资料,买了个2T U盘检测仅仅只有47G~
  16. 设备驱动安装不上怎么办?
  17. Extreme DAX中文第1章 商业智能中的DAX
  18. 敏涵化妆品何以圈粉Z世代消费群体?
  19. bulk这个词的用法_bulk是什么意思_bulk的翻译_音标_读音_用法_例句_爱词霸在线词典...
  20. 使用Scratch制作项目《弹珠游戏》

热门文章

  1. 【Vue】Java后端程序员也必须掌握的前端框架(下)
  2. loopback 搭建
  3. Pyomo/python_02 微分代数方程定义
  4. 机器学习实战(八)——预测数值型数据:回归
  5. 国服Cytus2解密与注入
  6. RadAsm更换主题
  7. jsbarcode 条形码
  8. 查看Eigen、CMake、ceres、opencv版本
  9. 如何分析上市公司利润表?
  10. wamp集成环境php扩展,redis学习之路:wampserver集成环境安装php redis拓展