人脸检测（四）--CART原理及实现

原文：

http://blog.csdn.net/acdreamers/article/details/44664481

在之前介绍过决策树的ID3算法实现，今天主要来介绍决策树的另一种实现，即CART算法。

Contents

1. CART算法的认识

2. CART算法的原理

3. CART算法的实现

1. CART算法的认识

Classification And Regression Tree，即分类回归树算法，简称CART算法，它是决策树的一种实现，通

常决策树主要有三种实现，分别是ID3算法，CART算法和C4.5算法。

CART算法是一种二分递归分割技术，把当前样本划分为两个子样本，使得生成的每个非叶子结点都有两个分支，

因此CART算法生成的决策树是结构简洁的二叉树。由于CART算法构成的是一个二叉树，它在每一步的决策时只能

是“是”或者“否”，即使一个feature有多个取值，也是把数据分为两部分。在CART算法中主要分为两个步骤

（1）将样本递归划分进行建树过程

（2）用验证数据进行剪枝

2. CART算法的原理

上面说到了CART算法分为两个过程，其中第一个过程进行递归建立二叉树，那么它是如何进行划分的？

设代表单个样本的个属性，表示所属类别。CART算法通过递归的方式将维的空间划分为不重

叠的矩形。划分步骤大致如下

（1）选一个自变量，再选取的一个值，把维空间划分为两部分，一部分的所有点都满足，

另一部分的所有点都满足，对非连续变量来说属性值的取值只有两个，即等于该值或不等于该值。

（2）递归处理，将上面得到的两部分按步骤（1）重新选取一个属性继续划分，直到把整个维空间都划分完。

在划分时候有一个问题，它是按照什么标准来划分的？对于一个变量属性来说，它的划分点是一对连续变量属

性值的中点。假设个样本的集合一个属性有个连续的值，那么则会有个分裂点，每个分裂点为相邻

两个连续值的均值。每个属性的划分按照能减少的杂质的量来进行排序，而杂质的减少量定义为划分前的杂质减

去划分后的每个节点的杂质量划分所占比率之和。而杂质度量方法常用Gini指标，假设一个样本共有类，那么

一个节点的Gini不纯度可定义为

其中表示属于类的概率，当Gini(A)=0时，所有样本属于同类，所有类在节点中以等概率出现时，Gini(A)

最大化，此时。

有了上述理论基础，实际的递归划分过程是这样的：如果当前节点的所有样本都不属于同一类或者只剩下一个样

本，那么此节点为非叶子节点，所以会尝试样本的每个属性以及每个属性对应的分裂点，尝试找到杂质变量最大

的一个划分，该属性划分的子树即为最优分支。

下面举个简单的例子，如下图

在上述图中，属性有3个，分别是有房情况，婚姻状况和年收入，其中有房情况和婚姻状况是离散的取值，而年

收入是连续的取值。拖欠贷款者属于分类的结果。

假设现在来看有房情况这个属性，那么按照它划分后的Gini指数计算如下

而对于婚姻状况属性来说，它的取值有3种，按照每种属性值分裂后Gini指标计算如下

最后还有一个取值连续的属性，年收入，它的取值是连续的，那么连续的取值采用分裂点进行分裂。如下

根据这样的分裂规则CART算法就能完成建树过程。

建树完成后就进行第二步了，即根据验证数据进行剪枝。在CART树的建树过程中，可能存在Overfitting，许多

分支中反映的是数据中的异常，这样的决策树对分类的准确性不高，那么需要检测并减去这些不可靠的分支。决策

树常用的剪枝有事前剪枝和事后剪枝，CART算法采用事后剪枝，具体方法为代价复杂性剪枝法。可参考如下链接

剪枝参考：http://www.cnblogs.com/zhangchaoyang/articles/2709922.html

3. CART算法的实现

以下代码是网上找的CART算法的MATLAB实现。

[plain] view plain copy print?

CART
function D = CART(train_features, train_targets, params, region)
% Classify using classification and regression trees
% Inputs:
% features - Train features
% targets - Train targets
% params - [Impurity type, Percentage of incorrectly assigned samples at a node]
% Impurity can be: Entropy, Variance (or Gini), or Missclassification
% region - Decision region vector: [-x x -y y number_of_points]
%
% Outputs
% D - Decision sufrace
[Ni, M] = size(train_features);
%Get parameters
[split_type, inc_node] = process_params(params);
%For the decision region
N = region(5);
mx = ones(N,1) * linspace (region(1),region(2),N);
my = linspace (region(3),region(4),N)' * ones(1,N);
flatxy = [mx(:), my(:)]';
%Preprocessing
[f, t, UW, m] = PCA(train_features, train_targets, Ni, region);
train_features = UW * (train_features - m*ones(1,M));;
flatxy = UW * (flatxy - m*ones(1,N^2));;
%Build the tree recursively
disp('Building tree')
tree = make_tree(train_features, train_targets, M, split_type, inc_node, region);
%Make the decision region according to the tree
disp('Building decision surface using the tree')
targets = use_tree(flatxy, 1:N^2, tree);
D = reshape(targets,N,N);
%END
function targets = use_tree(features, indices, tree)
%Classify recursively using a tree
if isnumeric(tree.Raction)
%Reached an end node
targets = zeros(1,size(features,2));
targets(indices) = tree.Raction(1);
else
%Reached a branching, so:
%Find who goes where
in_right = indices(find(eval(tree.Raction)));
in_left = indices(find(eval(tree.Laction)));
Ltargets = use_tree(features, in_left, tree.left);
Rtargets = use_tree(features, in_right, tree.right);
targets = Ltargets + Rtargets;
end
%END use_tree
function tree = make_tree(features, targets, Dlength, split_type, inc_node, region)
%Build a tree recursively
if (length(unique(targets)) == 1),
%There is only one type of targets, and this generates a warning, so deal with it separately
tree.right = [];
tree.left = [];
tree.Raction = targets(1);
tree.Laction = targets(1);
break
end
[Ni, M] = size(features);
Nt = unique(targets);
N = hist(targets, Nt);
if ((sum(N < Dlength*inc_node) == length(Nt) - 1) | (M == 1)),
%No further splitting is neccessary
tree.right = [];
tree.left = [];
if (length(Nt) ~= 1),
MLlabel = find(N == max(N));
else
MLlabel = 1;
end
tree.Raction = Nt(MLlabel);
tree.Laction = Nt(MLlabel);
else
%Split the node according to the splitting criterion
deltaI = zeros(1,Ni);
split_point = zeros(1,Ni);
op = optimset('Display', 'off');
for i = 1:Ni,
split_point(i) = fminbnd('CARTfunctions', region(i*2-1), region(i*2), op, features, targets, i, split_type);
I(i) = feval('CARTfunctions', split_point(i), features, targets, i, split_type);
end
[m, dim] = min(I);
loc = split_point(dim);
%So, the split is to be on dimention 'dim' at location 'loc'
indices = 1:M;
tree.Raction= ['features(' num2str(dim) ',indices) > ' num2str(loc)];
tree.Laction= ['features(' num2str(dim) ',indices) <= ' num2str(loc)];
in_right = find(eval(tree.Raction));
in_left = find(eval(tree.Laction));
if isempty(in_right) | isempty(in_left)
%No possible split found
tree.right = [];
tree.left = [];
if (length(Nt) ~= 1),
MLlabel = find(N == max(N));
else
MLlabel = 1;
end
tree.Raction = Nt(MLlabel);
tree.Laction = Nt(MLlabel);
else
%...It's possible to build new nodes
tree.right = make_tree(features(:,in_right), targets(in_right), Dlength, split_type, inc_node, region);
tree.left = make_tree(features(:,in_left), targets(in_left), Dlength, split_type, inc_node, region);
end
end

CARTfunction D = CART(train_features, train_targets, params, region)% Classify using classification and regression trees
% Inputs:
% features - Train features
% targets     - Train targets
% params - [Impurity type, Percentage of incorrectly assigned samples at a node]
%                   Impurity can be: Entropy, Variance (or Gini), or Missclassification
% region     - Decision region vector: [-x x -y y number_of_points]
%
% Outputs
% D - Decision sufrace[Ni, M]    = size(train_features);%Get parameters
[split_type, inc_node] = process_params(params);%For the decision region
N           = region(5);
mx          = ones(N,1) * linspace (region(1),region(2),N);
my          = linspace (region(3),region(4),N)' * ones(1,N);
flatxy      = [mx(:), my(:)]';%Preprocessing
[f, t, UW, m]   = PCA(train_features, train_targets, Ni, region);
train_features  = UW * (train_features - m*ones(1,M));;
flatxy          = UW * (flatxy - m*ones(1,N^2));;%Build the tree recursively
disp('Building tree')
tree        = make_tree(train_features, train_targets, M, split_type, inc_node, region);%Make the decision region according to the tree
disp('Building decision surface using the tree')
targets = use_tree(flatxy, 1:N^2, tree);D = reshape(targets,N,N);
%ENDfunction targets = use_tree(features, indices, tree)
%Classify recursively using a treeif isnumeric(tree.Raction)%Reached an end nodetargets = zeros(1,size(features,2));targets(indices) = tree.Raction(1);
else%Reached a branching, so:%Find who goes wherein_right    = indices(find(eval(tree.Raction)));in_left     = indices(find(eval(tree.Laction)));Ltargets = use_tree(features, in_left, tree.left);Rtargets = use_tree(features, in_right, tree.right);targets = Ltargets + Rtargets;
end
%END use_treefunction tree = make_tree(features, targets, Dlength, split_type, inc_node, region)
%Build a tree recursivelyif (length(unique(targets)) == 1),%There is only one type of targets, and this generates a warning, so deal with it separatelytree.right      = [];tree.left       = [];tree.Raction    = targets(1);tree.Laction    = targets(1);break
end[Ni, M] = size(features);
Nt      = unique(targets);
N       = hist(targets, Nt);if ((sum(N < Dlength*inc_node) == length(Nt) - 1) | (M == 1)),%No further splitting is neccessarytree.right      = [];tree.left       = [];if (length(Nt) ~= 1),MLlabel   = find(N == max(N));elseMLlabel   = 1;endtree.Raction    = Nt(MLlabel);tree.Laction    = Nt(MLlabel);else%Split the node according to the splitting criteriondeltaI = zeros(1,Ni);split_point = zeros(1,Ni);op = optimset('Display', 'off'); for i = 1:Ni,split_point(i) = fminbnd('CARTfunctions', region(i*2-1), region(i*2), op, features, targets, i, split_type);I(i) = feval('CARTfunctions', split_point(i), features, targets, i, split_type);end[m, dim] = min(I);loc = split_point(dim);%So, the split is to be on dimention 'dim' at location 'loc'indices = 1:M;tree.Raction= ['features(' num2str(dim) ',indices) >  ' num2str(loc)];tree.Laction= ['features(' num2str(dim) ',indices) <= ' num2str(loc)];in_right    = find(eval(tree.Raction));in_left     = find(eval(tree.Laction));if isempty(in_right) | isempty(in_left)%No possible split foundtree.right      = [];tree.left       = [];if (length(Nt) ~= 1),MLlabel   = find(N == max(N));elseMLlabel = 1;endtree.Raction    = Nt(MLlabel);tree.Laction    = Nt(MLlabel);else%...It's possible to build new nodestree.right = make_tree(features(:,in_right), targets(in_right), Dlength, split_type, inc_node, region);tree.left  = make_tree(features(:,in_left), targets(in_left), Dlength, split_type, inc_node, region);    endend

在Julia中的决策树包：https://github.com/bensadeghi/DecisionTree.jl/blob/master/README.md

人脸检测（四）--CART原理及实现相关推荐

门禁系统中人脸检测技术的原理剖析和使用教程
引言人脸检测 API 是一种基于深度学习技术的图像处理API,可以快速地检测出一张图片中的人脸,并返回人脸的位置和关键点坐标,在人脸识别系统.人脸情绪识别等多种场景下都有极大的应用. 本文将从人脸检 ...
算丰边缘计算开发板人脸检测识别-实现原理与代码介绍
一.概述本文章会尝试用简单的语言,来给大家介绍一下,如何搭建一个最基本的人脸检测以及识别的场景.当然,这篇文章不只是讲解关于人脸检测相关的部分,而是围绕这个核心,包含概括了整个前端的uvc,甚至so ...
一文带你了解人脸检测算法的类型及其工作原理
在过去的几年里,人脸识别受到了广泛的关注,被认为是图像分析领域最有前途的应用之一.人脸检测可以考虑人脸识别操作的很大一部分.根据其强度将计算资源集中在持有人脸的图像部分.图片中的人脸检测方法很复杂,因 ...
一文综述人脸检测算法（附资源）
文章来源:SIGAI 本文共9400字,建议阅读10+分钟. 本文将和大家一起回顾人脸检测算法的整个发展历史. [导读] 人脸检测是目前所有目标检测子方向中被研究的最充分的问题之一,它在安防监控,人证 ...
第九节、人脸检测之Haar分类器
人脸检测属于计算机视觉的范畴,早期人们的主要研究方向是人脸识别,即根据人脸来识别人物的身份,后来在复杂背景下的人脸检测需求越来越大,人脸检测也逐渐作为一个单独的研究方向发展起来. 目前人脸检测的方法主 ...
人脸检测MTCNN和人脸识别Facenet(附源码)
原文链接:人脸检测MTCNN和人脸识别Facenet(附源码) 在说到人脸检测我们首先会想到利用Harr特征提取和Adaboost分类器进行人脸检测(有兴趣的可以去一看这篇博客第九节.人脸检测之Haa ...
人脸检测：人脸检测算法综述
https://blog.csdn.net/SIGAI_CSDN/article/details/80751476 问题描述人脸检测的目标是找出图像中所有的人脸对应的位置,算法的输出是人脸外接矩形在 ...
（转）第三十七节、人脸检测MTCNN和人脸识别Facenet(附源码)
http://www.cnblogs.com/zyly/p/9703614.html 在说到人脸检测我们首先会想到利用Harr特征提取和Adaboost分类器进行人脸检测(有兴趣的可以去一看这篇博客第 ...
Adaboost算法详解（haar人脸检测）
转自:https://wizardforcel.gitbooks.io/dm-algo-top10/content/adaboost.html(脸书动不动上不去故转载)(主要看adaboost的例子. ...
人脸检测：传统到深度学习方法汇总
虽然人脸的结构是确定的,由眉毛.眼睛.鼻子和嘴等部位组成,近似是一个刚体,但由于姿态和表情的变化,不同人的外观差异,光照,遮挡的影响,准确的检测处于各种条件下的人脸是一件相对困难的事情. 人脸检测算法 ...

人脸检测（四）--CART原理及实现

人脸检测（四）--CART原理及实现相关推荐

最新文章

热门文章