分类决策树回归决策树

决策树分类器背后的数学 (Maths behind Decision Tree Classifier)

Before we see the python implementation of the decision tree. Let’s first understand the math behind the decision tree classification. We will see how all the above-mentioned terms are used for splitting.

在我们看到决策树的python实现之前。首先让我们了解决策树分类背后的数学原理。我们将看到如何使用所有上述术语进行拆分。

We will use a simple dataset which contains information about students from different classes and gender and see whether they stay in the school’s hostel or not.

我们将使用一个简单的数据集，其中包含有关来自不同班级和性别的学生的信息，并查看他们是否留在学校的宿舍中。

This is how our data set looks like :

这就是我们的数据集的样子：

Let’s try and understand how the root node is selected by calcualting gini impurity. We will use the above mentioned data.

让我们尝试了解如何通过计算基尼杂质来选择根节点。我们将使用上述数据。

We have two features which we can use for nodes: “Class” and “Gender”. We will calculate gini impurity for each of the features and then select that feature which has least gini impurity.

我们有两个可用于节点的功能：“类”和“性别”。我们将为每个特征计算基尼杂质，然后选择基尼杂质最少的特征。

Let’s review the formula for calculating ginni impurity:

让我们回顾一下计算ginni杂质的公式：

Let’s start with class, we will try to gini impurity for all different values in “class”.

让我们从类开始，我们将尝试为“类”中的所有不同值添加杂质。

This is how our Decision tree node is selected by calculating gini impurity for each node individually. If the number of feautures increases, then we just need to repeat the same steps after the selection of the root node.

这就是通过分别计算每个节点的基尼杂质来选择我们的决策树节点的方式。如果功能数量增加，那么我们只需要在选择根节点之后重复相同的步骤即可。

We will try and find the root nodes for the same dataset by calculating entropy and information gain.

我们将通过计算熵和信息增益来尝试找到同一数据集的根节点。

DataSet:

数据集：

We have two features and we will try to choose the root node by calculating the information gain by splitting each feature.

我们有两个功能，我们将尝试通过拆分每个功能来计算信息增益来选择根节点。

Let’ review the formula for entropy and information gain:

让我们回顾一下熵和信息增益的公式：

Let’s start with feature “class” :

让我们从功能“类”开始：

Let’ see the information gain from feature “gender” :

让我们看看从“性别”功能获得的信息：

决策树的不同算法 (Different Algorithms for Decision Tree)

ID3 (Iterative Dichotomiser) : It is one of the algorithms used to construct decision tree for classification. It uses Information gain as the criteria for finding the root nodes and splitting them. It only accepts categorical attributes.ID3(迭代二分器)：这是用于构建决策树以进行分类的算法之一。它使用信息增益作为查找根节点并将其拆分的标准。它仅接受分类属性。
C4.5 : It is an extension of ID3 algorithm, and better than ID3 as it deals both continuous and discreet values.It is also used for classfication purposes.C4.5：它是ID3算法的扩展，比ID3更好，因为它既处理连续值又处理离散值，也用于分类目的。
Classfication and Regression Algorithm(CART) : It is the most popular algorithm used for constructing decison trees. It uses ginni impurity as the default calculation for selecting root nodes, however one can use “entropy” for criteria as well. This algorithm works on both regression as well as classfication problems. We will use this algorithm in our pyhton implementation.分类和回归算法(CART)：这是用于构建决策树的最流行算法。它使用ginni杂质作为选择根节点的默认计算，但是也可以使用“熵”作为标准。该算法适用于回归和分类问题。我们将在pyhton实现中使用此算法。

Entropy and Ginni impurity can be used reversibly. It doesn’t affects the result much. Although, ginni is easier to compute than entropy, since entropy has a log term calculation. That’s why CART algorithm uses ginni as the default algorithm.

熵和Ginni杂质可以可逆地使用。它对结果的影响不大。尽管ginni比熵更容易计算，因为熵具有对数项计算。这就是CART算法使用ginni作为默认算法的原因。

If we plot ginni vs entropy graph, we can see there is not much difference between them:

如果我们绘制ginni vs熵图，我们可以看到它们之间没有太大的区别：

Advantages of Decision Tree:

决策树的优势：

It can be used for both Regression and Classification problems.它可以用于回归和分类问题。
Decision Trees are very easy to grasp as the rules of splitting is clearly mentioned.决策树很容易掌握，因为明确提到了拆分规则。
Complex decision tree models are very simple when visualized. It can be understood just by visualizing.可视化时，复杂的决策树模型非常简单。仅仅通过可视化就可以理解。
Scaling and normalization are not needed.不需要缩放和规范化。

Disadvantages of Decision Tree:

决策树的缺点：

A small change in data can cause instability in the model because of the greedy approach.由于贪婪的方法，数据的微小变化会导致模型不稳定。
Probability of overfitting is very high for Decision Trees.对于决策树，过度拟合的可能性非常高。
It takes more time to train a decision tree model than other classification algorithms.与其他分类算法相比，训练决策树模型需要更多时间。

翻译自: https://medium.com/@er.amansingh2019/maths-behind-decision-tree-classifier-e3bfd5445540

分类决策树回归决策树

查看全文

http://www.taodudu.cc/news/show-863688.html

检测对抗样本_对抗T恤以逃避ML人检测器
机器学习中一阶段网络是啥_机器学习项目的各个阶段
目标检测 dcn v2_使用Detectron2分6步进行目标检测
生成高分辨率pdf_用于高分辨率图像合成的生成变分自编码器
神经网络激活函数对数函数_神经网络中的激活函数
算法伦理
python 降噪_使用降噪自动编码器重建损坏的数据（Python代码）
bert简介_BERT简介
卷积神经网络结构_卷积神经网络
html两个框架同时_两个框架的故事
深度学习中交叉熵_深度计算机视觉，用于检测高熵合金中的钽和铌碎片
梯度提升树python_梯度增强树回归— Spark和Python
5行代码可实现5倍Scikit-Learn参数调整的更快速度
tensorflow 多人_使用TensorFlow2.x进行实时多人2D姿势估计
keras构建卷积神经网络_在Keras中构建，加载和保存卷积神经网络
深度学习背后的数学_深度学习背后的简单数学
深度学习：在图像上找到手势_使用深度学习的人类情绪和手势检测器：第1部分
单光子探测技术应用_我如何最终在光学/光子学应用程序中使用机器学习作为博士学位
基于深度学习的病理_组织病理学的深度学习（第二部分）
ai无法启动产品_启动AI启动的三个关键教训
达尔文进化奖_使用Kydavra GeneticAlgorithmSelector将达尔文进化应用于特征选择
变异函数 python_使用Python进行变异测试
信号处理深度学习机器学习_机器学习与信号处理
PinnerSage模型
零信任模型_关于信任模型
乐器演奏_深度强化学习代理演奏的蛇
深度学习模型建立过程_所有深度学习都是统计模型的建立
使用TensorFlow进行鬼写
使用OpenCV和Python从图像中提取形状
NLP的特征工程

分类决策树回归决策树_决策树分类器背后的数学相关推荐

python 决策树回归参数_使用 Python 和 scikit-learn 学习回归算法
本教程已纳入面向开发者的机器学习这一学习路径. 简介在本教程中,我们将介绍解决基于回归的机器学习问题的基础知识,并对当前最流行的一些算法进行比较研究. 设置注册或登录. 从 Try IBM Wat ...
python 决策树回归参数_python决策树之CART分类回归树详解
{"moduleinfo":{"card_count":[{"count_phone":1,"count":1}],&q ...
决策树信息熵计算_决策树熵|熵计算
决策树信息熵计算 A decision tree is a very important supervised learning technique. It is basically a classi ...
python决策树的应用_决策树应用（一）
上一篇讲了ID3决策树原理,现在开始拿一个例子进行实战一.python机器学习库 scikit-learn.sklearn是一个Python第三方提供的非常强力的机器学习库,它包含了从数据预处理到训 ...
python决策树id3算法_决策树ID3算法预测隐形眼睛类型--python实现
标签: 本节讲解如何预测患者需要佩戴的隐形眼镜类型. 1.使用决策树预测隐形眼镜类型的一般流程 (1)收集数据:提供的文本文件(数据来源于UCI数据库) (2)准备数据:解析tab键分隔的数据行 (3 ...
邮件伪造_伪造品背后的数学
邮件伪造入门指南 (An Introductory Guide) Although many are familiar with the incredible results produced by ...
决策树回归：不掉包源码实现
请点击上面公众号,免费订阅. <实例>阐述算法,通俗易懂,助您对算法的理解达到一个新高度.包含但不限于:经典算法,机器学习,深度学习,LeetCode 题解,Kaggle 实战.期待您的到 ...
机器学习系列(9)_决策树详解01
注:本篇文章参考: 1.机器学习的种类介绍 2.机器学习两种方法--监督学习和无监督学习 3.决策树回归文章目录一.种类介绍 1.监督学习 2.非监督学习 3.半监督学习 4.强化学习二.决策树 ...
12_信息熵，信息熵公式，信息增益，决策树、常见决策树使用的算法、决策树的流程、决策树API、决策树案例、随机森林、随机森林的构建过程、随机森林API、随机森林的优缺点、随机森林案例
1 信息熵以下来自:https://www.zhihu.com/question/22178202/answer/161732605 1.2 信息熵的公式先抛出信息熵公式如下: 1.2 信息熵信 ...

分类决策树回归决策树_决策树分类器背后的数学

决策树分类器背后的数学 (Maths behind Decision Tree Classifier)

决策树的不同算法 (Different Algorithms for Decision Tree)

相关文章：

分类决策树回归决策树_决策树分类器背后的数学相关推荐

最新文章

热门文章

分类决策树 回归决策树_决策树分类器背后的数学

决策树分类器背后的数学 (Maths behind Decision Tree Classifier)

决策树的不同算法 (Different Algorithms for Decision Tree)

相关文章：

分类决策树 回归决策树_决策树分类器背后的数学相关推荐

最新文章

热门文章

分类决策树回归决策树_决策树分类器背后的数学

分类决策树回归决策树_决策树分类器背后的数学相关推荐