吴恩达《机器学习》2022版，简单决策树代码实现。

以判断蘑菇可食用性为例：

Cap Color	Stalk Shape	Solitary	Edible
Brown	Tapering	Yes	1
Brown	Enlarging	Yes	1
Brown	Enlarging	No	0
Brown	Enlarging	No	0
Brown	Tapering	Yes	1
Red	Tapering	Yes	0
Red	Enlarging	No	0
Brown	Enlarging	Yes	1
Red	Tapering	No	1
Brown	Enlarging	No	0

把特征转化成为one-hot编码

Brown Cap	Tapering Stalk Shape	Solitary	Edible
1	1	1	1
1	0	1	1
1	0	0	0
1	0	0	0
1	1	1	1
0	1	1	0
0	0	0	0
1	0	1	1
0	1	0	1
1	0	0	0

代码

import numpy as np#导入数据
X_train = np.array([[1,1,1],[1,0,1],[1,0,0],[1,0,0],[1,1,1],[0,1,1],[0,0,0],[1,0,1],[0,1,0],[1,0,0]])
y_train = np.array([1,1,0,0,1,0,0,1,1,0])#计算熵
def compute_entropy(y):entropy = 0.if len(y) != 0:p1 = len(y[y == 1]) / len(y) if p1 != 0 and p1 != 1:entropy = -p1 * np.log2(p1) - (1 - p1) * np.log2(1 - p1)else:entropy = 0.return entropy#根据特征拆分
def split_dataset(X, node_indices, feature):left_indices = []right_indices = []for i in node_indices:   if X[i][feature] == 1:left_indices.append(i)else:right_indices.append(i)return left_indices, right_indices#计算信息增益
def compute_information_gain(X, y, node_indices, feature):left_indices, right_indices = split_dataset(X, node_indices, feature)X_node, y_node = X[node_indices], y[node_indices]X_left, y_left = X[left_indices], y[left_indices]X_right, y_right = X[right_indices], y[right_indices]information_gain = 0node_entropy = compute_entropy(y_node)left_entropy = compute_entropy(y_left)right_entropy = compute_entropy(y_right)w_left = len(X_left) / len(X_node)w_right = len(X_right) / len(X_node)weighted_entropy = w_left * left_entropy + w_right * right_entropyinformation_gain = node_entropy - weighted_entropyreturn information_gain#计算最优拆分特征
def get_best_split(X, y, node_indices):   num_features = X.shape[1]best_feature = -1max_info_gain = 0for feature in range(num_features): info_gain = compute_information_gain(X, y, node_indices, feature)if info_gain > max_info_gain:  max_info_gain = info_gainbest_feature = feature return best_feature#构建决策树
tree = []def build_tree_recursive(X, y, node_indices, branch_name, max_depth, current_depth):if current_depth == max_depth:formatting = " "*current_depth + "-"*current_depthprint(formatting, "%s leaf node with indices" % branch_name, node_indices)returnbest_feature = get_best_split(X, y, node_indices) tree.append((current_depth, branch_name, best_feature, node_indices))formatting = "-"*current_depthprint("%s Depth %d, %s: Split on feature: %d" % (formatting, current_depth, branch_name, best_feature))left_indices, right_indices = split_dataset(X, node_indices, best_feature)build_tree_recursive(X, y, left_indices, "Left", max_depth, current_depth+1)build_tree_recursive(X, y, right_indices, "Right", max_depth, current_depth+1)#验证
root_indices = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
build_tree_recursive(X_train, y_train, root_indices, "Root", max_depth=2, current_depth=0)

输出

 Depth 0, Root: Split on feature: 2
- Depth 1, Left: Split on feature: 0-- Left leaf node with indices [0, 1, 4, 7]-- Right leaf node with indices [5]
- Depth 1, Right: Split on feature: 1-- Left leaf node with indices [8]-- Right leaf node with indices [2, 3, 6, 9]

吴恩达-机器学习-简单决策树相关推荐

【2022吴恩达机器学习】决策树
决策树 1.1决策树模型决策树是一种典型的分类方法比如说有一组数据,特征量是"是不是折耳"."是不是圆脸"."有没有胡须",输入特征x是 ...
吴恩达机器学习课程笔记(英文授课) Lv.1 新手村（回归）
目录 1-1机器学习的相关名词 1-2 什么是机器学习? 1.definition 定义 2.主要的机器学习算法的分类 1-3有监督学习及常用算法 1.定义 2.两种数据类型补充:categorica ...
吴恩达机器学习视频学习笔记
吴恩达机器学习视频笔记介绍 Introduction 线性回归 Linear Regression 单变量 One Variable 多变量 Multiple Variables 多项式回归 Pol ...
【CV】吴恩达机器学习课程笔记第18章
本系列文章如果没有特殊说明,正文内容均解释的是文字上方的图片机器学习 | Coursera 吴恩达机器学习系列课程_bilibili 目录 18 应用案例:照片OCR 18-1 问题描述与流程(pi ...
【CV】吴恩达机器学习课程笔记第11章
本系列文章如果没有特殊说明,正文内容均解释的是文字上方的图片机器学习 | Coursera 吴恩达机器学习系列课程_bilibili 目录 11 机器学习系统设计 11-1 确定执行的优先级:以垃圾 ...
吴恩达机器学习笔记-应用机器学习的建议
评估假设我们之前已经学习过一些机器学习的算法,现在我们来谈谈如何评估算法学习得到的假设.当发现预测的结果和实际的数据有误差的时候,我们需要进行一些调整来保证预测的准确性,大部分情况下,有以下几种办法 ...
下载量过百万的吴恩达机器学习和深度学习笔记更新了！（附PDF下载）
今天,我把吴恩达机器学习和深度学习课程笔记都更新了,并提供下载,这两本笔记非常适合机器学习和深度学习入门.(作者:黄海广) 0.导语我和同学将吴恩达老师机器学习和深度学习课程笔记做成了打印版,放在g ...
吴恩达机器学习--单变量线性回归【学习笔记】
说明:本文是本人用于记录学习吴恩达机器学习的学习笔记,如有不对之处请多多包涵. 作者:爱做梦的90后一.模型的描述: 下面的这张图片是对于课程中一些符号的基本的定义: 吴老师列举的最简单的单变量线性 ...
吴恩达机器学习（十六）机器学习流水线、上限分析
目录 0. 前言 1. 流水线 2. 上限分析(Ceiling analysis) 学习完吴恩达老师机器学习课程的照片OCR,简单的做个笔记.文中部分描述属于个人消化后的理解,仅供参考. 如果这篇文章 ...
吴恩达机器学习（十四）推荐系统（基于梯度下降的协同过滤算法）
目录 0. 前言 1. 基于内容的推荐算法(Content-based recommendations) 2. 计算电影特征 3. 基于梯度下降的协同过滤算法(Collaborative filter ...

吴恩达-机器学习-简单决策树

吴恩达《机器学习》2022版，简单决策树代码实现。

代码

吴恩达-机器学习-简单决策树相关推荐

最新文章

热门文章