一、树模型可视化展示

1、导包

2、树模型的可视化展示

3、保存为pic.dot文件

4、在dot文件目录下生成.png文件

二、决策边界展示分析

1、展示png文件

2、决策边界展示

3、概率估计

三、树模型预剪枝参数作用

1、决策树中的正则化

2、举例

3、树模型对数据的敏感程度

四、回归树模型

回归任务

1、构建数据

2、导包

3、树模型展示

4、在dot文件目录下生成.png文件

5、png展示

对比树的深度对结果的影响

一、树模型可视化展示

1、导包

import numpy as np
import os
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
import warnings
warnings.filterwarnings('ignore')

2、树模型的可视化展示

下载安装包：https://graphviz.org/download/
环境变量配置：https://jingyan.baidu.com/article/020278115032461bcc9ce598.html

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifieriris = load_iris()
X = iris.data[:,2:] # petal length and width
y = iris.targettree_clf = DecisionTreeClassifier(max_depth=2)
tree_clf.fit(X,y)

3、保存为pic.dot文件

from sklearn.tree import export_graphvizexport_graphviz(tree_clf,out_file="iris_tree.dot",feature_names=iris.feature_names[2:],class_names=iris.target_names,rounded=True,filled=True
)

4、在dot文件目录下生成.png文件

使用graphviz包中的dot命令行工具将此.dot文件转换为各种格式，如PDF或PNG。在dot文件目录下生成了一个pic.png文件：

打开终端，切换到dot文件所在目录，输入：dot iris_tree.dot -T png -o iris_tree.png

二、决策边界展示分析

1、展示png文件

from IPython.display import Image
Image(filename='iris_tree.png',width=400,height=400)

2、决策边界展示

from matplotlib.colors import ListedColormapdef plot_decision_boundary(clf, X, y, axes=[0, 7.5, 0, 3], iris=True, legend=False, plot_training=True):x1s = np.linspace(axes[0], axes[1], 100)x2s = np.linspace(axes[2], axes[3], 100)x1, x2 = np.meshgrid(x1s, x2s)X_new = np.c_[x1.ravel(), x2.ravel()]y_pred = clf.predict(X_new).reshape(x1.shape)custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)if not iris:custom_cmap2 = ListedColormap(['#7d7d58','#4c4c7f','#507d50'])plt.contour(x1, x2, y_pred, cmap=custom_cmap2, alpha=0.8)if plot_training:plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo", label="Iris-Setosa")plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs", label="Iris-Versicolor")plt.plot(X[:, 0][y==2], X[:, 1][y==2], "g^", label="Iris-Virginica")plt.axis(axes)if iris:plt.xlabel("Petal length", fontsize=14)plt.ylabel("Petal width", fontsize=14)else:plt.xlabel(r"$x_1$", fontsize=18)plt.ylabel(r"$x_2$", fontsize=18, rotation=0)if legend:plt.legend(loc="lower right", fontsize=14)plt.figure(figsize=(8, 4))
plot_decision_boundary(tree_clf, X, y)
#实际分裂的位置
plt.plot([2.45, 2.45], [0, 3], "k-", linewidth=2)
plt.plot([2.45, 7.5], [1.75, 1.75], "k--", linewidth=2)
plt.plot([4.95, 4.95], [0, 1.75], "k:", linewidth=2)
plt.plot([4.85, 4.85], [1.75, 3], "k:", linewidth=2)
plt.text(1.40, 1.0, "Depth=0", fontsize=15)
plt.text(3.2, 1.80, "Depth=1", fontsize=13)
plt.text(4.05, 0.5, "(Depth=2)", fontsize=11)
plt.title('Decision Tree decision boundaries')plt.show()

3、概率估计

估计类概率 输入数据为：花瓣长5厘米，宽1.5厘米的花。相应的叶节点是深度为2的左节点，因此决策树应输出以下概率：

Iris-Setosa 为 0％（0/54），
Iris-Versicolor 为 90.7％（49/54），
Iris-Virginica 为 9.3％（5/54）。

tree_clf.predict_proba([[5,1.5]])

tree_clf.predict([[5,1.5]])

三、树模型预剪枝参数作用

1、决策树中的正则化

DecisionTreeClassifier类还有一些其他参数类似地限制了决策树的形状：

min_samples_split（节点在分割之前必须具有的最小样本数），
min_samples_leaf（叶子节点必须具有的最小样本数），
max_leaf_nodes（叶子节点的最大数量），
max_features（在每个节点处评估用于拆分的最大特征数）。
max_depth(树最大的深度)

2、举例

from sklearn.datasets import make_moons
X,y = make_moons(n_samples=100,noise=0.25,random_state=53)
tree_clf1 = DecisionTreeClassifier(random_state=42)
tree_clf2 = DecisionTreeClassifier(min_samples_leaf=4,random_state=42)#最小样本数
tree_clf1.fit(X,y)
tree_clf2.fit(X,y)plt.figure(figsize=(12,4))
plt.subplot(121)
plot_decision_boundary(tree_clf1,X,y,axes=[-1.5,2.5,-1,1.5],iris=False)
plt.title('No restrictions')plt.subplot(122)
plot_decision_boundary(tree_clf2,X,y,axes=[-1.5,2.5,-1,1.5],iris=False)
plt.title('min_samples_leaf=4')

不做任何限制，会抓住有问题的点，易出现过拟合现象。

3、树模型对数据的敏感程度

np.random.seed(6)
Xs = np.random.rand(100, 2) - 0.5
ys = (Xs[:, 0] > 0).astype(np.float32) * 2angle = np.pi / 4
rotation_matrix = np.array([[np.cos(angle), -np.sin(angle)], [np.sin(angle), np.cos(angle)]])
Xsr = Xs.dot(rotation_matrix)tree_clf_s = DecisionTreeClassifier(random_state=42)
tree_clf_s.fit(Xs, ys)
tree_clf_sr = DecisionTreeClassifier(random_state=42)
tree_clf_sr.fit(Xsr, ys)plt.figure(figsize=(11, 4))
plt.subplot(121)
plot_decision_boundary(tree_clf_s, Xs, ys, axes=[-0.7, 0.7, -0.7, 0.7], iris=False)
plt.title('Sensitivity to training set rotation')plt.subplot(122)
plot_decision_boundary(tree_clf_sr, Xsr, ys, axes=[-0.7, 0.7, -0.7, 0.7], iris=False)
plt.title('Sensitivity to training set rotation')plt.show()

四、回归树模型

回归任务

评估标准发生变化。MSE指标。

1、构建数据

np.random.seed(42)
m=200
X=np.random.rand(m,1)
y = 4*(X-0.5)**2
y = y + np.random.randn(m,1)/10

2、导包

from sklearn.tree import DecisionTreeRegressortree_reg = DecisionTreeRegressor(max_depth=2)
tree_reg.fit(X,y)

3、树模型展示

export_graphviz(tree_reg,out_file=("regression_tree.dot"),feature_names=["x1"],rounded=True,filled=True)

4、在dot文件目录下生成.png文件

dot regression_tree.dot -T png -o regression_tree.png

5、png展示

# 你的第二个决策树长这样
from IPython.display import Image
Image(filename="regression_tree.png",width=400,height=400,)

对比树的深度对结果的影响

from sklearn.tree import DecisionTreeRegressortree_reg1 = DecisionTreeRegressor(random_state=42, max_depth=2)
tree_reg2 = DecisionTreeRegressor(random_state=42, max_depth=3)
tree_reg1.fit(X, y)
tree_reg2.fit(X, y)def plot_regression_predictions(tree_reg, X, y, axes=[0, 1, -0.2, 1], ylabel="$y$"):x1 = np.linspace(axes[0], axes[1], 500).reshape(-1, 1)y_pred = tree_reg.predict(x1)plt.axis(axes)plt.xlabel("$x_1$", fontsize=18)if ylabel:plt.ylabel(ylabel, fontsize=18, rotation=0)plt.plot(X, y, "b.")plt.plot(x1, y_pred, "r.-", linewidth=2, label=r"$\hat{y}$")plt.figure(figsize=(11, 4))
plt.subplot(121)plot_regression_predictions(tree_reg1, X, y)
for split, style in ((0.1973, "k-"), (0.0917, "k--"), (0.7718, "k--")):plt.plot([split, split], [-0.2, 1], style, linewidth=2)
plt.text(0.21, 0.65, "Depth=0", fontsize=15)
plt.text(0.01, 0.2, "Depth=1", fontsize=13)
plt.text(0.65, 0.8, "Depth=1", fontsize=13)
plt.legend(loc="upper center", fontsize=18)
plt.title("max_depth=2", fontsize=14)plt.subplot(122)plot_regression_predictions(tree_reg2, X, y, ylabel=None)
for split, style in ((0.1973, "k-"), (0.0917, "k--"), (0.7718, "k--")):plt.plot([split, split], [-0.2, 1], style, linewidth=2)
for split in (0.0458, 0.1298, 0.2873, 0.9040):plt.plot([split, split], [-0.2, 1], "k:", linewidth=1)
plt.text(0.3, 0.5, "Depth=2", fontsize=13)
plt.title("max_depth=3", fontsize=14)plt.show()

tree_reg1 = DecisionTreeRegressor(random_state=42)
tree_reg2 = DecisionTreeRegressor(random_state=42, min_samples_leaf=10)
tree_reg1.fit(X, y)
tree_reg2.fit(X, y)x1 = np.linspace(0, 1, 500).reshape(-1, 1)
y_pred1 = tree_reg1.predict(x1)
y_pred2 = tree_reg2.predict(x1)plt.figure(figsize=(11, 4))plt.subplot(121)
plt.plot(X, y, "b.")
plt.plot(x1, y_pred1, "r.-", linewidth=2, label=r"$\hat{y}$")
plt.axis([0, 1, -0.2, 1.1])
plt.xlabel("$x_1$", fontsize=18)
plt.ylabel("$y$", fontsize=18, rotation=0)
plt.legend(loc="upper center", fontsize=18)
plt.title("No restrictions", fontsize=14)plt.subplot(122)
plt.plot(X, y, "b.")
plt.plot(x1, y_pred2, "r.-", linewidth=2, label=r"$\hat{y}$")
plt.axis([0, 1, -0.2, 1.1])
plt.xlabel("$x_1$", fontsize=18)
plt.title("min_samples_leaf={}".format(tree_reg2.min_samples_leaf), fontsize=14)plt.show()

唐宇迪学习笔记12：sklearn构造决策树相关推荐

唐宇迪学习笔记3：Python数据可视化库——Matplotlib
目录一.Matplotlib概述最基本的图线条格式线条颜色颜色与格式结合二.子图与标注绘制多个线指定线条的宽度自定义参数子图给图上加上注释三.风格设置四.条形图五.条形图 ...
唐宇迪学习笔记10：项目实战-交易数据异常检测
目录一.任务目标解读信用卡欺诈检测任务流程: 主要解决问题: 二.项目挑战与解决方案制定 1.导入我们的工具包 2.数据读取 3.数据标签分布三.数据标准化处理四.下采样数据集制作五.交叉 ...
唐宇迪学习笔记18：案例——SVM调参实例
目录一.SVM案例:线性支持向量机 SVM:支持向量机支持向量基本原理例子 Support Vector Machines: 最小化雷区训练一个基本的SVM 对比实验二.软间隔C值对结果的 ...
唐宇迪学习笔记4：Python可视化库——Seaborn
目录一.整体布局风格设置五种主题风格 1.darkgrid 2.whitegrid 3.dark 4.white 5.ticks 二.风格细节设置 1.指定画图距离轴线的位置 2.指定轴的隐藏与否 ...
唐宇迪学习笔记20：聚类算法——DBSCAN
目录一.DBSCAN聚类算法 (Density-Based Spatial Clustering of Applications with Noise) 二.DBSCAN工作流程工作流程参数选择 ...
唐宇迪学习笔记1：Python环境安装、Pytho科学计算库——Numpy
目录一.AI数据分析入门 1.案例来源 2.Python环境配置(Python3) Python的安装 Python库安装工具 Jupyter Notebook 二.Python科学计算库--Nu ...
唐宇迪Pytorch笔记（附课程资料）
目录 pytorch_tutorial 介绍软件架构安装教程所需python包使用说明配套资料 { title = {pytorch深度学习实战}, author = {唐宇迪}, url ...
23神经网络 :唐宇迪《python数据分析与机器学习实战》学习笔记
唐宇迪<python数据分析与机器学习实战>学习笔记 23神经网络 1.初识神经网络百度深度学习研究院的图,当数据规模较小时差异较小,但当数据规模较大时深度学习算法的效率明显增加,目前大 ...
唐宇迪机器学习实战课程笔记(全)
1. 线性回归 1.1线性回归理论 1.2线性回归实战 2.训练调参基本功(线性回归.岭回归.Lasso回归) 2.1 线性回归模型实现 2.2不同GD策略对比 2.3多项式曲线回归 2.4过拟合和欠 ...
唐宇迪机器学习课程笔记：逻辑回归之信用卡检测任务
信用卡欺诈检测基于信用卡交易记录数据建立分类模型来预测哪些交易记录是异常的哪些是正常的. 任务流程: 加载数据,观察问题针对问题给出解决方案数据集切分评估方法对比逻辑回归模型建模结果分析 ...

唐宇迪学习笔记12：sklearn构造决策树

一、树模型可视化展示

1、导包

2、树模型的可视化展示

3、保存为pic.dot文件

4、在dot文件目录下生成.png文件

二、决策边界展示分析

1、展示png文件

2、决策边界展示

3、概率估计

三、树模型预剪枝参数作用

1、决策树中的正则化

2、举例

3、树模型对数据的敏感程度

四、回归树模型

回归任务

1、构建数据

2、导包

3、树模型展示

4、在dot文件目录下生成.png文件

5、png展示

对比树的深度对结果的影响

唐宇迪学习笔记12：sklearn构造决策树相关推荐

最新文章

热门文章