• 本文分享机器学习工具Scikit-Learn强力扩展yellowbrick

  • 通过几行代码可视化特征值、模型、模型评估等,帮助更便捷的的选择机器学习模型和调参,依赖Matplotlib和Scikit-Learn。

本文目录


yellowbrick安装

# 清华源加速安装
pip install yellowbrick -i https://pypi.tuna.tsinghua.edu.cn/simple

yellowbrick核心“武器” - Visualizers

Visualizers可以理解为一个scikit-learn的估计器(estimator)对象,但是附加了可视化的属性,使用过程与使用scikit-learn模型类似:

  • 导入特定的visualizers;

  • 实例化visualizers;

  • 拟合visualizers;

  • 可视化展示。


yellowbrick实例快速上手

  • 展示ROC曲线,评估不同模型效果

import matplotlib.pyplot as pltplt.figure(dpi=120)
from sklearn.linear_model import RidgeClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OrdinalEncoder, LabelEncoderfrom yellowbrick.classifier import ROCAUC
from yellowbrick.datasets import load_game# 导入数据
X, y = load_game()# 数据转换
X = OrdinalEncoder().fit_transform(X)
y = LabelEncoder().fit_transform(y)# 构建测试集和训练集
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)# 实例化分类模型和visualizer
model = RidgeClassifier()
visualizer = ROCAUC(model, classes=["win", "loss", "draw"])visualizer.fit(X_train, y_train)  # 拟合visualizer
visualizer.score(X_test, y_test)  # 评价模型在训练集上效果
visualizer.show()

  • 特征工程中,展示PCA降维效果

import matplotlib.pyplot as pltplt.figure(dpi=120)
from yellowbrick.features import PCAX, y = load_credit()
classes = ['account in default', 'current with bills']visualizer = PCA(scale=True, projection=3, classes=classes)
visualizer.fit_transform(X, y)
visualizer.show()

  • 回归模型中,展示预测值和真实值之间的残差,Q-Q plot评估模型效果。

from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_splitfrom yellowbrick.datasets import load_concrete
from yellowbrick.regressor import ResidualsPlot# 导入数据
X, y = load_concrete()# 构建训练集、测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# 实例化模型和visualizer
model = Ridge()
visualizer = ResidualsPlot(model, hist=False, qqplot=True)
visualizer.fit(X_train, y_train)
visualizer.score(X_test, y_test)
visualizer.show()

Residuals Plot on the Concrete dataset with a Q-Q plot
  • 展示Lasso回归模型效果

import matplotlib.pyplot as pltplt.figure(dpi=120)
from sklearn.linear_model import Lasso
from yellowbrick.datasets import load_bikeshare
from yellowbrick.regressor import prediction_errorX, y = load_bikeshare()
visualizer = prediction_error(Lasso(), X, y)#一行代码即可展示,方不方便

更多实例见下一节~~


yellowbrick常用的Visualizers

特征展示(Feature Visualization)

  • Rank Features: pairwise ranking of features to detect relationships

  • Parallel Coordinates: horizontal visualization of instances

  • Radial Visualization: separation of instances around a circular plot

  • PCA Projection: projection of instances based on principal components

  • Manifold Visualization: high dimensional visualization with manifold learning

  • Joint Plots: direct data visualization with feature selection

分类模型展示(Classification Visualization)

  • Class Prediction Error: shows error and support in classification

  • Classification Report: visual representation of precision, recall, and F1

  • ROC/AUC Curves: receiver operator characteristics and area under the curve

  • Precision-Recall Curves: precision vs recall for different probability thresholds

  • Confusion Matrices: visual description of class decision making

  • Discrimination Threshold: find a threshold that best separates binary classes

回归模型展示(Regression Visualization)

  • Prediction Error Plot: find model breakdowns along the domain of the target

  • Residuals Plot: show the difference in residuals of training and test data

  • Alpha Selection: show how the choice of alpha influences regularization

  • Cook’s Distance: show the influence of instances on linear regression

聚类模型展示(Clustering Visualization)

  • K-Elbow Plot: select k using the elbow method and various metrics

  • Silhouette Plot: select k by visualizing silhouette coefficient values

  • Intercluster Distance Maps: show relative distance and size/importance of clusters

模型选择(Model Selection Visualization)

  • Validation Curve: tune a model with respect to a single hyperparameter

  • Learning Curve: show if a model might benefit from more data or less complexity

  • Feature Importances: rank features by importance or linear coefficients for a specific model

  • Recursive Feature Elimination: find the best subset of features based on importance

目标展示(Target Visualization)

  • Balanced Binning Reference: generate a histogram with vertical lines showing the recommended value point to bin the data into evenly distributed bins

  • Class Balance: see how the distribution of classes affects the model

  • Feature Correlation: display the correlation between features and dependent variables

文本展示(Text Visualization)

  • Term Frequency: visualize the frequency distribution of terms in the corpus

  • t-SNE Corpus Visualization: use stochastic neighbor embedding to project documents

  • Dispersion Plot: visualize how key terms are dispersed throughout a corpus

  • UMAP Corpus Visualization: plot similar documents closer together to discover clusters

  • PosTag Visualization: plot the counts of different parts-of-speech throughout a tagged corpus


yellowbrick图形个性化设置

https://www.scikit-yb.org/en/latest/index.html


-END-

【机器学习】机器学习可视化利器--Yellowbrick相关推荐

  1. Python可视化神器Yellowbrick使用

    ♚ 作者:沂水寒城,CSDN博客专家,个人研究方向:机器学习.深度学习.NLP.CV Blog: http://yishuihancheng.blog.csdn.net 在机器学习.数据挖掘领域里面, ...

  2. python语言实现lassocv中的可视化显示_Python可视化神器Yellowbrick使用

    码农那点事儿 关注我们,一起学习进步 作者:Together_CZ 链接:https://blog.csdn.net/Together_CZ/article/details/86640784 在机器学 ...

  3. python机器学习可视化工具Yellowbrick介绍及平行坐标图实战示例

    python机器学习可视化工具Yellowbrick介绍及平行坐标图实战示例 目录 python机器学习可视化工具Yellowbrick介绍及平行坐标图实战示例 yellowbrick简介及安装

  4. python机器学习可视化工具Yellowbrick绘图获取最佳聚类K值实战示例

    python机器学习可视化工具Yellowbrick绘图获取最佳聚类K值实战示例 目录 机器学习可视化工具Yellowbrick绘图获取最佳聚类K值实战示例

  5. 论文研读-AI4VIS-可视化推荐-VizML: 一种基于机器学习的可视化推荐方法

    VizML: 一种基于机器学习的可视化推荐方法 1 论文概述 1.1 摘要 1.2 引言 2 问题陈述 3 相关工作 3.1 基于规则的可视化推荐系统 3.2 基于机器学习的可视化推荐系统 4 数据 ...

  6. 《Python机器学习与可视化分析实战》简介

    #好书推荐##好书奇遇季#<Python机器学习与可视化分析实战>,京东当当天猫都有发售.定价69元.Python机器学习与可视化入门书,配套示例源码.PPT课件.作者答疑. 本书带给你的 ...

  7. 机器学习模型可视化(Slingshot)

    数据可视化可以赋能任何企业. 这包括在以下方面的巨大改进: 业务流程 设计 发展 评估 因此,利用机器学习可视化也是您的业务所需要的. 什么是机器学习:要点 机器学习(或 ML)是人工智能(或 AI) ...

  8. python可视化神器_详解Python可视化神器Yellowbrick使用

    机器学习中非常重要的一环就是数据的可视化分析,从源数据的可视化到结果数据的可视化都离不开可视化工具的使用,sklearn+matplotlib的组合在日常的工作中已经满足了绝对大多数的需求,今天主要介 ...

  9. 深度 | 详解可视化利器t-SNE算法:数无形时少直觉

    T 分布随机近邻嵌入(T-Distribution Stochastic Neighbour Embedding)是一种用于降维的机器学习方法,它能帮我们识别相关联的模式.t-SNE 主要的优势就是保 ...

最新文章

  1. php 筛选数组,php数组如何按照字段筛选
  2. Customer Report这个Fiori应用必须和CRM耦合在一起么
  3. CTS(8)---Android Google认证 -CTS认证问题小结
  4. HAproxy配置文件操作
  5. GNS 3路由器7200介绍
  6. C 语言中获取文件长度(ftell函数)
  7. 前端实习4个月的心得(uniapp)
  8. GCD中dispatch_semaphore(信号量)的使用方法
  9. argis利用gp工具打包tpk切片包
  10. 关于【野火】OV7725例程移植【OV7670】总结
  11. html页面太大了怎么调小,html – 如何在调整浏览器窗口大小时保持绝对定位的元素...
  12. 中国互联网20周年谈----GITC 2014
  13. 【Transformers】第 7 章 :问答
  14. java 调用 默认打印机 打印小票
  15. 网易有道校招笔试总结
  16. web 微信与基于node的微信机器人实现
  17. 实习时候的亚子==(一)
  18. 【题解】zjnu2709 善良的dp欧尼酱
  19. 构建“量子世界观”的基本概念
  20. android studio 打开pdf,从Android Studio中的手机浏览pdf

热门文章

  1. Linux 进程间通讯详解五
  2. 微信接口开发之高级篇系列【网页授权详细说明【提供测试账号使用】】
  3. sharepoint 2010学习资源汇总
  4. 【0805作业】模拟多人爬山
  5. 《浪潮之巅》读书笔记(中)
  6. Mac下Ruby升级与Rails的安装
  7. Android ListViewview入门
  8. Android事件机制详解
  9. SimpleAdapter的用法
  10. .net中从GridView中导出数据到excel(详细)