python模型部署方法_终极开箱即用的自动化Python模型选择方法

python模型部署方法

Choosing the best model is a key step after feature selection in any data science projects. This process consists of using the best algorithms (supervised, unsupervised) for obtaining the best predictions. Automate model selection methods for high dimensional datasets generally include Libra and Pycaret. A unicorn data-scientist needs to master the most advanced Automate model selections methods. In this article, we will review the 2 best Kaggle winners’ Automate model selections methods which can be implemented in short python codes.

在任何数据科学项目中选择特征之后，选择最佳模型都是关键的一步。此过程包括使用最佳算法(有监督，无监督)来获得最佳预测。用于高维数据集的自动模型选择方法通常包括Libra和Pycaret 。独角兽数据科学家需要掌握最先进的自动模型选择方法。在本文中，我们将介绍2种最佳的Kaggle获奖者的Automate模型选择方法，这些方法可以用简短的python代码实现。

For this article, we will analyze the sample chocolate bar rating dataset you can find here.

对于本文，我们将分析示例巧克力条评级数据集，您可以在此处找到。

A challenging dataset which after features selections contains 20 from 3400 features correlate to the target feature ‘review date’.

一个极具挑战性的数据集，在特征选择之后包含3400个特征中的20个，与目标特征“审查日期”相关。

Libra

天秤座

The challenge is to find the best performing combination of techniques so that you can minimize the error in your predictions. Libra provides out-of-the-box automated supervised machine learning that optimizes machine (or deep) learning pipelines, automatically searching for the best learning algorithms (Neural network, SVM, decision tree, KNN, etc) and best hyperparameters in seconds. Click here to see a complete list of estimators/models available in Libra.

面临的挑战是找到性能最佳的技术组合，以使预测误差最小。 Libra提供了开箱即用的自动监督机器学习，可优化机器(或深度)学习管道，自动在几秒钟内搜索最佳学习算法(神经网络，SVM，决策树，KNN等)和最佳超参数。单击此处查看天秤座中可用的估计器/模型的完整列表。

Here an example predicting the review_date feature of the chocolate rating dataset, a complex multiclass classification (labels: 2006–2020).

这是一个预测巧克力评分数据集的review_date功能的示例，这是一个复杂的多类分类(标签：2006–2020)。

#import libraries!pip install librafrom libra import client#open the dataseta_client = client('../input/preprocess-choc/dfn.csv')print(a_client)#choose the modela_client.neural_network_query('review_date', epochs=20)a_client.analyze()

Libra result in a neural network with an accuracy before optimizations of 0.796 and after of 0.860 reducing overfitting from train/test = 0.796–0.764 (0.35) to train/test = 0.860–0.851 (0.009) resulting in the best numbers of neural network layers from 3 to 6.

天秤座导致神经网络的精度在优化之前为0.796，在优化之后为0.860，减少了从训练/测试= 0.796–0.764(0.35)到训练/测试= 0.860–0.851(0.009)的过度拟合，从而获得了最佳的神经网络层数从3到6。

2. Pycaret

2. 皮卡雷

PyCaret is simple and easy to use sequential pipeline including a well integrate preprocessing functions with hyperparameters tuning and train models ensembling.

PyCaret是简单易用的顺序流水线，包括具有超参数调整和训练模型集成的良好集成的预处理功能。

#import libraries!pip install pycaretfrom pycaret.classification import *#open the datasetdfn = pd.read_csv('../input/preprocess-choc/dfn.csv')#define target label and parametersexp1 = setup(dfn, target = 'review_date', feature_selection = True)

All the preprocessing steps are applied within setup(). With more than 40 features to prepare data for machine learning including missing values imputation, categorical variable encoding, label encoding (converting yes or no into 1 or 0), and train-test-split are automatically performed when setup() is initialized. For more details about PyCaret’s preprocessing abilities Click here.

所有预处理步骤都在setup()中应用。初始化setup()时，将自动执行40多种功能来为机器学习准备数据，包括缺失值插补，分类变量编码，标签编码(将yes或no转换为1或0)和train-test-split。有关PyCaret预处理功能的更多详细信息，请单击此处。

Pycaret makes model comparisons in one line, returning a table with k-fold cross-validated scores and algorithms scored metrics.

Pycaret在一行中进行模型比较，返回一张带有k倍交叉验证得分和算法得分指标的表格。

compare_models(fold = 5, turbo = True)

PyCaret has over 60 open-source ready-to-use algorithms. Click here to see a complete list of estimators/models available in PyCaret.

PyCaret具有60多种开源即用型算法。单击此处查看PyCaret中可用的估算器/模型的完整列表。

The tune_model function is used for automatically tuning hyperparameters of a machine learning model. PyCaret uses random grid search over a predefined search space. This function returns a table with k-fold cross-validated scores.

tune_model函数用于自动调整机器学习模型的超参数。 PyCaret在预定义的搜索空间上使用随机网格搜索 。此函数返回具有k倍交叉验证得分的表格。

The ensemble_model function is used for ensembling trained models. It takes only trained model object returning a table with k-fold cross validated scores.

ensemble_model函数用于组合训练后的模型。它仅需要训练的模型对象返回具有k倍交叉验证得分的表格。

# creating a decision tree modeldt = create_model(dt)# ensembling a trained dt modeldt_bagged = ensemble_model(dt)#plot_model dtplot_model(estimator = dt, plot = 'learning')# plot_model dt_baggedplot_model(estimator = dt_bagged, plot = 'learning')

Simple and bagging decisions tree evaluations metrics

Performance evaluation and diagnostics of a trained machine learning model can be done using the plot_model function.

可以使用plot_model函数对经过训练的机器学习模型进行性能评估和诊断。

#hyperparameters tunningtuned_dt = tune_model(dt,optimize = "Accuracy", n_iter = 500)#evaluate modelevaluate_model(estimator=tuned_dt)#plot tuned dt confusion matrixplot_model(tuned_dt, plot = 'confusion_matrix')

Decision tree classifier evaluations methods using Pycaret

Finally, predict_model function can be used to predict unseen dataset.

最后， predict_model函数可用于预测看不见的数据集。

#predicting label on a new datasetpredictions = predict_model(dt)

Review_date predictions using decision tree

If you have some spare time I’d recommend, you’ll read this:

如果您有空闲时间，建议您阅读以下内容：

Sum Up

总结

Refer to these links :

请参考以下链接：

https://jovian.ml/yeonathan/libra

https://jovian.ml/yeonathan/pycaret

For complete algorithms selections of chocolate bar review date estimations using these 2 methods.

对于完整的算法选择，使用这两种方法选择巧克力棒的日期估计。

This brief overview is a reminder of the importance of using the right algorithms selection methods in data science. This post has for scope to cover the 2 best Python automate algorithms selection methods for high dimensional datasets, as well as share useful documentation.

这个简短的概述提醒我们在数据科学中使用正确的算法选择方法的重要性。这篇文章的范围涵盖了针对高维数据集的2种最佳Python自动算法选择方法，并分享了有用的文档。

I hope you enjoy it, keep exploring!

希望您喜欢它，继续探索！

翻译自: https://towardsdatascience.com/the-ultimate-out-of-the-box-automated-python-model-selection-methods-f2188472d2a

python模型部署方法

查看全文

http://www.taodudu.cc/news/show-863796.html

总体方差的充分统计量_R方是否衡量预测能力或统计充分性？
多尺度视网膜图像增强_视网膜图像怪异的预测
多元线性回归中多重共线性_多重共线性如何在线性回归中成为问题。
opencv 创建图像_非艺术家的图像创建（OpenCV项目演练）
使用TensorFlow进行深度学习-第2部分
基于bert的语义匹配_构建基于BERT的语义搜索系统…针对“星际迷航”
一个数据包的旅程_如何学习数据科学并开始您的惊人旅程
jupyter 托管_如何在本地托管的Jupyter Notebook上进行协作
fitbit手表中文说明书_如何获取和分析Fitbit睡眠分数
熔池沉积_用于3D打印的AI（第2部分）：异常熔池检测的一课学习
机器学习可视化_机器学习-可视化
学习javascript_使用5行JavaScript进行机器学习
强化学习-动态规划_强化学习-第4部分
神经网络优化器的选择_神经网络：优化器选择的重要性
客户细分_客户细分：K-Means聚类和A / B测试
菜品三级分类_分类器的惊人替代品
开关变压器绕制教程_教程：如何将变压器权重和令牌化器从AllenNLP上传到HuggingFace
一般线性模型和混合线性模型_线性混合模型如何工作
为什么基于数字的技术公司进行机器人研究
人类视觉系统_对人类视觉系统的对抗攻击
在神经网络中使用辍学：不是一个神奇的子弹
线程监视器模型_为什么模型验证如此重要，它与模型监视有何不同
dash使用_使用Dash和SHAP构建和部署可解释的AI仪表盘
面向表开发面向服务开发_面向繁忙开发人员的计算机视觉
可视化 nltk_词嵌入：具有Genism，NLTK和t-SNE可视化的Word2Vec
fitbit手表中文说明书_使用机器学习预测Fitbit睡眠分数
redis生产环境持久化_在SageMaker上安装持久性Julia环境
alexnet vgg_从零开始：建立著名的分类网2（AlexNet / VGG）
垃圾邮件分类 python_在python中创建SMS垃圾邮件分类器
脑电波之父:汉斯·贝格尔_深度学习，认识聪明的汉斯

python模型部署方法_终极开箱即用的自动化Python模型选择方法相关推荐

python操作网页游戏_教你写页游自动化Python脚本 1.界面篇（模仿某键精灵）[Python3]...
自学py写的第一个脚本本教程为新手向废话少说,下面开始教程我们先用tkinter搭建好脚本的基本界面首先导入tkinter,需要事先用pip安装进python里(方法自行百度) import t ...
利用模型算法部署图像识别_利用这些技巧增强您的图像识别模型
利用模型算法部署图像识别 So, you have gathered a dataset, built a neural network, and trained your model. 因此,您已经 ...
python RTL自动生成_成为视频分析专家：自动生成集锦的方法(Python实现)
概述以一种简单的方法在Python中生成视频集锦不使用机器学习或深度学习,学习如何自动生成集锦使用我们自己的方法,对一场完整的板球比赛自动生成集锦介绍我是个超级板球迷.从我记事起,我就迷上了 ...
python对excel表统计视频_元组常用统计方法_【曾贤志】用Python处理Excel数据 - 第1季基础篇_Excel视频-51CTO学院...
---------------------------------------------------------------- 学完本课程可继续巩固篇:https://edu.51cto.com/c ...
python大神作品_掌握了这24个顶级Python库，你就是大神！
全文共11815字,预计学习时长24分钟 Python有以下三个特点: · 易用性和灵活性 · 全行业高接受度:Python无疑是业界最流行的数据科学语言 · 用于数据科学的Python库的数量优势 ...
python 进程生命周期_计算客户生命周期价值的python解决方案
python 进程生命周期 By Lisa Cohen, Zhining Deng, Shijing Fang, and Ron Sielinski 由丽莎·科恩,志宁邓,石井方和罗恩Sielinsk ...
python药店销售数据分析_解析医院药店销售数据，看看Python数据分析结果有什么惊人的发现|python基础教程|python入门|python教程...
https://www.xin3721.com/eschool/pythonxin3721/ 前言本文的文字及图片来源于网络,仅供学习.交流使用,不具有任何商业用途,版权归原作者所有,如有问题请及时 ...
python国内书籍推荐_久等了,你要的 Python 书籍推荐,来了!
前言时不时有小伙伴私信问我有什么好一些的 Python 书籍推荐,想要学习学习. 那么今天就来给大伙说道说道,我会划分为以下几个分类,让不同阶段的朋友可以根据自身的情况,选择适合自己当下学习的 Py ...
python制作图像数据集_详细图像数据集增强原理的python代码
导读在深度学习时代,数据的规模越大.质量越高,模型就能够拥有更好的泛化能力,数据直接决定了模型学习的上限.然而在实际工程中,采集的数据很难覆盖全部的场景,比如图像的光照条件,同一场景拍摄的图片可能由 ...

python模型部署方法_终极开箱即用的自动化Python模型选择方法

相关文章：

python模型部署方法_终极开箱即用的自动化Python模型选择方法相关推荐

最新文章

热门文章