PyCaret is an open-source, low-code machine learning library in Python that aims to reduce the cycle time and allows you to go from preparing your data to deploying your model within seconds using your choice of notebook environment.

P yCaret是Python中的一种开放源代码，低代码的机器学习库，旨在减少周期时间，并允许您使用选择的笔记本环境从准备数据到在几秒钟内部署模型。

This article is aimed at someone who is familiar with machine learning concepts, and also knows how to implement the various Machine Learning algorithms using different libraries such as Scikit-Learn. The perfect reader is aware of the need for automation and doesn’t want to spend so much time seeking the optimal algorithm and its hyperparameters.

本文针对的对象是熟悉机器学习概念的人，并且知道如何使用不同的库(例如Scikit-Learn)来实现各种机器学习算法。完美的读者已经意识到了自动化的必要性，并且不想花太多时间寻找最佳算法及其超参数。

As machine learning practitioners, we know that there are several steps involved in the life cycle of a complete Data Science project and these include Data Preprocessing — missing value treatment, null value treatment, changing the data types, encoding techniques for categorical features, data transformation — log, box cox transformations, feature engineering, Exploratory Data Analysis (EDA), etc. before we can actually start the model building, evaluation and prediction. So we use various libraries such as numpy, pandas, matplotlib scikit-learn, etc in python for accomplishing these tasks. So Pycaret is a very powerful library that helps us in the automation of the process.

作为机器学习的从业者，我们知道一个完整的数据科学项目的生命周期涉及几个步骤，其中包括数据预处理-缺失值处理，空值处理，更改数据类型，分类特征的编码技术，数据转换—日志，Box Cox转换，功能工程，探索性数据分析(EDA)等，然后我们才能真正开始模型的建立，评估和预测。因此，我们在python中使用了各种库(例如numpy，pandas，matplotlib scikit-learn等)来完成这些任务。因此，Pycaret是一个非常强大的库，可以帮助我们实现流程的自动化。

安装Pycaret (Installing Pycaret)

!pip install pycaret==2.0

Once Pycaret is installed, we are ready to go! I am going to discuss a regression problem here and Pycaret can be used for many problems such as classification, anomaly detection, clustering, Natural Language Processing.

一旦安装了Pycaret，我们就可以开始了！我将在这里讨论回归问题，Pycaret可以用于许多问题，例如分类，异常检测，聚类，自然语言处理。

I am going to use the Laptop Prices dataset here which I have obtained from scraping Flipkart website.

我将在这里使用笔记本电脑价格数据集我是从抓取Flipkart网站获得的。

df = pd.read_csv('changed.csv') # Reading the datasetdf.head()

from pycaret.regression import *reg = setup(data = df, target = 'Price')

The setup() function of Pycaret does most of the correction, which is normally done with many lines of code — is done in a single line of code! That’s the beauty of this amazing library!

Pycaret的setup()函数进行了大部分校正，这通常是用多行代码完成的—只需一行代码即可完成！这就是这个令人惊叹的图书馆的美！

We use the setup variable, and in the target, we mention the feature name (dependent variable)-here we want to predict the Price of the laptop so that becomes the dependent variable.

我们使用设置变量，在目标中，我们提到功能名称(因变量)，此处我们要预测笔记本电脑的价格，以使其成为因变量。

X = df.drop('Price',axis=1) Y = df['Price'] Y = pd.DataFrame(Y)

Comparing all the regression models

比较所有回归模型

compare_models()

Training all the regression models. So after this, we can create any model-either CatBoost or else XGBoost regressor model, and then we can perform hyperparameter tuning.

训练所有回归模型。因此，在此之后，我们可以创建任何模型-CatBoost或XGBoost回归模型，然后执行超参数调整。

We can see that our Gradient Boosting Regressor (GBR) model has performed relatively better when compared to all the other models. But I have performed the analysis using the XGBoost model as well, and this model performed better than the GBR Model.

我们可以看到，与所有其他模型相比，我们的Gradient Boosting Regressor(GBR)模型的性能相对较好。但是我也使用XGBoost模型进行了分析，并且该模型的性能优于GBR模型。

Error using Gradient Boosting Regressor model

As we have identified the best model to be XGBoost so we create xgboost model with the help of create_model function and mention the max_depth(number of iteration for which the model will run)

由于我们已确定最佳模型为XGBoost，因此我们在create_model函数的帮助下创建了xgboost模型，并提到了max_depth(该模型将针对其运行的迭代次数)

Creating the model

建立模型

xgboost = create_model('xgboost', max_depth = 10)

So after creating the model with a depth of 10, it runs 10 iterations and calculates the MAE(Mean Absolute Error), MSE (Mean Squared Error), RMSE (Root Mean Squared Error), R2(R2_score-R squared value), MAPE (Mean Absolute Percentage Error) in every iteration. Finally, it displays the mean and standard deviation of all the errors in these 10 iterations. Lesser the error better is the machine learning model! So in order to reduce the error, we try to find out the hyperparameters which can minimize the error.

因此，在创建深度为10的模型之后，它将运行10次迭代并计算MAE(均值绝对误差)，MSE(均方误差)，RMSE(均方根误差)，R2(R2_score-R平方值)，MAPE (平均绝对百分比误差)。最后，它显示了这10次迭代中所有误差的平均值和标准偏差。误差越小，机器学习模型就越好！因此，为了减少错误，我们尝试找出可以使错误最小化的超参数。

For this purpose, we apply the tune_model function and apply K-fold cross-validation to find out the best hyperparameters.

为此，我们应用tune_model函数并应用K折交叉验证以找出最佳的超参数。

Hyper tuning of the model

超调模型

xgboost = tune_model(xgboost, fold=5)

The model runs 5 iterations and gives us the mean and standard deviation of all the errors. The mean value of MAE after 5 iterations was almost the same for both GBR and XGBoost models, but after hyper tuning and making the predictions, the XGBoost model had less error and performed better than the GBR model.

该模型运行5次迭代，并为我们提供所有误差的均值和标准差。对于GBR和XGBoost模型，经过5次迭代后，MAE的平均值几乎相同，但是经过超调和做出预测之后，XGBoost模型的误差较小，并且性能优于GBR模型。

Making predictions using the best model

使用最佳模型进行预测

predict_model(xgboost)

Checking the scores after applying Cross Validation (we mainly need the Mean Absolute Error). Here we can see that the MAE for the best model has come down to 10847.2257 so the Mean Absolute Error is approximately 10,000.

应用交叉验证后检查分数(我们主要需要平均绝对误差)。在这里，我们可以看到最佳模型的MAE已降至10847.2257，因此平均绝对误差约为10,000。

Checking all the parameters of the xgboost model

检查xgboost模型的所有参数

print(xgboost)

XGBoost model hyperparamaters

XGBoost模型超参数

plot_model(xgboost, plot='parameter')

Residuals Plot

残差图

The distances (errors) between the actual and predicted values

实际值与预测值之间的距离(误差)

plot_model(xgboost, plot='residuals')

We can clearly see that my model is overfitting as the R squared for training set is 0.999 and test set is 0.843. This is actually not surprising because my dataset contains a total of only 168 rows! But the main point here is to highlight the excellent features of Pycaret as you can create plots and curves with just one line of code!

我们可以清楚地看到，我的模型过度拟合，因为训练集的R平方为0.999，而测试集的R平方为0.843。这实际上不足为奇，因为我的数据集总共仅包含168行！但是这里的重点是要突出Pycaret的出色功能，因为您只需一行代码就可以创建绘图和曲线！

Plotting the Prediction Error

绘制预测误差

plot_model(xgboost, plot='error')

The value of R squared for the model is 0.843.

该模型的R平方的值为0.843。

Cooks Distance Plot

厨师距离图

plot_model(xgboost, plot='cooks')

Learning Curve

学习曲线

plot_model(xgboost, plot='learning')

Validation Curve

验证曲线

plot_model(xgboost, plot='vc')

These 2 plots also show us that the model is clearly overfitting!

这两个图也向我们显示该模型显然过拟合！

Plot of Feature Importance

特征重要性图

plot_model(xgboost, plot='feature')

By this plot, we can see that Processor_Type_i9 (i9 CPU) is a very important feature for determining the price of the laptop.

通过此图，我们可以看到Processor_Type_i9(i9 CPU)是确定笔记本电脑价格的非常重要的功能。

Splitting the dataset into training and testing set

将数据集分为训练和测试集

from sklearn.model_selection import train_test_splitX_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.2)

Final XGBoost parameters for deployment

最终的XGBoost部署参数

final_xgboost = finalize_model(xgboost)

Making the prediction on the unseen data ( Test set data)

对看不见的数据(测试集数据)进行预测

new_predictions = predict_model(xgboost, data=X_test)new_predictions.head()

Saving the transformation pipeline and model

保存转换流程和模型

save_model(xgboost, model_name = 'deployment_08082020')Transformation Pipeline and Model Succesfully Saveddeployment_08082020 = load_model('deployment_08082020')Transformation Pipeline and Model Sucessfully Loadeddeployment_08082020

So this is the final Machine Learning model that can be used for deployment.

因此，这是可用于部署的最终机器学习模型。

The model is saved in the pickle format!

模型以pickle格式保存！

For more info, check the documentation here

有关更多信息，请查看文档这里

In this article, I have not discussed everything in detail. But you can always refer to my GitHub Repository for the whole code. My conclusion from this article is that don’t expect a perfect model, but expect something you can use in your own company/project today!

在本文中，我没有详细讨论所有内容。但是您始终可以参考我的GitHub存储库以获取整个代码。我从本文得出的结论是，不要期望一个完美的模型，而是希望您今天可以在自己的公司/项目中使用某些东西！

Shout out to Moez Ali for this absolutely brilliant library!

为这个绝对出色的图书馆大喊Moez Ali ！

Connect with me on LinkedIn here

在此处通过LinkedIn与我联系

The bottom line is that the automation lowers the risk of human error and adds some intelligence to the enterprise system. — Stephen Elliot

最重要的是，自动化降低了人为错误的风险，并为企业系统增加了一些智能。 —斯蒂芬·艾略特(Stephen Elliot)

I hope you found the article insightful. I would love to hear feedback to improvise it and come back with better content.

我希望您发现这篇文章很有见地。我很想听听反馈以即兴创作，并以更好的内容回来。

Thank you so much for reading!

非常感谢您的阅读！

翻译自: https://towardsdatascience.com/leverage-the-power-of-pycaret-d5c3da3adb9b

查看全文

http://www.taodudu.cc/news/show-863598.html

ai伪造论文实验数据_5篇有关AI培训数据的基本论文
机器学习经典算法实践_服务机器学习算法的系统设计-不同环境下管道的最佳实践
css餐厅_餐厅的评分预测
机器学习结构化学习模型_生产化机器学习模型
人工智能已经迫在眉睫_创意计算机已经迫在眉睫
合奏：机器学习中唯一（几乎）免费的午餐
在Ubuntu 18.04上安装和使用Tesseract 4
pytorch机器学习_机器学习— PyTorch
检测和语义分割_分割和对象检测-第1部分
ai人工智能编程_从人工智能动态编程：Q学习
架构垂直伸缩和水平伸缩区别_简单的可伸缩图神经网络
yolo opencv_如何使用Yolo，SORT和Opencv跟踪足球运动员。
人工智能的搭便车指南
机器学习对回归的评估_在机器学习回归问题中应使用哪种评估指标？
可持久化数据结构加扫描线_结构化光扫描
信号处理深度学习机器学习_机器学习和信号处理如何融合？
python 数组合并排重_并排深度学习：Julia vs Python
强化学习求解迷宫问题_使用天真强化学习的迷宫求解器
朴素贝叶斯半朴素贝叶斯_使用朴素贝叶斯和N-Gram的Twitter情绪分析
自动填充数据新增测试数据_用测试数据填充员工数据库
bart使用方法_使用简单变压器的BART释义
卷积网络和卷积神经网络_卷积神经网络的眼病识别
了解回归：迈向机器学习的第一步
yolo yolov2_PP-YOLO超越YOLOv4 —对象检测的进步
机器学习初学者_绝对初学者的机器学习
monk js_对象检测-使用Monk AI进行文档布局分析
线性回归 c语言实现_C ++中的线性回归实现
忍者必须死3 玩什么忍者_降维：忍者新手
交叉验证和超参数调整：如何优化您的机器学习模型
安装好机器学习环境的虚拟机_虚拟环境之外的数据科学是弄乱机器的好方法

利用PyCaret的力量相关推荐

托管非托管_如何利用Kubernetes的力量来优化您的托管成本
托管非托管 by Daniele Polencic 由Daniele Polencic 如何利用Kubernetes的力量来优化您的托管成本 (How to leverage the power of ...
python硬件编程智能家居_利用 Python 的力量，实现 Tableau 与智能家居系统集成
导语你有想过将智能家居和 Tableau 结合么?Tableau 爱好者 Ann Jackson 就在自己家中,尝试实现将智能家居系统与 Tableau 仪表板集成在一起.根据灯光颜色非常" ...
乔布斯鲁宾_鲁宾·哈里斯（Ruben Harris）如何利用故事的力量闯入初创企业
乔布斯鲁宾 In this week's episode of the freeCodeCamp podcast, I interview Ruben Harris. 在本周的freeCodeCamp ...
xml文件打开_利用XML的力量来打开Financial Exchange文件
xml文件打开我的银行为我作为财务程序员和簿记员提供了非常有用的服务:我可以下载一个小文件,其中列出了在规定的时间段内我的一个帐户中的交易. 该文件包含帐户名称和号码: 是支票,储蓄或其他帐户类型: ...
有效管理是利用大数据力量的关键
随着大数据的应用不断发展与扩大,企业面临着新的机遇和挑战.企业可以通过大量数据揭示新的见解或策略,但必须注意不要被庞大信息的大山所压倒.正如数据专家所言,考虑到数据存储的成本增加因素,与非结构化数据的 ...
图像分类数据库_图像分类器-使用僧侣库对房屋房间类型进行分类
图像分类数据库 This article is a tutorial on how to use the Monk library to classify house room types like ...
python文本结构化处理_在Python中标记非结构化文本数据
python文本结构化处理 Labelled data has been a crucial demand for supervised machine learning leading to a n ...
python 验证模型_Python中的模型验证
python 验证模型 This is a memo to share what I have learnt in Model Validation (using Python), capturing ...
机器学习多变量回归算法_如何为机器学习监督算法识别正确的自变量？
机器学习多变量回归算法 There is a very famous acronym GIGO in the field of computer science which I have learn ...

利用PyCaret的力量

安装Pycaret (Installing Pycaret)

相关文章：

利用PyCaret的力量相关推荐

最新文章

热门文章