投影仪投影粉色_DecisionTreeRegressor —停止用于将来的投影！

投影仪投影粉色

内部AI (Inside AI)

Scikit-learn is one of the most famous machine learning library in Python. It offers several classification, regression and clustering algorithms and its key strength, in my opinion, is seamless integration with Numpy, Pandas and Scipy.

Scikit-learn是Python中最著名的机器学习库之一。它提供了几种分类，回归和聚类算法，在我看来，它的主要优势是与Numpy，Pandas和Scipy无缝集成。

Scikit-learn is so well written by the developers, that with a couple of lines of code we can check the model predictions by many different algorithms. I feel sometimes, this strength of Scikit-learn inadvertently works to its disadvantage. Machine learning developers esp. with relatively lesser experience implements an inappropriate algorithm for prediction without grasping particular algorithms salient feature and limitations.

Scikit-learn由开发人员编写得非常好，以至于只需几行代码，我们就可以通过许多不同的算法检查模型预测。有时我感到，Scikit学习的这种力量在无意中起到了不利的作用。机器学习开发人员，尤其是。经验相对较少的人在不掌握特定算法的显着特征和局限性的情况下，实施了不合适的预测算法。

In this article, I will discuss the reason we should not use the decision tree regression algorithm in making a prediction involving extrapolating the data.

在本文中，我将讨论在进行涉及外推数据的预测时不应使用决策树回归算法的原因。

Objective

目的

We have the iron, calcium and protein content of peas since the time it is picked from the farm until 1142 days. Let us assume that it is easier and economical to measure the iron and calcium content compare to protein content.

自豌豆从农场被采摘到1142天，我们已经拥有其铁，钙和蛋白质的含量。让我们假设与蛋白质含量相比，测量铁和钙含量更容易，更经济。

We will use this data to train the DecisionTreeRegressor algorithm and then predict the protein content based on new data points related to iron content, calcium, and days passed.

我们将使用这些数据来训练DecisionTreeRegressor算法，然后根据与铁含量，钙和通过天数有关的新数据点预测蛋白质含量。

Sample Data File

样本数据文件

I think that the data file is pretty much self-explanatory. The rows show the iron, calcium and protein content of peas with days since harvesting.

我认为数据文件几乎是不言自明的。这些行显示了自收获以来的豌豆中铁，钙和蛋白质的含量。

Step 1- We will import the packages pandas, matplotlib, and DecisionTreeRegressor and NumPy which we are going to use for our analysis.

第1步 -我们将导入将用于分析的软件包pandas，matplotlib，DecisionTreeRegressor和NumPy。

from sklearn.tree import DecisionTreeRegressorimport pandas as pdimport matplotlib.pyplot as pltimport numpy as np

Step 2- Read the full data sample data excel file into the PandasDataframe called “data”.

步骤2-将完整的数据样本数据excel文件读入称为“ data”的PandasDataframe中。

data= pd.read_excel("Peas Nutrient.xlsx")

I will not focus on preliminary data quality checks like blank values, outliers, etc. and respective correction approach in this article, and assuming that there are no data series related to the discrepancy.

在本文中，我将不着重于初步的数据质量检查，例如空白值，离群值等，以及相应的校正方法，并假设没有与差异有关的数据系列。

Step 3- We will split the full data set into two parts viz. training and testing set. As the name suggests, we will be using the training dataset to train the decision tree regressor algorithm and compare the protein predictions with actual content based on data in the testing set.

步骤3-我们将整个数据集分为两部分。培训和测试集。顾名思义，我们将使用训练数据集来训练决策树回归算法，并根据测试集中的数据将蛋白质预测值与实际含量进行比较。

In the below code, data records from day 1 to day 900 are sliced as training data and data records from day 901 to 1142 as testing data.

在下面的代码中，将第1天到第900天的数据记录切成训练数据，将第901天到1142天的数据记录切成测试数据。

Training_data= data[:900]Test_data=data.loc[901:1142]

Step 4- “Days passed”, “iron content” and “calcium content” are independent variables used for prediction.”Protein content” predicted is the dependent variable. Generally, the independent variable is denoted with “X “and the dependent variable with “y”.

步骤4- “经过的天数”，“铁含量”和“钙含量”是用于预测的自变量。预测的“蛋白质含量”是因变量。通常，自变量用“ X”表示，因变量用“ y”表示。

In the code below, “Protein content” data column is dropped from the DataFrame and remaining, data i.e independent variables datapoints is declared as X_train. Similarly, all the data columns except “Protein content” is dropped and declared as y_train.

在下面的代码中，“蛋白质内容”数据列从DataFrame中删除，剩余的数据(即自变量数据点)声明为X_train。同样，所有数据列(“蛋白质内容”除外)都将被删除并声明为y_train。

X_train=Training_data.drop(["Protein Content "], axis=1)y_train=Training_data.drop(["Days Passed", "Iron Content " ,"Calcium Content "], axis=1)

The same process is repeated in the below code for the testing data set i.e. values from day 901 to day 1142,

在下面的代码中针对测试数据集重复相同的过程，即从901天到1142天的值，

X_test=Test_data.drop(["Protein Content "], axis=1)y_test=Test_data.drop(["Days Passed", "Iron Content " ,"Calcium Content "], axis=1)

Step 5- DecisionTreeRegressor model is trained with the training dataset. Further, the score is checked to understand how well the algorithm is trained on this data.

步骤5-使用训练数据集对DecisionTreeRegressor模型进行训练。此外，检查分数以了解算法在该数据上的训练程度。

tree_reg = DecisionTreeRegressor().fit(X_train, y_train)print("The model training score is" , tree_reg.score(X_train, y_train))

A perfect score of 1.0 itself indicates the overfitting of the model.

1.0的完美得分本身表明该模型过度拟合。

Step 5- To address the overfitting due to unconstrained depth of tree during training the model, we will put a constraint of the max depth of 4.

步骤5-为了解决在训练模型期间因树深度不受约束而导致的过度拟合，我们将最大深度限制为4。

tree_reg = DecisionTreeRegressor(max_depth=6).fit(X_train, y_train)print("The model training score is" , tree_reg.score(X_train, y_train))

This solves the overfitting of the model on training data, and the model is ready to predict the protein content based on test data points.

这解决了模型对训练数据的过度拟合问题，并且该模型已准备好根据测试数据点预测蛋白质含量。

Step 6- In the below code, “protein content” of test data set i.e. from days 901 to 1142 is predicted based on respective “days passed”, “iron content” and “calcium content” data.

步骤6-在下面的代码中，根据相应的“经过的天数”，“铁含量”和“钙含量”数据预测测试数据集(即从901天到1142天)的“蛋白质含量”。

y_pred_tree = tree_reg.predict(X_test)

Step 7- We will plot the predicted protein content by the decision tree regression model and compare with actual protein content in the test dataset from day 901 to 1142.

步骤7-我们将通过决策树回归模型绘制预测的蛋白质含量，并与从901天到1142天的测试数据集中的实际蛋白质含量进行比较。

plt.plot(X_test["Days Passed"],y_test, label="Actual Data")plt.plot(X_test["Days Passed"],np.rint(y_pred_tree), label="Predicted Data")plt.ylabel("Days Passed")plt.xlabel('Protin Content (in Grams)')plt.legend(loc='best')plt.show()

We can see that the decision tree regressor model, which is trained quite well in training dataset with 0.93 score fails miserably to predict the protein content on test data. The model predicts the same protein content of ~ 51.34 for all days.

我们可以看到，在0.93分数的训练数据集中训练得很好的决策树回归模型在预测测试数据中的蛋白质含量方面失败了。该模型预测所有天的蛋白质含量相同，约为51.34。

We should not use the Decision Tree Regression model for prediction involving extrapolating the data. This is just an example, and the main takeaway for us machine learning practitioners are to consider the data, prediction objective, algorithms strengths and limitations before starting modelling.

我们不应该将决策树回归模型用于涉及外推数据的预测。这只是一个例子，对于我们的机器学习从业人员来说，主要的收获是在开始建模之前要考虑数据，预测目标，算法的优势和局限性。

We can make similar mistakes while selecting the independent variables for Machine Learning Supervised Algorithms. In the article, “How to identify the right independent variables for Machine Learning Supervised Algorithms? ” I have discussed a structured approach to identify the appropriate independent variables to make accurate predictions.

在为机器学习监督算法选择自变量时，我们可能会犯类似的错误。在文章“如何为机器学习监督算法中确定正确的自变量？ ”我已经讨论了结构化的方法来确定适当的独立变量做出准确的预测。

翻译自: https://towardsdatascience.com/decisiontreeregressor-stop-using-for-future-projections-e27104537f6a

投影仪投影粉色

查看全文

http://www.taodudu.cc/news/show-863504.html

机器学习中的随机过程_机器学习过程
ci/cd heroku_在Heroku上部署Dash或Flask Web应用程序。简易CI / CD。
图像纹理合成_EnhanceNet：通过自动纹理合成实现单图像超分辨率
变压器耦合和电容耦合_超越变压器和抱抱面的分类
梯度下降法_梯度下降
学习机器学习的项目_辅助项目在机器学习中的重要性
计算机视觉知识基础_我见你：计算机视觉基础知识
配对交易方法_COVID下的自适应配对交易，一种强化学习方法
设计数据密集型应用程序_设计数据密集型应用程序书评
pca 主成分分析_超越普通PCA：非线性主成分分析
全局变量和局部变量命名规则_变量范围和LEGB规则
dask 使用_在Google Cloud上使用Dask进行可扩展的机器学习
计算机视觉课_计算机视觉教程—第4课
用camelot读取表格_如何使用Camelot从PDF提取表格
c盘扩展卷功能只能向右扩展_信用风险管理：功能扩展和选择
使用OpenCV，Keras和Tensorflow构建Covid19掩模检测器
使用Python和OpenCV创建自己的“ CamScanner”
cnn图像进行预测_CNN方法：使用聚合物图像预测其玻璃化转变温度
透过性别看世界_透过树林看森林
gan神经网络_神经联觉：当艺术遇见GAN
rasa聊天机器人_Rasa-X是持续改进聊天机器人的独特方法
python进阶指南_Python特性工程动手指南
人工智能对金融世界的改变_人工智能革命正在改变网络世界
数据科学自动化_数据科学会自动化吗？
数据结构栈和队列_使您的列表更上一层楼：链接列表和队列数据结构
轨迹预测演变（第1/2部分）
人口预测和阻尼-增长模型_使用分类模型预测利率-第3部分
机器学习深度学习 ai_人工智能，机器学习，深度学习-特征和差异
随机模拟_随机模拟可帮助您掌握统计概念
机器学习算法如何应用于控制_将机器学习算法应用于NBA MVP数据

投影仪投影粉色_DecisionTreeRegressor —停止用于将来的投影！相关推荐

mysql象限和投影_Camera类之orthographic-摄像机投影模式(第100篇随笔)
这篇文章是我的第一百篇随笔,算是一个里程碑吧.本科的时候就曾在CSDN上注册了一个博客,但是一直没有坚持下来去写一些东西.研一上学期又在博客园上注册了博客,很值得自豪,能够坚持下来,将自己的学习心得. ...
投影元素直接隔离_Angular ngcontent 内容投影
前言内容投影和ng-content是可以让我们最大程度构建可重用组件的Angular功能之一.我们来构造一个小组件,一个Font Awesomne输入框.我们设计这个组件的目标是为了构造一个带有图标 ...
智能投影销量爆发式增长国内智能投影市场的春天还会要多久？
随着智能家居的概念盛行,智能投影也成为这领域的热门产品,销售量也是与日俱增,它已经让越来越多的用户把它当做了居家大屏观影的首选.随着市场需求的增加,这也让智能投影成为了资本追逐的焦点,BATJ也开始重 ...
pyhton 创建shp文件并投影，批量进行坐标转换与添加投影信息，合并shp文件，分割shp文件
代码如下,注释是经过百度翻译的中文.这是我的python地理分析课程作业之一. 如何获取投影信息,可以通过arcgis输出投影信息,更换投影信息的时候主要也要进行坐标转换,尤其从地理坐标系转到投影坐标 ...
android立体3D效果_3D全息投影和平面投影有什么区别_广州全息投影
随着时代的发展.科技的进步,3D全息投影已经融入到人们的日常生活当中,给人们带来全新的视觉体验,呈现亦真亦幻的虚拟影像世界,越来越多的行业肯定及选择全息投影技术.那么与传统的平面投影相比,3D全息投影 ...
618投影仪买哪款好？当贝投影F3值不值得入手？
每年的618都是不少数码发烧友最期待的时刻了,不仅能以优惠的价格购入自己心爱的设备,而且还有超多的福利.投影仪在这两年是不少年轻人追捧的数码产品,那么恰逢618年中促销,哪款投影值得买呢?今天小编就为 ...
投影仪硬件边缘融合服务器,带你了解投影融合的边缘融合显示技术
原标题:带你了解投影融合的边缘融合显示技术边缘融合显示系统是一个专业.复杂的视屏显示系统.在设计组建的时候务必考虑周密,消除各类不良因素.因为边缘融合系统建设具有相关器材多.系统连接复杂.易受环境因 ...
你随便动幕布，投影跟不上算我输，动态投影黑科技，AE特效秒变成真
十三发自凹非寺量子位报道 | 公众号 QbitAI 这个黑科技,让AE特效秒成真. 前不久,AE一个名为Lockdown的插件火了:任你再动再不平坦,图案都能无缝紧贴. 其实,这个魔法特效已经 ...
第5章 Python 数字图像处理(DIP) - 图像复原与重建17 - 由投影重建图像、雷登变换、投影、反投影、反投影重建
标题由投影重建图像投影和雷登变换 Johann Radon 反投影滤波反投影重建由投影重建图像本由投影重建图像,主要是雷登变换与雷登把变换的应用,所以也没有太多的研究,只为了保持完整性,而添 ...

投影仪投影粉色_DecisionTreeRegressor —停止用于将来的投影！

内部AI (Inside AI)

相关文章：

投影仪投影粉色_DecisionTreeRegressor —停止用于将来的投影！相关推荐

最新文章

热门文章