使用机器学习预测天气

You can reach all Python scripts relative to this on my GitHub page. If you are interested, you can also find the scripts used for data cleaning and data visualization for this study in the same repository. And the project is also deployed using Django on Heroku. View Deployment

您可以在我的 GitHub页面 找到 所有与此相关的Python脚本 如果您有兴趣,还可以在同一存储库中找到用于此研究的数据清理和数据可视化的脚本。 而且该项目还使用Django在Heroku上进行了部署。 查看部署

内容 (Content)

  1. Data Cleaning (Identifying null values, filling missing values and removing outliers)数据清理(识别空值,填充缺失值和消除异常值)
  2. Data Preprocessing (Standardization or Normalization)数据预处理(标准化或标准化)
  3. ML Models: Linear Regression, Ridge Regression, Lasso, KNN, Random Forest Regressor, Bagging Regressor, Adaboost Regressor, and XGBoostML模型:线性回归,山脊回归,套索,KNN,随机森林回归,装袋回归,Adaboost回归和XGBoost
  4. Comparison of the performance of the models模型性能比较
  5. Some insights from data来自数据的一些见解

为什么通过对数转换来缩放价格特征? (Why is price feature scaled by log transformation?)

In the regression model, for any fixed value of X, Y is distributed in this problem data-target value (Price ) not normally distributed, it is right skewed.

在回归模型中,对于X的任何固定值,Y均以非正态分布的这个问题数据目标值(价格)分布,因此右偏。

To solve this problem, the log transformation on the target variable is applied when it has skewed distribution and we need to apply an inverse function on the predicted values to get the actual predicted target value.

为了解决这个问题,当目标变量具有偏斜分布时,对它进行对数转换,我们需要对预测值应用反函数以获得实际的预测目标值。

Due to this, for evaluating the model, the RMSLE is calculated to check the error and the R2 Score is also calculated to evaluate the accuracy of the model.

因此,为了评估模型,将计算RMSLE以检查误差,并且还计算R2分数以评估模型的准确性。

一些关键概念: (Some Key Concepts:)

  • Learning Rate: Learning rate is a hyper-parameter that controls how much we are adjusting the weights of our network concerning the loss gradient. The lower the value, the slower we travel along the downward slope. While this might be a good idea (using a low learning rate) in terms of making sure that we do not miss any local minima, it could also mean that we’ll be taking a long time to converge — especially if we get stuck on a plateau region.

    学习率:学习率是一个超参数,它控制我们在网络上调整与损耗梯度有关的权重的程度。 值越低,我们沿着下坡行驶的速度就越慢。 尽管就确保我们不错过任何局部最小值而言,这可能是一个好主意(使用较低的学习率),但这也意味着我们将花费很长的时间进行收敛,尤其是如果我们陷入困境高原地区。

  • n_estimators: This is the number of trees you want to build before taking the maximum voting or averages of predictions. A higher number of trees give you better performance but make your code slower.

    n_estimators :这是在进行最大投票或平均预测之前要构建的树数。 数量更多的树可为您提供更好的性能,但会使您的代码变慢。

  • R² Score: It is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. 0% indicates that the model explains none of the variability of the response data around its mean.

    R²得分:它是统计数据与拟合回归线的接近程度的一种统计量度。 也称为确定系数,或用于多元回归的多重确定系数。 0%表示该模型无法解释响应数据均值附近的变化。

1.数据: (1. The Data:)

The dataset used in this project was downloaded from Kaggle.

该项目中使用的数据集是从Kaggle下载的。

2.数据清理: (2. Data Cleaning:)

The first step is to remove irrelevant/useless features like ‘URL’, ’region_url’, ’vin’, ’image_url’, ’description’, ’county’, ’state’ from the dataset.

第一步是从数据集中删除不相关/无用的功能,例如“ URL”,“ region_url”,“ vin”,“ image_url”,“ description”,“ county”,“ state”。

As a next step, check missing values for each feature.

下一步,检查每个功能的缺失值。

Showing missing values (Image By Panwar Abhash Anil)
显示缺失值(Panwar Abhash Anil摄)

Next, now missing values were filled with appropriate values by an appropriate method.

接下来,现在通过适当的方法用适当的值填充缺少的值。

To fill the missing values, IterativeImputer method is used and different estimators are implemented then calculated MSE of each estimator using cross_val_score

为了填充缺失值,使用了IterativeImputer方法,并实现了不同的估计量,然后使用cross_val_score计算每个估计量的MSE

  1. Mean and Median中位数和中位数
  2. BayesianRidge Estimator贝叶斯里奇估计
  3. DecisionTreeRegressor EstimatorDecisionTreeRegressor估算器
  4. ExtraTreesRegressor EstimatorExtraTreesRegressor估算器
  5. KNeighborsRegressor EstimatorKNeighbors回归估计器
MSE with Different Imputation Methods (Image By Panwar Abhash Anil)
具有不同插补方法的MSE(图片由Panwar Abhash Anil提供)

From the above figure, we can conclude that the ExtraTreesRegressor estimator will be better for the imputation method to fill the missing value.

从上图可以得出结论, ExtraTreesRegressor估计器将更适合插补方法来填充缺失值。

Panwar Abhash Anil)Panwar Abhash Anil摄 )

At last, after dealing with missing values there zero null values.

最后,在处理了缺失值之后,零值为零。

Outliers: InterQuartile Range (IQR) method is used to remove the outliers from the data.

离群值:四分位数间距(IQR)方法用于从数据中删除离群值。

Panwar Abhash Anil)Panwar Abhash Anil摄 )
Panwar Abhash Anil)Panwar Abhash Anil摄 )
Panwar Abhash Anil)Panwar Abhash Anil摄 )
  • From figure 1, the prices whose log is below 6.55 and above 11.55 are the outliers从图1中,对数低于6.55和高于11.55的价格是异常值
  • From figure 2, it is impossible to conclude something so IQR is calculated to find outliers i.e. odometer values below 6.55 and above 11.55 are the outliers.从图2中无法得出结论,因此要计算IQR以找到异常值,即里程表值低于6.55而高于11.55就是异常值。
  • From figure 3, the year below 1995 and above 2020 are the outliers.根据图3,1995年以下和2020年以上的年份是异常值。

At last, Shape of dataset before process= (435849, 25) and after process= (374136, 18). Total 61713 rows and 7 cols removed.

最后,处理之前的数据集的形状=(435849,25),处理之后的数据集的形状=(374136,18)。 总共61713行和7列删除。

3.数据预处理: (3. Data preprocessing:)

Label Encoder: In our dataset, 12 features are categorical variables and 4 numerical variables (price column excluded). To apply the ML models, we need to transform these categorical variables into numerical variables. And sklearn library LabelEncoder is used to solve this problem.

标签编码器:在我们的数据集中,有12个要素是分类变量和4个数字变量(不包括价格栏)。 要应用ML模型,我们需要将这些分类变量转换为数值变量。 sklearn库LabelEncoder用于解决此问题。

Normalization: The dataset is not normally distributed. All the features have different ranges. Without normalization, the ML model will try to disregard coefficients of features that have low values because their impact will be so small compared to the big value. Hence to normalized, sklearn library i.e. MinMaxScaler is used.

标准化 :数据集不是正态分布的。 所有功能都有不同的范围。 如果不进行归一化,则ML模型将尝试忽略具有低值的要素的系数,因为与大值相比,其影响将很小。 因此,为了进行标准化,使用了sklearn库,即MinMaxScaler

Train the data. In this process, 90% of the data was split for the train data and 10% of the data was taken as test data.

训练数据。 在此过程中,将90%的数据拆分为火车数据,并将10%的数据作为测试数据。

4.机器学习模型: (4. ML Models:)

In this section, different machine learning algorithms are used to predict price/target-variable.

在本节中,将使用不同的机器学习算法来预测价格/目标变量。

The dataset is supervised, so the models are applied in a given order:

数据集受到监督,因此以给定顺序应用模型:

  1. Linear Regression

    线性回归

  2. Ridge Regression

    岭回归

  3. Lasso Regression

    套索回归

  4. K-Neighbors Regressor

    K邻域回归器

  5. Random Forest Regressor

    随机森林回归

  6. Bagging Regressor

    装袋机

  7. Adaboost Regressor

    Adaboost回归器

  8. XGBoost

    XGBoost

1)线性回归: (1) Linear Regression:)

In statistics, linear regression is a linear approach to modelling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). In linear regression, the relationships are modelled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models. More Details

在统计中,线性回归是对标量响应(或因变量)与一个或多个解释变量(或自变量)之间的关系进行建模的线性方法。 在线性回归中,使用线性预测函数对关系进行建模,这些函数的未知模型参数可从数据中估算出来。 这种模型称为线性模型。 更多细节

Coefficients: The sign of each coefficient indicates the direction of the relationship between a predictor variable and the response variable.

系数:每个系数的符号表示预测变量和响应变量之间关系的方向。

  • A positive sign indicates that as the predictor variable increases, the response variable also increases.正号表示随着预测变量的增加,响应变量也增加。
  • A negative sign indicates that as the predictor variable increases, the response variable decreases.负号表示随着预测变量增加,响应变量减少。
Panwar Abhash Anil)Panwar Abhash Anil )
Panwar Abhash Anil)Panwar Abhash Anil )

Considering this figure, linear regression suggests that year, cylinder, transmission, fuel, and odometer these five variables are the most important.

考虑到这个数字,线性回归表明年份,汽缸,变速箱,燃油和里程表这五个变量是最重要的。

Panwar Abhash Anil)Panwar Abhash Anil )

2)岭回归: (2) Ridge Regression:)

Ridge Regression is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates are unbiased, but their variances are large so they may be far from the true value.

Ridge回归是一种用于分析遭受多重共线性的多个回归数据的技术。 当发生多重共线性时,最小二乘估计是无偏的,但是它们的方差很大,因此可能与真实值相去甚远。

To find the best alpha value in ridge regression, yellowbrick library AlphaSelection was applied.

为了在岭回归中找到最佳的alpha值,应用了yellowbrick库AlphaSelection

Graph showing best value of Alpha
该图显示了Alpha的最佳价值

From the figure, the best value of alpha to fit the dataset is 20.336.

从图中可以看出,最适合该数据集的alpha值为20.336。

Note: The value of alpha is not constant it varies every time.

注意:alpha值不是恒定的,每次都会变化。

Using this value of alpha, Ridgeregressor is implemented.

使用此alpha值,可实现Ridgeregressor。

Graph showing Important Features
该图显示重要功能

Considering this figure, Lasso regression suggests that year, cylinder, transmission, fuel, and odometer these five variables are the most important.

考虑到该数字,Lasso回归表明年份,汽缸,变速箱,燃油和里程表这五个变量是最重要的。

Panwar Abhash Anil)Panwar Abhash Anil摄 )

The performance of ridge regression is almost the same as Linear Regression.

岭回归的性能几乎与线性回归相同。

3)套索回归: (3)Lasso Regression:)

Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point as mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters).

套索回归是一种使用收缩的线性回归。 收缩是指数据值平均向中心点收缩。 套索程序鼓励使用简单,稀疏的模型(即参数较少的模型)。

Why Lasso regression is used?

为什么使用套索回归?

The goal of lasso regression is to obtain the subset of predictors that minimizes prediction error for a quantitative response variable. The lasso does this by imposing a constraint on the model parameters that cause regression coefficients for some variables to shrink toward zero.

套索回归的目标是获得使定量响应变量的预测误差最小化的预测子集。 套索通过对模型参数施加约束来实现此目的,该约束会使某些变量的回归系数缩小为零。

Panwar Abhash Anil)Panwar Abhash Anil摄 )

But for this dataset, there is no need for lasso regression as there no much difference in error.

但是对于此数据集,不需要套索回归,因为误差没有太大差异。

4)KNeighbors回归器:基于k最近邻的回归。 (4)KNeighbors Regressor: Regression-based on k-nearest neighbors.)

The target is predicted by local interpolation of the targets associated with the nearest neighbours the training set.

通过与训练集的最近邻居相关联的目标的局部插值来预测目标。

k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until function evaluation. Read More

k -NN是一种基于实例的学习或懒惰学习 ,其中功能仅在本地近似,所有计算都推迟到功能评估为止。

Panwar Abhash Anil)Panwar Abhash Anil摄 )

From the above figure, for k=5 KNN give the least error. So dataset is trained using n_neighbors=5 and metric=’euclidean’.

从上图可以看出,对于k = 5 KNN,误差最小。 因此,使用n_neighbors = 5和metric ='euclidean'训练数据集。

Panwar Abhash Anil)Panwar Abhash Anil摄 )

The performance KNN is better and error is decreasing with increased accuracy.

性能KNN更好,并且误差随着精度的提高而降低。

5)随机森林: (5) Random Forest:)

The random forest is a classification algorithm consisting of many decision trees. It uses bagging and feature randomness when building each tree to try to create an uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual tree. Read More

随机森林是一种由许多决策树组成的分类算法。 在构建每棵树时,它使用套袋和特征随机性来尝试创建不相关的树林,其委员会的预测比任何单个树的预测更为准确。

In our model, 180 decisions are created with max_features 0.5

在我们的模型中,使用max_features 0.5创建了180个决策

Performance of Random Forest (True value vs predicted value)
随机森林的性能(真实值与预测值)

This is the simple bar plot which illustrates that year is the most important feature of a car and then odometer variable and then others.

这是简单的条形图,它说明年份是汽车的最重要特征,然后是里程表变量,然后是其他变量。

Panwar Abhash Anil)Panwar Abhash Anil提供 )

The performance of the Random forest is better and accuracy is increased by approx. 10% which is good. Since the random forest is using bagging when building each tree so next Bagging Regressor will be performed.

随机森林的性能更好,并且准确性提高了约5%。 10%很好。 由于随机森林在构建每棵树时正在使用装袋,因此将执行下一个装袋回归器。

6)套袋回归器: (6) Bagging Regressor:)

A Bagging regressor is an ensemble meta-estimator that fits base regressors each on random subsets of the original dataset and then aggregates their predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it. Read More

Bagging回归器是一个集合元估计器,它使每个基本回归器都适合原始数据集的随机子集,然后将其预测(通过投票或平均)进行汇总以形成最终预测。 通过将随机化引入其构造过程中,然后使其整体,这种元估计器通常可以用作减少黑盒估计器(例如决策树)方差的方法。

In our model, DecisionTreeRegressor is used as the estimator with max_depth=20 which creates 50 decision trees and the results show below.

在我们的模型中,DecisionTreeRegressor用作max_depth = 20的估计量,它创建了50个决策树,结果如下所示。

Panwar Abhash Anil)Panwar Abhash Anil提供 )

The performance of Random Forest is much better than Bagging regressor.

Random Forest的性能比Bagging回归器要好得多。

The key difference between Random forest and Bagging: The fundamental difference is that in Random forests, only a subset of features are selected at random out of the total and the best split feature from the subset is used to split each node in a tree, unlike in bagging where all features are considered for splitting a node.

随机森林和套袋的关键区别:最根本的区别是,在随机森林中,只有功能的子集在总的随机开出,并从子集的最佳分割特征选择用于每个节点树分割,不像在装袋中考虑将所有要素拆分节点。

7)Adaboost回归器: (7) Adaboost regressor:)

AdaBoost can be used to boost the performance of any machine learning algorithm. Adaboost helps you combine multiple “weak classifiers” into a single “strong classifier”. Library used: AdaBoostRegressor & Read More

AdaBoost可用于提高任何机器学习算法的性能。 Adaboost可帮助您将多个“弱分类器”组合为一个“强分类器”。 使用的库: AdaBoostRegressor &

This is the simple bar plot which illustrates that year is the most important feature of a car and then odometer variable and then model, etc.

这是简单的条形图,它说明年份是汽车的最重要特征,然后是里程表变量,然后是模型,等等。

In our model, DecisionTreeRegressor is used as an estimator with 24 max_depth and creates 200 trees & learning the model with 0.6 learning_rate and result shown below.

在我们的模型中,DecisionTreeRegressor用作具有24个max_depth的估计量,并创建200棵树并以0.6 learning_rate和以下所示的结果学习模型。

Panwar Abhash Anil)Panwar Abhash Anil提供 )

8)XGBoost:XGBoost代表eXtreme Gradient Boosting (8) XGBoost: XGBoost stands for eXtreme Gradient Boosting)

XGBoost is an ensemble learning method.XGBoost is an implementation of gradient boosted decision trees designed for speed and performance. The beauty of this powerful algorithm lies in its scalability, which drives fast learning through parallel and distributed computing and offers efficient memory usage. Read More

XGBoost是一种整体学习方法 .XGBoost是为速度和性能而设计的梯度增强决策树的实现。 这种强大算法的优点在于可扩展性,可扩展性通过并行和分布式计算驱动快速学习,并提供有效的内存使用率。

This is the simple bar plot in descending of importance which illustrates that which feature/variable is an important feature of a car is more important.

这是重要性递减的简单条形图,它说明哪个特征/变量是汽车的重要特征更为重要。

According to XGBoost, Odometer is an important feature whereas from the previous models year is an important feature.

根据XGBoost的介绍, 里程表是一项重要功能,而从以前的型号开始,年份是一项重要功能。

In this model,200 decision trees are created of 24 max depth and the model is learning the parameter with a 0.4 learning rate.

在该模型中,创建了200个最大深度为24的决策树,并且该模型正在以0.4的学习率学习参数。

Panwar Abhash Anil)Panwar Abhash Anil提供 )

4)模型性能比较: (4)Comparison of the performance of the models:)

Panwar Abhash Anil)Panwar Abhash Anil摄 )
Panwar Abhash Anil)Panwar Abhash Anil提供 )

From the above figures, we can conclude that XGBoost regressor with 89.662% accuracy is performing better than other models.

从以上数据可以得出结论,精度为89.662%的XGBoost回归器的性能优于其他模型。

5)来自数据集的一些见解: (5) Some insights from the dataset:)

1From the pair plot, we can’t conclude anything. There is no correlation between the variables.

1从对图中,我们无法得出任何结论。 变量之间没有相关性。

Pair Plot to Find Correlation
配对图以找到相关性

2From the distplot, we can conclude that initially, the price is increasing rapidly but after a particular point, the price starts decreasing.

2从distplot中,我们可以得出结论,最初,价格正在Swift上涨,但是在特定点之后,价格开始下降。

Panwar Abhash Anil)Panwar Abhash Anil摄 )

3From figure 1, we analyze that the car price of the diesel variant is high then the price of the electric variant comes. Hybrid variant cars have the lowest price.

3从图1中,我们分析出柴油车型的汽车价格高,然后电动车型的价格就来了。 混合动力汽车的价格最低。

Bar Plot showing the price of each fuel type
条形图显示每种燃料类型的价格

4 From figure 2, we analyze that the car price of the respective fuel also depends upon the condition of the car.

4从图2中,我们分析了相应燃料的汽车价格还取决于汽车的状况。

Bar Plot between fuel and price with hue condition
带有色相条件的燃料和价格之间的条形图

5From figure 3, we analyze that car prices are increasing per year after 1995, and from figure 4, the number of cars also increasing per year, and at some point i.e in 2012yr, the number of cars is nearly the same.

5从图3中,我们分析了1995年以后汽车价格每年都在上涨,从图4中,汽车数量也在逐年增加,在某个年份,即2012年,汽车数量几乎是相同的。

Graph showing how the price varies per year
该图显示了价格每年的变化

6From figure 5, we can analyze that the price of the cars also depends upon the condition of the car, and from figure 6, price varies with the condition of the cars with there size also.

6从图5中,我们可以分析出汽车的价格也取决于汽车的状况,而从图6中,价格也随汽车的大小而变化。

Bar Plot showing the price respective of the condition of the car
条形图显示了汽车状况的价格

7From figure 7–8, we analyze that price of the cars also various each transmission of a car. People are ready to buy the car having “other transmission” and the price of the cars having “manual transmission” is low.

7从图7–8中,我们分析了汽车的价格也随汽车的每个变速箱而变化。 人们准备购买具有“其他变速箱”的汽车,并且具有“手动变速箱”的汽车的价格很低。

Panwar Abhash Anil)Panwar Abhash Anil提供 )

8 Below there are similar graphs with the same insight but different features.

8下面是具有相同见解但功能不同的相似图表。

结论: (Conclusion:)

By performing different ML models, we aim to get a better result or less error with max accuracy. Our purpose was to predict the price of the used cars having 25 predictors and 509577 data entries.

通过执行不同的ML模型,我们旨在以最大的精度获得更好的结果或更少的误差。 我们的目的是通过25个预测器和509577个数据输入来预测二手车的价格。

Initially, data cleaning is performed to remove the null values and outliers from the dataset then ML models are implemented to predict the price of cars.

最初,执行数据清理以从数据集中删除空值和离群值,然后实施ML模型以预测汽车价格。

Next, with the help of data visualization features were explored deeply. The relation between the features is examined.

接下来,借助数据可视化功能进行了深入探索。 检查特征之间的关系。

From the below table, it can be concluded that XGBoost is the best model for the prediction for used car prices. XGBoost as a regression model gave the best MSLE and RMSLE values.

从下表中可以得出结论,XGBoost是预测二手车价格的最佳模型。 XGBoost作为回归模型可提供最佳的MSLE和RMSLE值。

Panwar Abhash Anil)Panwar Abhash Anil提供 )

翻译自: https://towardsdatascience.com/used-car-price-prediction-using-machine-learning-e3be02d977b2

使用机器学习预测天气


http://www.taodudu.cc/news/show-863455.html

相关文章:

  • python集群_使用Python集群文档
  • 马尔可夫的营销归因
  • 使用Scikit-learn,Spotify API和Tableau Public进行无监督学习
  • 街景图像分割_借助深度学习和街景图像进行城市的大规模树木死亡率研究
  • 多目标分类的混淆矩阵_用于目标检测的混淆矩阵
  • 检测和语义分割_分割和对象检测-第2部分
  • watson软件使用_使用Watson Assistant进行多语言管理
  • keras核心已转储_转储Keras-ImageDataGenerator。 开始使用TensorFlow-tf.data(第2部分)
  • 闪亮蔚蓝_在R中构建第一个闪亮的Web应用
  • 亚马逊训练alexa的方法_Alexa对话是AI驱动的对话界面新方法
  • nlp文本相似度_用几行代码在Python中搜索相似文本:一个NLP项目
  • 爬虫goodreads数据_使用Python从Goodreads数据中预测好书
  • opengl层次建模_层次建模简介
  • 如何用dds实现线性调频_用神经网络生成线性调频
  • azure_Azure ML算法备忘单
  • 矩阵乘法如何去逆矩阵_矩阵乘法和求逆
  • 机器学习数据倾斜的解决方法_机器学习并不总是解决数据问题的方法
  • gan简介_GAN简介
  • 使用TensorFlow训练神经网络进行价格预测
  • 您应该如何改变数据科学教育
  • r语言解释回归模型的假设_模型假设-解释
  • 参考文献_参考文献:
  • 深度学习用于视频检测_视频如何用于检测您的个性?
  • 角距离恒星_恒星问卷调查的10倍机器学习生产率
  • apache beam_Apache Beam ML模型部署
  • 转正老板让你谈谈你的看法_让我们谈谈逻辑回归
  • openai-gpt_GPT-3报告存在的问题
  • 机器学习 凝聚态物理_机器学习遇到了凝聚的问题
  • 量子计算 qiskit_将Tensorflow和Qiskit集成到量子机器学习中
  • throw 烦人_烦人的简单句子聚类

使用机器学习预测天气_使用机器学习的二手车价格预测相关推荐

  1. 使用机器学习预测天气_使用机器学习来预测患者是否会再次入院

    使用机器学习预测天气 We are in a age where machines are utilizing huge data and trying to create a better worl ...

  2. 基于机器学习的二手车价格预测及应用实现(预测系统实现)

    1.摘要 随着中国汽车工业的迅速发展,国内的汽车数量也在迅速增长.新车销售市场已经逐渐饱和,而二手车交易市场正在兴起.但是,由于中国的二手车市场尚未成熟,与发达国家相比仍存在较大差距.其中一个重要原因 ...

  3. 分类预测回归预测_我们应该如何汇总分类预测?

    分类预测回归预测 If you are reading this, then you probably tried to predict who will survive the Titanic sh ...

  4. 数据挖掘二手车价格预测 Task05:模型融合

    模型融合是kaggle等比赛中经常使用到的一个利器,它通常可以在各种不同的机器学习任务中使结果获得提升.顾名思义,模型融合就是综合考虑不同模型的情况,并将它们的结果融合到一起.模型融合主要通过几部分来 ...

  5. Python二手车价格预测(二)—— 模型训练及可视化

    系列文章目录 一.Python数据分析-二手车数据获取用于机器学习二手车价格预测 二.Python二手车价格预测(一)-- 数据处理 文章目录 系列文章目录 前言 一.明确任务 二.模型训练 1.引入 ...

  6. 基于二手车价格预测——特征工程

    特征工程 特征工程 分析: 第一步:异常值处理 箱型图法: 第二步:特征构造 第三步:数据分桶 数据分桶详解 删除不需要的数据 特征归一化 总结--特征 1.特征构造: 2.异常类型处理 3.构造新特 ...

  7. 二手车价格预测task03:特征工程

    二手车价格预测task03:特征工程 1.学习了operator模块operator.itemgetter()函数 2.学习了箱线图 3.了解了特征工程的方法 (内容介绍) 4.敲代码学习,加注解 以 ...

  8. 二手车价格预测数据探索

    二手车价格预测数据探索 1.赛题理解 [类型]属于回归问题. [数据字段] 训练数据字段: 字段名字 含义 类型 name 汽车编码 int regDate 汽车注册时间 int model 车型编码 ...

  9. Datawhale task4打卡——二手车价格预测

    Datawhale task4打卡--二手车价格预测 1. 线性回归模型 1.1 *特征要求(易忽略) 1.2 *处理长尾分布(易忽略) 2. 模型性能验证 2.1 目标函数 2.2 交叉验证 2.2 ...

最新文章

  1. 直接插入排序与希尔排序
  2. WCF 第四章 绑定 绑定元素
  3. WPF获取当前用户控件的父级窗体
  4. Flutter学习记录(二、Flutter项目学习Widget)
  5. IDEA主题设置(字体颜色背景)
  6. python表头写进csv文件_Python读取CSV文件列并在CSV-fi中写入文件名和列名
  7. 普通二本的辛酸Android面试之路,满满干货指导
  8. idea设置自动清除不需要的import包,自动清除导包
  9. 漫画 | 让设计师崩溃的十个瞬间
  10. 【Elasticsearch】了解Elasticsearch写入磁盘的数据
  11. asp.net关于倒出excel文件
  12. LeetCode 744. Find Smallest Letter Greater Than Target
  13. java读取 文件_Java读取文件的简单实现方法
  14. java自学笔记(4)-Stanford CS106A 弹球动画 20.9.9
  15. 以后再有人问你selenium是什么,你就把这篇文章给他
  16. Elasticsearch启动问题:max number of threads [XXX] for user [XX] is too low, increase to at least [4096]
  17. Java毕设_装修公司业务流程管理系统的设计与实现
  18. 联想a670t提示android,联想A670t线刷刷机教程
  19. 【记录一次nginx转发 80端口无效】
  20. 输入年月,输出这个月的日历

热门文章

  1. Phoenix报错(6)Inconsistent namespace mapping properites
  2. rpm方式安装MySQL5.1.73
  3. 如何在Windows、Linux中获取主机的网络信息和公网地址
  4. 第六次课作业(质量管理、项目人力资源管理)
  5. [Hyper-V]在Windows 8.1 操作系统中启用Hyper-V功能
  6. ubuntu11.04下配置中文输入法
  7. DELPHI编程环境
  8. DHCP服务器-配置
  9. antisamy java_antisamy的使用方法
  10. mysql第五章上机事务_算法第五章上机实践