机器学习算法如何应用于控制

A step-by-step tutorial in R

R中的分步教程

1引言 (1 Introduction)

This blog makes up the Machine Learning section of another blog. You can find the main blog here:

该博客构成另一个博客的“机器学习”部分。您可以在这里找到主要博客：

https://medium.com/@deeganrobbie/nba-most-valuable-player-mvp-award-15c6cfe727ee

This is a tutorial-based blog that is primarily targeted at the entry-level to intermediate-level machine learning students out there.

这是一个基于教程的博客， 主要针对入门级到中级机器学习的学生。

The two main focus points of this tutorial are model selection and improving model performance. For each model, there is a “basic version” and an “upgraded version” of the model. The value this brings to the tutorial comes from the justification for the model improvements.

本教程的两个主要重点是模型选择和提高模型性能 。对于每个模型，都有该模型的“ 基本版本”和“ 升级版本” 。为本教程带来的价值来自模型改进的理由。

The code of the basic version of each algorithm will be shown, followed by the code of the upgraded version. The basic version will use different R packages, while all the upgraded versions will be using the Caret package.

将显示每种算法的基本版本的代码，然后是升级版本的代码。基本版本将使用不同的R软件包，而所有升级版本将使用Caret软件包 。

The Caret package allows for modifications such as cross-validation, data preprocessing and tuning parameters. (As well as many other useful features) It allows for a streamlined workflow.

Caret软件包允许进行修改，例如交叉验证，数据预处理和调整参数。 (以及许多其他有用的功能)，它可以简化工作流程。

The reason why all the algorithms aren’t implemented through the Caret package is that the Caret package comes with predefined features that automatically improves model output. Such as performing resampling to estimate a realistic R-squared and parameter tuning.

之所以不能通过Caret软件包实现所有算法，是因为Caret软件包具有预定义的功能，可以自动改善模型输出。例如执行重采样以估算实际的R平方和参数调整。

1.1假设 (1.1 Hypothesis)

There is no singular statistic that will determine who will win the MVP. There will be some combination of statistics that will decide who will win the MVP. Machine Learning will help solve this problem.

没有任何统计数据可以确定谁将赢得MVP。统计数据将决定谁将赢得MVP。机器学习将帮助解决此问题。

In my opinion, there are two statistics, that when combined, will determine who wins the MVP. The combination of the highest PER and being on a team with a 1 seed. That being said I can think of a few instances off the top of my head where that isn’t the case.

我认为，有两个统计数据结合起来将确定谁赢得了MVP。最高PER并以1种子入队。 话虽这么说，但我可以想到的并非如此。

2数据准备 (2 Data Preparation)

2.1 Web爬网 (2.1 Web Scrapping)

Rvest package was used to perform web scrapping on https://www.basketball-reference.com/

Rvest软件包用于在https://www.basketball-reference.com/上执行Web抓取

As the code is quite long I thought I would provide it in a link. In this link, you can find the code along with the CSVs created from web-scrapping. Using the csv “big_5.csv” and “Test2020.csv” will give you all the needed data for this tutorial.

由于代码很长，我想我会在链接中提供它。在此链接中，您可以找到代码以及通过网络剪贴创建的CSV。使用csv“ big_5.csv”和“ Test2020.csv”将为您提供本教程所需的所有数据。

However, if you want just the code for web scraping and data preparation: WebScraping_DataPrep.R

但是，如果只需要用于Web抓取和数据准备的代码： WebScraping_DataPrep.R

This R code creates the csvs — all.csv, big_5.csv, bigger_5.csv and king.csv

此R代码创建了csv-all.csv，big_5.csv，big_5.csv和king.csv
Two additional csvs — names.csv (needed for joining data) and test.csv which is what we will use to predict who will win this year

另外两个csvs： names.csv (需要连接数据)和test.csv ，我们将用它们来预测谁将在今年获胜

This tutorial will only use the “Updated_Final.R” code

本教程将仅使用“ Updated_Final.R”代码

Check out my “Sources” document in the same Github repo. This word d includes high-quality links to every topic covered in the tutorial.

在同一Github存储库中查看我的“ Sources ”文档。单词d包含指向本教程涵盖的每个主题的高质量链接。

2.2数据预处理 (2.2 Data Preprocessing)

There are two ways we used data preprocessing during this project

我们在此项目中使用两种方法进行数据预处理

Cleaning the data清理数据
Using caret package to preprocess the cleaned dataset so that certain Machine Learning algorithms can be properly implemented.使用插入符号包预处理已清理的数据集，以便可以正确实现某些机器学习算法。

After we collected the data from web scraping. We used a few techniques to clean the data. This includes text preprocessing and joining different tables. Please find the code here: WebScraping_DataPrep.R

从网络抓取中收集数据之后。我们使用了一些技术来清理数据。这包括文本预处理和连接不同的表。请在此处找到代码： WebScraping_DataPrep.R

Below is an upgraded model that uses processing.

以下是使用处理的升级模型。

Example of Caret package preprocessing in action

As you can see from the highlighted code, incorporating preprocessing is very easy. PreProcess() can be used for many operations on predictors, for this example, we used centering and scaling.

从突出显示的代码中可以看到，合并预处理非常容易。 P reProcess()可用于预测变量的许多操作，在本例中，我们使用了居中和缩放。

“center“: subtract mean from values.

“ center ”：从值中减去平均值。
“scale“: divide values by standard deviation.

“ 刻度 ”：将值除以标准偏差。

2.3特征工程 (2.3 Feature Engineering)

From a subjective point of view, there are typically 5 main MVP candidates. Within this group of 5 candidates (again subjective # choice), we usually compare players by saying “I believe player X is the 1 MVP choice because they lead the other candidates in PPG and play on the team with the most overall wins.”

从主观角度来看，通常有5个主要的MVP候选人。在这5名候选人的组中(同样是主观＃选择)，我们通常会说“我相信球员X是1个MVP的选择，因为他们在PPG中领先其他候选人，并在整体获胜最多的队伍中比赛。”

This is just one way we NBA fans justify why a certain player should win MVP over the other candidates. The below code mimics that thought process. It does this by first filtering our data set to only include the top 5 candidates. Then it ranks each statistic within a year. The final data set is named “big_5” and it includes all the normal statistics plus a set of ranked columns. We will first go through the code, then we will look at the first 3 years (15 rows) of the big_5 dataset.

这只是我们NBA球迷为某位球员赢得其他候选人最有价值球员的理由。下面的代码模仿了这种思考过程。为此，它首先过滤我们的数据集，使其仅包括前5个候选对象 。然后，它将对一年内的每个统计数据进行排名。最终数据集名为“ big_5 ”，它包括所有常规统计信息以及一组排名列。我们将首先检查代码，然后查看big_5数据集的前三年(15行)。

# Create a df that ranks where the MVPS ended up each year. The Rankings range from 1 - 5 because it's ranking the top 5 MVP canidates of each yearbig_5 <- all %>%  group_by(Year) %>%  filter(Rank < 6) %>%  mutate(Points_Rank = order(order(PTS, decreasing = T))) %>%  mutate(Rebounds_Rank = order(order(TRB, decreasing = T))) %>%  mutate(Assists_Rank = order(order(AST, decreasing = T))) %>%  mutate(Steals_Rank = order(order(STL, decreasing = T))) %>%  mutate(Blocks_Rank = order(order(BLK, decreasing = T))) %>%  mutate(Per_Rank = order(order(PER, decreasing = T)))  %>%  mutate(TS_Rank = order(order(TS., decreasing = T)))%>%  mutate(WS_Rank = order(order(WS, decreasing = T)))%>%  mutate(Wins_Rank = order(order(W, decreasing = T)))# Turn into dfbig_5 <- as.data.frame(big_5)# If a column is a interger mutate it into a numericbig_5 <- mutate_if(big_5, is.integer, as.numeric)big_5$Sum <- rowSums(big_5[,27:34], na.rm=TRUE)# Display first 15 rows of big_5 data set - created in previous stephead(big_five, 15)

The below image is the first 15 rows of the new “big_five” data set. I used paint to visualize that there is 3 years worth of data here. In order to make this graphic more clear I did an example of 3 statistics — Points_rank, Rebounds_Rank and Assists_Rank.

下图是新的“ big_five”数据集的前15行。我用油漆来可视化这里有3年的数据。为了使该图形更清晰，我举了3个统计数据的示例-Points_rank，Rebounds_Rank和Assists_Rank。

Explanation of Feature Engineering for this data set

Brown = Top 5 candidates in 2000

布朗= 2000年的前5名候选人
Blue = Top 5 candidates in 2001

蓝色= 2001年的前5名候选人
Pink = Top 5 Candidates in 2002.

粉红= 2002年的前5名候选人。

Let’s look at the very last column, named “Sum”. This column is the sum total of the “_Rank” columns. The lower the number in sum the better. As you can see from the red text, this is the only column we use during the modelling process. (However, feel free to use any of the other ones if you try this out)

让我们看看最后一列，名为“ Sum”。此列是“ _Rank”列的总和。 总数越低越好。 从红色文本可以看出，这是我们在建模过程中使用的唯一一列。 ( 但是，如果尝试此方法，请随意使用其他任何方法 )

2.4训练数据和测试数据 (2.4 Training data and Testing data)

Our basic models will use a simple training/testing split. While our upgraded models will incorporate 10-fold cross-validation.

我们的基本模型将使用简单的训练/测试拆分。虽然我们的升级模型将包含10倍交叉验证。

Basic Approach:

基本方法 ：

The image above illustrates the training/testing set split for our basic models. Our training set will use 75% of our data while our test set will include 25% of our data.

上图显示了针对我们基本模型的训练/测试集划分。我们的训练集将使用75％的数据，而测试集将包含25％的数据。

Upgraded Approach: We will split our data into two distinct sets. The training data set and the Hold-out sample set. Then perform 10-fold cross-validation on the training data set. Finally, we will make a prediction of the unseen testing data set.

升级方法 ：我们将数据分成两个不同的集合。训练数据集和保留样本集 。然后对训练数据集执行10倍交叉验证 。最后，我们将对看不见的测试数据集进行预测。

' This code is how we implement 10-fold cross-validation on the training data set IN CARET !! '# fit controlfitControl <- trainControl(method = "repeatedcv", number = 10, repeats = 10)

The caret package allows us to create a function that performs cross-validation. trainControl() is quite flexible, you can modify the parameters to suit your needs.

插入符号包使我们可以创建执行交叉验证的函数。 trainControl()非常灵活，您可以修改参数以适合您的需求。

3选型 (3 Model Selection)

Our benchmark models will be logistic regression, multiclass logistic regression and multi-linear regression as they are commonly used and relatively basic models. Each model will be modelled twice. First with a “basic” version applied to an algorithm then followed by an “upgraded” version of the algorithm. The point of the “upgraded” algorithm is to show how models can be improved. Each “upgraded” model will come with a justification/explanation on how it was improved.

我们的基准模型将是逻辑回归，多类逻辑回归和多线性回归，因为它们是常用的相对基础的模型。每个模型将建模两次。首先，对算法应用“基本”版本，然后对算法进行“升级”版本。 “升级”算法的重点是显示如何改进模型。每个“升级的”模型都将附带说明/说明如何进行改进。

We want to start off with models that produce outputs which are easy to understand. From there we move on to models whose output might be less easy to understand, but because of the added complexity are expected to perform better.

我们想从产生易于理解的输出的模型开始。从那里开始，我们转到其输出可能较难理解的模型，但由于增加了复杂性，因此预期其性能会更好。

逻辑回归 (Logistic Regression)

A few reasons why logistic regression is commonly used is because it’s highly interpretable*, doesn’t require too much computational power and doesn’t need its input features to be scaled. Another advantage is that logistic regression is a classification model which outputs probabilities. This is an advantage because you can compare the outputs of two instances and determine which one is predicted to be closer to your target class.

逻辑回归通常被使用的几个原因是因为它具有高度可解释性*，不需要太多的计算能力并且不需要缩放其输入特征。 Logistic回归的另一个优点是输出概率的分类模型。这是一个优点，因为您可以比较两个实例的输出并确定哪个实例更接近您的目标类。

To improve on logistic regression models you can use regularization techniques, which can help avoid overfitting. It can be used as a good benchmark model when comparing other, more complex, machine learning algorithms. One thing to look out for when implementing logistic regression is that it cannot solve non-linear problems.

要改善逻辑回归模型，可以使用正则化技术，这可以帮助避免过度拟合。在比较其他更复杂的机器学习算法时，它可以用作良好的基准模型。实施逻辑回归时要注意的一件事是它不能解决非线性问题。

*When comparing the logistic regression model to the linear regression model — interpretation is more difficult because the interpretation of the weights is multiplicative and not additive

*将逻辑回归模型与线性回归模型进行比较时，由于权重的解释是可乘的而不是累加的，因此解释更加困难

多项式Logistic回归 (Multinomial Logistic Regression)

The multinomial logistic regression model is an extension of the binomial logistic regression model. The log odds of the outcomes are modelled as a linear combination of the predictor variables. The advantages and disadvantages that apply to binary logistic regression can also be applied to multinomial logistic regression.

多项式逻辑回归模型是二项式逻辑回归模型的扩展。结果的对数赔率被建模为预测变量的线性组合。适用于二进制逻辑回归的优缺点也可以应用于多项逻辑回归。

***Multinomial is used when the dependent variable has more than two nominal (unordered) categories. As Rank is an ordered variable, the more appropriate machine learning algorithm to implement is Ordinal Logistic Regression.

***当因变量具有两个以上名义( 无序 )类别时，将使用多项式。由于Rank是有序变量，因此要实现的更合适的机器学习算法是Ordinal Logistic回归。

多元线性回归(MLR) (Multiple linear regression (MLR))

Multiple linear regression is a very popular machine learning algorithm for regression tasks. It is simple to implement and easy to interpret what the output coefficients mean. If there is a linear relationship between variables then this is a preferred choice to more complex models. However, MLR cannot handle non-linear data. MLR allows for the implementation of regularization or cross-validation to overcome over-fitting.

多元线性回归是一种非常流行的用于回归任务的机器学习算法。它易于实现且易于解释输出系数的含义。如果变量之间存在线性关系，那么这是更复杂模型的首选。但是，MLR无法处理非线性数据。 MLR允许实施正则化或交叉验证以克服过度拟合。

决策树 (Decision Tree)

Decision Trees are also a very popular machine learning algorithm. They are able to model both classification and regression tasks. Some advantages of decision tree models include that they’re intuitive and easy to explain, can model non-linear relationships and are fairly robust to outliers. Decision trees require less effort during the data preparation phase as they do not require normalization or scaling of data. With decision trees, feature selection occurs automatically. A disadvantage of decision trees is that the model tends to overfit. Since decision trees are a greedy algorithm the final model may not be the optimal solution.

决策树也是一种非常流行的机器学习算法。他们能够为分类和回归任务建模。决策树模型的一些优点包括它们直观，易于解释，可以对非线性关系进行建模并且对异常值相当健壮。决策树在数据准备阶段所需的精力较少，因为它们不需要数据的规范化或缩放。使用决策树时，功能选择会自动发生。决策树的缺点是模型趋于过度拟合。由于决策树是一种贪婪算法，因此最终模型可能不是最佳解决方案。

人工神经网络 (Artificial Neural Networks)

“ANNs are best applied to problems where the input data and output data arewell-understood or at least fairly simple, yet the process that relates the input to output is extremely complex”- Brett Lantz

“人工神经网络最适用于输入数据和输出数据易于理解或至少相当简单的问题，但是将输入与输出相关联的过程非常复杂”-Brett Lantz

ANNs can model both regression and classification tasks. Can be implemented in supervised, unsupervised and reinforcement learning. Improve over time be iteratively updating weights in their network, which allows ANN’s to automatically learn from examples. ANN’s have the ability to model non-linear and complex relationships.

人工神经网络可以对回归和分类任务进行建模。可以在有监督，无监督和强化学习中实施。通过不断地更新其网络中的权重来逐步改进，这使ANN可以自动从示例中学习。人工神经网络具有建模非线性和复杂关系的能力。

A notable disadvantage of ANNs is that it is a “black box” algorithm. It does not give an explanation to why and how it came to its final selected model. An appropriate network structure can be achieved through a trial and error approach.

人工神经网络的显着缺点是它是一种“黑匣子”算法。它没有说明为什么以及如何涉及最终选择的模型。适当的网络结构可以通过反复试验的方法来实现。

4.模型精度指标 (4. Model Accuracy Metrics)

There are plenty of different Accuracy Metrics one can use to compare models. We will briefly go over which metrics we are using.

有许多不同的准确性度量标准可用于比较模型。我们将简要介绍一下正在使用的指标。

4.1分类： (4.1 Classification:)

We are going to look at 3 different accuracy metrics for classification. For our binary task, we will highlight all 3 as each one is important. Sensitivity is the most important out of the 3. For our multi-value task, we will focus on the overall accuracy.

我们将研究3种不同的分类精度指标 。对于我们的二进制任务，我们将突出显示所有三个，因为每个都很重要。灵敏度是三者中最重要的。对于我们的多值任务，我们将专注于整体准确性。

Accuracy (all correct / all) = TP + TN / TP + TN + FP + FN

精度(全部正确 /全部)= TP + TN / TP + TN + FP + FN
Sensitivity aka Recall (true positives / all actual positives) = TP / TP + FN

灵敏度又称为召回率( 真实阳性/所有实际阳性)= TP / TP + FN
Specificity (true negatives / all actual negatives) =TN / TN + FP

特异性( 真阴性/所有实际阴性)= TN / TN + FP

Sensitivity tells us what percentage of NBA players that won MVP was correctly identified.

敏感性告诉我们正确确定了赢得MVP的NBA球员的百分比。
Specificity tells us what percentage of NBA players who did not end up winning the MVP award were correctly identified.

特殊性告诉我们正确识别未最终获得MVP奖的NBA球员的百分比。

Our Testing data set will have 20% MVPs and 80% Not MVPs. We prefer a model that has high sensitivity % as were are concerned with predicting who will win MVP.

我们的测试数据集将具有20％的MVP和80％的非MVP。我们更喜欢具有高灵敏度％的模型，因为该模型与预测谁将赢得MVP有关。

4.2回归： (4.2 Regression:)

For our regression task, we will use 3 accuracy metrics when inferring the results from the different models.

对于我们的回归任务，在从不同模型推断结果时，我们将使用3个准确性指标 。

RMSE (Root Mean Squared Error)is the square root of the averaged squared difference between the target value and the value predicted by the model. The lower the value is, the better the model is. RMSE gives a relatively high weight to large errors. This means the RMSE should be more useful when large errors are particularly undesirable.

RMSE (均方根误差)是目标值与模型预测的值之间的均方差的平方根。值越低，模型越好。 RMSE对较大的错误给予相对较高的权重。这意味着当特别不希望出现大错误时，RMSE应该会更加有用。

R-squared (Coefficient of determination) shows the proportion of variance in the outcome variable that is explained by the predictions. Typically, the higher the value is, the better the model is.

R平方 (确定系数)显示了预测所解释的结果变量中方差的比例。通常，值越高，模型越好。

MAE (Mean absolute error) represents the difference between the original and predicted values extracted by averaged the absolute difference over the data set. The lower the value is, the better the model is.

MAE (平均绝对误差)表示原始值和预测值之间的差值，该值是通过对数据集上的绝对差值进行平均而得出的。值越低，模型越好。

5.二进制分类 (5. Binary Classification)

In our binary data, there is 80 non-MVPs and only 20 MVPs. If our machine learning models simply predict “No” it would still have high prediction accuracy (80%). This class imbalance will be addressed after we examine our target variable in more depth.

在我们的二进制数据中，有80个非MVP，只有20个MVP。如果我们的机器学习模型仅预测“否”，它仍将具有较高的预测准确性(80％)。在更深入地检查目标变量之后，将解决此类班级失衡问题。

Ideally, the boxplots would be distinguishable. From an eye test, the below look like variables which might have clear distinction among classes.

理想情况下，箱线图将是可区分的。通过眼睛测试，下面看起来像变量，可能在类之间有明显的区别。

Seed, Sum, W, WS, WS48种子，总和，W，WS，WS48

Synthetic Minority Oversampling Technique (SMOTE) creates new synthetic observations. Thew newly generated instances are relatively close in feature space to existing examples in the minority class (aka they’re made up, but similar to the real ones). The increase in observations allows for the machine learning algorithm to learn more, which leads to a better understanding of the data. A standard approach for implementing SMOTE is to only increase the minority class, but since our data set has such few observations we increased both the minority and majority classes to 75 observations. This number was selected based off experimenting with several iterations. We will just be showing the 75/75 split. (150 in total)

综合少数族裔过采样技术 (SMOTE)创建了新的综合观察。这些新生成的实例在特征空间上与少数类中的现有实例相对较近(aka它们是组成的，但与真实实例相似)。观察值的增加允许机器学习算法学习更多，从而可以更好地理解数据。实施SMOTE的标准方法是仅增加少数派类别，但是由于我们的数据集具有很少的观察值，因此我们将少数派和多数派类别都增加到了75个。这个数字是根据多次迭代实验选择的。 我们将仅显示75/75分割 。 (共150个)

# SMOTEover_2 <- SMOTE(Results ~., Train_C2, perc.over = 400, perc.under = 125, k = 5)over_2$Results <- factor(over_2$Results, levels = c("MVP", "Not_MVP"))# Check target variable's distributiontable(over_2$Results)75 75

Now our over_2 training dataset contains 150 target observations. 75 MVP observations and 75 Not_MVP observations.

现在我们的over_2训练数据集包含150个目标观察值。 75个MVP观测值和75个Not_MVP观测值。

Important to emphasize that we are only applying the SMOTE algorithm to the training data set.需要强调的是，我们仅将SMOTE算法应用于训练数据集。

5.1 Logistic回归 (5.1 Logistic Regression)

Logistic Regression is a binary classification algorithm that belongs to the family of generalized linear models(GLM). The model makes a prediction by returning a probability of the target class. It uses a threshold value to determine which group of the target class the prediction belongs to.

Logistic回归是一种二进制分类算法，属于广义线性模型(GLM)族。该模型通过返回目标类别的概率进行预测。它使用阈值确定预测属于目标类别的哪一组。

c_logistic_model2 <- glm(Results ~ ., data = Train_C2[,3:15],                                      family = binomial)

Our basic model uses glm() from the glmnet library. We use all variables to predict our target variable (Results).

我们的基本模型使用glmnet库中的glm()。我们使用所有变量来预测目标变量(结果)。

upgraded_c_logistic_model2 <- train(Results ~ .,data= over_2[,3:15],                                     method = "glmboost",                                     trControl = fitControl)

Fit a glm using a boosting algorithm (as opposed to MLE). Unlike the glm function, glmboost will perform variable selection.

使用提升算法(而不是MLE)来拟合glm。与glm函数不同，glmboost将执行变量选择。

Improvements include

改进包括

Smote data远程数据
10 fold cross-validation10倍交叉验证
Boosting提升

Results:

结果：

While our upgraded model had a worse specificity it is important to note that the overall accuracy and sensitivity both increased.

尽管我们升级后的模型的特异性较差，但需要注意的是总体准确性和敏感性均提高了。

Below is another way to visualize the confusion matrix for the logistic regression models. The visualizations below are valuable to stakeholders (NBA fans) because they can intuitively understand what this machine learning algorithm considers an “MVP”.

下面是可视化逻辑回归模型的混淆矩阵的另一种方法。下面的可视化对利益相关者(NBA球迷)很有价值，因为他们可以直观地理解该机器学习算法认为的“ MVP”。

Accuracy of our binary logistic regression models. These are the same results as our confusion matrices above!

我们的二进制逻辑回归模型的准确性。这些结果与我们上面的混淆矩阵相同！

Our basic model incorrectly predicted that Jason Kidd in 2002 and Lebron in 2011 & 2017 were MVPs. It also predicted that Allen Iverson 2001 and Tim Duncan 2002 were not MVPs when they actually were MVPs.

我们的基本模型错误地预测2002年的Jason Kidd和2011＆2017年的Lebron是MVP。它还预测，艾伦·艾弗森(Allen Iverson)2001和蒂姆·邓肯(Tim Duncan 2002)并不是MVP，而实际上他们是MVP。

Our upgraded logistic model was more generous with predicting an observation would be an MVP. It incorrectly predicted Nash ’07, Lebron ’11, KD ’13 and Lebron ’17 as MVPs when they were not MVPs. It also predicted that A.I was not an MVP in 2001 when he was, in fact, the MVP.

我们升级后的逻辑模型更加慷慨，可以预测观察将是MVP。当它们不是MVP时，它错误地将Nash '07，Lebron '11，KD '13和Lebron '17预测为MVP。它还预测AI在2001年并不是MVP，而实际上他是MVP。

5.2决策树 (5.2 Decision Tree)

c_tree_model2 <- rpart(Results ~ ., data = Train_C2[,3:15])

Our basic decision tree uses the rpart library to predict Results. The decision tree automatically does variable selection. Below is our final tree:

我们的基本决策树使用rpart库预测结果。决策树自动进行变量选择。下面是我们的最后一棵树：

upgraded_c_tree_model2 <- train(Results ~ ., data = over_2[,3:15],                                method = "rpart",                                tuneLength = 10,                                trControl = fitControl)

Our upgraded model is implemented with caret, but behind the scenes, it uses the rpart library. For our upgraded model we incorporated tuneLength. The tuneLength parameter tells the algorithm to try different default values for the main parameter.

我们的升级模型是使用插入符号实现的，但在后台，它使用了rpart库。对于我们的升级模型，我们合并了tuneLength。 tuneLength参数告诉算法尝试对main参数使用不同的默认值。

Improvements include

改进包括

Smote data远程数据
10 fold cross-validation10倍交叉验证
TuneLength音调长度

Results:

结果：

Our upgraded model improved on overall accuracy and sensitivity while remaining consistent with specificity.

我们升级后的模型改善了总体准确性和敏感性，同时保持了特异性。

5.3人工神经网络 (5.3 Artificial Neural Networks)

From the neuralnet package, we implement a simple neuralnet using all the available explanatory variables. With the complexity of neural networks, there are lots of options to explore in the neuralnet package.

从Neuronet包中，我们使用所有可用的解释变量来实现一个简单的Neuronet。由于神经网络的复杂性，在Neuronet软件包中有很多可供探索的选择。

c_ann_model2 <- neuralnet(Results ~ ., data = Train_C2[,3:15],                                      hidden = c(2,1))

While the ANN model is expected to perform better, as you can see by the plot above it is much less interpretable.

虽然人工神经网络模型有望表现更好，但如上图所示，它的可解释性要差得多。

When you train a neural network (nnet) using Caret you need to specify two hyper-parameters: size and decay. Decay is the weight decay, and there are three tuning values. This is the regularization parameter to avoid over-fitting. Size is the number of units in the hidden layer.

使用Caret训练神经网络(nnet)时，您需要指定两个超参数：大小和衰减。衰减是重量衰减，有三个调整值。这是避免过度拟合的正则化参数。大小是隐藏层中的单位数。

*For our upgraded model we are using the nnet package instead of the neural network package, which means we don’t have access to the same plots. We will not be plotting the nnet ANN as it is not as user friendly. This is also the case for section 6 and 7 ANN models

*对于我们的升级模型，我们使用nnet软件包而不是神经网络软件包，这意味着我们无法访问相同的图。我们不会绘制nnet ANN，因为它不那么用户友好。第6节和第7节ANN模型也是如此

nnetGrid <-  expand.grid(size = seq(from = 1, to = 10, by = 1),                       decay = seq(from = 0.1, to = 0.5, by = 0.1))set.seed(4321)## ANN modelupgraded_c_ann_model2 <- train(Results ~ ., data = over_2[,3:15],                               method = "nnet",                               trControl = fitControl,                               preProcess = c('center', 'scale'),                               tuneGrid = nnetGrid)

Improvements include

改进包括

Smote data远程数据
10 fold cross-validation10倍交叉验证
preProcess预处理
nnetGridnnetGrid

Results:

结果：

Our upgraded model improved on overall accuracy and sensitivity while remaining consistent with specificity.

我们升级后的模型改善了总体准确性和敏感性，同时保持了特异性。

6.多值分类 (6. Multi-value Classification)

In our multi-value data set, there are 5 groups; 1st place, 2nd place, 3rd place, 4th place and 5th place. Each group has 20 instances. Since the target data is distributed evenly then we will focus more on accuracy as a metric. Accuracy is a good measure when the target variable classes in the data are nearly balanced.

在我们的多值数据集中，共有5组；第一名，第二名，第三名，第四名和第五名。每个组有20个实例。由于目标数据分布均匀，因此我们将更多地关注准确性作为度量标准。当数据中的目标变量类别接近平衡时，准确性是一个很好的度量。

From an eye test, the below look like variables which might have a clear distinction.

从眼睛测试来看，下面的变量看起来可能有明显的区别。

Seed, Sum, WS, WS48种子，总和，WS，WS48

For our ‘upgraded’ models we implemented the SMOTE technique again. The justification for this is that do to the small number of observations, we felt that creating more observations would help the machine learning algorithms pick up on the underlying relationships.

对于我们的“升级”模型，我们再次实施了SMOTE技术。这样做的理由是只对少量的观察结果进行处理，我们认为创建更多的观察结果将有助于机器学习算法掌握潜在的关系。

# SMOTEset.seed(4321)over_5 <- SMOTE(Rank ~., Train_C5, perc.over = 200, perc.under = 420, k = 5)

Original training data is 75 obs while SMOTE_5 is 171

SMOTE randomly generates synthetic observations, which means we didn’t intentionally make the distribution look like this. If you wanted to modify the distribution then change the “perc.under” or “perc.over” in the SMOTE() formula. (or change the set.seed() as it’s randomly generated)

SMOTE 随机生成综合观测值，这意味着我们没有故意使分布看起来像这样。如果要修改分布，请在SMOTE()公式中更改“ perc.under”或“ perc.over”。 (或更改set.seed()，因为它是随机生成的)

This is a rather large increase in training data, but this came after experimenting with different results. Our more conservative datasets produced ~100 & ~140 observations but their accuracy were significantly worse.

训练数据的增加是相当大的，但这是在试验了不同结果之后得出的。我们更保守的数据集产生了约100和〜140个观测值，但其准确性明显差。

6.1多项式Logistic回归 (6.1 Multinomial Logistic Regression)

From the nnet package, we use the multinom function. This fits a multinomial log-linear models via neural networks.

从nnet包中，我们使用multinom函数。这通过神经网络拟合多项式对数线性模型。

c_log_model5 <- multinom(Rank ~ ., data = Train_C5[,3:15])

For our upgraded model we are using the same technique as our basic model.

对于我们的升级模型，我们使用与基本模型相同的技术。

upgraded_c_log_model5 <- train(Rank ~  ., data = over_5[,3:15],                               method = 'multinom',                               trControl = fitControl)

Improvements include

改进包括

Smote data远程数据
10 fold cross-validation10倍交叉验证

Results:

结果：

Our upgraded model resulted in a major increase in overall accuracy.

我们升级后的模型大大提高了整体准确性。

6.2决策树 (6.2 Decision Trees)

c_tree_model5 <- rpart(Rank ~ ., data = Train_C5[,3:15])

Our basic decision tree uses the rpart library to predict Results. The decision tree automatically does variable selection. Below is our final tree:

我们的基本决策树使用rpart库预测结果。决策树自动进行变量选择。下面是我们的最后一棵树：

upgraded_c_tree_model5 <- train(Rank ~ ., data = over_5[,3:15],                                method = "ctree",                                trControl = fitControl)

Our upgraded model is implemented with Caret, but behind the scenes, it uses the party library. Since it’s not using the rpart package we won’t be able to plot the decision tree. Ctree does have a plot function, but it’s not as user friendly.

我们的升级模型是使用Caret实现的，但在后台使用了聚会库。由于未使用rpart包，因此我们无法绘制决策树。 Ctree确实具有绘图功能，但是它不那么用户友好。

Improvements include

改进包括

ctree instead of rpart (party library instead of rpart library)用ctree代替rpart(用方库代替rpart库)
Smote data远程数据
10 fold cross-validation10倍交叉验证

Results:

结果：

Our upgraded model resulted in a major increase in overall accuracy.

我们升级后的模型大大提高了整体准确性。

6.3人工神经网络 (6.3 Artificial Neural Networks)

c_ann_model5 <- nnet(Rank ~  ., data = Train_C5[,3:15], size = 1)

Our basic ann model uses the neural nnet package to model our data.

我们的基本ann模型使用神经网络nnet包对数据进行建模。

nnetGrid <-  expand.grid(size = seq(from = 1, to = 10, by = 1),                       decay = seq(from = 0.1, to = 0.5, by = 0.1))set.seed(4321)# ANN modelupgraded_c_ann_model5 <- train(Rank ~  ., data = over_5[,3:15],                               method = "nnet",                               preProcess = c('center', 'scale'),                               tuneGrid = nnetGrid,                               trControl = fitControl)

Improvements include

改进包括

Smote data远程数据
10 fold cross-validation10倍交叉验证
preprocessing预处理
nnetGridnnetGrid

Results:

结果：

Our upgraded model resulted in a major increase in overall accuracy.

我们升级后的模型大大提高了整体准确性。

7回归 (7 Regression)

Our target variable for our regression task is Points_Won. In the below graphic the Y-axis is how many points a candidate won while the X-axis is showing a different statistic. You can find out what the X-axis is by looking at the title (top-middle) of each sub-plot.

我们回归任务的目标变量是Points_Won。在下面的图形中，Y轴是候选人在X轴显示不同的统计数据时赢得的积分。您可以通过查看每个子图的标题(中上)来找出X轴是什么。

Question: If you inspect each sub-plot, do you find any with a linear relationship?

问题：如果检查每个子图，是否发现它们之间存在线性关系？

To be absolutely sure of linear relationship variables we will also use the correlation matrix (on top of the scatter plot above). The results will help us decide which variables to select for our MLR model.

为了绝对确定线性关系变量，我们还将使用相关矩阵(在上面的散点图上方)。结果将帮助我们确定为MLR模型选择哪些变量。

# Correlation Matrix - Visual firstggcorr(Train_r[,3:15], label = TRUE, label_size = 2.9, hjust = 1, layout.exp = 2)

The way to read this visual is closer to 1 or -1 the more correlated the variables are to each other. A correlation number close to zero implies the absence of a linear relationship. Our focus is to find variables that are correlated to “Points_Won”. We do that by going down the last row and looking for boxes with #’s closest to 1 or -1.

变量之间的相关性越高，读取此视觉效果的方式越接近1或-1。接近零的相关数表示不存在线性关系。我们的重点是找到与“ Points_Won”相关的变量。为此，我们向下走最后一行，然后查找带有＃的数字最接近1或-1的框。

In order to confirm which variables are the most correlated to the “Points_Won” column, we performed a correlation matrix on randomly sampled training data. As this is randomly sampled we thought it would be best to sample it 5 times and select the two variables with the highest correlation to Points_Won.

为了确认哪些变量与“ Points_Won”列最相关，我们对随机采样的训练数据执行了相关矩阵。由于这是随机采样，因此我们认为最好采样5次，然后选择与Points_Won相关性最高的两个变量。

# Correlation Matrix - Ran 5 times randomly cor(Train_r[,2:14])

Each column is a newly ran correlated matrix. The two highest are highlighted in each column. As the data in randomly sampled, there is a variance in correlation from column to column. After 5 runs it seems that win share (WS) and wins (W) will be the two variables we build our MLR model on.

每列都是一个新运行的相关矩阵。最高的两个在每一列中突出显示。由于数据是随机抽样的，因此各列之间的相关性存在差异。经过5次运行之后，似乎胜利份额(WS)和胜利(W)将成为我们建立MLR模型的两个变量。

7.1多元线性回归 (7.1 Multi-linear Regression)

Our basic mlr model uses the lm package.

我们的基本mlr模型使用lm软件包。

r_mlr_model <- lm(Points_Won ~ WS + W, data = Train_r)

Our upgraded model uses the glmnet package to model our upgraded mlr model. Elastic net combines the penalties of ridge regression and lasso to get the best of both worlds. Elastic Net aims at minimizing the following loss function:

我们的升级模型使用glmnet软件包为我们的升级的mlr模型建模。弹性网结合了岭回归和套索的惩罚，从而获得了两全其美的效果。 Elastic Net旨在最小化以下损失函数：

where α is the mixing parameter between ridge (α = 0) and lasso (α = 1).

其中α是脊( α = 0)和套索( α = 1)之间的混合参数。

# Make a custom tuning gridtuneGrid <- expand.grid(alpha = 0:1, lambda = seq(0.0001, 1,                        length = 10))upgraded_r_mlr_model <- train(Points_Won ~., data = Train_r[,3:15],                              method = "glmnet",                              trControl = fitControl,                              tuneGrid = tuneGrid)

Improvements include

改进包括

10 fold cross-validation10倍交叉验证
tuneGridtuneGrid
Elastic net弹力网

Results:

结果：

Our upgraded model resulted in a much better RMSE, R-squared and MAE.

我们升级后的模型产生了更好的RMSE，R平方和MAE。

7.2决策树 (7.2 Decision Tree)

Our basic decision tree model uses the rpart package.

我们的基本决策树模型使用rpart包。

r_tree_model <- rpart(Points_Won ~., data = Train_r[,3:15])

Our upgraded model uses the same rpart package but adds 10-fold cross-validation through the caret package. Both models use the same explanatory variables.

我们的升级模型使用相同的rpart包，但通过插入符号包添加了十倍的交叉验证。两种模型使用相同的解释变量。

upgraded_r_tree_model <- train(Points_Won ~., data = Train_r[,3:15],                               method = "rpart",                               trControl = fitControl)

Improvements include

改进包括

10 fold cross-validation10倍交叉验证

Results:

结果：

Our upgraded model resulted in a much better RMSE, R-squared and MAE.

我们升级后的模型产生了更好的RMSE，R平方和MAE。

7.3人工神经网络 (7.3 Artificial Neural Network)

Our Original ANN Model uses the nnet package

我们的原始 ANN模型使用nnet包

r_ann_model <- nnet(Points_Won ~ ., data = Train_r[,3:15], size =1)

Our upgraded model uses the same nnet package but adds a few modifications through the caret package. Both models use the same explanatory variables.

我们的升级模型使用相同的nnet软件包，但通过插入符号软件包进行了一些修改。两种模型使用相同的解释变量。

# Make a custom tuning gridnnetGrid <-  expand.grid(size = seq(from = 1, to = 10, by = 1),                         decay = seq(from = 0.0001, to = 0.5, by = 0.1))set.seed(4321)# ANN model# You need to add ' linout = 1' to make it a regression model or else you'll only get 1 for an outputupgraded_r_ann_model <- train(Points_Won ~ ., data = Train_r[,3:15],                              method = "nnet",                              trControl = fitControl,                              linout = 1,                              tuneGrid = nnetGrid,                              preProcess = c('center', 'scale'))

When you train a neural network (nnet) using Caret you need to specify two hyper-parameters: size and decay. Size is the number of units in the hidden layer (nnet fit a single hidden layer neural network) and decay is the regularization parameter to avoid over-fitting.

使用Caret训练神经网络(nnet)时，您需要指定两个超参数： size和衰减。大小是隐藏层中的单位数(nnet适合单个隐藏层神经网络)，衰减是避免过度拟合的正则化参数。

Improvements include

改进包括

Smote data远程数据
10 fold cross-validation10倍交叉验证
preprocessing预处理
nnetGridnnetGrid

Results:

结果：

Our upgraded model resulted in a much better RMSE, R-squared and MAE.

我们升级后的模型产生了更好的RMSE，R平方和MAE。

8 2020年MVP候选人的预测 (8 Prediction on 2020 MVP Candidates)

The holdout test data set we will make our final predictions on !!

We will take the best model from section 5 (Binary), 6 (Multiclass) and 7 (Regression) to predict on the Final_Test data. These are considered the “best models” because they resulted in the highest accuracy metrics (which we discussed in their respective sections).

我们将采用第5节(二进制)，第6节(多类)和第7节(回归)中的最佳模型对Final_Test数据进行预测。这些被认为是“最佳模型”，因为它们导致了最高的准确度指标(我们在各自的章节中进行了讨论)。

Below are the results:

结果如下：

Binary (ANN) & Multi (Logistic) & Regression (MLR) model final results.

As you can Giannis is the clear favourite, followed by Lebron. According to our models, Anthony Davis will land in 3rd place for MVP voting.

Giannis显然是最受欢迎的，其次是Lebron。根据我们的模型，安东尼·戴维斯(Anthony Davis)将在MVP投票中排名第三。

9执行摘要(TLDR) (9 Executive Summary (TLDR))

This tutorial has given you an overview of how to implement several Machine Learning algorithms. It includes all the steps involved in achieving better results on unseen testing data. Additionally, this tutorial has highlighted a few methods related to pre-processing, SMOTE and accuracy metrics. One important take away is that just because a model is more complex doesn’t automatically mean it will yield better results.

本教程概述了如何实现几种机器学习算法。它包括在看不见的测试数据上获得更好结果的所有步骤。此外，本教程重点介绍了几种与预处理，SMOTE和准确性指标有关的方法。一个重要的收获是，仅仅因为模型更加复杂，并不意味着它会自动产生更好的结果。

Here’s a list of things I look forward to investigating after publishing this blog:

这是发布此博客后我希望调查的事情清单：

Implementation of ordinal multinomial logistic regression有序多项式逻辑回归的实现
My next Machine Learning project will have much more data. This will lead to seeing the true value of tuning parameters.我的下一个机器学习项目将拥有更多数据。这将导致看到调整参数的真实值。
There are many sampling techniques for balancing data. SMOTE is just one of them. As there’s no single best technique. I would like to experiment with others in the future.有许多用于平衡数据的采样技术。 SMOTE只是其中之一。因为没有单一的最佳技术。我想将来与其他人一起尝试。

翻译自: https://medium.com/swlh/applying-machine-learning-algorithms-to-nba-mvp-data-e4470a531338

机器学习算法如何应用于控制

查看全文

http://www.taodudu.cc/news/show-863474.html

知乎开源机器学习_使用开源数据和机器学习预测海洋温度
:)xception_Xception：认识Xtreme盗梦空间
评估模型如何建立_建立和评估分类ML模型
介绍神经网络_神经网络介绍
人物肖像速写_深度视频肖像
奇异值值分解。svd_推荐系统-奇异值分解（SVD）和截断SVD
机器学习对模型进行惩罚_使用Streamlit对机器学习模型进行原型制作
神经网络实现xor_在神经网络中实现逻辑门和XOR解决方案
sagan 自注意力_请使用英语：自我注意生成对抗网络（SAGAN）
pytorch 音频分类_Pytorch中音频的神经风格转换
变压器 5g_T5：文本到文本传输变压器
演示方法：有抱负的分析师
机器学习模型性能评估_如何评估机器学习模型的性能
深度学习将灰度图着色_通过深度学习为视频着色
工业机器人入门实用教程_机器学习实用入门
facebook 图像比赛_使用Facebook的Detectron进行图像标签
营销大数据分析关键技术_营销分析的3个最关键技能
ue4 gpu构建_待在家里吗为什么不构建GPU Box！
使用机器学习预测天气_使用机器学习的二手车价格预测
python集群_使用Python集群文档
马尔可夫的营销归因
使用Scikit-learn，Spotify API和Tableau Public进行无监督学习
街景图像分割_借助深度学习和街景图像进行城市的大规模树木死亡率研究
多目标分类的混淆矩阵_用于目标检测的混淆矩阵
检测和语义分割_分割和对象检测-第2部分
watson软件使用_使用Watson Assistant进行多语言管理
keras核心已转储_转储Keras-ImageDataGenerator。开始使用TensorFlow-tf.data（第2部分）
闪亮蔚蓝_在R中构建第一个闪亮的Web应用
亚马逊训练alexa的方法_Alexa对话是AI驱动的对话界面新方法
nlp文本相似度_用几行代码在Python中搜索相似文本：一个NLP项目

机器学习算法如何应用于控制_将机器学习算法应用于NBA MVP数据相关推荐

人工免疫算法c语言实例,毕业设计_蚁群算法模拟系统的设计与实现.doc
J I A N G S U U N I V E R S I T Y 本科毕业论文蚁群算法模拟系统的设计与实现 Ant Colony Simulation System Design and ...
dbscan算法中参数的意义_常用聚类算法
一.K-Means 算法步骤: (1) 首先我们选择一些类/组,并随机初始化它们各自的中心点.中心点是与每个数据点向量长度相同的位置.这需要我们提前预知类的数量(即中心点的数量). (2) 计算每个数 ...
python贪心算法几个经典例子_关于贪心算法的一些探讨、经典问题的解决和三种典型的贪心算法算法(哈弗曼，Kruskal，Prim)的Python实现。...
贪心算法(又称贪婪算法)是指,在对问题求解时,总是做出在当前看来是最好的选择.也就是说,不从整体最优上加以考虑,他所做出的是在某种意义上的局部最优解. 贪心算法不是对所有问题都能得到整体最优解,关键是 ...
算法在ros中应用_烟火检测算法——中伟视界人工智能算法AI在智慧工地、石油中的应用_腾讯新闻...
烟火检测算法功能说明及实现原理等一. 软件概述视频智能分析基于目前先进的深度学习算法,通过大量的项目现场素材训练模型,通过本站大量采集的工作服素材,高精度的识别人.安全帽.工作服等识别,本项目主要 ...
burg算法的matlab代码实现_导向滤波算法及其matlab代码实现
导向滤波同样是一种平滑滤波算法,其与最小二乘滤波和双边滤波相比,同样是一种具有边缘保持的功能的图形滤波算法,可以用于处理图形噪点较多的图像,而且此种滤波算法与最小二乘滤波和双边滤波相比,有其独特的特点 ...
mooc数据结构与算法python版期末考试_数据结构与算法Python版-中国大学mooc-试题题目及答案...
数据结构与算法Python版-中国大学mooc-试题题目及答案更多相关问题婴儿出生一两天后就有笑的反应,这种笑的反应属于(). [判断题]填制原始凭证,汉字大写金额数字一律用正楷或草书书写,汉字大 ...
knn算法python理解与预测_理解KNN算法
KNN主要包括训练过程和分类过程.在训练过程上,需要将训练集存储起来.在分类过程中,将测试集和训练集中的每一张图片去比较,选取差别最小的那张图片. 如果数据集多,就把训练集分成两部分,一小部分作为验证 ...
机器学习中用到的概率知识_学习机器学习前，你首先要掌握这些概率论基础知识...
编者按:本文来自微信公众号"将门创投"(ID:thejiangmen),编译:Tom R,36氪经授权转发. 机器学习中有很多十分重要的核心基础概念,掌握这些概念对我们进行机器学习 ...
卷积神经网络算法python实现车牌识别_车牌识别算法之CNN卷积神经网络
原标题:车牌识别算法之CNN卷积神经网络随着我国经济的发展,汽车,特别是小轿车的数量越来越多,智能交通管理系统应运而生.车牌智能自动识别作为智能交通管理系统中的重要组成部分,在智能交通管理中发挥着越 ...

机器学习算法如何应用于控制_将机器学习算法应用于NBA MVP数据

1引言 (1 Introduction)

1.1假设 (1.1 Hypothesis)

2数据准备 (2 Data Preparation)

2.1 Web爬网 (2.1 Web Scrapping)

2.2数据预处理 (2.2 Data Preprocessing)

2.3特征工程 (2.3 Feature Engineering)

2.4训练数据和测试数据 (2.4 Training data and Testing data)

3选型 (3 Model Selection)

逻辑回归 (Logistic Regression)

多项式Logistic回归 (Multinomial Logistic Regression)

多元线性回归(MLR) (Multiple linear regression (MLR))

决策树 (Decision Tree)

人工神经网络 (Artificial Neural Networks)

4.模型精度指标 (4. Model Accuracy Metrics)

4.1分类： (4.1 Classification:)

4.2回归： (4.2 Regression:)

5.二进制分类 (5. Binary Classification)

5.1 Logistic回归 (5.1 Logistic Regression)

5.2决策树 (5.2 Decision Tree)

5.3人工神经网络 (5.3 Artificial Neural Networks)

6.多值分类 (6. Multi-value Classification)

6.1多项式Logistic回归 (6.1 Multinomial Logistic Regression)

6.2决策树 (6.2 Decision Trees)

6.3人工神经网络 (6.3 Artificial Neural Networks)

7回归 (7 Regression)

7.1多元线性回归 (7.1 Multi-linear Regression)

7.2决策树 (7.2 Decision Tree)

7.3人工神经网络 (7.3 Artificial Neural Network)

8 2020年MVP候选人的预测 (8 Prediction on 2020 MVP Candidates)

9执行摘要(TLDR) (9 Executive Summary (TLDR))

相关文章：

机器学习算法如何应用于控制_将机器学习算法应用于NBA MVP数据相关推荐

最新文章

热门文章

机器学习算法如何应用于控制_将机器学习算法应用于NBA MVP数据

1引言 (1 Introduction)

1.1假设 (1.1 Hypothesis)

2数据准备 (2 Data Preparation)

2.1 Web爬网 (2.1 Web Scrapping)

2.2数据预处理 (2.2 Data Preprocessing)

2.3特征工程 (2.3 Feature Engineering)

2.4训练数据和测试数据 (2.4 Training data and Testing data)

3选型 (3 Model Selection)

逻辑回归 (Logistic Regression)

多项式Lo​​gistic回归 (Multinomial Logistic Regression)

多元线性回归(MLR) (Multiple linear regression (MLR))

决策树 (Decision Tree)

人工神经网络 (Artificial Neural Networks)

4.模型精度指标 (4. Model Accuracy Metrics)

4.1分类 ： (4.1 Classification:)

4.2回归： (4.2 Regression:)

5.二进制分类 (5. Binary Classification)

5.1 Logistic回归 (5.1 Logistic Regression)

5.2决策树 (5.2 Decision Tree)

5.3人工神经网络 (5.3 Artificial Neural Networks)

6.多值分类 (6. Multi-value Classification)

6.1多项式Lo​​gistic回归 (6.1 Multinomial Logistic Regression)

6.2决策树 (6.2 Decision Trees)

6.3人工神经网络 (6.3 Artificial Neural Networks)

7回归 (7 Regression)

7.1多元线性回归 (7.1 Multi-linear Regression)

7.2决策树 (7.2 Decision Tree)

7.3人工神经网络 (7.3 Artificial Neural Network)

8 2020年MVP候选人的预测 (8 Prediction on 2020 MVP Candidates)

9执行摘要(TLDR) (9 Executive Summary (TLDR))

相关文章：

机器学习算法如何应用于控制_将机器学习算法应用于NBA MVP数据相关推荐

最新文章

热门文章

多项式Logistic回归 (Multinomial Logistic Regression)

4.1分类： (4.1 Classification:)

6.1多项式Logistic回归 (6.1 Multinomial Logistic Regression)