机器学习回归预测

Introduction: The applications of machine learning range from games to autonomous vehicles; one very interesting application is with education. With machine learning, regression algorithms, we can use a student dataset to predict the grades of students in their exams. This is an interesting application as it allows the teachers to be able to predict students grades early before the exams and find ways to assist the students who are not expected to perform so well. This article provides a detail explanation of how to use python to carry out this machine learning prediction task.

介绍: 机器学习的应用范围从游戏到自动驾驶汽车; 一个非常有趣的应用是教育。 通过机器学习,回归算法,我们可以使用学生数据集来预测学生在考试中的成绩。 这是一个有趣的应用程序,它使教师能够在考试前及早预测学生的成绩,并找到方法来帮助那些表现不佳的学生。 本文详细说明了如何使用python来执行此机器学习预测任务。

Dataset: This study considers data collected during the 2005–2006 school year from two public schools, from the Alentemol region of Portugal. The database was built from two sources: school reports, and questionnaires, related to several demographic (.e.g. Mother’s education, family income), social/emotional (e.g. alcohol consumption) and school related (e.g. number of past class failures) variables that are expected to affect student performance.

资料集: 本研究考虑了2005-2006学年期间从葡萄牙阿连特莫尔地区的两所公立学校收集的数据。 该数据库由两个来源建立:学校报告和调查表,与以下几个人口统计指标(例如母亲的学历,家庭收入),社会/情感(例如饮酒)和学校相关(例如上课失败的次数)相关预期会影响学生的表现。

The datasets used for this project is publicly available on Kaggle and can be downloaded with these urls:

该项目使用的数据集可在Kaggle上公开获得,并可使用以下网址下载:

- https://www.kaggle.com/ozberkgunes/makineogrenmesiodev2-student-grande-prediction/data

-https://www.kaggle.com/ozberkgunes/makineogrenmesiodev2-student-grande-prediction/data

- https://www.kaggle.com/imkrkannan/student-performance-data-set-y-uci

-https://www.kaggle.com/imkrkannan/student-performance-data-set-y-uci

The dimensions in the dataset are all explained and summarized in the table below.

下表中解释并总结了数据集中的维度。

Data Pre-processing: Before we can apply our regression algorithms on our dataset, we first need to pre-process our dataset to make sure we have handled both empty and categorical values.

数据预处理: 在将回归算法应用于数据集之前,我们首先需要对数据集进行预处理,以确保我们处理了空值和分类值。

Firstly, we check for the empty values in our dataset by using the isnull and sum function as shown in the code snippet below.

首先,如下面的代码片段所示,我们使用notull和sum函数检查数据集中的空值。

Code snippet for reading dataset and checking for null values
用于读取数据集并检查空值的代码段

We find out we only have 1 empty values for each column; we figure out we have an insignificant number of empty rows, hence we simply drop all empty columns by using the dropna function.

我们发现每一列只有1个空值; 我们发现空行的数量微不足道,因此我们只需使用dropna函数删除所有空列。

Code Snippet for dropping all null columns用于删除所有空列的代码段

Selecting the Columns to Use for Regression Using Correlation:There are many ways to select the columns for regression; some of these ways include using p values, using their correlation, or using a feature selection method. In this case, our target column is G3 (the final exam result for the students), we decided to make use of a heatmap showing the correlation between all columns.

使用相关性选择要用于回归的列: 有很多方法可以选择要回归的列。 其中一些方法包括使用p值,使用它们的相关性或使用特征选择方法。 在这种情况下,我们的目标列是G3(学生的期末考试成绩),我们决定使用显示所有列之间相关性的热图。

Code snippet to create correlation heat map
代码段以创建关联热图

From the heat map we find that the columns with the most relevant correlation to G3 are G1,G2, Medu and failures and hence these are the columns we will use for our regression. G1 and G2 represent the student’s performance in previous assessments. This is not surprising as we would expect students who perform well to most likely perform well, again, in the final assessment. Medu represents the mothers level of education, while failures represents the number of assessments previously failed by the student. Both these two properties are not surprisingly as anyone would expect a student with very few previously failed courses to do well in an exam.

从热图中,我们发现与G3最相关的列是G1,G2,Medu和failures ,因此这些是我们将用于回归的列。 G1和G2代表学生在以前的评估中的表现。 这并不奇怪,因为我们期望表现良好的学生在最终评估中再次表现出色。 Medu代表母亲的教育程度,而失败则代表学生先前未通过的评估数量。 这两个属性都不令人惊讶,因为任何人都希望以前很少通过课程的学生在考试中表现出色。

Applying Regression Algorithm:The selected columns are all non-categorical values so no need to use any method i.e. one hot encoding to handle categorical data hence we can move straight to applying our regression algorithm on our dataset with our selected columns (G1,G2, Medu and failures).

应用回归算法: 选定的列都是非分类值,因此无需使用任何方法,即使用一种热编码来处理分类数据,因此我们可以直接将回归算法应用到具有选定列( G1,G2,Medu和failures )的数据集上。

I created a function to run the regression models, it accepts the algorithms names as parameter along with the regression objects and prints out the resulting model’s accuracy and RMSE.

我创建了一个运行回归模型的函数,它接受算法名称作为参数以及回归对象,并打印出结果模型的准确性和RMSE。

For this project I selected 5 different regression algorithms to use on the dataset:

对于此项目,我选择了5种不同的回归算法以用于数据集:

- Linear regression

-线性回归

- Ridge regression

-岭回归

- Lasso regression

-套索回归

- Elastic Net regression

-弹性净回归

- Orthogonal Matching Pursuit CV regression

-正交匹配追踪CV回归

Before proceeding to training our regression models, we need to split our dataset into the training and testing data. This is very important as we don’t want to train and test our model with the same set of data hence the need for the split. We achieve this we the code snippet below:

在继续训练回归模型之前,我们需要将数据集分为训练和测试数据。 这一点非常重要,因为我们不想使用相同的数据集来训练和测试模型,因此不需要拆分。 我们通过以下代码片段实现了这一目标:

We simply create an array of these regression models and pass it to the run _reg_models function as shown in the code snippet below:

我们只需创建一个这些回归模型的数组,然后将其传递给run _reg_models函数,如下面的代码片段所示:

Results and Future Works:

结果和未来工作:

The table above shows the resulting accuracies for the different regressor models with Linear, ridge and Orthogonal matching pursuit CV having the highest accuracy of 82% and the others at 81%. There is not a lot of difference between the different regression models both with their accuracy and RMSE.

上表显示了具有线性,脊形和正交匹配追踪CV的不同回归模型的最终精度,其最高精度为82%,其他精度为81%。 不同的回归模型在准确性和均方根误差方面没有太大差异。

An accuracy of 82% is okay but we would still need to fine tune the hyper-parameters i.e. test with different parameters for the different regression algorithms. If we do this, we may get higher accuracy values. It’s also important to note that this is not the final accuracy, we would still need to test our model(s) with external datasets to see how robust our model is.

82%的精度是可以的,但是我们仍然需要微调超参数,即针对不同的回归算法使用不同的参数进行测试。 如果这样做,我们可能会获得更高的精度值。 同样重要的是要注意,这不是最终的准确性,我们仍然需要使用外部数据集测试模型,以查看模型的健壮性。

There are many ways to select a model i.e. time to train model, time to predict and many other methods, but in this case we will use the model with the least RMSE as they all have similar accuracies and all take a similar time to run.

选择模型的方法有很多,例如训练模型的时间,预测的时间和许多其他方法,但是在这种情况下,我们将使用具有最小RMSE的模型,因为它们具有相似的精度并且都需要相似的时间来运行。

Our selected model is the Orthogonal pursuit as it has the lowest RMSE. In the future, we will play around with the hyperparameters in order to see how much of a difference it makes with both the accuracy and RMSE and if it makes a high difference, we will use redo our model selection.

我们选择的模型是正交追求,因为它具有最低的RMSE。 将来,我们将研究超参数,以了解它对精度和均方根误差的影响有多大,如果影响很大,我们将使用重做模型选择。

A future work will be to test our models with an external dataset (any dataset like this but one that our model has not been trained with before) and see how well our model performs. It is also important to note that the dataset used in this article was from a study carried out 15 years ago, another possible future works will be to find a newer dataset and to test our model on it and see the result.

未来的工作将是使用外部数据集(这样的任何数据集,但我们以前从未对其进行过训练的数据集)来测试我们的模型,并查看模型的性能如何。 还需要注意的是,本文中使用的数据集来自15年前进行的一项研究,未来的另一项可能的工作是找到一个更新的数据集并在其上测试我们的模型并查看结果。

Extra: Using A Model For Prediction

附加:使用模型进行预测

It’s interesting that most articles don’t include a section where they show case the prediction abilities of the model. Here, I have added this section so anyone who is a bit confused understands exactly how to use our models for predictions. Our selected model is the Orthogonal Matching Pursuit CV and hence it is what we use for the prediction as shown with the code snippet below:

有趣的是,大多数文章都没有包含展示模型预测能力的章节。 在这里,我添加了本节,以便任何有点困惑的人都能准确地了解如何使用我们的模型进行预测。 我们选择的模型是Orthogonal Matching Pursuit CV,因此它是我们用于预测的模型,如下面的代码片段所示:

Code snippet for using our selected model for making predictions
使用我们选择的模型进行预测的代码片段

Our regressor isn’t a 100% accurate but it’s pretty close and hence could be really used to predict the grades of students quite well.

我们的回归指标并不是100%准确的,但它非常接近,因此可以真正用来很好地预测学生的成绩。

It’s possible to use the other models to also make predictions and it’s something you might be interested in trying out as an extra exercise.

可以使用其他模型进行预测,这可能是您有兴趣尝试做为额外练习的事情。

Thank you for reading my article, please reach out to me if you have any questions to ask.

感谢您阅读我的文章,如果您有任何疑问请与我联系。

翻译自: https://medium.com/@kole.audu/predicting-high-school-students-grades-with-machine-learning-regression-3479781c185c

机器学习回归预测


http://www.taodudu.cc/news/show-2920651.html

相关文章:

  • CPU当中的分支预测
  • 人工学习之预测2023年考研英语答案分布
  • python神经网络预测股价_用Python预测股票价格变化
  • 时序预测 | MATLAB实现ARIMA时间序列预测(GDP预测)
  • CPU 分支预测
  • 人脸颜值预测(facial beauty prediction)综述
  • 【论文阅读】流量预测
  • 2022年12月英语六级预测范文—预测范文:人生哲理、人生
  • 链接预测(Link Prediction)
  • python xgb模型 预测_如何使用XGBoost模型进行时间序列预测
  • 【时间序列预测】股价预测零售预测
  • 利用“bert模型”预测英语“完形填空”答案
  • excel绘制回归直线
  • speedoffice(Excel)如何画直线
  • 读取excel数据 画k线 成交量图
  • pandas读取excel数据以及matplotlib的散点与直线图绘画
  • 根据离散点画直线_excel表格怎么画散点图画直线
  • excel 添加垂直竖向直线
  • POI Excel插入线条(直线、斜线)
  • 双环形图表_excel两个环形图怎么画
  • Excel数据导入Matlab绘图
  • excel使用—— 宏 | VBA | 画线
  • 02 画一条直线
  • python在excel中数据画线_python中操作excel数据
  • python在excel中数据画线_在python中使用excel工作表中的数据绘制图形
  • JAVA -- NPOI在excel中画直线
  • 如何在EXCEL中画横线并输入汉字
  • 【安全牛学习笔记】密钥交换、AIRCRACK-NG基础、AIRODUMP-NG排错
  • Some DPF
  • Car的旅行路线

机器学习回归预测_通过机器学习回归预测高中生成绩相关推荐

  1. 机器学习 量子_量子机器学习:神经网络学习

    机器学习 量子 My last articles tackled Bayes nets on quantum computers (read it here!), and k-means cluste ...

  2. 机器学习 预测模型_使用机器学习模型预测心力衰竭的生存时间-第一部分

    机器学习 预测模型 数据科学 , 机器学习 (Data Science, Machine Learning) 前言 (Preface) Cardiovascular diseases are dise ...

  3. 机器学习 生成_使用机器学习的Midi混搭生成独特的乐谱

    机器学习 生成 AI Composers present ideas to their human partners. People can then take certain elements an ...

  4. python 机器学习管道_构建机器学习管道-第1部分

    python 机器学习管道 Below are the usual steps involved in building the ML pipeline: 以下是构建ML管道所涉及的通常步骤: Imp ...

  5. 小时转换为机器学习特征_通过机器学习将pdf转换为有声读物

    小时转换为机器学习特征 This project was originally designed by Kaz Sato. 该项目最初由 Kaz Sato 设计 . 演示地址 I made this ...

  6. 机器学习:numpy版本线性回归预测波士顿房价

    机器学习:numpy版本线性回归预测波士顿房价 导入数据 划分数据 模型 数据链接 链接: https://pan.baidu.com/s/1uDG_2IZVZCn9kndZ_ZIGaA?pwd=ne ...

  7. 分类预测回归预测_我们应该如何汇总分类预测?

    分类预测回归预测 If you are reading this, then you probably tried to predict who will survive the Titanic sh ...

  8. logistic回归预测_使用Logistic回归的suv购买预测

    logistic回归预测 In this blog-post ,I will go through the process of creating a machine learning model f ...

  9. 机器学习算法如何应用于控制_将机器学习算法应用于NBA MVP数据

    机器学习算法如何应用于控制 A step-by-step tutorial in R R中的分步教程 1引言 (1 Introduction) This blog makes up the Machi ...

最新文章

  1. css 加载动画如何生效,CSS 加载动画
  2. 机房布线的最高境界……
  3. 一个域名可以对应多个ip地址吗_域名解析 | A记录 ,CNAME,MX,NS 你懂了吗
  4. Git pull[push] 不用每次输入用户名和密码
  5. 6月8号=》105页-110页
  6. mysql 只导数据不含表结构
  7. 程序员面试系列——冒泡排序
  8. C语言程序设计 | 操作符介绍与使用方法
  9. linux怎样服务,如何在linux添加服务
  10. Valid Number
  11. datatable中某一列最小值_获取DataTable 某一列所有值
  12. Mac快捷键和实用技巧
  13. 2021抖音电商达人生态报告
  14. ofdma技术_SC-FDMA技术
  15. 单片机c语言按照长度分割字符串,单片机c语言字符串操作
  16. android我的世界连接pc,我的世界手机玩电脑版操作教程(可以连接pc版服务器)
  17. 深入理解设计模式之模板模式
  18. pc模仿移动端滚动条样式,好看就对了
  19. 整型常量是整数类型的数据
  20. torch.randn用法

热门文章

  1. C++题目:因数最多
  2. rz command
  3. H3C交换机查询光功率
  4. mc服务器ip是网站,我的世界服务器地址大全
  5. KunlunBase集群管理接口
  6. mysql实践周心得_实践周心得体会4篇
  7. Java svg图片转png图片
  8. workbook 读取excel表格
  9. KS值和GINI系数
  10. c语言中14 3,C语言中的单精度和双精度浮点型的区别!3.14是哪种?3.14159呢?