机器学习 训练验证测试

In my previous article, we have discussed about the need to train and test our model and we wrote a code to split the given data into training and test sets.

在上一篇文章中,我们讨论了训练和测试模型的必要性,并编写了代码将给定的数据分为训练和测试集。

Before moving to the validation portion, we need to see what is the need to use validation procedure before performing the testing procedure in the given data set. At times when we are dealing with a huge amount of data there is a certain chance that maybe the data used by our model during learning produced a biased result and in this case as we use the test set to check the accuracy of our model the following 2 cases can arise:

在转到验证部分之前,我们需要了解在给定数据集中执行测试过程之前,需要使用验证过程进行哪些操作。 有时,当我们处理大量数据时,很有可能我们的模型在学习过程中使用的数据会产生有偏差的结果,在这种情况下,由于我们使用测试集来检查模型的准确性,因此以下可能出现2种情况:

  1. Under fitting of the test data

    测试数据拟合

  2. Over fitting of the test data

    测试数据过度拟合

Image source: https://docs.aws.amazon.com/machine-learning/latest/dg/images/mlconcepts_image5.png

图片来源: https : //docs.aws.amazon.com/machine-learning/latest/dg/images/mlconcepts_image5.png

So then how do we deal with such a problem? Well, the answer is pretty simple if we can somehow use a 3rd data set to validate the results obtained from the training set so that we can adjust the various hyperparameters like learning rate and batch values to get a balanced result on the validation set which will, in turn, increase the accuracy of our model in estimating the target values from the test set.

那么,我们该如何处理这个问题呢? 那么,答案很简单,如果我们能够以某种方式使用三档数据集来验证训练组所取得的成果,使我们可以调整各种超参数就像学率和批量值来得到验证集一个平衡的结果,其反过来,将提高我们的模型从测试集中估算目标值的准确性。

Image source: https://rpubs.com/charlydethibault/348566

图片来源: https : //rpubs.com/charlydethibault/348566

Here, you can see that the validation set is nothing but a subset of the training data set that we create. Here do remember that when we create a partition from a dataset. The data present in the datasets are shuffled randomly to remove biased results.

在这里,您可以看到验证集不过是我们创建的训练数据集的子集。 这里要记住,当我们根据数据集创建分区时。 数据集中存在的数据会随机洗牌以消除有偏见的结果。

So, let us write a simple code to create a validation data set in python:

因此,让我们编写一个简单的代码来在python中创建一个验证数据集:

File: headbrain.CSV

文件: headbrain.CSV

Here is the code:

这是代码:

# -*- coding: utf-8 -*-
"""
Created on Wed Aug  1 22:18:11 2018
@author: Raunak Goswami
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#reading the data
"""here the directory of my code and the headbrain.csv
file is same make sure both the files are stored in the same folder
or directory"""
data=pd.read_csv('headbrain.csv')
#this will show the first five records of the whole data
data.head()
#this will create a variable x which has the feature values i.e brain weight
x=data.iloc[:,2:3].values
#this will create a variable y which has the target value i.e brain weight
y=data.iloc[:,3:4].values
#splitting the data into training and test
"""
the following statement written below will split x and y into 2 parts:
1.training variables named x_train and y_train
2.test variables named x_test and y_test
The splitting will be done in the ratio of 1:4 as we have mentioned
the test_size as 1/4 of the total size
"""
from sklearn.cross_validation import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/4,random_state=0)
#Here we again split the training data further
##into training and validating sets.
#observe that the size of the validating set is
#1/4 of the training set and not of the whole dataset
from sklearn.cross_validation import train_test_split
x_training,x_validate,y_training,y_validate=train_test_split(x_train,y_train,test_size=1/4,random_state=0)

After running this python code on your Spyder tool provided by the Anaconda distribution just cross check your variable explorer:

在Anaconda发行版提供的Spyder工具上运行此python代码后,只需交叉检查变量浏览器即可:

On the image above you can see that we have split the train variables into training variables and validate variables.

在上图中,您可以看到我们已将训练变量分为训练变量并验证了变量。

So, guys that is it for today hope you liked this article. Have a great day ahead.

所以,今天的家伙们希望您喜欢这篇文章。 祝您有美好的一天。

翻译自: https://www.includehelp.com/ml-ai/validation-before-testing.aspx

机器学习 训练验证测试

机器学习 训练验证测试_测试前验证| 机器学习相关推荐

  1. 机器学习 测试_测试优先机器学习

    机器学习 测试 Testing software is one of the most complex tasks in software engineering. While in traditio ...

  2. 路科验证示例_角度形式验证示例

    路科验证示例 In this post, we will see how Angular form validation works. Earlier we looked into angular f ...

  3. 第一章-机器学习简介 深度之眼_吴恩达机器学习作业训练营

    目录 专栏简介: 一,机器学习简介 1.1 机器学习定义 1.1 机器学习的重要性 1.2 应用领域 二.监督学习 三.无监督学习 四.总结 专栏简介: 本栏主要内容为吴恩达机器学习公开课的学习笔记, ...

  4. web应用程序并发测试_测试并发应用

    web应用程序并发测试 本文是我们名为Java Concurrency Essentials的学院课程的一部分. 在本课程中,您将深入探讨并发的魔力. 将向您介绍并发和并发代码的基础知识,并学习诸如原 ...

  5. unity urp测试_测试Unity

    unity urp测试 It's been a fair while since we wrote a post about testing Unity, so we'd like to update ...

  6. pki 证书验证机制_网络身份验证的PKI签名请求和证书颁发

    pki 证书验证机制 In a PKI (Public Key Infrastructure) system, proof of identity and ownership of key pairs ...

  7. 用python做问答测试_测试用户输入Python

    我在用Python测试代码的输入时遇到了问题.我尝试了几个解决方案,但有一些东西我遗漏了,所以如果你能给我一些建议,我将不胜感激.在 首先,这里是我要测试的主代码文件的一个片段:if __name__ ...

  8. python网格测试_测试d的numpy网格大小调整

    我有一个形状为(10049280)的测试数据集x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5 y_min, y_max = X[:, 1]. ...

  9. java正则表达式验证密码_最新密码验证正则表达式

    正则表达式验证密码功能在项目中经常被使用到,但是很多朋友还是不大会使用密码正则表达式进行验证,本文小编为大家整理了php密码验证正则表达式.python密码强度正则,当然还有大家常用到的js正则表达式 ...

最新文章

  1. BZOJ 1592. Making the Grade(思维,数据结构优化DP,以及三个拓展问题)[Usaco2008 Feb]【BZOJ计划】
  2. VM与VPC共存(续)
  3. options请求_前端数据请求的终极方案
  4. 解决:which: no java in (/root/chengxu/maven/apache-maven-3.5.2/bin:/usr/local/sbin:/usr/local/bin:/usr
  5. Linux跑齿轮命令,【转】glxgears命令
  6. 微服务网关总结之 —— zuul
  7. 压缩感知重构算法之OMP算法python实现
  8. html转word 自动分页,word怎样自动分页
  9. Android 获取手机分辨率
  10. 灰灰考研c语言讲义,灰灰考研数据结构全书定稿demo.pdf
  11. 固体物理 2022.9.27
  12. ad中按钮开关的符号_SAST Weekly | Word中的公式语法
  13. android 如何绕过签名校验
  14. 英语学习口诀大全be 的用法口诀
  15. python3-函数与参数以及空值
  16. 可以完美在 wps中使用zotero的方法
  17. MTK5G-MT6853(天玑720)
  18. Windows安装Weblogic
  19. 万字长文总结的Linux从入门到精通的必会知识!
  20. 虚拟机中Ubuntu右上角没有网络图标解决方法

热门文章

  1. svd降维 python案例_SVD(奇异值分解)Python实现
  2. atomikosdatasourcebean mysql_SpringBoot2整合JTA组件实现多数据源事务管理
  3. springboot怎么设置多个路径全部跳转首页_SpringBoot(四)—Web开发(二)
  4. Python二级笔记(18,19合集知识点篇)
  5. 【思维】Congestion Charging Zone
  6. Web框架之Django_01初识(三大主流web框架、Django安装、Django项目创建方式及其相关配置、Django基础三件套:HttpResponse、render、redirect)...
  7. 【Infragistics教程】在javascript构造函数中创建基本继承
  8. Maven的Settings.xml配置文件解释
  9. oracle 10g学习之分组函数
  10. Nginx学习笔记(五) 源码分析内存模块内存对齐