样本修改 sample

Let’s understand one of the frequently used functions, sample() in R. In data analysis, taking samples of the data is the most common process done by the analysts. To study and understand the data, sometimes taking a sample is the best way and it is mostly true in case of big data.

让我们了解R中最常用的函数之一sample()。在数据分析中,对数据进行采样是分析师最常用的过程。 要研究和理解数据,有时取样是最好的方法,并且在大数据的情况下通常是正确的。

R offers the standard function sample() to take a sample from the datasets. Many business and data analysis problems will require taking samples from the data. The random data is generated in this process with or without replacement, which is illustrated in the below sections.

R提供了标准函数sample()来从数据集中获取样本。 许多业务和数据分析问题都需要从数据中取样。 随机数据是在此过程中生成的,有无替换,如下节所示。



Let’s roll into the topic!!!

让我们进入主题!!!

R中sample()的语法 (Syntax of sample() in R)


sample(x, size, replace = FALSE, prob = NULL)
  • x – vector or a data set.x –向量或数据集。
  • size – sample size.大小 –样本大小。
  • replace – with or without replacement of values.替换 –替换或不替换值。
  • replace – with or without replacement of values.替换 –替换或不替换值。
  • prob – probability weights概率 –概率权重


更换样品 (Taking samples with replacement)

You may wonder, what is taking samples with replacement?

您可能想知道,正在取样替换的样品是什么?

Well, while you are taking samples from a list or a data, if you specify replace=TRUE or T, then the function will allow repetition of values.

好吧,当您从列表或数据中取样时,如果指定replace = TRUE或T ,则该函数将允许重复值。

Follow the below example which clearly explains the case.

请遵循以下示例,该示例清楚地说明了这种情况。


#sample range lies between 1 to 5
x<- sample(1:5)
#prints the samples
x
Output -> 3 2 1 5 4#samples range is 1 to 5 and number of samples is 3
x<- sample(1:5, 3)
#prints the samples (3 samples)
x
Output -> 2 4 5#sample range is 1 to 5 and the number of samples is 6
x<- sample(1:5, 6)
x
#shows error as the range should include only 5 numbers (1:5)
Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE'#specifing replace=TRUE or T will allow repetition of values so that the function will generate 6 samples in the range 1 to 5. Here 2 is repeated.x<- sample(1:5, 6, replace=T)
Output -> 2 4 2 2 4 3


R中未更换的样品 (Samples Without Replacement in R)

In this case, we are going to take samples without replacement. The whole concept is shown below.

在这种情况下,我们将取样 而不更换 样品 。 整个概念如下所示。

In this case of without replacement, the function replace=F is used and it will not allow the repetition of values.

在不替换的情况下,将使用函数replace = F ,它将不允许重复值。


#samples without replacement
x<-sample(1:8, 7, replace=F)
x
Output -> 4 1 6 5 3 2 7
x<-sample(1:8, 9, replace=F)
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'#here the size of the sample is equal to range 'x'.
x<- sample(1:5, 5, replace=F)
x
Output -> 5 4 1 3 2


使用函数set.seed()进行采样 (Taking samples using the function set.seed())

As you may experience that when you take the samples, they will be random and change each time. In order to avoid that or if you don’t want different samples each time, you can make use of set.seed() function.

正如您可能会遇到的那样,当您采样时,它们将是随机的,并且每次都会改变。 为了避免这种情况,或者如果您不想每次都使用不同的样本,可以使用set.seed()函数。

set.seed() – set.seed function will produce the same sequence when you run it.

set.seed() – set.seed函数在运行时会产生相同的序列。

This case is illustrated below, execute the below code to get the same random samples each time.

下面说明了这种情况,执行以下代码以每次获得相同的随机样本。


#set the index
set.seed(5)
#takes the random samples with replacement
sample(1:5, 4, replace=T)
2 3 1 3set.seed(5)
sample(1:5, 4, replace=T)
2 3 1 3set.seed(5)
sample(1:5, 4, replace=T)
2 3 1 3


从数据集中获取样本 (Taking the sample from a dataset)

In this section, we are going to generate samples from a dataset in Rstudio.

在本节中,我们将从Rstudio中的数据集中生成样本。

This code will take the 10 rows as a sample from the ‘ToothGrowth’ dataset and display it. In this way, you can take the samples of the required size from the dataset.

此代码将从“ ToothGrowth”数据集中获取10行作为示例并显示它。 这样,您可以从数据集中获取所需大小的样本。


#reads the dataset 'Toothgrwoth' and take the 10 rows as sample
df<- sample(1:nrow(ToothGrowth), 10)
df
--> 53 12 16 26 37 27  9 22 28 10
#sample 10 rows
ToothGrowth[df,]len supp dose
53 22.4   OJ  2.0
12 16.5   VC  1.0
16 17.3   VC  1.0
26 32.5   VC  2.0
37  8.2   OJ  0.5
27 26.7   VC  2.0
9   5.2   VC  0.5
22 18.5   VC  2.0
28 21.5   VC  2.0
10  7.0   VC  0.5


使用set.seed()函数从数据集中获取样本 (Taking the samples from the dataset using the set.seed() function)

In this section, we are going to use the set.seed() function to take the samples from the dataset.

在本节中,我们将使用set.seed()函数从数据集中获取样本。

Execute the below code to generate the samples from the data set using set.seed().

执行以下代码,使用set.seed()从数据集中生成样本。


#set.seed function
set.seed(10)
#taking sample of 10 rows from the iris dataset.
x<- sample(1:nrow(iris), 10)
x
--> 137  74 112  72  88  15 143 149  24  13
#displays the 10 rows
iris[x, ]Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
137          6.3         3.4          5.6         2.4  virginica
74           6.1         2.8          4.7         1.2 versicolor
112          6.4         2.7          5.3         1.9  virginica
72           6.1         2.8          4.0         1.3 versicolor
88           6.3         2.3          4.4         1.3 versicolor
15           5.8         4.0          1.2         0.2     setosa
143          5.8         2.7          5.1         1.9  virginica
149          6.2         3.4          5.4         2.3  virginica
24           5.1         3.3          1.7         0.5     setosa
13           4.8         3.0          1.4         0.1     setosa

You will get the same rows when you execute the code multiple times. The values won’t change as we have used the set.seed() function.

多次执行代码时,将获得相同的行。 这些值不会更改,因为我们已经使用过set.seed()函数。



在R中使用sample()生成随机样本 (Generating a random sample using sample() in R)

Well, we will understand this concept with the help of a problem.

好吧,我们将在问题的帮助下理解这个概念。

Problem: A gift shop has decided to give a surprise gift to one of its customers. For this purpose, they have collected some names. The thing is to choose a random name out of the list.

问题:一家礼品店已决定向其一位顾客提供惊喜礼物。 为此,他们收集了一些名称。 事情是从列表中选择一个随机名称。

Hint: use the sample() function to generate random samples.

提示:使用sample()函数生成随机样本。

As you can see below, every time you run this code, it generates a random sample of participant names.

如下所示,每次运行此代码时,它都会随机生成参与者名称样本。


#creates a list of names and generates one sample from this list
sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Rossie"sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Jolie"sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "jack"sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Edwards"sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Kyle"


通过设置概率采样 (Taking samples by setting the probabilities )

With the help of the above examples and concepts, you have understood how you can generate random samples and extract specific data from a dataset.

借助以上示例和概念,您已经了解了如何生成随机样本并从数据集中提取特定数据。

Some of you may feel relaxed if I say that R allows you to set the probabilities, as it may solve many problems. Let’s see how it works with the help of a simple example.

如果我说R允许您设置概率,则有些人可能会感到轻松,因为它可以解决许多问题。 让我们在一个简单的示例的帮助下看看它是如何工作的。

Let’s think of a company that is able to manufacture 10 watches. Among these 10 watches, 20% of them are found defective. Let’s illustrate this with the help of the below code.

让我们考虑一家能够制造10只手表的公司。 在这10只手表中,有20%被发现有缺陷。 让我们借助以下代码来说明这一点。


#creates a probability of 80% good watches an 20% effective watches.sample (c('Good','Defective'), size=10, replace=T, prob=c(.80,.20))"Good"      "Good"      "Good"      "Defective" "Good"      "Good"
"Good"      "Good"      "Defective" "Good"

You can also try for different probability adjustments as shown below.

您还可以尝试进行如下所示的不同概率调整。

sample (c('Good','Defective'), size=10, replace=T, prob=c(.60,.40))--> "Good"      "Defective" "Good"      "Defective" "Defective" "Good"     "Good"      "Good"      "Defective" "Good"


结语 (Wrapping up)

In this tutorial, you have learned how to generate the sample from the dataset, vector, and a list with or without replacement. The set.seed() function is helpful when you are generating the same sequence of samples.

在本教程中,您学习了如何从数据集, 向量以及有或没有替换的列表中生成样本。 当您生成相同的样本序列时,set.seed()函数会很有用。

Try taking samples from various datasets available in R and also you can import some CSV files to take samples with probability adjustments as shown.

尝试从R中可用的各种数据集中获取样本,也可以导入一些CSV文件以通过概率调整来获取样本,如图所示。

More study: R documentation

更多研究: R文档

翻译自: https://www.journaldev.com/39307/sample-in-r

样本修改 sample

样本修改 sample_如何在R中使用sample()获取样本?相关推荐

  1. csv文件示例_如何在R中使用数据框和CSV文件-带有示例的详细介绍

    csv文件示例 Welcome! If you want to start diving into data science and statistics, then data frames, CSV ...

  2. 如何在 R 中计算调整后的 R 平方

    如果有什么问题和项目作业关于R语言,可以微信call我:RunsenLiu R 平方,通常写成 R 2,是响应变量中的方差比例,它可以由线性回归模型中的预测变量来解释. R-squared 的值可以在 ...

  3. 如何在 R 中计算 Bray-Curtis 相异度

    Bray-Curtis Dissimilarity是一种衡量两个不同站点之间差异的方法. 它经常在生态学和生物学中用于量化两个地点在这些地点发现的物种的不同之处. 计算如下: BC ij = 1 – ...

  4. 如何在R中正确使用列表?

    本文翻译自:How to Correctly Use Lists in R? Brief background: Many (most?) contemporary programming langu ...

  5. rstudio r语言_如何在R中接受用户输入?

    rstudio r语言 Taking a user input is very simple in R using readline() function. In this tutorial, we ...

  6. 如何在 R 中执行 Wald 测试

    Wald 检验可用于测试模型中的一个或多个参数是否等于某些值. 此检验通常用于确定回归模型中的一个或多个预测变量是否等于零. 我们对此测试使用以下无效假设和替代假设: H 0:一些预测变量都等于零. ...

  7. 如何在 R 中计算条件概率

    假设事件B已经发生,事件A发生的条件概率计算如下: P(A|B) = P(A∩B) / P(B) 在哪里: P(A∩B) = 事件A 和事件 B 同时发生 的概率 . P(B) = 事件 B 发生的概 ...

  8. 如何在 R 中读取 Zip 文件

    您可以使用以下基本语法将 ZIP 文件读入 R: library(readr)#import data1.csv located within my_data.zip df <- read_cs ...

  9. 如何在 R 中找到 F 临界值

    当您进行 F 检验时,您将获得 F 统计量作为结果.要确定 F 检验的结果是否具有统计显着性,可以将 F 统计量与 F 临界值进行比较.如果 F 统计量大于 F 临界值,则检验结果具有统计显着性. F ...

最新文章

  1. 正式环境docker部署hyperf_应用部署 - Docker Swarm 集群搭建 - 《Hyperf v1.1.1 开发文档》 - 书栈网 · BookStack...
  2. arm指令中mov和ldr及ldr伪指令的区别
  3. PHP垃圾回收机制防止内存溢出
  4. maven安装_如何从官网下载Maven与安装Maven
  5. mysql中xml类型_使用 SQLXML 数据类型
  6. Boost:BOOST_CURRENT_FUNCTION的测试程序
  7. 此项目与visual studio的当前版本不兼容_忘掉GOPATH,迎接Go modules,进入Go项目依赖库版本管理新时代...
  8. 申请Let's Encrypt永久免费SSL证书
  9. 开源阅读_开源如何维持您的阅读习惯
  10. SAP License:SAP PFCG或SEARCH_SAP_MENU文本乱码
  11. 如何快速转载CSDN博客(详细)
  12. php汉字组合算法,php数字转汉字的函数算法
  13. Android 中多点触摸协议
  14. 向量与矩阵(点线距离与交点)
  15. ybt1359: 围成面积
  16. office2007在ppt中插入文件对象(可以单击打开)
  17. html向服务器发送请求有哪些方法,HTTP协议客户端是如何向服务器发送请求
  18. 回溯(python)
  19. Python之xlrd读Excel文件问题解决 (python xlrd unsupported format, or corrupt file.)
  20. ORBSLAM知识整理

热门文章

  1. yiibooster+bsie
  2. Xorg可以使用hot-plug了,不过配置很麻烦
  3. Alarm:IT界朋友请珍惜你的身体[转贴]
  4. [转载] python数字类型(一)
  5. [转载] pickle:让python对象序列化
  6. [转载] python基础 - namedtuple和enum
  7. [转载] python在内网服务器安装第三方库
  8. [转载] python的 for、while循环、嵌套循环
  9. Linux管理传世经典:Linux 系统管理技术手册(第二版) 中文高清版下载
  10. JSON.stringify的三个参数(转载)