样本修改 sample_如何在R中使用sample（）获取样本？

样本修改 sample

Let’s understand one of the frequently used functions, sample() in R. In data analysis, taking samples of the data is the most common process done by the analysts. To study and understand the data, sometimes taking a sample is the best way and it is mostly true in case of big data.

让我们了解R中最常用的函数之一sample（）。在数据分析中，对数据进行采样是分析师最常用的过程。要研究和理解数据，有时取样是最好的方法，并且在大数据的情况下通常是正确的。

R offers the standard function sample() to take a sample from the datasets. Many business and data analysis problems will require taking samples from the data. The random data is generated in this process with or without replacement, which is illustrated in the below sections.

R提供了标准函数sample（）来从数据集中获取样本。许多业务和数据分析问题都需要从数据中取样。随机数据是在此过程中生成的，有无替换，如下节所示。

Let’s roll into the topic!!!

让我们进入主题！！！

R中sample（）的语法 (Syntax of sample() in R)


sample(x, size, replace = FALSE, prob = NULL)

x – vector or a data set.x –向量或数据集。
size – sample size.大小 –样本大小。
replace – with or without replacement of values.替换 –替换或不替换值。
replace – with or without replacement of values.替换 –替换或不替换值。
prob – probability weights概率 –概率权重

更换样品 (Taking samples with replacement)

You may wonder, what is taking samples with replacement?

您可能想知道，正在取样替换的样品是什么？

Well, while you are taking samples from a list or a data, if you specify replace=TRUE or T, then the function will allow repetition of values.

好吧，当您从列表或数据中取样时，如果指定replace = TRUE或T ，则该函数将允许重复值。

Follow the below example which clearly explains the case.

请遵循以下示例，该示例清楚地说明了这种情况。


#sample range lies between 1 to 5
x<- sample(1:5)
#prints the samples
x
Output -> 3 2 1 5 4#samples range is 1 to 5 and number of samples is 3
x<- sample(1:5, 3)
#prints the samples (3 samples)
x
Output -> 2 4 5#sample range is 1 to 5 and the number of samples is 6
x<- sample(1:5, 6)
x
#shows error as the range should include only 5 numbers (1:5)
Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE'#specifing replace=TRUE or T will allow repetition of values so that the function will generate 6 samples in the range 1 to 5. Here 2 is repeated.x<- sample(1:5, 6, replace=T)
Output -> 2 4 2 2 4 3

R中未更换的样品 (Samples Without Replacement in R)

In this case, we are going to take samples without replacement. The whole concept is shown below.

在这种情况下，我们将取样 而不更换 样品。整个概念如下所示。

In this case of without replacement, the function replace=F is used and it will not allow the repetition of values.

在不替换的情况下，将使用函数replace = F ，它将不允许重复值。


#samples without replacement
x<-sample(1:8, 7, replace=F)
x
Output -> 4 1 6 5 3 2 7
x<-sample(1:8, 9, replace=F)
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'#here the size of the sample is equal to range 'x'.
x<- sample(1:5, 5, replace=F)
x
Output -> 5 4 1 3 2

使用函数set.seed（）进行采样 (Taking samples using the function set.seed())

As you may experience that when you take the samples, they will be random and change each time. In order to avoid that or if you don’t want different samples each time, you can make use of set.seed() function.

正如您可能会遇到的那样，当您采样时，它们将是随机的，并且每次都会改变。为了避免这种情况，或者如果您不想每次都使用不同的样本，可以使用set.seed（）函数。

set.seed() – set.seed function will produce the same sequence when you run it.

set.seed（） – set.seed函数在运行时会产生相同的序列。

This case is illustrated below, execute the below code to get the same random samples each time.

下面说明了这种情况，执行以下代码以每次获得相同的随机样本。


#set the index
set.seed(5)
#takes the random samples with replacement
sample(1:5, 4, replace=T)
2 3 1 3set.seed(5)
sample(1:5, 4, replace=T)
2 3 1 3set.seed(5)
sample(1:5, 4, replace=T)
2 3 1 3

从数据集中获取样本 (Taking the sample from a dataset)

In this section, we are going to generate samples from a dataset in Rstudio.

在本节中，我们将从Rstudio中的数据集中生成样本。

This code will take the 10 rows as a sample from the ‘ToothGrowth’ dataset and display it. In this way, you can take the samples of the required size from the dataset.

此代码将从“ ToothGrowth”数据集中获取10行作为示例并显示它。这样，您可以从数据集中获取所需大小的样本。


#reads the dataset 'Toothgrwoth' and take the 10 rows as sample
df<- sample(1:nrow(ToothGrowth), 10)
df
--> 53 12 16 26 37 27  9 22 28 10
#sample 10 rows
ToothGrowth[df,]len supp dose
53 22.4   OJ  2.0
12 16.5   VC  1.0
16 17.3   VC  1.0
26 32.5   VC  2.0
37  8.2   OJ  0.5
27 26.7   VC  2.0
9   5.2   VC  0.5
22 18.5   VC  2.0
28 21.5   VC  2.0
10  7.0   VC  0.5

使用set.seed（）函数从数据集中获取样本 (Taking the samples from the dataset using the set.seed() function)

In this section, we are going to use the set.seed() function to take the samples from the dataset.

在本节中，我们将使用set.seed（）函数从数据集中获取样本。

Execute the below code to generate the samples from the data set using set.seed().

执行以下代码，使用set.seed（）从数据集中生成样本。


#set.seed function
set.seed(10)
#taking sample of 10 rows from the iris dataset.
x<- sample(1:nrow(iris), 10)
x
--> 137  74 112  72  88  15 143 149  24  13
#displays the 10 rows
iris[x, ]Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
137          6.3         3.4          5.6         2.4  virginica
74           6.1         2.8          4.7         1.2 versicolor
112          6.4         2.7          5.3         1.9  virginica
72           6.1         2.8          4.0         1.3 versicolor
88           6.3         2.3          4.4         1.3 versicolor
15           5.8         4.0          1.2         0.2     setosa
143          5.8         2.7          5.1         1.9  virginica
149          6.2         3.4          5.4         2.3  virginica
24           5.1         3.3          1.7         0.5     setosa
13           4.8         3.0          1.4         0.1     setosa

You will get the same rows when you execute the code multiple times. The values won’t change as we have used the set.seed() function.

多次执行代码时，将获得相同的行。这些值不会更改，因为我们已经使用过set.seed（）函数。

在R中使用sample（）生成随机样本 (Generating a random sample using sample() in R)

Well, we will understand this concept with the help of a problem.

好吧，我们将在问题的帮助下理解这个概念。

Problem: A gift shop has decided to give a surprise gift to one of its customers. For this purpose, they have collected some names. The thing is to choose a random name out of the list.

问题：一家礼品店已决定向其一位顾客提供惊喜礼物。为此，他们收集了一些名称。事情是从列表中选择一个随机名称。

Hint: use the sample() function to generate random samples.

提示：使用sample（）函数生成随机样本。

As you can see below, every time you run this code, it generates a random sample of participant names.

如下所示，每次运行此代码时，它都会随机生成参与者名称样本。


#creates a list of names and generates one sample from this list
sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Rossie"sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Jolie"sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "jack"sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Edwards"sample(c('jack','Rossie','Kyle','Edwards','Joseph','Paloma','Kelly','Alok','Jolie'),1)
--> "Kyle"

通过设置概率采样 (Taking samples by setting the probabilities )

With the help of the above examples and concepts, you have understood how you can generate random samples and extract specific data from a dataset.

借助以上示例和概念，您已经了解了如何生成随机样本并从数据集中提取特定数据。

Some of you may feel relaxed if I say that R allows you to set the probabilities, as it may solve many problems. Let’s see how it works with the help of a simple example.

如果我说R允许您设置概率，则有些人可能会感到轻松，因为它可以解决许多问题。让我们在一个简单的示例的帮助下看看它是如何工作的。

Let’s think of a company that is able to manufacture 10 watches. Among these 10 watches, 20% of them are found defective. Let’s illustrate this with the help of the below code.

让我们考虑一家能够制造10只手表的公司。在这10只手表中，有20％被发现有缺陷。让我们借助以下代码来说明这一点。


#creates a probability of 80% good watches an 20% effective watches.sample (c('Good','Defective'), size=10, replace=T, prob=c(.80,.20))"Good"      "Good"      "Good"      "Defective" "Good"      "Good"
"Good"      "Good"      "Defective" "Good"

You can also try for different probability adjustments as shown below.

您还可以尝试进行如下所示的不同概率调整。

sample (c('Good','Defective'), size=10, replace=T, prob=c(.60,.40))--> "Good"      "Defective" "Good"      "Defective" "Defective" "Good"     "Good"      "Good"      "Defective" "Good"

结语 (Wrapping up)

In this tutorial, you have learned how to generate the sample from the dataset, vector, and a list with or without replacement. The set.seed() function is helpful when you are generating the same sequence of samples.

在本教程中，您学习了如何从数据集，向量以及有或没有替换的列表中生成样本。当您生成相同的样本序列时，set.seed（）函数会很有用。

Try taking samples from various datasets available in R and also you can import some CSV files to take samples with probability adjustments as shown.

尝试从R中可用的各种数据集中获取样本，也可以导入一些CSV文件以通过概率调整来获取样本，如图所示。

More study: R documentation

更多研究： R文档

翻译自: https://www.journaldev.com/39307/sample-in-r

样本修改 sample