NOTES

1 The basic problem of hypothesis test

1.1 Example

year	weight of new born (mean)	size of data	SD
1989	3190 g	∞\infty∞	80 g
1990	3210 g	100	NA

It is obvious that the mean of 100 new born’s weight is 20g higher than that of 1989.

The point here is that what is shown by this 20 g diference. One explain is that it caused by randomness of sampling. Or it is true that baby born in 1990 is weighter than born in 1989.

To figure out this problem, we use μ0\mu_0μ0 indecate the mean of baby’s weight in 1989 and μ\muμ indecate that of 1990. We hypothesize that μ=μ0\mu=\mu_0μ=μ0, and use 100 samples in 1990 to test whether the hypothesis is true or not.

Here
Null hypothesis: H0:μ=3190gH_0:\mu=3190gH0:μ=3190g
If the null hypothesis is not stand, we need an alternative hypothesis:
Alternative hypothesis: H1:μ≠3190gH_1:\mu \ne 3190gH1:μ=3190g

1.2 Two type error

αerror\alpha\ errorα error: H0H_0H0 is right but is rejected.
βerror\beta\ errorβ error: H1H_1H1 is wrong but is accepted.

1.3 The procedure of hypothesis test

raise hypothesis:
H0:μ=3190gH_0:\mu=3190gH0:μ=3190g
H1:μ≠3190gH_1:\mu \ne 3190gH1:μ=3190g
statistics (σ\sigmaσ is known, sample > 30):
z=xˉ−μ0σn=3210−319080/100=2.5z=\frac{\bar{x}-\mu_0}{\sigma\sqrt{n}}=\frac{3210-3190}{80/\sqrt{100}}=2.5z=σnxˉ−μ0=80/1003210−3190=2.5
zα2=±1.96<2.5z_{\frac{\alpha}{2}}=\pm1.96<2.5z2α=±1.96<2.5, zzz is in the critical region, so we reject H0H_0H0.

1.4 Test region

two-tailed test:
H0:μ=μ0H_0 :\mu=\mu_0H0:μ=μ0
acceptance region:∣z∣<∣zα/2∣|z|<|z_{\alpha/2}|∣z∣<∣zα/2∣
one-tailed test:
H0:μ>μ0H_0 :\mu>\mu_0H0:μ>μ0 or μ<μ0\mu<\mu_0μ<μ0
acceptance region:z>zαz>z_\alphaz>zα or z<z1−αz<z_{1-\alpha}z<z1−α

2 Tests for parameters

2.1 One sample test

large samples:
z=xˉ−μ0σnz=\frac{\bar{x}-\mu_0}{\sigma\sqrt{n}}z=σnxˉ−μ0

small samples:
σ\sigmaσ unknown →\to→ estimate with sss

small samples:
t=xˉ−μ0snt=\frac{\bar{x}-\mu_0}{s\sqrt{n}}t=snxˉ−μ0
large samples:
z=xˉ−μ0snz=\frac{\bar{x}-\mu_0}{s\sqrt{n}}z=snxˉ−μ0

n>30

n<30

known

unknown

sample size

z test

t test

2.2 Two samples test

The test of the difference between two means

σ1\sigma_1σ1, σ2\sigma_2σ2 is known
z=(x1ˉ−x2ˉ)−(μ1ˉ−μ2ˉ)σ12n1−σ22n2z=\frac{(\bar{x_1}-\bar{x_2})-(\bar{\mu_1}-\bar{\mu_2})}{\sqrt{\frac{{\sigma_1}^2}{n_1}-\frac{{\sigma_2}^2}{n_2}}}z=n1σ12−n2σ22(x1ˉ−x2ˉ)−(μ1ˉ−μ2ˉ)
σ1\sigma_1σ1, σ2\sigma_2σ2 is unknown:
2.1 smale sample, and σ1=σ2\sigma_1 = \sigma_2σ1=σ2:
t=(x1ˉ−x2ˉ)−(μ1ˉ−μ2ˉ)sp1n1−1n2t=\frac{(\bar{x_1}-\bar{x_2})-(\bar{\mu_1}-\bar{\mu_2})}{s_p\sqrt{\frac{1}{n_1}-\frac{1}{n_2}}}t=spn11−n21(x1ˉ−x2ˉ)−(μ1ˉ−μ2ˉ)
here:
sp2=(n1−1)s12+(n2−1)s22n1+n2−2{s_p}^2=\frac{(n_1-1){s_1}^2+(n_2-1){s_2}^2}{n_1+n_2-2}sp2=n1+n2−2(n1−1)s12+(n2−1)s22
2.2 If it can be sure that σ1≠σ2\sigma_1 \ne \sigma_2σ1=σ2, the t-test fomula is totally different. So it is very important to jurge if the standard variance of two samples are equle.

2.3 paired sample test

Samples are related, test the diference with the statistics according to the sample size.

Exercise

One sample t-test

1 data import and exploration

Import data

> mice = read.csv('mice.csv')
> str(mice)
'data.frame': 100 obs. of  2 variables:$ ID     : Factor w/ 100 levels "A1","B0","B1",..: 45 51 1 65 30 48 41 62 24 95 ...$ T.level: num  0.916 0.978 1.448 0.757 0.849 ...

Explore T-level data

x = mice$T.level
h = hist(x)
xfit = seq(min(x, na.rm = T), max(x, na.rm = T), length = 200)
yfit = dnorm(xfit, mean = mean(x, na.rm = T), sd = sd(x, na.rm = T))
yfit = yfit*max(h$counts)/max(yfit)
lines(xfit, yfit, col = 'blue', lwd = 2)

2 Check assumptions

Normality of data

qqnorm(x)
qqline(x) # looks good> shapiro.test(x) # not significantShapiro-Wilk normality testdata:  x
W = 0.98737, p-value = 0.4633
# so accept the dataas being normal
> y = pnorm(summary(x), mean = mean(x, na.rm = T), sd = sd(x, na.rm = T))
> ks.test(x, y) # evenif the ks-test returned a significant resultTwo-sample Kolmogorov-Smirnov testdata:  x and y
D = 0.75333, p-value = 0.0008108
alternative hypothesis: two-sided

The QQ-plot looks pretty good, and the Shapiro-Wilks test is not significant, so we can accept the data as being normal, even if the Kolmgorov-Smirnov test returned a significant result.

3 Carry out one-sample t-test

> t.test(x, mu = 1.12)One Sample t-testdata:  x
t = 1.8215, df = 99, p-value = 0.07154
alternative hypothesis: true mean is not equal to 1.12
95 percent confidence interval:1.115409 1.227396
sample estimates:
mean of x 1.171403

Here, H0H_0H0 is μ=μ0=1.12\mu=\mu_0=1.12μ=μ0=1.12, p-value is 0.07154 greater than 0.05. we can accept the null hypothesis, but not very stand.

If we let mu = 1.17

> t.test(x, mu = 1.17)One Sample t-testdata:  x
t = 0.049708, df = 99, p-value = 0.9605
alternative hypothesis: true mean is not equal to 1.17
95 percent confidence interval:1.115409 1.227396
sample estimates:
mean of x 1.171403

The 95 percent confidence interval is from 1.115 to 1.227, if we let mu = 1.1

> t.test(x, mu = 1.1)One Sample t-testdata:  x
t = 2.5303, df = 99, p-value = 0.01297
alternative hypothesis: true mean is not equal to 1.1
95 percent confidence interval:1.115409 1.227396
sample estimates:
mean of x 1.171403

We have to reject the null hypothesis.

4 illustrate the result

In this case we wanted to know if our sampled data was different from a particular reference value(μ0\mu_0μ0). If the 95% confidence interval of the mean includes the reference value, then there is no significant difference. A good way to illustrate this result is therefore a barplot of the mean value with error bars for the 95% confidence interval.

mean.T = mean(x)
result.T = t.test(x, mu = 1.12)
bp = barplot(mean.T, ylim = c(0, 1.4), col = 'red', asp = 1.5)
arrows(bp, result.T$conf.int[1], bp, result.T$conf.int[2], code = 3, angle = 90)
abline(h = 1.12, lty = 2)

Two-sample t-test

1 data import and exploration

> tits = read.csv('tits.csv')
> str(tits)
'data.frame': 90 obs. of  3 variables:$ SPE: Factor w/ 2 levels "BM","TX": 2 2 2 2 2 2 1 2 2 2 ...$ wei: num  17.3 17.2 14.5 17.8 18 18.3 9.3 15.1 NA 15.8 ...$ egg: int  10 6 9 4 7 8 10 9 10 10 ...
> boxplot(egg~SPE, data = tits) # difference in weight betwenn species

A two-sample t-test is appropriate in this case since we want to compare two different (and unrelated) groups, Blue tits(BM) and Great tits(TX).

2 Check assumptions

same procedure like befor

The results for both BM and TX do not pass formal normality tests.

Then as mentioned before, we shoud think about whether the two sd are different. This is different from one-sample t-test.

We use leveneTest() within the car package.

leveneTest(wei~SPE, data=tits)
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  2.3485 0.1305
##       62

The null hypothesis is that the variances are equal. Here P = 0.13 > 0.05, we accept the null hypothesis. Variances are not significantly different. homogeneity of variance is fulfilled.

3 carry out two-sample t-test

So we assumpt that two samples have same SD, in the t-test() let var.equal = T

> t.test(wei~SPE, data = tits, var.equal = T)Two Sample t-testdata:  wei by SPE
t = -16.402, df = 62, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:-7.009003 -5.486166
sample estimates:
mean in group BM mean in group TX 11.16111         17.40870

Here, null hytheposis is μ1−μ2=0\mu_1-\mu_2=0μ1−μ2=0, p-value < 0.05, so the weight of egg between two specise is significantly different.

4 Illustrate result of t-test

mean.wei = aggregate(list(weight=tits$wei), by=list(species=tits$SPE), mean, na.rm = T)
SE = function(x) sd(x, na.rm = T)/(sqrt(sum(!is.na(x))))
se.wei = aggregate(list(SE = tits$wei), by = list(species=tits$SPE), FUN=SE)
wei.dat = cbind(mean.wei, SE=se.wei$SE)
bp = barplot(weight~species, data=wei.dat, ylim=c(0,20), col='purple')
arrows(bp, wei.dat$weight-wei.dat$SE, bp, wei.dat$weight+wei.dat$SE, code=3, angle=90)

paired-samples t-test

This test is most appropriate when you want to compare two dependent values.An example could be a measurement made on the same individuals before and after an experimental manipulation.

1 data import and exploration

we want to test whether one type of tree covers a larger proportion of the total area than the other.

forest = read.csv('forest.csv')
hist(forest$oak)
hist(forest$bir)

2 check assumption

Test the difference

> diff = forest$oak - forest$bir
> hist(diff)
> shapiro.test(diff)Shapiro-Wilk normality testdata:  diff
W = 0.92191, p-value = 0.2662> x = diff
> y = pnorm(summary(x), mean = mean(x, na.rm = T), sd = sd(x, na.rm = T))
> ks.test(x, y)Two-sample Kolmogorov-Smirnov testdata:  x and y
D = 0.83333, p-value = 0.002875
alternative hypothesis: two-sided> qqnorm(x)
> qqline(x)

3 the paire-samples t-test.

add the argument paired=TRUE

> t.test(forest$oak, forest$bir, paired = T)Paired t-testdata:  forest$oak and forest$bir
t = -1.4412, df = 12, p-value = 0.1751
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:-0.14934135  0.03043097
sample estimates:
mean of the differences -0.05945519

There is no significant difference.

4 Illustrate the result

we could create a scatterplot of oak vs. birwith a 1:1 reference line, where sites falling above the 1:1 line have an excess of oak and sites below the 1:1 line have an excess of birch. In cases where there is a significant difference, most of the points will be on one or the other side of the line. The advantage of this approach is that it also illustrates the variation among sites.

plot(forest$oak~forest$bir, xlab='brich', ylab='oak', pch=16)
abline(0,1)

BIOS14: Hypothesis testing（假设检验）using R相关推荐

Hypothesis Testing
Refer to R Tutorial andExercise Solution Researchers retain or reject hypothesis based on measuremen ...
课堂笔记(3) 假设检验 Hypothesis testing
目录 Basic knowledge Estimated standard error of βˆi Hypothesis test for the slope parameter βi ...
Chapter 9 (Classical Statistical Inference): Binary Hypothesis Testing
本文为 IntroductionIntroductionIntroduction tototo ProbabilityProbabilityProbability 的读书笔记目录 Binary Hy ...
Probability and Hypothesis Testing
代做hw06留学生作业.代写java/python编程语言作业.代写Hypothesis作业.代写C/C++课程设计作业 hw06-Copy1 November 16, 2018 1 Homework ...
统计学假设检验(Hypothesis Testing)
什么是假设检验: 通过设定一个假设, 然后通过收集数据.计算等操作来判断这个假设是否成立. 假设检验的步骤: 1. 设定 null hypothesis 和 alternative hypothes ...
假设检验 Hypothesis testing
Hypothesis H0{H_0}H0:零假设 Null Hypothesis H1{H_1}H1:备择假设 Alternert Hypothesis t-test 单样本单组和总体比(ind ...
假设检验(hypothesis testing)及P值(p-value)
前一篇t检验的文末提到了P值的概念,P值实际上是医学统计中很常用的一个概念,那么这篇文章继续讲解什么是P值.说到P值,就得先从假设检验说起. 首先声明,此篇的内容是来自"马同学高等数学&qu ...
R语言学习笔记（五）假设检验及其R实现
文章目录写在前面概念回顾关于χ2\chi^2χ2分布的一个重要定理假设检验概念两类错误功效与样本量假设检验与置信区间的关系单个正态总体均值的检验推导过程 ppp值的有关结论 R语言 ...
【mathematical statistics】4 hypothesis testing
假设检验的概念与步骤统计推断:由样本到总体的推理常用的统计推断有三种抽样分布参数估计假设检验正态均值的检验两正态均值差的推断成对数据的比较正态方差的推断比率的推断广义似然比检验

BIOS14: Hypothesis testing（假设检验）using R

NOTES

1 The basic problem of hypothesis test

1.1 Example

1.2 Two type error

1.3 The procedure of hypothesis test

1.4 Test region

2 Tests for parameters

2.1 One sample test

2.2 Two samples test

2.3 paired sample test

Exercise

One sample t-test

1 data import and exploration

2 Check assumptions

3 Carry out one-sample t-test

4 illustrate the result

Two-sample t-test

1 data import and exploration

2 Check assumptions

3 carry out two-sample t-test

4 Illustrate result of t-test

paired-samples t-test

1 data import and exploration

2 check assumption

3 the paire-samples t-test.

4 Illustrate the result

BIOS14: Hypothesis testing（假设检验）using R相关推荐

最新文章

热门文章